experimental controls as opportunity cost

A recent press release from NYU celebrates the release of some new video games that “train your brain.” The games were developed by “developmental psychologists, neuroscience researchers, learning scientists, and game designers,” so you can be sure they’re thrilling. There are two issues I want to discuss. The first is their choice of control. The article is paywalled so I don’t know all the details of the experimental setup, but the abstract gives you a fairly good idea: continue reading

lessons from the cuckoo’s egg

Cliff Stoll
Doubleday, 1989

There was a time before we defaulted to locking our doors. In the 1980s, the nascent internet was mostly used by research scientists and the military. The community was small, and—as tends to be the case in small communities—the level of trust was high. Administrators didn’t invest much effort in locking down their systems, because the possibility of bad actors hadn’t been seriously considered. continue reading

book review: search patterns

Peter Morville and Jeffery Callender
O’Reilly, 2010

Data science projects can (very roughly) be divided into two types. The first is a study, aimed at providing quantitative insights to other business units. These typically involve building reports, calculating p-values, and answering product managers’ questions. Fifteen years ago the people doing this were called “statisticians” or “data analysts.” The second type of project feeds directly into customer-facing products. Examples include recommender systems, tagging/classification systems, and search engines. continue reading

visualizing piero scaruffi’s music database

Scaruffi’s music database

Since the mid-1980s, Piero Scaruffi has written essays on countless topics, and published them all for free on the internet – which he helped develop. You can learn more about him (and pretty much anything else that might interest you) on his legendary website. continue reading

the lowest form of wit: modelling sarcasm on reddit

A while back Kaggle introduced a database containing all the comments that were posted to reddit in May 2015. (The data is 30Gb in SQLite format and you can still download it here). Kagglers were encouraged to try NLP experiments with the data. One of the more interesting responses was a script that queried and displayed comments containing the /s flag, which indicates sarcasm. continue reading