Reflections on remote data science work
It’s been about a year and a half since I joined Automattic as a remote data scientist. This is the longest I’ve been in one position since finishing my PhD in 2012. This is also the first time I’ve...
View ArticleIntroducing pipe, The Automattic Machine Learning Pipeline
One of the main projects I’ve been working on over the past year. Data for Breakfast A generalized machine learning pipeline, pipe serves the entire company and helps Automatticians seamlessly build...
View ArticleThe most practical causal inference book I’ve read (is still a draft)
I’ve been interested in the area of causal inference in the past few years. In my opinion it’s more exciting and relevant to everyday life than more hyped data science areas like deep learning....
View ArticleHackers beware: Bootstrap sampling may be harmful
Bootstrap sampling techniques are very appealing, as they don’t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and...
View ArticleHow to Increase Retention and Revenue in 1,000 Nontrivial Steps
One of the main projects I worked on last year. Data for Breakfast Recently, Automattic created a Marketing Data team to support marketing efforts with dedicated data capabilities. As we got started,...
View ArticleBootstrapping the right way?
Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk...
View ArticleA day in the life of a remote data scientist
Earlier this year, I gave a talk titled A Day in the Life of a Remote Data Scientist at the Data Science Sydney meetup. The talk covered similar ground to a post I published on remote data science...
View ArticleSoftware commodities are eating interesting data science work
The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote...
View ArticleMany is not enough: Counting simulations to bootstrap the right way
Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of...
View ArticleSome highlights from 2020
My track record of posting here has been pretty poor in 2020, partly because of a bunch of content I’ve contributed elsewhere. In general, my guiding principle for posting is to only add stuff I’d...
View Article