    Don't Overfit

    Posted February 27, 2017

    Overfitting is a subject that isn’t discussed nearly enough. In machine learning, overfitting is when an algorithm learns a model so specialized that it is unable to generalize or handle new tasks...Read more...

    Kubernetes Cron Container

    Posted May 30, 2016

    So, you’ve got a Kubernetes cluster, and a cron task you need to run. Running it on your machine is an obviously bad idea, as is shoe horning it into another machine in your cloud fleet. Wouldn’t it...Read more...

    Undeleting Kafka Topics

    Posted May 19, 2016

    Accidentally deleted a topic, but hadn’t set delete.topic.enable in your file, and need to undo the topic deletion? Just delete the topic deletion in Zookeeper! Just ssh into your...Read more...

    Simple, Clean Python Deploys with Anaconda

    Posted December 17, 2015

    Deploying Python projects can be a pain - especially with Python 3.5. Anaconda is the emerging replacement for pip/virtualenv deploys, with its scope expanding past Python packages to binaries like...Read more...

    Reducers - A Productive Stream Processing Pattern

    Posted October 24, 2015

    The genesis of the software industry to stream processing is well underway. Open source systems like Kafka handle huge throughputs with surprisingly few resources, and aid heavily in decomposing...Read more...

    LDA Alpha and Beta Parameters - The Intuition

    Posted October 21, 2015

    Latent Dirichlet Allocation (LDA) is a fantastic tool for topic modeling, but its alpha and beta hyperparameters cause a lot of confusion to those coming to the model for the first time (say, via an...Read more...

    The Last Analytics Company

    Posted October 7, 2015

    The analytics market is crowded - there are countless companies offering nearly identical services. What’s worse, the technical task of recording analytics has become easy: many technologies...Read more...

    Python 3 on Spark - Return of the PYTHONHASHSEED

    Posted September 8, 2015

    If you’re anything like me, you’ve been stuck using Python 2 for the last 10 years, and for 8 of them you’ve been trying to switch to 3. Since the release of Spark and PySpark 1.4, Apache has started...Read more...

    Pipelining - A Successful Data Processing Model

    Posted March 10, 2015

    It’s finally time to implement that new personalization service — the one you’ve been pushing for for months. With it, your app will be serving up relevant, personalized content to every user. But the...Read more...