SQL Server for Data Scientists

SQL is not the sexiest language on the block and many/most data scientists I know prefer to stick to R and/or Python. Some common complains I hear about SQL are: It is hard to read and as a consequence large SQL statements are hard to debug. Version control with databases often requires additional tooling to … Read more

Package development in R – Overview

Creating an R package is as easy as typing: package.skeleton(name = “YourPackageName”) As you might have guessed, this function creates the basic file and folder structure you need to create an R package. You will get: YourPackageName/ DESCRIPTION man/ NAMESPACE R/ You can also use RStudio to create a package with File > New Project … Read more

Agile Project Management for Data Science

Many data scientists are former academics who are used to working on a specific and often quite narrow research problems for long periods of time, often years. With data science being in high demand at the moment in nearly all industries, more and more researchers switch from an academic career to one in the private … Read more

Implementing QANet (Question Answering Network) with CNNs and self attentions

Apr 15, 2018 In this post, we will tackle one of the most challenging yet interesting problems in Natural Language Processing, aka Question Answering. We will implement Google’s QANet in Tensorflow. Just like its machine translation counterpart Transformer network, QANet doesn’t use RNNs at all which makes it faster to train / test. I’m assuming … Read more

What I wish I’d done differently as a data science manager

On centralizing siloed data Apr 12, 2018 I still get nostalgic looking at the very first Pebbles. (Photo courtesy of Pebble’s first Kickstarter) In 2014, I joined Pebble, the smartwatch maker later acquired by Fitbit, to lead their data science & analytics team. I was interested in the challenges of managing a data organization at a … Read more

Machine Learning for People Who Don’t Care About Machine Learning

Greg Lamp, previous co-founder of the data science startup Yhat, and current co-founder & CTO of Waldo shares his thoughts on Machine Learning for those of us who just don’t care about Machine Learning. What is Machine Learning? The definition I have come up with for Machine Learning is as follows… machine learning is using … Read more

Office Ribbons

I am an absolute fan of adapting your work environment to your needs. Spending an hour to set up some shortcuts is virtually always a good time investment. Then you can easily drag your most used commands into a new bar. You should be able to save a lot of time on, e.g. aligning objects … Read more

Hierarchical Clustering on Categorical Data in R

Dissimilarity MatrixArguably, this is the backbone of your clustering. Dissimilarity matrix is a mathematical expression of how different, or distant, the points in a data set are from each other, so you can later group the closest ones together or separate the furthest ones — which is a core idea of clustering. This is the step where … Read more