Package development in R – Overview

Creating an R package is as easy as typing: package.skeleton(name = “YourPackageName”) As you might have guessed, this function creates the basic file and folder structure you need to create an R package. You will get: YourPackageName/ DESCRIPTION man/ NAMESPACE R/ You can also use RStudio to create a package with File > New Project … Read morePackage development in R – Overview

Agile Project Management for Data Science

Many data scientists are former academics who are used to working on a specific and often quite narrow research problems for long periods of time, often years. With data science being in high demand at the moment in nearly all industries, more and more researchers switch from an academic career to one in the private … Read moreAgile Project Management for Data Science

Implementing QANet (Question Answering Network) with CNNs and self attentions

Apr 15, 2018 In this post, we will tackle one of the most challenging yet interesting problems in Natural Language Processing, aka Question Answering. We will implement Google’s QANet in Tensorflow. Just like its machine translation counterpart Transformer network, QANet doesn’t use RNNs at all which makes it faster to train / test. I’m assuming … Read moreImplementing QANet (Question Answering Network) with CNNs and self attentions

What I wish I’d done differently as a data science manager

On centralizing siloed data Apr 12, 2018 I still get nostalgic looking at the very first Pebbles. (Photo courtesy of Pebble’s first Kickstarter) In 2014, I joined Pebble, the smartwatch maker later acquired by Fitbit, to lead their data science & analytics team. I was interested in the challenges of managing a data organization at a … Read moreWhat I wish I’d done differently as a data science manager

Machine Learning for People Who Don’t Care About Machine Learning

Greg Lamp, previous co-founder of the data science startup Yhat, and current co-founder & CTO of Waldo shares his thoughts on Machine Learning for those of us who just don’t care about Machine Learning. What is Machine Learning? The definition I have come up with for Machine Learning is as follows… machine learning is using … Read moreMachine Learning for People Who Don’t Care About Machine Learning

Hierarchical Clustering on Categorical Data in R

Dissimilarity MatrixArguably, this is the backbone of your clustering. Dissimilarity matrix is a mathematical expression of how different, or distant, the points in a data set are from each other, so you can later group the closest ones together or separate the furthest ones — which is a core idea of clustering. This is the step where … Read moreHierarchical Clustering on Categorical Data in R