(i.e. how to explain machine learning algorithms to your grandma) As a recent graduate of the Flatiron School’s Data Science Bootcamp, I’ve been inundated with advice on how to ace technical interviews. A soft skill that keeps coming to the forefront is the ability to explain complex machine learning algorithms to a non-technical person. https://wordstream-files-prod.s3.amazonaws.com/s3fs-public/machine-learning.png This … Read moreMachine Learning Algorithms In Layman’s Terms, Part 1
This tutorial will cover exploring and visualizing data through 2018 for the Minneapolis, MN bike sharing service NiceRide. Part of what makes R incredible is the number of great packages. Part of what makes packages like ggmap and gganimate great is how they build on existing packages. First step, as always, is to include the … Read moreVisualizing Bike Share Data (NiceRide)
Identifying lanes of the road is very common task that human driver performs. This is important to keep the vehicle in the constraints of the lane. This is also very critical task for an autonomous vehicle to perform. And very simple Lane Detection pipeline is possible with simple Computer Vision techniques. This article will describe … Read moreFinding Lane Lines — Simple Pipeline For Lane Detection.
It’s been a dream of mine to break into the data science field, so prior to my move, I decided to add another project to my portfolio – a sleek Shiny dashboard. A brutal truth about this project was that I had to invest time in finding my own data and deciding what to do … Read moreMy Shiny Dashboard, Milwaukee Beer
Now that you know the basics of the convolution, we can start building one ! Preparing the data This part is useful only if you want to use your own data, or data that can’t be found on the web easily, to build a convolutional neural network maybe more adapted to your needs. Otherwise, here is the … Read moreAll the Steps to Build your first Image Classifier (with code)
Solution: The Setup Jupyter Notebook Extension Rather than just complaining about the problem (it’s easy to be a critic but a lot harder to do something positive) I decided to see what could be done with Jupyter Notebook extensions. The result is an extension that on opening a new notebook automatically: Creates a template to … Read moreSet Your Jupyter Notebook up Right with this Extension
Let’s say you’ve developed a predictive model in R, and you want to embed predictions (scores) from that model into another application (like a mobile or Web app, or some automated service). If you expect a heavy load of requests, R running on a single server isn’t going to cut it: you’ll need some kind … Read moreAn architecture for real-time scoring with R
Suppose that you have a sample of a variable of interest, e.g. the heights of men in certain population, and for some obscured reason you are interest not in the mean height μ but in its square μ². How would you inference on μ², e.g. test a hypothesis or calculate a confidnce interval? The delta … Read moreThe delta method and its implementation in R
The US Powerball lottery hysteria took another step when no one won the big jackpot in the last draw that took place on October 20, 2018. So, the total jackpot is now 2.22 billion dollars. I am sure that you want to win this jackpot. I myself want to win it. Actually, there are two different … Read morePowerball demystified
The R Journal is the open access, refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that should be of interest to users or developers of R. Christoph Weiss, Gernot Roetzer and myself have joined forces to write an R package and the accompanied paper: Forecast … Read moreR Journal publication
The earliest report of a clinical trial is probably provided in the Book of Daniel. Daniel and a group of other Jewish people who stayed at the palace of the king of Babylon, did not want to eat the king’s non-Kosher food and preferred a vegetarian diet. To show that vegetarian and Kosher diet is healthier, … Read moreA brief history of clinical trials
So I’ve been back in Australia for five months now. While things have been very busy in my new role at Nous Group, it’s not so busy that I’ve failed to notice there’s a Federal election due some time by November this year. I’m keen to apply some of the techniques I used in New … Read moreBayesian state space modelling of the Australian 2019 election by @ellis2013nz
A brief introduction to Markov chains By Joseph Rocca — 19 min read In 1998, Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd published “The PageRank Citation Ranking: Bringing Order to the Web”, an article in which they introduced the now famous PageRank algorithm at the origin of Google. Favorite
Suppose that you are interviewed for a data scientist role. You are asked about logistic regression, and you answer all sorts of questions: How to run it in Python, how would you perform feature selection, and how would you use it for prediction. For the last question you answer that if you have the estimated of the regression … Read moreWhat is logistic in the logistic regression?
Some of the 300,000+ images I captured while leaving the machine running for a few days. If you are a hobbyist or researcher working on an AI project, it’s quite likely that you’ve run into the unfortunate situation of having to generate a large amount of labeled training data. Of course, having spent all your funding … Read moreHow I created over 100,000 labeled LEGO training images
Parameter Sharing You might have noticed another key difference between Figure 1 and Figure 3. In the earlier, multiple different weights are applied to the different parts of an input item generating a hidden layer neuron, which in turn is transformed using further weights to produce an output. There seems to be a lot of … Read moreRecurrent Neural Networks
Whether you are a Data Engineer or a Data Scientist, getting up and running with Apache Spark is a relatively easy process from a development perspective. It does require a slight change in paradigm thinking and understanding how Spark executes code and how it functions on our clusters is an important part of being efficient … Read moreHow does Apache Spark run on a cluster?
Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. He compares a case-based approach (where the bin divisions are stuffed into code) with a join based approach. He shares code and timings. Best of all: rquery gets some attention and turns out to be the dominant … Read moreBinning Data in a Database
In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages. Although I began with … Read moreSome R Packages for ROC Curves