A Great Public Health Conspiracy?

The Facts on Public Water Fluoridation With any health topic, especially one that has attracted controversy, we must be careful about where we get our data. Even studies in peer-reviewed journals can have biases — intentional or not. Therefore, the best practice for reviewing medical evidence is to look at meta-analyses, reviews that evaluate results from dozens … Read moreA Great Public Health Conspiracy?

Canny Edge Detection Step by Step in Python — Computer Vision

Noise Reduction Since the mathematics involved behind the scene are mainly based on derivatives (cf. Step 2: Gradient calculation), edge detection results are highly sensitive to image noise. One way to get rid of the noise on the image, is by applying Gaussian blur to smooth it. To do so, image convolution technique is applied … Read moreCanny Edge Detection Step by Step in Python — Computer Vision

Weekly Selection — Jan 25, 2019

Attn: Illustrated Attention By Raimi Karim — 12 min read For decades, Statistical Machine Translation has been the dominant translation model, until the birth of Neural Machine Translation (NMT). NMT is an emerging approach to machine translation that attempts to build and train a single, large neural network that reads an input text and outputs a translation … Read moreWeekly Selection — Jan 25, 2019

What the world agrees with by @ellis2013nz

A serious, decades-long attempt to understand different peoples’ values David Hood (@Thoughfulnz) has been posting some interesting snippets of analysis using the World Values Survey data (like this example). This inspired me to have a look at the data myself; something that’s been on my to-do list for years. I have analysed it before, but … Read moreWhat the world agrees with by @ellis2013nz

Artificial Intelligence, Music, and the Human Sublime

Hannah Fry, in her book “Hello World”, talks of how computers can be programmed to mimic music, nearly perfectly. A program was written that perfectly mimicked Bach’s musical lexicon-right down to the notes, words, and phrases he used in all his body of work. Even the most astute musician could never get this “data’s eye … Read moreArtificial Intelligence, Music, and the Human Sublime

How Frequently Do People Use Different Drugs?

One of the most frustrating things about Tell Your Children, the anti-marijuana tract from former New York Times reporter (and spy novelist!) Alex Berenson, is that its most interesting points are almost entirely unrelated to its thesis. That central idea — roughly, that marijuana causes psychosis and schizophrenia, and psychosis causes violence, therefore marijuana causes violence—is compelling … Read moreHow Frequently Do People Use Different Drugs?

How to develop data products and not die trying

Authors:David Flórez Fernández, Data and AI Solution Architect @ Microsoft Pablo Peris, Digital Architect @ Microsoft Companies struggle to thrive with Analytics projects In the present days of data accumulation there is a global craving for the innovative and business use of AI at all levels. Maybe it’s time to stop and reflect on that burning … Read moreHow to develop data products and not die trying

Understanding Machine Learning on Point Clouds through PointNet++

Introduction Data can take on a variety of forms. For processing visual information, images are extremely common. Images store a two-dimensional grid of pixels that often represent our three-dimensional world. Some of the most successful advances in machine learning have come from problems involving images. However, for capturing data in 3D directly, it is less … Read moreUnderstanding Machine Learning on Point Clouds through PointNet++

My #TidyverseDevDay and #RStudioConf 2019 Reflections!

This was my second RStudio Conference following last year’s editionin San Diego! In addition, at Tidyverse Developer Day I got a reallycool chance to work on issues and contribute to making the Tidyversebetter. This post won’t be a complete overview of the talks at theconference (others have already released some good blog posts on thatnote: … Read moreMy #TidyverseDevDay and #RStudioConf 2019 Reflections!

Right Now It’s KDA…Asset Allocation.

This post will introduce KDA Asset Allocation. KDA — I.E. Kipnis Defensive Adaptive Asset Allocation is a combination of Wouter Keller’s and TrendXplorer’s Defensive Asset Allocation, along with ReSolve Asset Management’s Adaptive Asset Allocation. This is an asset allocation strategy with a profile unlike most tactical asset allocation strategies I’ve seen before (namely, it barely … Read moreRight Now It’s KDA…Asset Allocation.

A Neural Algorithm of Artistic Style: A Modern Form of Creation

Understanding Convolutional Neural Networks Seeing as convolutional neural networks are the underlying concept for the entirety of N.A.A.S., it is important to have a clear idea of what they do. If you already know about CNNs, that’s great. Move on to the next section. Conv nets are a type of artificial neural networks which lever a … Read moreA Neural Algorithm of Artistic Style: A Modern Form of Creation

Data Science vs Decision Science

What’s the difference between a Data Scientist and a Decision Scientist? At Instagram, we have many different job roles that analyze data. A few of the ‘data’ job titles include: Data Scientists, Analysts, Researchers and Growth marketing. But there’s often a lot of confusion between the roles of Data Scientist vs Decision Scientist. We have … Read moreData Science vs Decision Science

Quick Hit: Automating Production Graphics Uploads in R Markdown Documents with googledrive

As someone who measures all kinds of things on the internet as part of his $DAYJOB, I can say with some authority that huge swaths of organizations are using cloud-services such as Google Apps, Dropbox and Office 365 as part of their business process workflows. For me, one regular component that touches the “cloud” is … Read moreQuick Hit: Automating Production Graphics Uploads in R Markdown Documents with googledrive

Data science unicorns might be right under your nose

Our society produces data at an astounding rate. By some estimates, as many as 2.5 million terabytes of new information appear on servers around the world every day. That’s as much data as could fit on a billion iPhones, a quantity of zeros and ones so large you need eighteen zeros just to count it. … Read moreData science unicorns might be right under your nose

Neural Networks Intuitions: 2. Dot product, Gram Matrix and Neural Style Transfer

Problem — 2. Generate Style: The problem is to produce an image which contains the style as in the style image. Solution: To extract the style of an image(or more specifically to compute the style loss), we need something called as Gram matrix. Wait, what is a Gram matrix? Before talking about how to compute the style … Read moreNeural Networks Intuitions: 2. Dot product, Gram Matrix and Neural Style Transfer

DeepTraffic – DQN Tuning for Traffic Navigation (75.01 MPH Solution)

Crowdsourced Hyperparameter Tuning Competition In today’s article, we are going to approach a traffic navigation problem with Reinforcement Learning (RL). In order to do so, we will revise our RL skills and participate in the DeepTraffic competition hosted by MIT Deep Learning. Americans spend 8 billion hours stuck in traffic every year.Deep neural networks can help! … Read moreDeepTraffic – DQN Tuning for Traffic Navigation (75.01 MPH Solution)

3 Tips to Improving Your Data Science Workflow

3. Optimising Parameters Efficiently When I first started learning to apply machine learning, I would manually change the parameter inputs one by one and take a note of the results for my final output. Although this helped my understanding with the parameters, it was time consuming and inefficient. As time has gone on, I have … Read more3 Tips to Improving Your Data Science Workflow

Reinforcement Learning with Exploration by Random Network Distillation

Ever since the seminal DQN work by DeepMind in 2013, in which an agent successfully learned to play Atari games at a level that is higher than an average human, Reinforcement Learning (RL) has been making headlines frequently. From Atari games to robotics, and the amazing defeat of world Go champion Lee-Sedol by AlphaGo, it … Read moreReinforcement Learning with Exploration by Random Network Distillation

Introduction to ResNets

‘We need to go Deeper’ Meme, classical CNNs do not perform well as the depth of the network grows past a certain threshold. ResNets allow for the training of deeper networks. This Article is Based on Deep Residual Learning for Image Recognition from He et al. [2] (Microsoft Research): https://arxiv.org/pdf/1512.03385.pdf In 2012, Krizhevsky et al. … Read moreIntroduction to ResNets

What is AI bias?

The AI bias trouble starts — but doesn’t end — with definition. “Bias” is an overloaded term which means remarkably different things in different contexts. Image: source. Here are just a few definitions of bias for your perusal. In statistics: Bias is the difference between the expected value of an estimator and its estimand. That’s awfully technical, so allow … Read moreWhat is AI bias?

How to beat Google’s AutoML – Hyperparameter Optimisation with Flair

This is a follow-up to our previous post about State of the Art Text Classification. We explain how to do hyperparameter optimisation using Flair to achieve optimal results in text classification outperforming Google’s AutoML Natural Language. What is hyperparameter optimisation and why can’t we simply do it by hand? Hyperparameter optimisation (or tuning) is the process … Read moreHow to beat Google’s AutoML – Hyperparameter Optimisation with Flair

Python’s Collections Module — High-performance container data types.

Let us now hop over to the actual objective of this article which is to get to know about the Python’s Collection module. This is just an overview and for detailed explanations and examples please refer to the official Python documentation. Collections Module Collections is a built-in Python module that implements specialized container datatypes providing … Read morePython’s Collections Module — High-performance container data types.

Zen and The Art of Competing Against MBA’s

“I appreciate your ambition, but we’re looking for an MBA…” My senior manager smiled and indicated the topic was closed. Despite the fact I was effectively running our direct mail program in the absence of my recently departed boss, the door was closed and locked. I quit two months later. Within three years, I was promoted … Read moreZen and The Art of Competing Against MBA’s

stringfix : new R package for string manipulation in a %>% way

I usually write around here in french and mainly report on French Hospitals data managment and the statistical tasks they imply. As today’s post is about a new package I have created, I’ll be writing in english. The package is called stringfix because it uses infix operators to manipulate character strings. This post is an … Read morestringfix : new R package for string manipulation in a %>% way

Animating Data Transformations III – separate()

We recently have published two blogs on animating data transformations. The first, Animating Data Transformations, illustrated the spread() and gather() functions for going between wide and tall representations of data. The second, Animating Data Transformations II, covered the unnest() function for transforming a list column into a one value per row format. Today, we’re going to … Read moreAnimating Data Transformations III – separate()

Time Series of Price Anomaly Detection

Photo credit: Pixabay Anomaly detection detects data points in data that does not fit well with the rest of the data. Also known as outlier detection, anomaly detection is a data mining process used to determine types of anomalies found in a data set and to determine details about their occurrences. Automatic anomaly detection is critical in … Read moreTime Series of Price Anomaly Detection

Tel Aviv artists: build yourself a mapping app

tl;dr — I went from experimenting with mapping libraries to building a reusable mapping app. This is how I did it and how you can re-use it. Intro As a data scientist, most of my work stays behind the scenes. When training models, the farthest I reach in exposure is deploying a simple flask web-app as REST … Read moreTel Aviv artists: build yourself a mapping app

Get Started with Support Vector Machines (SVM)

A hands-on tutorial with 4 examples on how to implement support vector machines for classification Photo by Randy Fath on Unsplash In a previous post, I introduced the theory of support vector machine (SVM). Now, I will further explain how SVMs work with fours different exercises! The first part will show how to perform classification with … Read moreGet Started with Support Vector Machines (SVM)

AI Thinks Rachel Maddow Is A Man (and this is a problem for all of us)

A data-driven review of AI bias in production systems In 2011, IBM Watson made headlines when it beat Jeopardy legends Ken Jennings and Brad Rutter in a $1M match. In Final Jeopardy, Jennings admitted defeat by writing “I, for one, welcome our new computer overlords.” That was in 2011, when a good score in a … Read moreAI Thinks Rachel Maddow Is A Man (and this is a problem for all of us)

From prediction to decision making

Why your predictions might be falling short — opinion Photo by Mika Baumeister on Unsplash “There are a number of gaps between making a prediction and making a decision” Susan Athey [1] Correlation does not imply causation This is one of the most repeated phrases in statistical testing. It’s done so for a reason, I believe, and that … Read moreFrom prediction to decision making

Information Flows in You — And Your Friends

Upper limits of predictability using social media information even if a person has deleted their social media presence You’ve had enough. Of baby pictures, of political rants by ‘friends’, even of cute cat pictures! Of fearing about your privacy and future career security. You decide to delete your accounts on Facebook, Twitter and Instagram. And you’re … Read moreInformation Flows in You — And Your Friends

Introducing Feast

Google’s New Feature Store for Machine Learning Applications Feature extraction and storage is one of the most important and often overlooked aspects of machine learning solutions. Features play a key role helping machine learning models to process and understand datasets for training and production. If you are building a single machine learning model, feature extraction … Read moreIntroducing Feast

LondonR calling

It’s the New Year and we’re kicking off 2019 with our first LondonR! The meetup took place on the 15th of January, and we were delighted to have about 100 people in attendance. With excellent speakers lined-up and a free bar for networking, we started 2019 with a BANG! Please find all the presentations here. Dawid Kaledkowski, ClickMeeting – … Read moreLondonR calling

If wealth had anything to do with intelligence…

…the richest man on earth would have a fortune of no more than $43,000! If you don’t believe me read this post! Have you ever thought about the distribution of wealth as a function of some quality? Especially rich people pride themselves on extraordinary abilities, so that they somehow “deserve” their wealth. Now “abilities” is … Read moreIf wealth had anything to do with intelligence…

Simple Soybean Price Regression with Fast.ai Random Forests

As a student in the fast.ai Machine Learning for Coders MOOC¹ with an interest in agriculture the first application of the fast.ai random forest regression library that came to mind was prediction of soybean prices from historical data. Soybeans are a global commodity and their price-per-bushel has varied a great day over the past decade. … Read moreSimple Soybean Price Regression with Fast.ai Random Forests

Level up your Data Visualizations with quick plot

K-Means plot for Spotify Data Visualization is an essential part of a Data Scientists workflow. It allows us to visually understand our problem, analyses our models, and allows us to provide deep meaningful understanding to communities. As Data Scientists, we always look new ways of improving our data science workflow. Why should I use this over … Read moreLevel up your Data Visualizations with quick plot

Playlist Classification on Spotify using KNN and Naive Bayes Classification

https://unsplash.com/@usefulcollective One day, I thought it would be cool if Spotify helped me pick a playlist when I like a song. The idea is to touch on the plus button when my phone is locked and Spotify add it into one of my playlists rather than library so that I don’t go into the app … Read morePlaylist Classification on Spotify using KNN and Naive Bayes Classification

How to prepare data for NLP (text classification) with Keras and TensorFlow

In the past, I have written and taught quite a bit about image classification with Keras (e.g. here). Text classification isn’t too different in terms of using the Keras principles to train a sequential or function model. You can even use Convolutional Neural Nets (CNNs) for text classification. What is very different, however, is how to … Read moreHow to prepare data for NLP (text classification) with Keras and TensorFlow

Onboard and Offboard Data Manipulation in Flexdashboard

Harrison Schramm is a Professional Statistician and Non-Resident Senior Fellow at the Center for Strategic and Budgetary Assessments. The Shiny set of tools, and, by extension, Flexdashboard, give professional analysts tools to rapidly put interactive versions of their work in the hands of clients. Frequently, an end user will interact with data by either uploading … Read moreOnboard and Offboard Data Manipulation in Flexdashboard

EEG Motor Imagery Classification in Node.js with BCI.js

Detecting brainwaves associated with imagined movements Brain-computer interfaces (BCIs) allow for the control of computers and other devices using only your thoughts. A popular way to achieve this is with motor imagery detected with electroencephalography (EEG). This tutorial will serve as an introduction to the detection and classification of motor imagery. I’ve broken it down … Read moreEEG Motor Imagery Classification in Node.js with BCI.js