Weekly Selection — Jan 25, 2019

Attn: Illustrated Attention By Raimi Karim — 12 min read For decades, Statistical Machine Translation has been the dominant translation model, until the birth of Neural Machine Translation (NMT). NMT is an emerging approach to machine translation that attempts to build and train a single, large neural network that reads an input text and outputs a translation … Read more

What the world agrees with by @ellis2013nz

A serious, decades-long attempt to understand different peoples’ values David Hood (@Thoughfulnz) has been posting some interesting snippets of analysis using the World Values Survey data (like this example). This inspired me to have a look at the data myself; something that’s been on my to-do list for years. I have analysed it before, but … Read more

Categories R Tags ExcerptFavorite

How Frequently Do People Use Different Drugs?

One of the most frustrating things about Tell Your Children, the anti-marijuana tract from former New York Times reporter (and spy novelist!) Alex Berenson, is that its most interesting points are almost entirely unrelated to its thesis. That central idea — roughly, that marijuana causes psychosis and schizophrenia, and psychosis causes violence, therefore marijuana causes violence—is compelling … Read more

R Conference Costs v2.0

Last year we gave you a price break down of some of the most popular R conferences around the globe for 2017. We’re going to do it again for 2018. Remember, you canget up-to-date information on upcoming conferences via our GitHub page. It’s important to note that these costs are the prices of an industry … Read more

Categories R Tags ExcerptFavorite

How to develop data products and not die trying

Authors:David Flórez Fernández, Data and AI Solution Architect @ Microsoft Pablo Peris, Digital Architect @ Microsoft Companies struggle to thrive with Analytics projects In the present days of data accumulation there is a global craving for the innovative and business use of AI at all levels. Maybe it’s time to stop and reflect on that burning … Read more

Understanding Machine Learning on Point Clouds through PointNet++

Introduction Data can take on a variety of forms. For processing visual information, images are extremely common. Images store a two-dimensional grid of pixels that often represent our three-dimensional world. Some of the most successful advances in machine learning have come from problems involving images. However, for capturing data in 3D directly, it is less … Read more

Keeping up to date with R news

I’ve now given my talk about “How to be a resilient R user” three times, at R-Ladies Strasbourg and R-Ladies Paris in person, and at R-Ladies San José via Google Hangouts. It was fun! I covered part of the content of that talk in a blog post about where to get R help. Today, it’s … Read more

Categories R Tags ExcerptFavorite

My #TidyverseDevDay and #RStudioConf 2019 Reflections!

This was my second RStudio Conference following last year’s editionin San Diego! In addition, at Tidyverse Developer Day I got a reallycool chance to work on issues and contribute to making the Tidyversebetter. This post won’t be a complete overview of the talks at theconference (others have already released some good blog posts on thatnote: … Read more

Categories R Tags ExcerptFavorite

more concentration, everywhere

Although it may sound like an excessive notion of optimality, one can hope at obtaining an estimator δ of a unidimensional parameter θ that is always closer to θ that any other parameter. In distribution if not almost surely, meaning the cdf of (δ-θ) is steeper than for other estimators enjoying the same cdf at … Read more

Categories R Tags ExcerptFavorite

Right Now It’s KDA…Asset Allocation.

This post will introduce KDA Asset Allocation. KDA — I.E. Kipnis Defensive Adaptive Asset Allocation is a combination of Wouter Keller’s and TrendXplorer’s Defensive Asset Allocation, along with ReSolve Asset Management’s Adaptive Asset Allocation. This is an asset allocation strategy with a profile unlike most tactical asset allocation strategies I’ve seen before (namely, it barely … Read more

Categories R Tags ExcerptFavorite

A Neural Algorithm of Artistic Style: A Modern Form of Creation

Understanding Convolutional Neural Networks Seeing as convolutional neural networks are the underlying concept for the entirety of N.A.A.S., it is important to have a clear idea of what they do. If you already know about CNNs, that’s great. Move on to the next section. Conv nets are a type of artificial neural networks which lever a … Read more

Data Science vs Decision Science

What’s the difference between a Data Scientist and a Decision Scientist? At Instagram, we have many different job roles that analyze data. A few of the ‘data’ job titles include: Data Scientists, Analysts, Researchers and Growth marketing. But there’s often a lot of confusion between the roles of Data Scientist vs Decision Scientist. We have … Read more

Quick Hit: Automating Production Graphics Uploads in R Markdown Documents with googledrive

As someone who measures all kinds of things on the internet as part of his $DAYJOB, I can say with some authority that huge swaths of organizations are using cloud-services such as Google Apps, Dropbox and Office 365 as part of their business process workflows. For me, one regular component that touches the “cloud” is … Read more

Categories R Tags ExcerptFavorite

Neural Networks Intuitions: 2. Dot product, Gram Matrix and Neural Style Transfer

Problem — 2. Generate Style: The problem is to produce an image which contains the style as in the style image. Solution: To extract the style of an image(or more specifically to compute the style loss), we need something called as Gram matrix. Wait, what is a Gram matrix? Before talking about how to compute the style … Read more

DeepTraffic – DQN Tuning for Traffic Navigation (75.01 MPH Solution)

Crowdsourced Hyperparameter Tuning Competition In today’s article, we are going to approach a traffic navigation problem with Reinforcement Learning (RL). In order to do so, we will revise our RL skills and participate in the DeepTraffic competition hosted by MIT Deep Learning. Americans spend 8 billion hours stuck in traffic every year.Deep neural networks can help! … Read more

3 Tips to Improving Your Data Science Workflow

3. Optimising Parameters Efficiently When I first started learning to apply machine learning, I would manually change the parameter inputs one by one and take a note of the results for my final output. Although this helped my understanding with the parameters, it was time consuming and inefficient. As time has gone on, I have … Read more

Reinforcement Learning with Exploration by Random Network Distillation

Ever since the seminal DQN work by DeepMind in 2013, in which an agent successfully learned to play Atari games at a level that is higher than an average human, Reinforcement Learning (RL) has been making headlines frequently. From Atari games to robotics, and the amazing defeat of world Go champion Lee-Sedol by AlphaGo, it … Read more

Introduction to ResNets

‘We need to go Deeper’ Meme, classical CNNs do not perform well as the depth of the network grows past a certain threshold. ResNets allow for the training of deeper networks. This Article is Based on Deep Residual Learning for Image Recognition from He et al. [2] (Microsoft Research): https://arxiv.org/pdf/1512.03385.pdf In 2012, Krizhevsky et al. … Read more

What is AI bias?

The AI bias trouble starts — but doesn’t end — with definition. “Bias” is an overloaded term which means remarkably different things in different contexts. Image: source. Here are just a few definitions of bias for your perusal. In statistics: Bias is the difference between the expected value of an estimator and its estimand. That’s awfully technical, so allow … Read more

How to beat Google’s AutoML – Hyperparameter Optimisation with Flair

This is a follow-up to our previous post about State of the Art Text Classification. We explain how to do hyperparameter optimisation using Flair to achieve optimal results in text classification outperforming Google’s AutoML Natural Language. What is hyperparameter optimisation and why can’t we simply do it by hand? Hyperparameter optimisation (or tuning) is the process … Read more

Python’s Collections Module — High-performance container data types.

Let us now hop over to the actual objective of this article which is to get to know about the Python’s Collection module. This is just an overview and for detailed explanations and examples please refer to the official Python documentation. Collections Module Collections is a built-in Python module that implements specialized container datatypes providing … Read more

Zen and The Art of Competing Against MBA’s

“I appreciate your ambition, but we’re looking for an MBA…” My senior manager smiled and indicated the topic was closed. Despite the fact I was effectively running our direct mail program in the absence of my recently departed boss, the door was closed and locked. I quit two months later. Within three years, I was promoted … Read more

Categories R Tags ExcerptFavorite

stringfix : new R package for string manipulation in a %>% way

I usually write around here in french and mainly report on French Hospitals data managment and the statistical tasks they imply. As today’s post is about a new package I have created, I’ll be writing in english. The package is called stringfix because it uses infix operators to manipulate character strings. This post is an … Read more

Categories R Tags ExcerptFavorite

Let’s call it tidysearch

R became 25 years old last year, and yet it’s only in relatively recent years that the language has really taken off with numerous conferences every year driven by a passionate and vibrant community of users. A large part of this has been driven by an ecosystem of R packages called the Tidyverse, which many … Read more

Categories R Tags ExcerptFavorite

Animating Data Transformations III – separate()

We recently have published two blogs on animating data transformations. The first, Animating Data Transformations, illustrated the spread() and gather() functions for going between wide and tall representations of data. The second, Animating Data Transformations II, covered the unnest() function for transforming a list column into a one value per row format. Today, we’re going to … Read more

Categories R Tags ExcerptFavorite

Le Monde puzzle [#1081]

A “he said-she said” Le Monde mathematical puzzle (again in the spirit of the famous Singapore high-school birthdate problem): Abigail and Corentin are both given a positive integer, a and b, such that a+b is either 19 or 20. They are asked one after the other and repeatedly if they are sure of the other’s … Read more

Categories R Tags ExcerptFavorite

Time Series of Price Anomaly Detection

Photo credit: Pixabay Anomaly detection detects data points in data that does not fit well with the rest of the data. Also known as outlier detection, anomaly detection is a data mining process used to determine types of anomalies found in a data set and to determine details about their occurrences. Automatic anomaly detection is critical in … Read more

Introduction to Data Analysis in RStudio

I’ve just started doing one of my favourite parts of my job – teaching a term of Data Analysis in R to about three hundred Bioscientists in their first year of higher education. My blog last week included a figure of their expected level of enjoyment: However,  I find they become very competent in both statistics … Read more

Categories R Tags ExcerptFavorite

Get Started with Support Vector Machines (SVM)

A hands-on tutorial with 4 examples on how to implement support vector machines for classification Photo by Randy Fath on Unsplash In a previous post, I introduced the theory of support vector machine (SVM). Now, I will further explain how SVMs work with fours different exercises! The first part will show how to perform classification with … Read more

AI: Why it Actually Makes a Difference

Simple Human Decisions are Hard for Computers: Think of all the decisions you make in a single day. From what you eat in the morning to how you get home from work at night. Many things you do right are like second nature by now, but they’re actually really hard to do. For instance, how … Read more

From prediction to decision making

Why your predictions might be falling short — opinion Photo by Mika Baumeister on Unsplash “There are a number of gaps between making a prediction and making a decision” Susan Athey [1] Correlation does not imply causation This is one of the most repeated phrases in statistical testing. It’s done so for a reason, I believe, and that … Read more

Information Flows in You — And Your Friends

Upper limits of predictability using social media information even if a person has deleted their social media presence You’ve had enough. Of baby pictures, of political rants by ‘friends’, even of cute cat pictures! Of fearing about your privacy and future career security. You decide to delete your accounts on Facebook, Twitter and Instagram. And you’re … Read more

Introducing Feast

Google’s New Feature Store for Machine Learning Applications Feature extraction and storage is one of the most important and often overlooked aspects of machine learning solutions. Features play a key role helping machine learning models to process and understand datasets for training and production. If you are building a single machine learning model, feature extraction … Read more

LondonR calling

It’s the New Year and we’re kicking off 2019 with our first LondonR! The meetup took place on the 15th of January, and we were delighted to have about 100 people in attendance. With excellent speakers lined-up and a free bar for networking, we started 2019 with a BANG! Please find all the presentations here. Dawid Kaledkowski, ClickMeeting – … Read more

Categories R Tags ExcerptFavorite

If wealth had anything to do with intelligence…

…the richest man on earth would have a fortune of no more than $43,000! If you don’t believe me read this post! Have you ever thought about the distribution of wealth as a function of some quality? Especially rich people pride themselves on extraordinary abilities, so that they somehow “deserve” their wealth. Now “abilities” is … Read more

Categories R Tags ExcerptFavorite

Simple Soybean Price Regression with Fast.ai Random Forests

As a student in the fast.ai Machine Learning for Coders MOOC¹ with an interest in agriculture the first application of the fast.ai random forest regression library that came to mind was prediction of soybean prices from historical data. Soybeans are a global commodity and their price-per-bushel has varied a great day over the past decade. … Read more

Level up your Data Visualizations with quick plot

K-Means plot for Spotify Data Visualization is an essential part of a Data Scientists workflow. It allows us to visually understand our problem, analyses our models, and allows us to provide deep meaningful understanding to communities. As Data Scientists, we always look new ways of improving our data science workflow. Why should I use this over … Read more

How to prepare data for NLP (text classification) with Keras and TensorFlow

In the past, I have written and taught quite a bit about image classification with Keras (e.g. here). Text classification isn’t too different in terms of using the Keras principles to train a sequential or function model. You can even use Convolutional Neural Nets (CNNs) for text classification. What is very different, however, is how to … Read more

Categories R Tags ExcerptFavorite

Onboard and Offboard Data Manipulation in Flexdashboard

Harrison Schramm is a Professional Statistician and Non-Resident Senior Fellow at the Center for Strategic and Budgetary Assessments. The Shiny set of tools, and, by extension, Flexdashboard, give professional analysts tools to rapidly put interactive versions of their work in the hands of clients. Frequently, an end user will interact with data by either uploading … Read more

Categories R Tags ExcerptFavorite

Geo Experiments (Part 1)

What Is It and How Will It Help You In Marketing This will be a three-part series discussing the topic of Geo Experimentation and its use in marketing. Part 1: What Is It and How Will It Help You In Marketing? Part 2: Understanding the mathematics behind Geo Experiments Part 3: Application of Geo Experiments … Read more

EEG Motor Imagery Classification in Node.js with BCI.js

Detecting brainwaves associated with imagined movements Brain-computer interfaces (BCIs) allow for the control of computers and other devices using only your thoughts. A popular way to achieve this is with motor imagery detected with electroencephalography (EEG). This tutorial will serve as an introduction to the detection and classification of motor imagery. I’ve broken it down … Read more

RStudio Server on Azure

RStudio Server Pro is now available on the Azure Marketplace, the company announced on the RStudio Blog earlier this month. This means you can launch RStudio Server Pro on an virtual machine with the memory, disk, and CPU configuration of your choice, and pay by the minute for the VM instance plus a the RStudio … Read more

Categories R Tags ExcerptFavorite

Image Dithering in R

This January I played the most intriguing computer game I’ve played in ages: The Return of the Obra Dinn. Except for being a masterpiece of murder-mystery storytelling it also has the most unique art-style as it only uses black and white pixels. To pull this off Obra Dinn makes use of image dithering: the arrangement … Read more

Categories R Tags ExcerptFavorite

Mass Shootings and Terrorism

Our obsession with small probabilities and rare events I started considering this article last month around the anniversary of the death of my father. Even with Christmas, the weeks leading up to and after the holiday are always a little somber. Thoughts of death and mortality intermingle with my children’s innocent excitement for Santa’s arrival and … Read more

Getting Creative with Algorithms

How to stop being mechanical and keep your innovative edge always sharp in data science Get Creative with Algorithms In April 1972, New York times published an article “Workers Increasingly Rebel Against Boredom on Assembly Line”. Though car industry was considered very innovative, the type of work was very mechanical and repetitive. The reason was that … Read more