Clustering Pollock

I am not very passionate about art: when I visit a museum, I am a casual tourist that walks around observing paintings and sculptures, trying to learn as much as possible, unfortunately without appreciating the depth that is behind an art piece. A few years ago I was in a cocktail party organized inside the … Read more

PDSwR2 Free Excerpt and New Discount Code

Manning has a new discount code and a free excerpt of our book Practical Data Science with R, 2nd Edition: here. This section is elementary, but things really pick up speed as later on (also available in a paid preview). Related To leave a comment for the author, please follow the link and comment on … Read more

Categories R Tags ExcerptFavorite

Online color apps at

The web page has been relaunched, hosting three online color apps based on the HCL (Hue-Chroma-Luminance) color model: a palette constructor, a color vision deficiency emulator, and a color picker. HCL wizard: Somewhere over the rainbow The web page had originally been started to accompany the manuscript: “Somewhere over the Rainbow: How to … Read more

Categories R Tags ExcerptFavorite

A Shiny app for your perfect circle

Abstract: The perfect circle is a shiny app providing a user friendly interface to the algorithm described in the previous blog post Judging Freehand Circle Drawing Competitions. The app allows one to score freehand circles directly from the mobile by uploading photos of them them to a shiny server. An R package “perfectcircle” contains the … Read more

Categories R Tags ExcerptFavorite

Basketball Analytics: Predicting Win Shares

Analysis Objective: Can we predict individual win shares of NBA players using other basketball metrics? The data used for this analysis is from the 2016–17 and 2017–2018 NBA Season, using Basketball-Reference. Essentially, I used data from the 2016–2017 NBA season to create our model and stats from the most recent season to predict win shares. … Read more

AI, Machine Learning and Data Science Roundup: February 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News ONNX, the open interchange format for AI … Read more

Categories R Tags ExcerptFavorite

Backpropagation for people who are afraid of math

Backpropagation is one of the most important concepts in machine learning. There are many online resources that explain the intuition behind this algorithm (IMO the best of these is the backpropagation lecture in the Stanford cs231n video lectures. Another very good source, is this), but getting from the intuition to practice, can be (put gently) … Read more

How to Automatically Import Your Favorite Libraries into IPython or a Jupyter Notebook

No more typing “import pandas as pd” 10 times a day If you often use interactive IPython sessions or Jupyter Notebooks and you’re getting tired of importing the same libraries over and over, try this: Navigate to ~/.ipython/profile_default Create a folder called startup if it’s not already there Add a new Python file called Put … Read more

Kaggle Days 2019 in Paris

Kaggle Days are the first global offline events series for Data Scientists and Kagglers. Such a great event provides an opportunity to both create and build the data science community. First-ever event called Kaggle Days in Warsaw succeeded in 2018. Over 100 participants learned from Kaggle Grandmasters in lively presentations and workshops. For many, the … Read more

Boosting and AdaBoost clearly explained

II. Introduction to Boosting a. Concept The intuition described above can be described as such : Train the model h1 on the whole set Train the model h2 with exaggerated data on the regions in which h1 performs poorly Train the model h3 with exaggerated data on the regions in which h1 ≠ h2 … Instead of … Read more

Neural Quantum States

One of the most challenging problems in modern theoretical physics is the so-called many-body problem. Typical many-body systems are composed of a large number of strongly interacting particles. Few such systems are amenable to exact mathematical treatment and numerical techniques are needed to make progress. However, since the resources required to specify a generic many-body … Read more

How Artificial Intelligence (AI) is Adding New Horizons to Cybersecurity Solutions?

AI For Cybersecurity McCarthy and Minsky described Artificial Intelligence as a task performed by a machine, which if, performed by a human instead will require a great deal of intelligence. A collective data of all the behavioral qualities are required to make the precise decision. These behavioral qualities are planning, problem-solving, reasoning and manipulation. Massive … Read more

How to Find Your Partner and Grow Your Relationship

Valentine’s Day is arriving. Whether or not you will share this day with someone, you probably thought about how your ideal partner and relationship should look like. If you want to know what data (source in footnote) says about “how to find your partner and grow your relationship?”, we can provide you some helpful insights. … Read more

Happy Valentines day by Nerds

Real nerds on Valentines day graph hearts instead of drawing them. My drawing skills are not what I like them to be, my R skills are though! Therefore, let’s draw a heart in R instead on paper! dat<- data.frame(t=seq(0, 2*pi, by=0.01) )xhrt<- function(t) 16*sin(t)^3yhrt<- function(t) 13*cos(t)-5*cos(2*t)-2*cos(3*t)-cos(4*t)dat$y=yhrt(dat$t)dat$x=xhrt(dat$t) with(dat, plot(x,y, type=”l”, axes=FALSE, frame.plot=FALSE, labels = FALSE, xlab … Read more

Categories R Tags ExcerptFavorite

Roses Are Red, Violets Are Blue, Statistics Can Be Romantic Too!

It’s Valentine’s day, making this the most romantic time of the year. But actually, already 2018 was a year full of love here at STATWORX: many of my STATWORX colleagues got engaged. And so we began to wonder – some fearful, some hopeful – who will be next?Therefore, today we’re going to tackle this question … Read more

Categories R Tags ExcerptFavorite

The Future and Philosophy of Machine Consciousness

Can a machine think Short answer: Sure, why not. Long answer: It’s complicated. In 1950, Alan Turing, known as the father of modern computing and artificial intelligence, wondered the same thing. In an attempt to answer, he coined the famous Turing test. Put simply: “The Turing test, developed by Alan Turing in 1950, is a test … Read more

NLP and Sarcasm: What’s the Deal?

Sarcasm is incredibly hard for chatbots and NLP applications. Here, we’ll take a look at why. Sarcasm or Not? As humans, you and I can look at these two chats and determine that in the first the person appears to be sincere while in the other comes off sarcastic and cold simply due to the way it … Read more

Do you know what I mean?

Why thinking too much is just as bad as thinking too little. The cognitive trap Imagine you’re standing in front of your computer, much like you are now, and you’re prompted to play a game. “You are playing this game with thousands of people all over the world”, you’re told. Intrigued as to what this game might be, … Read more

Teaching an AI to Write Pop Music

Because who has time to write it themselves anymore? I was never any good at writing lyrics. In my high-school ska band, I wrote the horn parts and some other instrumentals and in my college a capella group, I was always just arranging covers of songs that that had already been written. There was rarely … Read more

Generate multiple language version plots

The use case is to create the same plot in different languages. I used this technique for Wikipedia plots. We are going to build a list containing all translations, we will then loop over each language, generating and saving the plot. # Mauna Loa atmospheric CO2 change # multi language plot for Wikipedia # Required … Read more

Categories R Tags ExcerptFavorite

New discretization method: Recursive information gain ratio maximization

Hello everyone, I’m happy to share a new method to discretize variables I was working on for the last few months: Recursive discretization using gain ratio for multi-class variable tl;dr: funModeling::discretize_rgr(input, target) The problem: Need to convert a numeric variable into one categorical, considering the relationship with the target variable. How do we choose the … Read more

Categories R Tags ExcerptFavorite

Introduction to gradient boosting on decision trees with Catboost

Today I would like to share my experience with open source machine learning library, based on gradient boosting on decision trees, developed by Russian search engine company — Yandex. Github profile according to the 12th of February Library is released under Apache license and offered as a free service. ‘Cat’, by the way, is a shortening of ‘category’, … Read more

Deeper into DCGANs

My last post about DCGANs was primarily focused on the idea of replacing fully connected layers with convolutions and implementing upsampling convolutions with Keras. This article will further explain the architectural guidelines mentioned by Raford et al. [1], as well as additional topics mentioned in the paper such as Unsupervised Feature Learning with GANs, GAN … Read more

FastAI Image Classification

Creating model and initial training The FastAI library is designed to let you create models (FastAi calls them learners) with only a few lines of code. They provide a method called create_cnn, which can be used to create a convolutional neural network. The method needs two arguments, the data and the architecture, but also supports many … Read more

Finding Your Flavor of Data Science Career

Three Approaches to Guide You in Choosing Your Path Does your concept of a Data Scientist look something like a fictional super hero, possessing such a broad and deep skillset that it is simply humanly impossible? And yet, does that unrealistic image make you sometimes feel like a data science imposter? (It’s really worth reading that … Read more

Building fully custom machine learning models on AWS SageMaker: a practical guide

AWS SageMaker is a cloud machine learning SDK designed for speed of iteration, and it’s one of the fastest-growing toys in the Amazon AWS ecosystem. Since launching in late 2017 SageMaker’s growth has been remarkable — last year’s AWS re:Invent stated that there are now over 10,000 companies using SageMaker to standardize their machine learning processes. SageMaker … Read more

Try out RStudio Connect on Your Desktop for Free

Have you heard of RStudio Connect, but do not know where to start? Maybe you aretrying to show your manager how Shiny applications can be deployed inproduction, or convince a DevOps engineer that R can fit into her existingtooling. Perhaps you want to explore the functionality of RStudio’s Professionalproducts to see if they fit the … Read more

Categories R Tags ExcerptFavorite

Learning from Graph data using Keras and Tensorflow

Motivation : There is a lot of data out there that can be represented in the form of a graph in real-world applications like in Citation Networks, Social Networks (Followers graph, Friends network, … ), Biological Networks or Telecommunications. Using Graph extracted features can boost the performance of predictive models by relying of information flow between neighboring nodes. … Read more

Performing multidimensional matrix operations using numpy’s broadcasting

Numpy’s broadcasting feature can be somewhat confusing for new users of this library, but as it allows for very clean, elegant and FUN coding. It is definitely worth the effort of getting used to. In this short article, I wanted to show a nice implementation of broadcasting to save some for loops and even computation … Read more

Machine Learning: Regularization and Over-fitting Simply Explained I am going to give intuitive understanding of Regularization method in as simple words as possible. Firstly, I will discuss some basic ideas, so if you think you are already families with those, feel free to move ahead. A Liner Model A liner model is the one that follows a straight line in the prediction … Read more

Limitations of Deep Learning in AI Research

Related articles: References: [1] Deep Learning Review| Yann LeCun, Yoshua Bengio, Geoffrey Hinton | [2] 30 Amazing Applications of Deep Learning | Yaron Hadad | [3] Introduction to Deep Learning | Bhiksha Raj | Carnegie Mellon University | [4] Understanding LSTM Networks | Christopher Olah | [5] Memory Augmented Neural-Networks | … Read more

Winning Blackjack using Machine Learning

Genetic Algorithm Configurations One of the unusual aspects to working with a GA is that it has so many settings that need to be configured. The following items can be configured for a run: Population Size Selection Method Mutation Rate and Impact Termination Conditions Varying each of these gives different results. The best way to … Read more

Predicting presence of Heart Diseases using Machine Learning

Machine Learning is used across many spheres around the world. The healthcare industry is no exception. Machine Learning can play an essential role in predicting presence/absence of Locomotor disorders, Heart diseases and more. Such information, if predicted well in advance, can provide important insights to doctors who can then adapt their diagnosis and treatment per … Read more

KubernetesExecutor for Airflow

Scale Airflow natively on Kubernetes In the 1.10 release, Airflow introduced a new executor to run workers at scale: the Kubernetes executor. In this article we’ll look into: What is Airflow and which problem it solves The Kubernetes executor and how it compares to the Celery executor An example deployment on minikube TL;DR Airflow has … Read more

What makes an active timebank?

In this article, I explain my process for collecting data, feature engineering, and using linear regression to identify predictors of active timebanks. But first… What’s a timebank? Without fail, when I say I am working on a project about timebanks, the first question someone asks is, “What’s a timebank?” As explained on the TimeBanks USA … Read more

Around the world in 90.414 kilometers

This article compares several search algorithms applied to a Traveling Salesman Problem of 85 cities. The goal is to show intuition behind some well known and effective search algorithms to people new to the subject of optimization. I chose to build less complex algorithms and attempted to describe them as understandable as possible. If you … Read more

Deep Compression: Optimization Techniques for Inference & Efficiency

As technology cozies up to the physical limits of Moore’s law, computation is increasingly limited by heat dissipation rather than the number of transistors that can be packed onto a given area of silicon. Modern chips already routinely idle whole sections of their area forming what’s referred to as “dark silicon,” referring to design constraining … Read more

Predict malignancy in breast cancer tumors with your own neural network and the Wisconsin Dataset

In the final part of this series, we predict malignancy in breast cancer tumors using the network we coded from scratch. In part 1 of this series, we understood in depth the architecture of our neural network. In part 2, we built it using Python. We also understood in depth back-propagation and the gradient descent optimization … Read more

In memory of Monty Hall

Some find it a common knowledge, some find it weird. As a professor I usually teach about Monty Hall problem and year after year I see puzzling looks from students regarding the solution. Image taken from*388/mon+tyhall.jpg The original and most simple scenario of the Monty Hall problem is this: You are in a prize … Read more

Categories R Tags ExcerptFavorite

Quantum rush — New light on pushing AI & Machine learning boundaries

Quantum computers, It’s a successor to the classical -digital computer, Which adds an extra dimension to existing computers which lacks in solving optimization problems & doing things in parallel. Quantum computers work on the principle of quantum mechanics Quantum mechanics is the body of scientific laws that describe the motion and interaction of photons, electrons, … Read more

Towards Fast Neural Style Transfer

The seminal paper of Neural Style Transfer presented by Gatys et al. [1] demonstrates a remarkable characteristic of Deep Convolutional Neural Networks. The sequential representations learned from layers of parametric convolutions can be separated into ‘content’ and ‘style’. The fundamental idea behind Style Transfer is that pre-trained DCNNs on tasks such as ImageNet classification can … Read more

Embeddings-free Deep Learning NLP model

Word embeddings (e.g. word2vec, GloVe) are introduced several years ago and changing NLP tasks fundamentally. Having embeddings, we do not need one-hot encoding which causing very high-dimensional feature in most of the NLP tasks. We can use 300 dimensions to represent over 1 million words. Different kinds of embeddings such as character embedding, sentence embeddings … Read more

How to make GDPR and ONA work together?

Searching online for ONA (Organizational Network Analysis) gets you various definitions, including those with curly math symbols and graph theory. However, in a nutshell, ONA is about who communicates to who in an organization. Although ONA is regarded as one of late buzzwords, it can be traced way back at least in the 80’s. In … Read more

How to “farm” Kaggle in the right way

This article describes the advice and approaches on how to effectively use Kaggle as a competition platform to improve practitioner skills in Data Science with maximum efficiency and profitability. farm (farming) — gaming tactic where a player performs repetitive actions to gain experience, points or some form of in-game currency. Description These methods helped me with getting … Read more

So, what is AI really?

One of the topics that is totally hyped at the moment is obviously Artificial Intelligence or AI for short. There are many self-proclaimed experts running around trying to sell you the stuff they have been doing all along under this new label. When you ask them what AI means you will normally get some convoluted … Read more

Categories R Tags ExcerptFavorite

Where should you live in San Francisco?

Moving to San Francisco would be awesome except for the cost of rent. I use Zillow data in this project to find cheap/high-value rental opportunities in San Francisco. The following analysis was scripted in R, and analysis is neighborhood-based. Credits to Ken Steif and Keith Hassel for their great tutorial on exploring home prices in … Read more

Cultural overfitting and underfitting. Or why the “Netflix Culture” won’t work in your company.

A couple of weeks back, I gave a talk in the SFELC (San Francisco Engineering Leadership Conference). As I was preparing the slides for the talk I reflected on something pretty interesting: I have been managing technical teams for over 25 years. I have also been giving public talks for about the same time. However, … Read more

Holy Grail for Bias-Variance tradeoff, Overfitting & Underfitting

There are two more important terms related to bias and variance that we must understand now- Overfitting and Underfitting. I am again going to use a real life analogy here. I have referred to the blog of Machine learning@Berkeley for this example. There is a very delicate balancing act when machine learning algorithms try to … Read more