Python’s Collections Module — High-performance container data types.

Let us now hop over to the actual objective of this article which is to get to know about the Python’s Collection module. This is just an overview and for detailed explanations and examples please refer to the official Python documentation. Collections Module Collections is a built-in Python module that implements specialized container datatypes providing … Read morePython’s Collections Module — High-performance container data types.

Time Series of Price Anomaly Detection

Photo credit: Pixabay Anomaly detection detects data points in data that does not fit well with the rest of the data. Also known as outlier detection, anomaly detection is a data mining process used to determine types of anomalies found in a data set and to determine details about their occurrences. Automatic anomaly detection is critical in … Read moreTime Series of Price Anomaly Detection

Tel Aviv artists: build yourself a mapping app

tl;dr — I went from experimenting with mapping libraries to building a reusable mapping app. This is how I did it and how you can re-use it. Intro As a data scientist, most of my work stays behind the scenes. When training models, the farthest I reach in exposure is deploying a simple flask web-app as REST … Read moreTel Aviv artists: build yourself a mapping app

Get Started with Support Vector Machines (SVM)

A hands-on tutorial with 4 examples on how to implement support vector machines for classification Photo by Randy Fath on Unsplash In a previous post, I introduced the theory of support vector machine (SVM). Now, I will further explain how SVMs work with fours different exercises! The first part will show how to perform classification with … Read moreGet Started with Support Vector Machines (SVM)

AI Thinks Rachel Maddow Is A Man (and this is a problem for all of us)

A data-driven review of AI bias in production systems In 2011, IBM Watson made headlines when it beat Jeopardy legends Ken Jennings and Brad Rutter in a $1M match. In Final Jeopardy, Jennings admitted defeat by writing “I, for one, welcome our new computer overlords.” That was in 2011, when a good score in a … Read moreAI Thinks Rachel Maddow Is A Man (and this is a problem for all of us)

From prediction to decision making

Why your predictions might be falling short — opinion Photo by Mika Baumeister on Unsplash “There are a number of gaps between making a prediction and making a decision” Susan Athey [1] Correlation does not imply causation This is one of the most repeated phrases in statistical testing. It’s done so for a reason, I believe, and that … Read moreFrom prediction to decision making

Information Flows in You — And Your Friends

Upper limits of predictability using social media information even if a person has deleted their social media presence You’ve had enough. Of baby pictures, of political rants by ‘friends’, even of cute cat pictures! Of fearing about your privacy and future career security. You decide to delete your accounts on Facebook, Twitter and Instagram. And you’re … Read moreInformation Flows in You — And Your Friends

Introducing Feast

Google’s New Feature Store for Machine Learning Applications Feature extraction and storage is one of the most important and often overlooked aspects of machine learning solutions. Features play a key role helping machine learning models to process and understand datasets for training and production. If you are building a single machine learning model, feature extraction … Read moreIntroducing Feast

Simple Soybean Price Regression with Fast.ai Random Forests

As a student in the fast.ai Machine Learning for Coders MOOC¹ with an interest in agriculture the first application of the fast.ai random forest regression library that came to mind was prediction of soybean prices from historical data. Soybeans are a global commodity and their price-per-bushel has varied a great day over the past decade. … Read moreSimple Soybean Price Regression with Fast.ai Random Forests

Level up your Data Visualizations with quick plot

K-Means plot for Spotify Data Visualization is an essential part of a Data Scientists workflow. It allows us to visually understand our problem, analyses our models, and allows us to provide deep meaningful understanding to communities. As Data Scientists, we always look new ways of improving our data science workflow. Why should I use this over … Read moreLevel up your Data Visualizations with quick plot

Playlist Classification on Spotify using KNN and Naive Bayes Classification

https://unsplash.com/@usefulcollective One day, I thought it would be cool if Spotify helped me pick a playlist when I like a song. The idea is to touch on the plus button when my phone is locked and Spotify add it into one of my playlists rather than library so that I don’t go into the app … Read morePlaylist Classification on Spotify using KNN and Naive Bayes Classification

EEG Motor Imagery Classification in Node.js with BCI.js

Detecting brainwaves associated with imagined movements Brain-computer interfaces (BCIs) allow for the control of computers and other devices using only your thoughts. A popular way to achieve this is with motor imagery detected with electroencephalography (EEG). This tutorial will serve as an introduction to the detection and classification of motor imagery. I’ve broken it down … Read moreEEG Motor Imagery Classification in Node.js with BCI.js

Mass Shootings and Terrorism

Our obsession with small probabilities and rare events I started considering this article last month around the anniversary of the death of my father. Even with Christmas, the weeks leading up to and after the holiday are always a little somber. Thoughts of death and mortality intermingle with my children’s innocent excitement for Santa’s arrival and … Read moreMass Shootings and Terrorism

Getting Creative with Algorithms

How to stop being mechanical and keep your innovative edge always sharp in data science Get Creative with Algorithms In April 1972, New York times published an article “Workers Increasingly Rebel Against Boredom on Assembly Line”. Though car industry was considered very innovative, the type of work was very mechanical and repetitive. The reason was that … Read moreGetting Creative with Algorithms

Graph Databases. What’s the Big Deal?

Continuing the analysis on semantics and data science, it’s time to talk about graph databases and what they have to offer us. Introduction Should we invest our precious time in learning a new way on ingesting, storing and analyzing data? With the touch on mathematics on graphs? For me the answer was unsure when I started … Read moreGraph Databases. What’s the Big Deal?

Experiment sample size calculation using power analysis

If you use experiments to evaluate a product feature, and I hope you do, the question of the minimum required sample size to get statistically significant results is often brought up. In this article, we explain how we apply mathematical statistics and power analysis to calculate AB testing sample size. Before launching an experiment, it … Read moreExperiment sample size calculation using power analysis

Is the Difference in Work Hours the Real Reason for the Gender Wage Gap? [Interactive Infographic]

Every year, the Department of Labor issues a report on the pay gap between women and men. Women earn a median of $30,0001 per year, while men earn $40,000 per year. In other words, working women earn 75% of what men earn. But this gap doesn’t take into account the fact that on average, men … Read moreIs the Difference in Work Hours the Real Reason for the Gender Wage Gap? [Interactive Infographic]

Scrape Reddit data using Python and Google BigQuery

Let’s get started with data collection from Reddit Reddit API: While web scraping is one among the famous(or infamous!) ways of collecting data from websites, a lot of websites offer APIs to access the public data that they host on their website. This is to avoid unnecessary traffic that scraping bots create, often crashing their websites … Read moreScrape Reddit data using Python and Google BigQuery

Creating AI for GameBoy Part 1: Coding a Controller

Released in 2003, Fire Emblem, The Blazing Sword is a strategy game so successful that its characters are featured in Super Smash Bros and the 15th installment of the series will be released in early 2019. The game is played by selecting characters (aka units), making decisions on where to move them, and then deciding … Read moreCreating AI for GameBoy Part 1: Coding a Controller

Why you should care about Docker?

The Dockerfile — where it all begins Docker is a powerful tool, but its power is harnessed through the use of things called Dockerfiles (as mentioned above). A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated … Read moreWhy you should care about Docker?

Python for Data Science: From Scratch(Part II)

2.2 Pandas: Pandas is an open source library for Python that was particularly created for data manipulation and analysis of huge chunks of data. Pandas offers robust data structures and functions for manipulating data easily. Photo by Debbie Molle on Unsplash But wait, that’s what lists, dict and Numpy’s ndarrays could do too, So why Pandas? … Read morePython for Data Science: From Scratch(Part II)

Getting Started with Recommender Systems and TensorRec

System Overview TensorRec is a Python package for building recommender systems. A TensorRec recommender system consumes three pieces of input data: user features, item features, and interactions. Based on the user/item features, the system will predict which items to recommend. The interactions are used when fitting the model: predictions are compared to the interactions and … Read moreGetting Started with Recommender Systems and TensorRec

Introduction to Logistic Regression

Introduction In this blog, we will discuss the basic concepts of Logistic Regression and what kind of problems can it help us to solve. GIF: University of Toronto Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Some of the examples of classification problems are Email spam or not … Read moreIntroduction to Logistic Regression

My Machine Learning Journey and First Kaggle Competition

Let the Game Begin… All the time that i have spent on Kaggle I have seen the competitions but never dared to apply ones. I was prejudiced about the competition that if you want to apply a competition you should be at least 8 of 10. Which is a nonsense idea? My first application to a … Read moreMy Machine Learning Journey and First Kaggle Competition

Predicting Customer Churn with Spark

For many companies, churn is a major concern. It is natural that some people stop using the service, but if this proportion becomes too large it can hinder growth, regardless of revenue sources (ad sales, subscriptions or a mix of both). With that in mind, the ability for firms to predict churn by identifying customers … Read morePredicting Customer Churn with Spark

A Comprehensive List of Handy R Packages

Stuff I have found super useful for work and life Gang SuBlockedUnblockFollowFollowing Jan 21 Whether Python or R is more superior for Data Science / Machine Learning is an open debate. Despite of its quirkiness and not-so-true-but-generally-perceived slowness, R really shines in exploratory data analysis (EDA), in terms of data wrangling, visualizations, dashboards, myriad choices of … Read moreA Comprehensive List of Handy R Packages

Detecting malaria using deep learning.

Set-up First, create a folder/directory to store the project. Then, create a directory inside that called malaria, download the dataset into the directory and open it up. $ cd whatever-you-named-your-directory$ mkdir malaria$ cd malaria$ wget https://ceb.nlm.nih.gov/proj/malaria/cell_images.zip$ unzip cell_images.zip We’re going to switch back to our parent directory and make another directory called cnn where we … Read moreDetecting malaria using deep learning.

QuickBlarks

The next chart, generated from this “R” code, difficulty %>%group_by(block.bin) %>%summarize(sum.diff.delta = sum(diff.delta), na.rm=T) %>%ggplot(aes(x=block.bin, y=sum.diff.delta)) +geom_line() shows the accumulated sum of the diff.delta values. You can clearly see the battle waged by the pre-byzantium difficulty bomb. Up, down, up, down. The fact that the difficulty hovers around a target is exactly what the difficulty … Read moreQuickBlarks

Quality over quantity: building the perfect data science project

credit: https://www.housetohouse.com/diamonds-in-the-rough/ In startup lingo, a “vanity metric” is a number that companies keep track of in order to convince the world — and sometimes themselves — that they’re doing better than they actually are. To pick on a prominent example, about eight years ago Twitter announced that 200 million tweets per day were being sent on its app. … Read moreQuality over quantity: building the perfect data science project

3 Methods for Parallelization in Spark

Source: geralt on pixabay Scaling data science tasks for speed Spark is great for scaling up data science tasks and workloads! As long as you’re using Spark data frames and libraries that operate on these data structures, you can scale to massive data sets that distribute across a cluster. However, there are some scenarios where libraries may … Read more3 Methods for Parallelization in Spark

Artificial Intelligence is just a Tool

There are scenarios in which AI applications deliver better results, but no general superiority can be derived from this. More importantly, IT managers have to check very carefully what they want to use for each project. To answer this question, decision-makers must consider AI in connection with other concepts. AI does not replace, AI supplements … Read moreArtificial Intelligence is just a Tool

Building an interactive computer vision demo in a few hours on AWS DeepLens

A couple months ago, I posted an article on explaining my job as a technology consultant to my daughter’s preschool class of 3-year-olds. One of the more understandable parts of what I’m doing these days is working on computer vision problems. People (even the toddler crowd) inherently understand the idea of recognizing what is in … Read moreBuilding an interactive computer vision demo in a few hours on AWS DeepLens

Counting No. of Parameters in Deep Learning Models by Hand

5 simple examples to count parameters in FFNN, RNN and CNN models Counting the number of trainable parameters of deep learning models is considered too trivial, because your code can already do this for you. But I’d like to keep my notes here for us to refer to once in a while. Here are the models … Read moreCounting No. of Parameters in Deep Learning Models by Hand

Think your Data Different

Case study Taboola’s content recommender system gathers lots of data, some of which can be represented in a graphical manner. Let’s inspect one type of data as a case study for using node2vec. Taboola recommends articles in a widget shown in publishers’ websites: Each article has named entities — the entities described by the title. For example, … Read moreThink your Data Different

Seamlessly Integrated Deep Learning Environment with Terraform, Google cloud, Gitlab and Docker

When you are starting with some serious deep learning projects, you usually have the problem that you need a proper GPU. Buying reasonable workstations which are suitable for deep learning workloads can easily become very expensive. Luckily there are some options in the cloud. One that I tried out was using the wonderful Google Compute … Read moreSeamlessly Integrated Deep Learning Environment with Terraform, Google cloud, Gitlab and Docker

PU Learning

Dealing with a negative class hidden in unlabelled data PU Learning — finding a needle in a haystack A challenge that keeps presenting itself at work is one of not having a labelled negative class in the context of needing to train a binary classifier. Typically, the issue is paired with horribly imbalanced data sets and pressed for … Read morePU Learning

A.I. Demilitarisation Won’t Happen

Artificial Intelligence is already being integrated in next-generation defence systems, and its demilitarisation is highly unlikely. Restricting it from military use is probably anyway not the smartest strategy to pursue. Photo by Rostislav Kralik on Public Domain Pictures This year’s World Economic Forum’s annual meeting is about to start. While browsing through this year’s agenda, I … Read moreA.I. Demilitarisation Won’t Happen

Sentiment of the Union: Analyzing Presidential State of the Union Addresses with Python

Analyzing Presidential State of the Union Addresses using Sentiment Analysis and Python tools Photo from 271277 on Pixabay In Article II, Section 3 of the Constitution, the President of the United States is directed to “give to the Congress information of the State of the Union, and recommend their consideration such measures as he shall judge necessary … Read moreSentiment of the Union: Analyzing Presidential State of the Union Addresses with Python

Key Steps for Building an Effective AI Organization

Recently, I got fascinated by the impact of Artificial Intelligence on any business from any sector (tech, banking, manufacturing, etc.) This led me to explore the subject further while trying to understand what a corporation should do to transform its processes using AI. In this article, I would love to summarize my observations into a … Read moreKey Steps for Building an Effective AI Organization

Visualizing Principal Component Analysis with Matrix Transforms

A guide to understanding eigenvalues, eigenvectors, and principal components Principal Component Analysis (PCA) is a method of decomposing data into correlated components by identifying eigenvalues and eigenvectors. The following is meant to help visualize what these different values represent and how they’re calculated. First I’ll show how matrices can be used to transform data, then … Read moreVisualizing Principal Component Analysis with Matrix Transforms

3 steps to a clean dataset with Pandas

Data Science isn’t all fancy charts! It’s a set of tools that we use to clean, explore, and model data in order to extract real-world, meaningful information. Getting real-world information first requires real-world data — that real-world data is dirty. Think of how companies big and small would collect their data. It’s usually done by a non-expert; … Read more3 steps to a clean dataset with Pandas

The Poisson Distribution and Poisson Process Explained

Waiting Time An intriguing part of a Poisson process involves figuring out how long we have to wait until the next event (this is sometimes called the interarrival time). Consider the situation: meteors appear once every 12 minutes on average. If we arrive at a random time, how long can we expect to wait to … Read moreThe Poisson Distribution and Poisson Process Explained

The basics of deploying Logstash pipelines to Kubernetes

Now that we’ve walked through the config of our pipeline we can move onto Kubernetes. What we have to do first of all is create a ConfigMap. A ConfigMap allows us to store key-value pairs of configuration data that is accessible by our Pods. So we could have a ConfigMap that would store a directory … Read moreThe basics of deploying Logstash pipelines to Kubernetes

How to visualize convolutional features in 40 lines of code

Feature visualizations Below you find feature visualizations for filters in several layers of a VGG-16 network. While looking at them, I would like you to observe how the complexity of the generated patterns increases the deeper we get into the network. Layer 7: Conv2d(64, 128) filters 12, 16, 86, 110 (top left to bottom right, … Read moreHow to visualize convolutional features in 40 lines of code