Simple and multiple linear regression with Python

Linear regression is a linear approach to model the relationship between a dependent variable (target variable) and one (simple regression) or more (multiple regression) independent variables. Python has different libraries that allow us to plot a data set and analyze the relation between variables. In case we observe a linear trend, we can calculate the … Read more Simple and multiple linear regression with Python

How to use data version control (dvc) in a machine learning project

When working in a productive machine learning project you probably deal with a tone of data and several models. To keep track of which models were trained with which data, you should use a system to version the data, similar to versioning and tracking your code. One way to solve this problem is dvc (Data … Read more How to use data version control (dvc) in a machine learning project

Foundations of ML: Parameterized Functions

A soft introduction to parameterized functions, a foundational topic in both machine learning and statistics, explained through small programming examples. By the end of this article, I would like the following sentence to make sense: My favorite visualization of parameterized functions comes from the geometric interpretation of higher-order functions. To accomplish that goal, I will … Read more Foundations of ML: Parameterized Functions

Using Deep Learning for Image Analogies

Note: the notebook for this article can be found at: https://colab.research.google.com/github/tomer1amit/tomer1amit.github.io/blob/master/ImageAnalogies.ipynb I am going to answer the following question: A dog is to wolf as a cat is to ___ ? by using a deep convolutional neural network trained to classify photos. A dog is to wolf as a cat is to ___ ? I … Read more Using Deep Learning for Image Analogies

Introducing tsviz, interactive time series visualization in R Studio

Why and how we developed an R Studio add-in tsviz logo We all know. R is great for data visualization. Hadley Wickham’s ggplot2 provides a simple yet elegant syntax to create very effective charts. Moreover, in the unlikely event that ggplot2 does not support some exotic chart, R offers lots of valid alternatives, like lattice. … Read more Introducing tsviz, interactive time series visualization in R Studio

Reinforcement Learning — Tile Coding Implementation

Step by step explanation of tile coding We have come so far and extended our reinforcement learning theories into continuous space. If you would like to go further, you need to know tile coding, which is probably the most practical and computationally efficient tools being used in continuous space, reinforcement learning problems. Essentially, tile coding … Read more Reinforcement Learning — Tile Coding Implementation

Introducing Distython. New Python package implementing novel distance metrics

Distython with Scikit-Learn Those metrics were designed in such a way that they can be directly applied to the Nearest Neighbors or Dbscan class in Scikit-Learn. You can use it for your personal projects together with Scikit-Learn classes as long as they provide an interface to call a custom metric function. Please note that if … Read more Introducing Distython. New Python package implementing novel distance metrics

The Opioid Crisis in Data

Exploratory Analysis of opioid overdose deaths using Medicare and Medicaid prescription data It’s no secret that the United States is in the midst of an opioid crisis. What can be debated is the reason for this crisis. Coming from a pharmacy background, I was interested in prescriptions. Specifically, if I could find a correlation between … Read more The Opioid Crisis in Data

Solving Fake News with AI, Blockchain and a global Community

An immutable registry of labeled ‘fake news’ and other classes of misleading content, accessible by all — humans and machines — to help quantify the problem and raise global awareness. Photo by Bank Phrom on Unsplash Fake news is a major problem in the era of AI. Although, similar dysfunctional scenarios involving misinformation and propaganda … Read more Solving Fake News with AI, Blockchain and a global Community

Avoiding Side effects and Reward Hacking in Artificial Intelligence

A short summary of an excerpt of Concrete Problems in AI Safety I decided to take a step back, again. This time to the paper written about AI Safety published on the OpenAI page on June 2016 called Concrete Problems in AI Safety. It is now July the 26th at the time of writing. However … Read more Avoiding Side effects and Reward Hacking in Artificial Intelligence

Modeling with Reinforcement Learning

Concepts and use cases Reinforcement learning involves figuring out what to do in which situation. This can be tricky. Only a tiny fraction of all possible situations might have been experienced. If that. Even in a familiar situation, a tried-and-true action might, in a particular instance, produce an unexpected result. The environment might throw a … Read more Modeling with Reinforcement Learning

It’s time to think more about the pipeline

— An introduction of the TPOT tool TPOT (Tree-Based Pipeline Optimization Tool) It is always exciting to see that data science is consistently concerned with improving algorithms and techniques to analyze the booming data to extract patterns. These patterns can create new insights or become new decision-making methods, and these patterns are otherwise hard to … Read more It’s time to think more about the pipeline

Sweet Home Chicago: Examining Crime in the City of Chicago

A data driven approach. by Tyler Doll, Jeff Greene, Gazi Morshed This paper was originally submitted for my database class in college. We found it interesting and we spent a good amount of time on it so I wanted to share it here. It has been edited for this format. Introduction The city of Chicago … Read more Sweet Home Chicago: Examining Crime in the City of Chicago

Convolution vs. Cross-Correlation

This post will overview the difference between convolution and cross-correlation. This post is the only resource online that contains a step-by-step worked example of both convolution and cross-correlation together (as far as I know — and trust me, I did a lot of searching). This post also deals precisely with indices, which it turns out … Read more Convolution vs. Cross-Correlation

Image Panorama Stitching with OpenCV

Image stitching is one of the most successful applications in Computer Vision. Nowadays, it is hard to find a cell phone or an image processing API that do not contain this functionality. In this piece, we will talk about how to perform image stitching using Python and OpenCV. Given a pair of images that share … Read more Image Panorama Stitching with OpenCV

Overview of feature selection methods

Common strategies for choosing the most relevant features in your data set The importance of feature selection Selecting the right set of features to be used for data modelling has been shown to improve the performance of supervised and unsupervised learning, to reduce computational costs such as training time or required resources, in the case … Read more Overview of feature selection methods

How to Communicate Clearly About Machine Learning.

Say you’re building a system to optimize a labor-intensive task so that costs can be reduced. Does it matter that your system is highly accurate if it actually makes the process slower? That wouldn’t really reduce costs. A good KPI for this case could be something like Average Workflow Speed. Define a workflow and measure … Read more How to Communicate Clearly About Machine Learning.

Soup of the Day

Webscraping With Beautiful Soup — A Beginner’s Guide Though there are many thousands of lovely clean datasets available out there for a data scientist’s delectation (mostly on Kaggle), you’re always going to have those pesky hypotheses that stay out of their scope. Creating the dataset you do need from scratch is a potentially daunting prospect … Read more Soup of the Day

Weekly Trading Roundup — Week 2

Week 2: We discuss earnings season, market sentiment, and adjusting the portfolio for changing market conditions… Disclaimer: Past performance is NOT indicative of future returns and you should under no circumstances try to match the portfolio in this article. Nothing herein is financial advice, and this is NOT a recommendation to trade real money or … Read more Weekly Trading Roundup — Week 2

Deep Reinforcement Learning Tutorial with Open AI Gym

Q learning for playing Space Invaders This blog is the Part-3 of the series on reinforcement learning. Feel free to read the earlier parts. Reinforcement Learning Tutorial with Open AI Gym Solving mountain car environment using reinforcement learning in open ai gym. towardsdatascience.com Reinforcement Learning Tutorial with Open AI Gym Solving bipedal walker environment using … Read more Deep Reinforcement Learning Tutorial with Open AI Gym

How to resume an interrupted training session in fastai

What do I do if a training with fit_one_cycle was interrupted halfway through? Courtesy of [email protected] What do you do if you have a huge dataset, a large and slow-to-train network and your training session was interrupted after several hours of training? This can happen for many reasons: because you reached your 12 hours of … Read more How to resume an interrupted training session in fastai

How AI can bring back the ‘lost 80%’ data into decision-making?

Picture Credit: Dilbert.com, by Scott Adams A business leader in a logistics company, once shared with me that his business was losing revenue, as it could not optimise capacity of its carriers effectively. The operations could not cope up with fluctuations in cargo deliveries; this problem was more pronounced during peak periods. Not optimising capacity … Read more How AI can bring back the ‘lost 80%’ data into decision-making?

Keeping an eye on confounds: a walk through for calculating a partial correlation matrix

An R demo illustrating two approaches for calculating partial correlation matrices Don’t forget about potential confounding variables! Photo from Wikimedia Commons One of the most common steps analysts perform following data munging/ pre-processing is to run a correlation analysis to check the pairwise associations among variables in a standardized way. It’s a quick and pretty … Read more Keeping an eye on confounds: a walk through for calculating a partial correlation matrix

Are you likely to be attacked by a bear?

My Viz Improvement Journey The problem statement for the visualization below is “In which park and when, are you most likely to be killed by a bear?” Data collected, prepared, and distributed by Ali Sanne on data.world | Visualization by Zachary Crockett, Vox Naturally, I was expecting the visualization to have locations of parks and … Read more Are you likely to be attacked by a bear?

Breaking BERT Down

BERT is short for Bidirectional Encoder Representations from Transformers. It is a new type of language model developed and released by Google in late 2018. Pre-trained language models like BERT play an important role in many natural language processing tasks, such as Question Answering, Named Entity Recognition, Natural Language Inference, Text Classification etc. BERT is … Read more Breaking BERT Down

What the numbers tell us about a “Kawhi Effect”

Enough storytelling. Let’s get right into the numbers. Both Kawhi and Paul George are superstars in the league. You can say whatever you want about the Warriors being injured, but Kawhi led the Toronto Raptors to their first NBA championship. Even though Damian Lillard waved goodbye to the Thunder and their franchise as we knew … Read more What the numbers tell us about a “Kawhi Effect”

Linear Regression In Python

So you’ve decided to learn about machine learning. Whether you’re doing it for career reasons or strictly out of curiosity, you’ve come to the right place. In the proceeding article, we’ll take a look at, the “Hello World” of machine learning, linear regression. In the context of machine learning, when people speak of a model, … Read more Linear Regression In Python

Creating Reproducible Data Science Projects

A Nightmare Scenario Imagine you completed a one-off analysis a few months ago, creating a fairly complex data pipeline, machine learning model and visualisations. Fast forward to today and you have Emily, a senior executive at your company, asking you to reuse that work to help solve a similar, time-critical business problem. She looks stressed. … Read more Creating Reproducible Data Science Projects

The Human Side of Precision vs. Recall

A People-First Approach to Model Evaluation Seems like we lose either way. Using precision alone risks not catching students who would drop out. And, using recall alone risks predicting too many students will drop out. Let’s flip our mindset. We’ve been thinking about the data science first and then observing the effect on the students … Read more The Human Side of Precision vs. Recall

In the New Era of Knowledge, Connection Beats Collection Every Time

Photo by Clint Adair on Unsplash “Look at the data — the numbers don’t lie.” It’s an often given piece of advice, but a less often understood one. Because what the person giving the advice really means is “Look at the data, and think about what it means for the situation we’re facing. Once you … Read more In the New Era of Knowledge, Connection Beats Collection Every Time

What if Your Colleague is a Robot

Artificial Intelligence, Robotics and the Bizzare Future of Collaboration Throughout history, we have seen how organisations across entire industries have embraced robotic technology, and how today, it is almost impossible for some of these organisations to operate without it. Every day, we are witnesses of how technology is integrating into nearly every aspect of our … Read more What if Your Colleague is a Robot

The Little Robot that Lived at the Library

How we built an emotive social robot to guide library customers to books Our team at Futurice designed and built a social robot to guide people to books at Helsinki’s new central library, Oodi. Opened in 2018, Oodi is the biggest of Helsinki’s 37 public libraries. It has 10,000 visitors a day, and an estimated … Read more The Little Robot that Lived at the Library

Sentiment Analysis: a benchmark

Recurrent neural networks explained. Classifying customer reviews using FCNNs, CNNs, RNNs and Embeddings. This article gently introduces recurrent units, how their memory works and how they are used to handle sequence data such as text. With hand-on practical Python code, we demonstrate limitations of simple recurrent neural networks and show how embeddings improve fully connected … Read more Sentiment Analysis: a benchmark

Domo Arigato, Misses Roboto

The Story of My Super Confusing Existential Relationship with Amazon Echo Don’t have time to read? Listen to a podcast of this episode. When Amazon first announced the Echo in 2014, I put my name on the waiting list to order the world’s first pure-voice-controlled robot assistant. Three days before Christmas, Alexa (Amazon Echo’s name) … Read more Domo Arigato, Misses Roboto

Learning SQL 201: Optimizing Queries, Regardless of Platform

TL;DR: Disk is Frickin’ Slow. Network is Worse. Caveat: As of this writing, I’ve used the following database-like systems in a production environment: MySQL, PostgreSQL, Hive, MapReduce on Hadoop, AWS Redshift, GCP BigQuery, in various mixes of on-prem/hybrid/cloud setups. My optimization knowledge largely stems from those. I’ll stick to strategies/thinking process here, but there are … Read more Learning SQL 201: Optimizing Queries, Regardless of Platform

Predicting vs. Explaining

Even in academic fields like economics and other social sciences, the concepts of predictive power and explanatory power are often conflated — models showing high explanatory power are often assumed to be highly predictive. But the approach to building the best predictive model is totally different from the approach to building the best explanatory model, … Read more Predicting vs. Explaining

A gentle introduction to Recommendation Systems

If you are here, reading about Recommendation Systems, surely you already know what we’ll be talking about, so maybe you can just jump over this brief chapter. But if you came here attracted by the cover image, or if you want to know more about how Recommendation Systems emerged and grew up in the last … Read more A gentle introduction to Recommendation Systems

Dependency Parser or how to find syntactic neighbours of a word

This article will go through the theory to demystify this insufficiently known part of NLP. Then, in a second article, we will suggest tools to help you understand how to easily implement a Dependency Parser. When we think about a word’s neighbors, we could think about the neighborhood as their location in a sentence, their … Read more Dependency Parser or how to find syntactic neighbours of a word

Bist-Parser : an end-to-end implementation of a Dependency Parser

This article is the 2nd and last article on Dependency Parsing. We will give you some easy guidelines for implementation and the tools to help you improve it. A TreeBank is a parsed text corpus that annotates syntactic or semantic sentence structure. Dependency TreeBanks are created using different approaches : either thanks to human annotators … Read more Bist-Parser : an end-to-end implementation of a Dependency Parser

Millennials’ Favorite Fruit: Forecasting Avocado Prices with ARIMA Models

Forecasts I think it’s pretty safe to say that the ship has sailed for Millenials in terms of being able to afford that two-bedroom I mentioned earlier. I mean Americans are spending around $7,000,000 a week on avocados so all hopes of the white picket fence should be dashed for Millenials. Thankfully, according to the … Read more Millennials’ Favorite Fruit: Forecasting Avocado Prices with ARIMA Models

Python Basics — Classes and Objects

It refers to defining a new class with little or no modification to an existing class.A sub-class is derived from a base-class, inheriting its behaviour and making behaviour specific to sub-class. Syntax # Base classclass BaseClass:Body of base class# Derived class class DerivedClass(BaseClass):Body of derived class Why Inheritance? Inheritance allows a derived class to inherit … Read more Python Basics — Classes and Objects

Relative vs Absolute: How to Do Compositional Data Analyses. Part — 2

This is a continuation of my earlier post on compositional data analyses where I showed the pitfalls of treating compositional data as absolute data instead of relative data. In this post, I will summarize the techniques we can use to correctly analyze compositional data with specific examples demonstrated using RNA-Seq data. Two main strategies exist … Read more Relative vs Absolute: How to Do Compositional Data Analyses. Part — 2

Predicting unknown unknowns

Reference Paper: Reducing Network Agnostophobia: https://arxiv.org/pdf/1811.04110.pd For classification models for many domains and scenarios it is important to predict when the input given to the model does not belong to the classes it was trained on. For computer vision / object detector models author provide following justification: Object detectors have evolved over time from using … Read more Predicting unknown unknowns

An Introduction to Recurrent Neural Networks for Beginners

A simple walkthrough of what RNNs are, how they work, and how to build one from scratch in Python. Recurrent Neural Networks (RNNs) are a kind of neural network that specialize in processing sequences. They’re often used in Natural Language Processing (NLP) tasks because of their effectiveness in handling text. In this post, we’ll explore … Read more An Introduction to Recurrent Neural Networks for Beginners

Introduction to dtplyr

Learn how to easily combine dplyr’s readability with data.table’s performance! I recently saw a Tweet by Hadley Wickham about the release of dtplyr. It is a package that enables working with dplyr syntax on data.table objects. dtplyr automatically translates the dplyr syntax to thedata.table equivalent, which in the end results in a performance boost. Marvel: … Read more Introduction to dtplyr

Managing R&D in Data Science — Part 1

Lessons learned: what worked and what did not go so well As long as teams exist, inter-communication issues arise (credit to John B.) When I accepted the challenging opportunity to become the Head of an R&D team, I thought I could mimic some methodologies from other tech companies. A lot of these companies do R&D, … Read more Managing R&D in Data Science — Part 1