Toronto on Fire in Data, Part 1

Fire Incidents Analysis by Segmentation & Poisson in Practice Each year the Toronto Fire Services (TFS) are dispatched to between 9,000 and 10,000 fires in the city of 2.7 million inhabitants. The severity ranges from minor fires in grass or rubbish to major fires in warehouses or residential high-rises. In this study, the first of a … Read more

Introducing DoWhy

Microsoft’s Framework for Causal Inference The human mind has a remarkable ability to associate causes with a specific event. From the outcome of an election to an object dropping on the floor, we are constantly associating chains of events that cause a specific effect. Neuropsychology refers to this cognitive ability as causal reasoning. Computer science … Read more

What’s happening on the roads of Bangalore?

A visual analysis of traffic accidents and other incidents on roads of Bengaluru/Bangalore in India. Image: High speed traffic on NICE Road, Bangalore. Copyrighted Source: OBJECTIVE We spend a lot of time on roads, stuck in traffic jams, which are usually caused by overflow of vehicles at specific times of day or unexpected incidents at any … Read more

Women Leading in AI — 10 Principles for Responsible AI

“To establish an education and training programme to meet the needs identified by the skills audit, including content on data ethics and social responsibility. As part of that, we recommend the set up of a solid, courageous and rigorous programme to encourage young women and other underrepresented groups into technology.” — Recommendation 10 Men do science and … Read more

Web Scraping Craigslist: A Complete Tutorial

I’ve been looking to make a move recently. And what better way to know I’m getting a good price than to sample from the “population” of housing on Craigslist? Sounds like a job for…Python and web scraping! In this article, I’m going to walk you through my code that scrapes East Bay Area Craigslist for … Read more

Predicting Rich Attributes in Real Estate Images Using fastai

by David Samuel and Naveen Kumar Overview Visual attribute search can greatly improve the user experience, and SEO for home listing and travel websites. Although Zillow, Redfin, Airbnb and TripAdvisor have some metadata already about the amenities of a property, they can expand searchable attributes by analyzing the property images with vision models. In this … Read more

Predicting the social determinants of health

DataScience@HF Social determinants of health — factors outside of a person’s direct health status such as employment, where they live, and education level — drive at least twenty percent of health outcomes. At Healthfirst, members facing challenges related to these determinants go to the hospital 30% more, stay up to 2 days longer in the hospital when they go, … Read more

Technology and the Origins of Creativity

AI and emerging technologies bring humanity powerful new tools, but are we ready to transition to them? by Dirk Knemeyer and Jonathan Follett The smartware evolution We are on the cusp of the next major evolution in computing technologies that will unleash significant changes in our work and our lives. While artificial intelligence (AI) receives outsized … Read more

Almost Everything You Need to Know About Time Series

Understand moving average, exponential smoothing, stationarity, autocorrelation, SARIMA, and more Photo by Lukas Blazek on Unsplash Whether we wish to predict the trend in financial markets or electricity consumption, time is an important factor that must now be considered in our models. For example, it would be interesting to not only know when a stock will move … Read more

15 Docker Commands You Should Know

Part 5 of Learn Enough Docker to be Useful In this article we’ll look at 15 Docker CLI commands you should know. If you haven’t yet, check out the rest of this series on Docker concepts, the ecosystem, Dockerfiles, and keeping your images slim. In Part 6 we’ll explore data with Docker. I’ve got a series … Read more

The keys of Deep Learning in 100 lines of code

In search of the mystery function A lot of what happens in the universe can be expressed with functions. A function a mathematical construction that takes an input and produces an output. Cause and effect. Input and Output. When we look at the world and its challenges, we see information, we see data. And we can … Read more

Review: LapSRN & MS-LapSRN — Laplacian Pyramid Super-Resolution Network (Super Resolution)

Progressively Reconstructs Residuals, Charbonnier Loss, Parameter Sharing, Local Residual Learning, Outperforms SRCNN, VDSR, DRCN, DRRN 32×, 16×, 8×, 4× and 2× SR In this story, LapSRN (Laplacian Pyramid Super-Resolution Network) and MS-LapSRN (Multi-Scale Laplacian Pyramid Super-Resolution Network) are reviewed. By progressively reconstructs the sub-band residuals, with Charbonnier loss functions, LapSRN outperforms SRCNN, FSRCNN, VDSR, and DRCN. With … Read more

Explaining Data Science/Artificial Intelligence

What do you do ? This is a difficult one for Data Scientists to answer, although they’re often asked this very question which is why we’re writing this article. We will try to explain what they do for a living while covering the basics of Artificial Intelligence. So, we’re going to focus this article on what … Read more

Machine Learning — Text Classification, Language Modelling using

Applying latest deep learning techniques for text processing Transfer learning is a technique where instead of training a model from scratch, we reuse a pre-trained model and then fine-tune it for another related task. It has been very successful in computer vision applications. In natural language processing (NLP) transfer learning was mostly limited to the … Read more

Advancing Open Domain Dialog Systems Through Alexa Prize

Building an Open-Domain Dialogue system is one of the most challenging tasks. Almost all of the tasks related to Open-Domain Dialogue system are believed to be “AI-complete”. In other words, solving the problems of Open-Domain Dialogue systems would need “true intelligence” or “human intelligence”. Open-Domain Dialogue systems require the understanding of Natural Languages. The absence … Read more

Zooming In and Zooming Out

A Note on Qualitative Sample Sizes Zooming in or zooming out? How close a picture you need will impact your sample size. (Photos from @ryoji__iwata and @13on via This article covers: Why small sample sizes are acceptable in qualitative research What the overall goals of qualitative research are How to determine sample size, and what … Read more

Understanding Encoder-Decoder Sequence to Sequence Model

In this article, I will try to give a short and concise explanation of the sequence to sequence model which have recently achieved significant results on pretty complex tasks like machine translation, video captioning, question answering etc. Prerequisites: the reader should already be familiar with neural networks and, in particular, recurrent neural networks (RNNs). In … Read more

What is a Recurrent NNs and Gated Recurrent Unit (GRUS)

Photo by Tom Grimbert on Unsplash Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many sequential data and among others used by Apples Siri and Googles Voice Search. Their great advantage is that the algorithm remembers its input, due to an internal memory. But despite their recent popularity there exists a … Read more

Spectral clustering

The intuition and math behind how it works! What is clustering? Clustering is a widely used unsupervised learning method. The grouping is such that points in a cluster are similar to each other, and less similar to points in other clusters. Thus, it is up to the algorithm to find patterns in the data and group … Read more

Machine Learning for Anyone who Took Math in 8th Grade

Explaining modern “AI” with easy math, pop-culture references, and oversimplified analogies I usually see Artificial Intelligence explained in 1 of 2 ways: through the increasingly sensationalist perspective of the media, or through dense scientific literature riddled with superfluous language and field-specific terms. Source — a classic There’s a less publicized area in between these extremes where I think … Read more

NLP Kaggle Competition

Class Imbalance As we saw above, we have a class imbalance problem. Imbalanced classes are a common problem in machine learning classification where there are a disproportionate ratio of observations in each class. (In this post I explore methods for dealing with class imbalance.) With just 6.6% of our dataset belonging to the target class, … Read more

Predicting the Frequency of Asteroid Impacts with a Poisson Processes

Simulating Asterid Impacts Our objective is to determine the probability distribution of the number of expected impacts in each size category which means we need a time range. To keep things in perspective, we’ll start with 100 years, about the lifespan of a human. This means our distribution will show the probabities for number of impacts … Read more

Semantic Segmentation of Aerial images Using Deep Learning

What is Semantic Segmentation?? What are its Practical Applications?? Semantic segmentation of drone images to classify different attributes is quite a challenging job as the variations are very large, you can’t expect the places to be same. And doing manual segmentation of this images to use it in different application is a challenge and a … Read more

February Edition: Data Visualization

8 of the best articles on visualizing data Data visualization is an essential step in any data science process. It’s the final bridge between the data scientist and end users. It communicates, validates, confronts and educates. And when done correctly, it opens up the insights from a data science project to a wider audience. Great … Read more

Building a Better Profanity Detection Library with scikit-learn

Why existing libraries are uninspiring and how I built a better one. A few months ago, I needed a way to detect profanity in user-submitted text strings: This shouldn’t be that hard, right? I ended up building and releasing my own library for this purpose called profanity-check: Of course, before I did that, I looked in the … Read more

ML Algorithms: One SD (σ)- Regression

An intro to machine learning regression algorithms The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The … Read more

A Guide to Data Visualisation in R for Beginners

Visualisation libraries in R R comes equipped with sophisticated visualisation libraries having great capabilities. Let us have a closer look at some of the commonly used ones. In this section, we will use the built-in mtcars dataset to show the uses of the various libraries. This dataset has been extracted from the 1974 Motor Trend US … Read more

Blender 2.8 Grease Pencil Scripting and Generative Art

5agadoBlockedUnblockFollowFollowing Feb 4 Quick, Draw! — Flock — Conway’s Game of Life What: learning the basics of scripting for Blender Grease-Pencil tool, with focus on generative art as a concrete playground. Less talking, more code (commented) and many examples. Why: mostly because we can. Also because Blender is a very rich ecosystem, and Grease-Pencil in version 2.8 is a powerful … Read more

Is the #10YearChallenge A Sign of the AI Apocalypse?

Viral social media “challenges,” memes, and gimmicks have taken over our feeds in recent years. The term “challenge” is used loosely though since these viral sensations aren’t so much challenging as they are just unique ways to spice up your social media presence. But are they also signs of the impending AI apocalypse? Let’s look … Read more

How It Feels to Learn Data Science in 2019

Seeing the (Random) Forest Through the (Decision) Trees The following is inspired by the article How it Feels to Learn JavaScript in 2016. Do not take this too seriously. This piece is just an opinion, much like people’s definition of data science. I heard you are the one to go to. Thank you for meeting … Read more

Maximum Likelihood Estimation

Coin Flip MLE Let’s derive the MLE estimator for our coin flip model from before. I’ll cover the MLE estimator for our linear model in a later post on linear regression. Recall that we’re modeling the outcome of a coin flip by a Bernoulli distribution, where the parameter p represents the probability of getting a heads. … Read more

Review: DRRN — Deep Recursive Residual Network (Super Resolution)

Up to 52 Convolutional Layers, With Global and Local Residual Learnings, Outperforms SRCNN, FSRCNN, ESPCN, VDSR, DRCN, and RED-Net. Digital Image Enlargement, The Need of Super Resolution In this story, DRRN (Deep Recursive Residual Network) is reviewed. With Global Residual Learning (GRL) and Multi-path mode Local Residual Learning (LRL), plus the recursive learning to control the … Read more

Python Basics: Mutable vs Immutable Objects

Source: After reading this blog post you’ll know: What are an object’s identity, type, and value What are mutable and immutable objects Introduction (Objects, Values, and Types) All the data in a Python code is represented by objects or by relations between objects. Every object has an identity, a type, and a value. Identity An … Read more

Tweets Data Visualization with Circles and User Interaction

Adding Interactivity: Tweet Info by Click After plotting and packing all the circles, we can make each circle to work like a button. To achieve this, we can include help from the function fig.canvas.mpl_connect. The function can take two arguments, the first one is a string that corresponds to the type of interaction (in our case … Read more

Understanding Studies of Racial Demarcations

Studies of racial demarcations typically are implemented in context of what are referred to as regression analyses. Simply put, a regression enables assessments of relations between some variable of interest, say students’ test scores, and variables that define said students, such as race, family income, parents’ professions, parents’ education etc. Pictorially, with x’s denoting variables … Read more

Learning aggregate functions

Machine Learning with relational data This article is inspired by the Kaggle competition . While I did not participate in the competition, I used the data to explore another problem that often arises working with realistic data. All machine learning algorithms work great with the tabular data, but in reality a lot of data … Read more

These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of…

Augmentation operations for NLP proposed in [this paper]. SR=synonym replacement, RI=random insertion, RS=random swap, RD=random deletion. The Github repository for these techniques can be found [here]. Data augmentation is commonly used in computer vision. In vision, you can almost certainly flip, rotate, or mirror an image without risk of changing the original label. However, in natural … Read more

Transfer Learning using ELMO Embedding

Last year, the major developments in “Natural Language Processing” were about Transfer Learning. Basically, Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task. Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of … Read more

Model-Free Prediction: Reinforcement Learning

Part 4: Model-Free Predictions with Monte-Carlo Learning, Temporal-Difference Learning and TD( λ) Previously, we looked at planning by dynamic programming to solve a known MDP. In this post, we will use model-free prediction to estimate the value function of an unknown MDP. i.e We will look at policy evaluation of an unknown MDP. This series of … Read more

Matplotlib Tutorial: Learn basics of Python’s powerful Plotting library

What is Matplotlib To make necessary statistical inferences, it becomes necessary to visualize your data and Matplotlib is one such solution for the Python users. It is a very powerful plotting library useful for those working with Python and NumPy. The most used module of Matplotib is Pyplot which provides an interface like MATLAB but … Read more

Introduction to TWO approaches of Content-based Recommendation System

A complete guide to resolve the confusion Content-based filtering is one of the common methods in building recommendation systems. While I tried to do some research in understanding the detail, it is interesting to see that there are 2 approaches that claim to be “Content-based”. Below I will share my findings and hope it can … Read more

Machine Learning and Particle Motion in Liquids: An Elegant Link

The gradient descent algorithm is one of the most popular optimization techniques in machine learning. It comes in three flavors: batch or “vanilla” gradient descent (GD), stochastic gradient descent (SGD), and mini-batch gradient descent which differ in the amount of data used to compute the gradient of the loss function at each iteration. The goal … Read more

Three steps for a successful machine learning project

Less technical considerations to make for all ML projects As people and companies venture into machine learning (ML), it is common for some to expect to dive right into building models and generating useful output. And while some parts of ML feel like this technical wizardry with magical predictions, there are other aspects that are less … Read more

Contextual Embeddings for NLP Sequence Labeling

Text representation (aka text embeddings) is a breakthrough of solving NLP tasks. At the beginning, single word vector represent a word even though carrying different meaning among context. For example, “Washington” can be a location, name or state. “University of Washington” Zalando released an amazing NLP library, flair, makes our life easier. It already implement … Read more