Hybrid Humans and Conscious Robots

Musings on the intersection of Artificial Intelligence, Consciousness, and Reinforcement Learning At what level are you conscious? Staring into the eyes of a comatose loved one, many of us have agonized over whether the patient was conscious of caresses received or whispered prayers. Increasing we will have answers to such questions, thanks in a large … Read more

People Tracking using Deep Learning

Doing cool things with data! Introduction Object Tracking is an important domain in computer vision. It involves the process of tracking an object which could be a person, ball or a car across a series of frames. For people tracking we would start with all possible detections in a frame and give them an ID. In … Read more

Supervised Machine Learning: Model Validation, a Step by Step Approach

Model validation is the process of evaluating a trained model on test data set. This provides the generalization ability of a trained model. Here I provide a step by step approach to complete first iteration of model validation in minutes. The basic recipe for applying a supervised machine learning model are: Choose a class of model … Read more

BigQuery without a credit card: Discover, learn and share

If you ever had trouble signing up for BigQuery, worry no more — now it’s easier than ever to sign up and start querying. The new sandbox mode even includes free storage, no credit card required. See the official blog post “Query without a credit card: introducing BigQuery sandbox” for more details. Here we are going to … Read more

Learn Enough Python to be Useful Part 2

How to Use if __name__ == “__main__ “ This article is one in a series to help you become comfortable in Python scripting land. It’s for data scientists and anyone new to Python programming. if __name__ == “__main__”: is one of those things you see in Python scripts that often isn’t explained. You might have … Read more

Intuitive Deep Learning Part 1a: Introduction to Neural Networks

As mentioned above, Deep Learning is simply a subset of the architectures (or templates) that employs “neural networks” which we can specify during Step 1. “Neural networks” (more specifically, artificial neural networks) are loosely based on how our human brain works, and the basic unit of a neural network is a neuron. At the basic … Read more

Fashion Science takes on Seasonal Color Analysis

Turns out ‘seasons’ just aren’t found in the data Wear the right color clothes and be more attractive! That’s the allure of seasonal color analysis. By appropriately placing you into one of four seasons — spring, summer, autumn, and winter — each has its own palette of colors appropriate for you. This paper applies Fashion Science to explore a simple … Read more

Image Classification for E-commerce [Part I]

System Requirements Download or clone the ResNet model from Facebook’s Github link. Install the Torch ResNet dependencies on Ubuntu 14.04+: Install Torch on a machine with CUDA GPU (NVIDIA GPU with compute capability 3.5 or above) Install cuDNN v4 or v5 and the Torch cuDNN bindings See the installation instructions for a step-by-step guide. Let’s … Read more

Perfume Recommendations using Natural Language Processing

Introduction Natural Language Processing(NLP) has many intriguing applications to Recommender Systems and Information Retrieval. As a perfume lover and a Data Scientist, the unusual and highly descriptive language used in the niche perfume community inspired me to use NLP to create a model to help me discover perfumes I might want to purchase. “Niche” perfumes … Read more

Hand Keypoints Detection

Detect the keypoint positions on hand images with small train data set. How many labelled images are needed to train a network to accurately predict fingers and palm lines locations? I was inspired by this blog post where the author reported 97.5% classification accuracy to classify if a human was wearing glasses or not with … Read more

Your Data is Using a lot of Energy

All the data collected on users and saved in the cloud is having a big impact on the environment If you begin to look into the amount of data created each year, you’ll quickly find a statistic that at first look seems over used and outdated: 90% of data ever created was created in the … Read more

Algorithms for Text Classification — Part 1: Naive Bayes

Next, let’s see how to run this algorithm using Python with real data: import pandas as pdimport numpy as np spam_data = pd.read_csv(‘spam.csv’) spam_data[‘target’] = np.where(spam_data[‘target’]==’spam’,1,0)print(spam_data.shape)spam_data.head(10) from sklearn.model_selection import train_test_split#Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(spam_data[‘text’],spam_data[‘target’],random_state=0) from sklearn.feature_extraction.text import CountVectorizerfrom sklearn.naive_bayes import MultinomialNBfrom sklearn.metrics import roc_auc_score #Train and evaluate … Read more

How Do I Write About Data Science On Medium

5 Core Principles to Write about Data Science, and Beyond (Source) 1. Be conversational Your articles are always read by individual readers — one reader at any given time. What this means is that readers mostly read your articles individually without anyone beside them. Therefore, to really attract and engage with readers, your writing should be in a … Read more

Community Forums Meets Data Science

Analysis of forum members’ activity, posts, and behavior SummaryAs a community builder and strategist with a passion for data science, I have found that the use of data science techniques has deepened my understanding of the communities I manage, allowing me to make better strategic and operational decisions. In this article, I aim to exemplify how … Read more

AI: The Future of Technology and the World

Artificial intelligence (AI) has now become a topic of controversy bigger than ever before. Many people are worried about robots taking over the world. The concept of AI scares people because they are afraid of the fact that we are creating bots in which we have no idea how they work. But what if I … Read more

Community detection of survey responses based on Pearson correlation coefficient with Neo4j

Just a few days ago a new version of Neo4j graph algorithms plugin was released. With the new release come new algorithms and Pearson correlation algorithm is one of them. To demonstrate how to use Pearson correlation algorithm in Neo4j we will use the data from “Young People Survey” Kaggle dataset made available by Miroslav … Read more

Toronto on Fire in Data, Part 1

Fire Incidents Analysis by Segmentation & Poisson in Practice Each year the Toronto Fire Services (TFS) are dispatched to between 9,000 and 10,000 fires in the city of 2.7 million inhabitants. The severity ranges from minor fires in grass or rubbish to major fires in warehouses or residential high-rises. In this study, the first of a … Read more

Introducing DoWhy

Microsoft’s Framework for Causal Inference The human mind has a remarkable ability to associate causes with a specific event. From the outcome of an election to an object dropping on the floor, we are constantly associating chains of events that cause a specific effect. Neuropsychology refers to this cognitive ability as causal reasoning. Computer science … Read more

What’s happening on the roads of Bangalore?

A visual analysis of traffic accidents and other incidents on roads of Bengaluru/Bangalore in India. Image: High speed traffic on NICE Road, Bangalore. Copyrighted Source: https://500px.com/photo/12002497/speed-by-supratim-haldar OBJECTIVE We spend a lot of time on roads, stuck in traffic jams, which are usually caused by overflow of vehicles at specific times of day or unexpected incidents at any … Read more

Women Leading in AI — 10 Principles for Responsible AI

“To establish an education and training programme to meet the needs identified by the skills audit, including content on data ethics and social responsibility. As part of that, we recommend the set up of a solid, courageous and rigorous programme to encourage young women and other underrepresented groups into technology.” — Recommendation 10 Men do science and … Read more

Web Scraping Craigslist: A Complete Tutorial

I’ve been looking to make a move recently. And what better way to know I’m getting a good price than to sample from the “population” of housing on Craigslist? Sounds like a job for…Python and web scraping! In this article, I’m going to walk you through my code that scrapes East Bay Area Craigslist for … Read more

Predicting Rich Attributes in Real Estate Images Using fastai

by David Samuel and Naveen Kumar Overview Visual attribute search can greatly improve the user experience, and SEO for home listing and travel websites. Although Zillow, Redfin, Airbnb and TripAdvisor have some metadata already about the amenities of a property, they can expand searchable attributes by analyzing the property images with vision models. In this … Read more

Predicting the social determinants of health

DataScience@HF Social determinants of health — factors outside of a person’s direct health status such as employment, where they live, and education level — drive at least twenty percent of health outcomes. At Healthfirst, members facing challenges related to these determinants go to the hospital 30% more, stay up to 2 days longer in the hospital when they go, … Read more

Technology and the Origins of Creativity

AI and emerging technologies bring humanity powerful new tools, but are we ready to transition to them? by Dirk Knemeyer and Jonathan Follett The smartware evolution We are on the cusp of the next major evolution in computing technologies that will unleash significant changes in our work and our lives. While artificial intelligence (AI) receives outsized … Read more

Almost Everything You Need to Know About Time Series

Understand moving average, exponential smoothing, stationarity, autocorrelation, SARIMA, and more Photo by Lukas Blazek on Unsplash Whether we wish to predict the trend in financial markets or electricity consumption, time is an important factor that must now be considered in our models. For example, it would be interesting to not only know when a stock will move … Read more

15 Docker Commands You Should Know

Part 5 of Learn Enough Docker to be Useful In this article we’ll look at 15 Docker CLI commands you should know. If you haven’t yet, check out the rest of this series on Docker concepts, the ecosystem, Dockerfiles, and keeping your images slim. In Part 6 we’ll explore data with Docker. I’ve got a series … Read more

The keys of Deep Learning in 100 lines of code

In search of the mystery function A lot of what happens in the universe can be expressed with functions. A function a mathematical construction that takes an input and produces an output. Cause and effect. Input and Output. When we look at the world and its challenges, we see information, we see data. And we can … Read more

Review: LapSRN & MS-LapSRN — Laplacian Pyramid Super-Resolution Network (Super Resolution)

Progressively Reconstructs Residuals, Charbonnier Loss, Parameter Sharing, Local Residual Learning, Outperforms SRCNN, VDSR, DRCN, DRRN 32×, 16×, 8×, 4× and 2× SR In this story, LapSRN (Laplacian Pyramid Super-Resolution Network) and MS-LapSRN (Multi-Scale Laplacian Pyramid Super-Resolution Network) are reviewed. By progressively reconstructs the sub-band residuals, with Charbonnier loss functions, LapSRN outperforms SRCNN, FSRCNN, VDSR, and DRCN. With … Read more

Explaining Data Science/Artificial Intelligence

What do you do ? This is a difficult one for Data Scientists to answer, although they’re often asked this very question which is why we’re writing this article. We will try to explain what they do for a living while covering the basics of Artificial Intelligence. So, we’re going to focus this article on what … Read more

Machine Learning — Text Classification, Language Modelling using fast.ai

Applying latest deep learning techniques for text processing Transfer learning is a technique where instead of training a model from scratch, we reuse a pre-trained model and then fine-tune it for another related task. It has been very successful in computer vision applications. In natural language processing (NLP) transfer learning was mostly limited to the … Read more

Advancing Open Domain Dialog Systems Through Alexa Prize

Building an Open-Domain Dialogue system is one of the most challenging tasks. Almost all of the tasks related to Open-Domain Dialogue system are believed to be “AI-complete”. In other words, solving the problems of Open-Domain Dialogue systems would need “true intelligence” or “human intelligence”. Open-Domain Dialogue systems require the understanding of Natural Languages. The absence … Read more

Zooming In and Zooming Out

A Note on Qualitative Sample Sizes Zooming in or zooming out? How close a picture you need will impact your sample size. (Photos from @ryoji__iwata and @13on via unsplash.com) This article covers: Why small sample sizes are acceptable in qualitative research What the overall goals of qualitative research are How to determine sample size, and what … Read more

Understanding Encoder-Decoder Sequence to Sequence Model

In this article, I will try to give a short and concise explanation of the sequence to sequence model which have recently achieved significant results on pretty complex tasks like machine translation, video captioning, question answering etc. Prerequisites: the reader should already be familiar with neural networks and, in particular, recurrent neural networks (RNNs). In … Read more

What is a Recurrent NNs and Gated Recurrent Unit (GRUS)

Photo by Tom Grimbert on Unsplash Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many sequential data and among others used by Apples Siri and Googles Voice Search. Their great advantage is that the algorithm remembers its input, due to an internal memory. But despite their recent popularity there exists a … Read more

Spectral clustering

The intuition and math behind how it works! What is clustering? Clustering is a widely used unsupervised learning method. The grouping is such that points in a cluster are similar to each other, and less similar to points in other clusters. Thus, it is up to the algorithm to find patterns in the data and group … Read more

Machine Learning for Anyone who Took Math in 8th Grade

Explaining modern “AI” with easy math, pop-culture references, and oversimplified analogies I usually see Artificial Intelligence explained in 1 of 2 ways: through the increasingly sensationalist perspective of the media, or through dense scientific literature riddled with superfluous language and field-specific terms. Source — a classic There’s a less publicized area in between these extremes where I think … Read more

NLP Kaggle Competition

Class Imbalance As we saw above, we have a class imbalance problem. Imbalanced classes are a common problem in machine learning classification where there are a disproportionate ratio of observations in each class. (In this post I explore methods for dealing with class imbalance.) With just 6.6% of our dataset belonging to the target class, … Read more

Predicting the Frequency of Asteroid Impacts with a Poisson Processes

Simulating Asterid Impacts Our objective is to determine the probability distribution of the number of expected impacts in each size category which means we need a time range. To keep things in perspective, we’ll start with 100 years, about the lifespan of a human. This means our distribution will show the probabities for number of impacts … Read more

Semantic Segmentation of Aerial images Using Deep Learning

What is Semantic Segmentation?? What are its Practical Applications?? Semantic segmentation of drone images to classify different attributes is quite a challenging job as the variations are very large, you can’t expect the places to be same. And doing manual segmentation of this images to use it in different application is a challenge and a … Read more

February Edition: Data Visualization

8 of the best articles on visualizing data Data visualization is an essential step in any data science process. It’s the final bridge between the data scientist and end users. It communicates, validates, confronts and educates. And when done correctly, it opens up the insights from a data science project to a wider audience. Great … Read more

Building a Better Profanity Detection Library with scikit-learn

Why existing libraries are uninspiring and how I built a better one. A few months ago, I needed a way to detect profanity in user-submitted text strings: This shouldn’t be that hard, right? I ended up building and releasing my own library for this purpose called profanity-check: Of course, before I did that, I looked in the … Read more

ML Algorithms: One SD (σ)- Regression

An intro to machine learning regression algorithms The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The … Read more