Data Scientist’s Guide to Summarization

A text summarization tutorial for beginners Team Members: Richa Bathija, Abhinaya Ananthakrishnan, Akhilesh Reddy(@akhilesh.narapareddy), Preetika Srivastava (@preetikasrivastava30) Did you ever face a situation where you had to scroll through a 400 word article only to realize that there are only 4 key points in the article? All of us have been there. In this age … Read more

Financial Machine Learning Part 1: Labels

Setting up a supervised learning problem Introduction In the previous post, we’ve explored several approaches for aggregating raw data for a financial instrument to create observations called bars. In this post, we will focus on the next crucial stage of the machine learning pipeline — labeling observations. As a reminder, labels in machine learning denote the outcomes of … Read more

Finding similar images using Deep learning and Locality Sensitive Hashing

A simple walkthrough on finding similar images through image embedding by a ResNet 34 using FastAI & Pytorch. Also doing fast semantic similarity search in huge image embeddings collections. Fina output with similar images given an Input image in Caltech 101 In this post, we are trying to achieve the above result, i.e., given an image, … Read more

Bayesian Modeling of Pro Overwatch Matches with PyMC3

Photo by AC De Leon on Unsplash Professional eSports are becoming increasingly popular, and the industry is growing rapidly. Many of these professional game leagues are based on games that have two teams that battle it out. Call of Duty, League of Legends, and Overwatch are all examples. Although these are comparable to traditional team sports, … Read more

The path to being the best data analyst: Help, Build, then Do.

The core competency of a data analyst is “Speed to Insight”. A data team often consists of many people, with many skills, using potentially overlapping techniques. This focus on speed distinguishes this role from data scientists or statisticians. Today I’m focused on answering questions about the business or about how users behave. I’ll refer to … Read more

Six Recommendations for Aspiring Data Scientists

Source: Building experience before landing a job Data science is a field with a huge demand, in part because it seems to require experience as a data scientist to be hired as a data scientist. But many of the best data scientists I’ve worked with have diverse backgrounds ranging from humanities to neuroscience, and it … Read more

Taking Google Sheets to (a) Class.

I am currently building a Flask app for teachers. Since Google Drive has been adopted by teachers, Google sheets are used by them also. One of my app’s features is to easily allow teachers to copy and paste the sheet link into the app and submit it through a form. It will then convert it … Read more

Machine Learning Models as Micro Services in Docker

One of the biggest underrated challenges in machine learning development is the deployment of the trained models in production that too in a scalable way. One joke on it I have read is “Most common way, Machine Learning gets deployed today is powerpoint slides :)”. Why Docker? Docker is a containerization platform which packages an application … Read more

How to setup the PySpark environment for development, with good software engineering practices

In this article we will discuss about how to set up our development environment in order to create good quality python code and how to automate some of the tedious tasks to speed up deployments. We will go over the following steps: setup our dependencies in a isolated virtual environment with pipenv how to setup … Read more

Convolutional Neural Network: A Step By Step Guide

“Artificial Intelligence, deep learning, machine learning — whatever you’re doing if you don’t understand it — learn it. Because otherwise, you’re going to be a dinosaur within three years” — Mark Cuban, a Serial Entrepreneur Hello and welcome, aspirant! If you are reading this and interested in the topic, I’m assuming that you are familiar with the basic concepts of deep … Read more

Let’s build an Article Recommender using LDA

Due to keen interest in learning new topics, I decided to work on a project where a Latent Dirichlet Allocation (LDA) model can recommend Wikipedia articles based on a search phrase. This article explains my approach towards building the project in Python. Check out the project on GitHub below. Structure Photo by Ricardo Cruz on Unsplash … Read more

Object Detection On Aerial Imagery Using RetinaNet

ESRI Data Science Challenge 2019 3rd place solution (Left) the original image. (Right) Car detections using RetinaNet, marked in green boxes Detecting cars and swimming pools using RetinaNet Introduction For tax assessments purposes, usually, surveys are conducted manually on the ground. These surveys are important to calculate the true value of properties. For example, having a swimming … Read more

Light on Math ML: Attention with Keras

Why Keras? With the unveiling of TensorFlow 2.0 it is hard to ignore the conspicuous attention (no pun intended!) given to Keras. There was greater focus on advocating Keras for implementing deep networks. Keras in TensorFlow 2.0 will come with three powerful APIs for implementing deep networks. Sequential API — This is the simplest API where you … Read more

Why you should be a Generalist first, Specialist later as a Data Scientist?

So what’s a Generalist and a Specialist? Before going any further, let’s first understand what we mean when we talk about being a generalist and a specialist in data science. A generalist is someone that has knowledge in many areas whereas a specialist knows a lot in one area. Simple as that. Particularly in data … Read more

Who are Independent Voters?

The differences in people who identify with a party “not very strongly”, and those who identify as independent but “are closer to” a party. The data we are using is polling conducted by YouGov Blue and from the progressive data organization Data For Progress, it consists of 3,215 voters and then is weighted by “age, sex, … Read more

Data Scientist Knowledge and Skills

A data scientist creates knowledge from data; and has skills in statistics, programming, and the domain under study. A data scientist creates knowledge from data through quantitative and programming methods and the knowledge of the domain under study. Data science is field in which data scientists work. A data scientist should have skills and knowledge in … Read more

Robotic Control with Graph Networks

Exploiting relational inductive bias to improve generalization and control source Machine learning is helping to transform many fields across diverse industries, as anyone interested in technology undoubtedly knows. Things like computer vision and natural language processing were changed dramatically due to deep learning algorithms in the past few years, and the effects of that change are … Read more

PCA and SVD explained with numpy

How exactly are principal component analysis and singular value decomposition related and how to implement using numpy. Principal component analysis (PCA) and singular value decomposition (SVD) are commonly used dimensionality reduction approaches in exploratory data analysis (EDA) and Machine Learning. They are both classical linear dimensionality reduction methods that attempt to find linear combinations of … Read more

Hyper-parameter Tuning Techniques in Deep Learning

The process of setting the hyper-parameters requires expertise and extensive trial and error. There are no simple and easy ways to set hyper-parameters — specifically, learning rate, batch size, momentum, and weight decay. Source Deep learning models are full of hyper-parameters and finding the best configuration for these parameters in such a high dimensional space is not a … Read more

Crash Course in Quantum Computing Using Very Colorful Diagrams

Representing information in Quantum Computing In a traditional computer, information is represented using traditional bits that can only possess a value of 1 or 0 and not both at the same time. In Quantum computers, we represent information using Qubits (quantum bits). We can represent Qubits using the bra-ket notations: |0⟩ or |1⟩, pronounced ‘ket … Read more

How to ask good questions?

What is a good question? Why ask a good question? How to do so? This story explores these three questions. Why ask good questions? Before diving into the how part, you might want to know why you should care about asking good questions so let us take a moment to understand why. Questions are intended … Read more

Identifying the Sources of Winter Air Pollution in Bangkok Part I

Air Pollution Map near Bangkok in January 2019 Air pollution is a serious environmental threat in many Asian countries. In Thailand, this issue has recently gained prominence due to the high levels of air pollution in Bangkok during the winter of 2019. Air pollution is reported through the air quality index (AQI), with higher values indicating … Read more

Web Traffic Forecasting

Motivation: Time-series being an important concept in statistics and machine learning is often less explored by data enthusiasts like us. To change the winds, we decided to work on one of the most burning time series problem of today’s day and era, “predicting web traffic”. This blog mirrors our brain storming involved in Web Traffic … Read more

Measuring Financial Turbulence and Systemic Risk

This project illustrates 2 unique approaches for measuring financial risk. This project illustrates 2 unique approaches for measuring financial risk. The Financial Turbulence Indicator measures the turbulence of global financial markets across time. This matters because: We can predict the future path of financial turbulence, since financial turbulence is highly persistent across time. You can … Read more

Developing a DCGAN Model in Tensorflow 2.0

Introduction In early March 2019, TensorFlow 2.0 was released and we decided to create an image generator based on Taehoon Kim’s implementation of DCGAN. Here’s a tutorial on how to develop a DCGAN model in TensorFlow 2.0. “To avoid the fast convergence of D (discriminator) network, G (generator) network is updated twice for each D … Read more

Basic Binary Sentiment Analysis using NLTK

“Your most unhappy customers are your greatest source of learning.” — Bill Gates So what does the customer say? In today’s context, it turns out a LOT. Social media has opened the floodgates of customer opinions and it is now free-flowing in mammoth proportions for businesses to analyze. Today, using machine learning companies are able to extract … Read more

The sexiest job of the 22nd century

Three questions you should ask in a data science interview Data science has been called “the sexiest job of the 21st century” — a sentiment I’d believe if I saw more business leaders hiring data scientists into environments where we can be effective. Instead, many of us feel misunderstood and invisible. The world isn’t ready for us … Read more

IT Support Ticket Classification and Deployment using Machine Learning and AWS Lambda

IT Support Ticket Classification and Deployment IT Ticket Classification Project Description and initial assumptions: As a part of our final project for Cognitive computing, we decided to address a real life business challenge for which we chose IT Service Management. Of all the business cases, we were interested with four user cases that might befitting … Read more

Enterprise Technology 101: How Five Practices Can Make Your Organization a Leader or a Loser

Why me? One of the many interesting things about being a technologist — leading the design, development, and deployment of custom software solutions for nearly 20 years — has been the opportunity to experientially learn by observing trials and errors. The diversity of these opportunities has enhanced that learning. Projects have involved many types of technologies: client-server, application development, … Read more

Those Racist Robots…

(Source: ARTIFICIAL INTELLIGENCE (AI) is one of the hottest topics out there, especially with the whole debate over whether or not robots are likely to take over the world. Regardless of our view on Artificial Intelligence being an actual advancement in our history or just another reckless, clumsy integration of accumulated knowledge, examining this … Read more

Questions pairs identification

Background You have a burning question — you login to Quora, post your question and wait for responses. There is a chance that what you asked is truly unique but more often than not if you have a question, someone has had it too. Did you notice that Quora tells you that a similar question has been … Read more

Data Science with no Math

Using AI to Build Mathematical Datasets This is an addendum to my last article, in which I had to add a caveat at the end that I was not a mathematician, and I was new at Python. I added this because I struggled to come up with a mathematical formula to generate patient data that … Read more

How do I know if my AI idea is possible?

One of the questions that I get asked often as an AI consultant is, in some ways, the most simple: Is this possible? People will come to me with some very vague notion of something they want automated or some sort of AI product they want to create. They usually don’t come from a technology … Read more

How to Learn Data Science

Photo by Paul Schafer on Unsplash Why most online data science courses will fail to teach you the skills you need Right now, I’m in a fairly unique position. On the one hand I’m writing a book (The Science of Data Science), which I hope will be as inclusive and as easy to read as possible. On … Read more

What is the difference between AI, machine learning, and deep learning?

People like to throw buzzwords like artificial intelligence, machine learning, and deep learning into conversations. I plead guilty. They accurately describe the work I do. That does not offer an excuse to hide behind buzzwords without understanding what they mean. So, let’s go over what they mean so you know when to use each in … Read more

Key Ingredients to Being Data Driven

PSA: if you’re still showing data in pie charts, stop. Companies love to exclaim “we’re data driven”. There are obvious benefits to being a data driven organization, and everyone nowadays has more data than they can shake a stick at. But what exactly does an organization need to be “data driven”? Just because you have a … Read more

The Evolved Transformer — Enhancing Transformer with Neural Architecture Search

Neural architecture search (NAS) is the process of algorithmically searching for new designs of neural networks. Though researchers have developed sophisticated architectures over the years, the ability to find the most efficient ones is limited, and recently NAS has reached the point where it can outperform human-designed models. A new paper by Google Brain presents … Read more

The Iceberg secret in Machine Learning

Revealing the secret of AI projects in the real world I first came across the Iceberg secret a few years back when reading a post by Joel Spolsky. Joel does a great job at revealing the secret that many software projects face. Since then I have come to realize that the same principle or secret translates … Read more

Checklist for debugging neural networks

This article will provide a framework to help you debug your neural networks: Start simple Confirm your loss Check intermediate outputs and connections Diagnose parameters Tracking your work Feel free to skip to a particular section or read through below! Please note: we do not cover data preprocessing or specific model algorithm selection. There are … Read more

Intro To Deep Learning: Taught by a 14 Year Old

Many of you may have started to hear the legendary tales of Artificial Intelligence curing cancer, or heard the frightening horror stories about intelligence robots taking over. Today, or whatever day you are reading this (hello future robot overlords), I will explain what Deep Learning is, why people are so afraid of it, and how … Read more

Deep Learning versus Biological Neurons: floating-point numbers, spikes, and neurotransmitters

Conjoined Dichotomy by Melting Miltons In recent years, “deep learning” AI models have often been touted as “working like the brain,” in that they are composed of artificial neurons mimicking those of biological brains. From the perspective of a neuroscientist, however, the differences between deep learning neurons and biological neurons are numerous and distinct. In this … Read more

Machine-Learning Real Estate Valuation: Not Only a Data Affair

Valuations are relatively straightforward yet still involved exercises when similar properties in terms of hedonic variables[i] (also called comparables) transacted in the market close to the valuation date. In the absence of reliable comparable transactions, the possible value of a piece of real estate (be it residential or commercial) needs to be assessed using a … Read more

Deep Learning — it`s not only about kitties in mobiles, or how we proceeded in locomotive bogies…

Few days ago Aurorai company sent system of defects and bogie status control recognition of Ermak locomotive for operational tests. This problem is uncommon and very interesting, first stage included evaluation of brake pad and bandage width condition. We managed to solve this task with accuracy up to 1 mm at locomotive speed not exceeding … Read more