Zooming In and Zooming Out

A Note on Qualitative Sample Sizes Zooming in or zooming out? How close a picture you need will impact your sample size. (Photos from @ryoji__iwata and @13on via unsplash.com) This article covers: Why small sample sizes are acceptable in qualitative research What the overall goals of qualitative research are How to determine sample size, and what … Read moreZooming In and Zooming Out

Unsupervised learning for anomaly detection in stock options pricing

3. Unsupervised learning for finding outliers (anomaly) What would be an anomaly? Anomaly in our case (and in general) would be any mismatch in the logic of the options. For example, the Bid (or Ask) prices of two call options that have the same Strike price but, say, 1–2 days difference in the Exercise day … Read moreUnsupervised learning for anomaly detection in stock options pricing

Understanding Encoder-Decoder Sequence to Sequence Model

In this article, I will try to give a short and concise explanation of the sequence to sequence model which have recently achieved significant results on pretty complex tasks like machine translation, video captioning, question answering etc. Prerequisites: the reader should already be familiar with neural networks and, in particular, recurrent neural networks (RNNs). In … Read moreUnderstanding Encoder-Decoder Sequence to Sequence Model

What is a Recurrent NNs and Gated Recurrent Unit (GRUS)

Photo by Tom Grimbert on Unsplash Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many sequential data and among others used by Apples Siri and Googles Voice Search. Their great advantage is that the algorithm remembers its input, due to an internal memory. But despite their recent popularity there exists a … Read moreWhat is a Recurrent NNs and Gated Recurrent Unit (GRUS)

Machine Learning for Anyone who Took Math in 8th Grade

Explaining modern “AI” with easy math, pop-culture references, and oversimplified analogies I usually see Artificial Intelligence explained in 1 of 2 ways: through the increasingly sensationalist perspective of the media, or through dense scientific literature riddled with superfluous language and field-specific terms. Source — a classic There’s a less publicized area in between these extremes where I think … Read moreMachine Learning for Anyone who Took Math in 8th Grade

Predicting the Frequency of Asteroid Impacts with a Poisson Processes

Simulating Asterid Impacts Our objective is to determine the probability distribution of the number of expected impacts in each size category which means we need a time range. To keep things in perspective, we’ll start with 100 years, about the lifespan of a human. This means our distribution will show the probabities for number of impacts … Read morePredicting the Frequency of Asteroid Impacts with a Poisson Processes

Time Series Analysis Tutorial Using Financial Data

VIX predictions from my ARMA (8,2) time window refitting model For my 2nd project at Metis I created a model that predicted the price of the CBOE volatility index (VIX) using a time series analysis. The VIX is a composite of option prices of popular stocks that indicate how much volatility is in the overall … Read moreTime Series Analysis Tutorial Using Financial Data

Semantic Segmentation of Aerial images Using Deep Learning

What is Semantic Segmentation?? What are its Practical Applications?? Semantic segmentation of drone images to classify different attributes is quite a challenging job as the variations are very large, you can’t expect the places to be same. And doing manual segmentation of this images to use it in different application is a challenge and a … Read moreSemantic Segmentation of Aerial images Using Deep Learning

Audio AI: isolating vocals from stereo music using Convolutional Neural Networks

From A to Z Ok, now that I’m done preaching, let’s get into what you came for! Just like with every other data problem that I’ve worked on in my career, I’ll begin by asking the question “how does the data look like”?. Let’s take a look at the following fragment of singing voice from an … Read moreAudio AI: isolating vocals from stereo music using Convolutional Neural Networks

February Edition: Data Visualization

8 of the best articles on visualizing data Data visualization is an essential step in any data science process. It’s the final bridge between the data scientist and end users. It communicates, validates, confronts and educates. And when done correctly, it opens up the insights from a data science project to a wider audience. Great … Read moreFebruary Edition: Data Visualization

Building a Better Profanity Detection Library with scikit-learn

Why existing libraries are uninspiring and how I built a better one. A few months ago, I needed a way to detect profanity in user-submitted text strings: This shouldn’t be that hard, right? I ended up building and releasing my own library for this purpose called profanity-check: Of course, before I did that, I looked in the … Read moreBuilding a Better Profanity Detection Library with scikit-learn

Making deep neural networks paint to understand how they work

It’s a mystery that deep learning works so well. Even though there are several hints about why deep neural networks are so effective, the truth is that nobody is entirely sure and theoretical understanding of deep learning is very much an active area of research. In this tutorial, we’ll scratch a tiny aspect of the … Read moreMaking deep neural networks paint to understand how they work

ML Algorithms: One SD (σ)- Regression

An intro to machine learning regression algorithms The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The … Read moreML Algorithms: One SD (σ)- Regression

A Guide to Data Visualisation in R for Beginners

Visualisation libraries in R R comes equipped with sophisticated visualisation libraries having great capabilities. Let us have a closer look at some of the commonly used ones. In this section, we will use the built-in mtcars dataset to show the uses of the various libraries. This dataset has been extracted from the 1974 Motor Trend US … Read moreA Guide to Data Visualisation in R for Beginners

Blender 2.8 Grease Pencil Scripting and Generative Art

5agadoBlockedUnblockFollowFollowing Feb 4 Quick, Draw! — Flock — Conway’s Game of Life What: learning the basics of scripting for Blender Grease-Pencil tool, with focus on generative art as a concrete playground. Less talking, more code (commented) and many examples. Why: mostly because we can. Also because Blender is a very rich ecosystem, and Grease-Pencil in version 2.8 is a powerful … Read moreBlender 2.8 Grease Pencil Scripting and Generative Art

A Beginner’s Tutorial on Building an AI Image Classifier using PyTorch

This is a step-by-step guide to build an image classifier. The AI model will be able to learn to label images. I use Python and Pytorch. Step 1: Import libraries When we write a program, it is a huge hassle manually coding every small action we perform. Sometimes, we want to use packages of code … Read moreA Beginner’s Tutorial on Building an AI Image Classifier using PyTorch

Is the #10YearChallenge A Sign of the AI Apocalypse?

Viral social media “challenges,” memes, and gimmicks have taken over our feeds in recent years. The term “challenge” is used loosely though since these viral sensations aren’t so much challenging as they are just unique ways to spice up your social media presence. But are they also signs of the impending AI apocalypse? Let’s look … Read moreIs the #10YearChallenge A Sign of the AI Apocalypse?

How It Feels to Learn Data Science in 2019

Seeing the (Random) Forest Through the (Decision) Trees The following is inspired by the article How it Feels to Learn JavaScript in 2016. Do not take this too seriously. This piece is just an opinion, much like people’s definition of data science. I heard you are the one to go to. Thank you for meeting … Read moreHow It Feels to Learn Data Science in 2019

Review: DRRN — Deep Recursive Residual Network (Super Resolution)

Up to 52 Convolutional Layers, With Global and Local Residual Learnings, Outperforms SRCNN, FSRCNN, ESPCN, VDSR, DRCN, and RED-Net. Digital Image Enlargement, The Need of Super Resolution In this story, DRRN (Deep Recursive Residual Network) is reviewed. With Global Residual Learning (GRL) and Multi-path mode Local Residual Learning (LRL), plus the recursive learning to control the … Read moreReview: DRRN — Deep Recursive Residual Network (Super Resolution)

Python Basics: Mutable vs Immutable Objects

Source: https://www.quora.com/Can-you-suggest-some-good-books-websites-for-learning-Python-for-a-layman After reading this blog post you’ll know: What are an object’s identity, type, and value What are mutable and immutable objects Introduction (Objects, Values, and Types) All the data in a Python code is represented by objects or by relations between objects. Every object has an identity, a type, and a value. Identity An … Read morePython Basics: Mutable vs Immutable Objects

Tweets Data Visualization with Circles and User Interaction

Adding Interactivity: Tweet Info by Click After plotting and packing all the circles, we can make each circle to work like a button. To achieve this, we can include help from the function fig.canvas.mpl_connect. The function can take two arguments, the first one is a string that corresponds to the type of interaction (in our case … Read moreTweets Data Visualization with Circles and User Interaction

Understanding Studies of Racial Demarcations

Studies of racial demarcations typically are implemented in context of what are referred to as regression analyses. Simply put, a regression enables assessments of relations between some variable of interest, say students’ test scores, and variables that define said students, such as race, family income, parents’ professions, parents’ education etc. Pictorially, with x’s denoting variables … Read moreUnderstanding Studies of Racial Demarcations

Learning aggregate functions

Machine Learning with relational data https://pixabay.com This article is inspired by the Kaggle competition https://www.kaggle.com/c/elo-merchant-category-recommendation . While I did not participate in the competition, I used the data to explore another problem that often arises working with realistic data. All machine learning algorithms work great with the tabular data, but in reality a lot of data … Read moreLearning aggregate functions

These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of…

Augmentation operations for NLP proposed in [this paper]. SR=synonym replacement, RI=random insertion, RS=random swap, RD=random deletion. The Github repository for these techniques can be found [here]. Data augmentation is commonly used in computer vision. In vision, you can almost certainly flip, rotate, or mirror an image without risk of changing the original label. However, in natural … Read moreThese are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of…

Transfer Learning using ELMO Embedding

Last year, the major developments in “Natural Language Processing” were about Transfer Learning. Basically, Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task. Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of … Read moreTransfer Learning using ELMO Embedding

Model-Free Prediction: Reinforcement Learning

Part 4: Model-Free Predictions with Monte-Carlo Learning, Temporal-Difference Learning and TD( λ) Previously, we looked at planning by dynamic programming to solve a known MDP. In this post, we will use model-free prediction to estimate the value function of an unknown MDP. i.e We will look at policy evaluation of an unknown MDP. This series of … Read moreModel-Free Prediction: Reinforcement Learning

Matplotlib Tutorial: Learn basics of Python’s powerful Plotting library

What is Matplotlib To make necessary statistical inferences, it becomes necessary to visualize your data and Matplotlib is one such solution for the Python users. It is a very powerful plotting library useful for those working with Python and NumPy. The most used module of Matplotib is Pyplot which provides an interface like MATLAB but … Read moreMatplotlib Tutorial: Learn basics of Python’s powerful Plotting library

Introduction to TWO approaches of Content-based Recommendation System

A complete guide to resolve the confusion Content-based filtering is one of the common methods in building recommendation systems. While I tried to do some research in understanding the detail, it is interesting to see that there are 2 approaches that claim to be “Content-based”. Below I will share my findings and hope it can … Read moreIntroduction to TWO approaches of Content-based Recommendation System

Machine Learning and Particle Motion in Liquids: An Elegant Link

The gradient descent algorithm is one of the most popular optimization techniques in machine learning. It comes in three flavors: batch or “vanilla” gradient descent (GD), stochastic gradient descent (SGD), and mini-batch gradient descent which differ in the amount of data used to compute the gradient of the loss function at each iteration. The goal … Read moreMachine Learning and Particle Motion in Liquids: An Elegant Link

Three steps for a successful machine learning project

Less technical considerations to make for all ML projects As people and companies venture into machine learning (ML), it is common for some to expect to dive right into building models and generating useful output. And while some parts of ML feel like this technical wizardry with magical predictions, there are other aspects that are less … Read moreThree steps for a successful machine learning project

Contextual Embeddings for NLP Sequence Labeling

Text representation (aka text embeddings) is a breakthrough of solving NLP tasks. At the beginning, single word vector represent a word even though carrying different meaning among context. For example, “Washington” can be a location, name or state. “University of Washington” Zalando released an amazing NLP library, flair, makes our life easier. It already implement … Read moreContextual Embeddings for NLP Sequence Labeling

Deep Learning with Satellite Data

“The rockets and the satellites, spaceships that we’re creating now,we’re pollinating the universe.” -Neil Young Overview— Satellite Data—Data Collection— Model — Results Overview While at the University of Sannio in Benevento, Italy this January, my friend Tuomas Oikarinen and I created a (semi-automated) pipeline for downloading publicly available images, and trained a 3-D Convolutional Neural Network on … Read moreDeep Learning with Satellite Data

Making Programming Easier with Keyboard Macros — Video

A recent video from Linus Tech Tips introduced how one of their editors uses macros for video editing. This got me thinking; can macros be easily created to improve my programming? This video demonstrates how creating code macros can be achieved and how useful it can be: Background Source: Linus Tech Tips — Can your Keyboard do … Read moreMaking Programming Easier with Keyboard Macros — Video

Unsupervised Feature Learning

Deep Convolutional Networks on Image tasks take in Image Matrices of the form (height x width x channels) and process them into low-dimensional features through a series of parametric functions. Supervised and Unsupervised Learning tasks both aim to learn a semantically meaningful representation of features from raw data. Training Deep Supervised Learning models requires a … Read moreUnsupervised Feature Learning

Predicting Kickstarter Campaign Success with Gradient Boosted Decision Trees: A Machine Learning…

Fitting the models, evaluating performance, choosing a final model, and predicting on a new (totally real) campaign Another common thing in the data science workflow is trying out multiple models. There are ways to minimize the effort in this stage based on what you want to accomplish or what the dataset is/what the problem is (you … Read morePredicting Kickstarter Campaign Success with Gradient Boosted Decision Trees: A Machine Learning…

Best practices in Ads Search

Big Data is the process of collecting and analyzing large amounts of information. The complexity and large volume of data that our society currently generates has made it impossible to capture, manage, process or analyse with the technologies we know so far. Big Data embraces five features: volume (manages terabytes or petabytes of information), variety … Read moreBest practices in Ads Search

Modeling cumulative impact — Part I

Create simple features of cumulative impact, predict sports performance with the fitness-fatigue model “Little by little, a little becomes a lot.” -Tanzanian proverb Welcome to Modeling cumulative impact, a series that views the cumulative impact of athletic training on sports performance through a variety of modeling lenses. The journey starts here in Part I with … Read moreModeling cumulative impact — Part I

What should you learn from Google Ads Search to manage your business advertising?

In addition to the keywords, there are other concepts that we will need, to be able to work with Ads; we will explain the most important: Impressions: frequency with which an ad is published. The Cost per Click (CPC): is the amount that we will pay for each click that users make in our ad. … Read moreWhat should you learn from Google Ads Search to manage your business advertising?

WTF is image classification?

Conquering convolutional neural networks for the curious and confused Photo by Micheile Henderson on Unsplash “One thing that struck me early is that you don’t put into a photograph what’s going to come out. Or, vice versa, what comes out is not what you put in.” ― Diane Arbus A notification pops up on your favorite social … Read moreWTF is image classification?

Review: DCN — Deformable Convolutional Networks, 2nd Runner Up in 2017 COCO Detection (Object…

With Deformable Convolution, Improved Faster R-CNN and R-FCN, Got 2nd Runner Up in COCO Detection & 3rd Runner Up in COCO Segmentation. After reviewed STN, this time, DCN (Deformable Convolutional Networks), by Microsoft Research Asia (MSRA), is reviewed. (a) Conventional Convolution, (b) Deformable Convolution, (c) Special Case of Deformable Convolution with Scaling, (d) Special Case … Read moreReview: DCN — Deformable Convolutional Networks, 2nd Runner Up in 2017 COCO Detection (Object…

Statistics is the Grammar of Data Science — Part 3/5

Moments Moments describe various aspects of the nature and shape of our distribution. #1 — The first moment is the mean of the data, which describes the location of the distribution. #2 — The second moment is the variance, which describes the spread of the distribution. High values are more spread out than smaller values. #3 — The third moment is … Read moreStatistics is the Grammar of Data Science — Part 3/5

Comparing Different Classification Machine Learning Models for an imbalanced dataset

A data set is called imbalanced if it contains many more samples from one class than from the rest of the classes. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class) while other classes make up the majority. In this scenario, … Read moreComparing Different Classification Machine Learning Models for an imbalanced dataset

ML Algorithms: One SD (σ)- Instance-based Algorithms

An intro to machine learning instance-based algorithms TThe obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The … Read moreML Algorithms: One SD (σ)- Instance-based Algorithms

The Data Driven Partier: Movie Mustache

The concept behind ‘Movie Mustache’ is simple, but revolutionary. Watch a movie with friends but with a mustache or two on the TV — whenever the mustache lines up with a character’s upper lip, everyone drinks. This game was foreign to me until a few weeks ago when I got to experience it watching the Adam Sandler … Read moreThe Data Driven Partier: Movie Mustache

Comparing Python Virtual Environment tools

Thanks to Keith Smith, Alexander Mohr, Victor Kirillov and Alain SPAITE for recommending pew, venv and pipenv. I just love the community that we have on Medium. I recently published an article on using Virtual Environments for Python projects. The article was well received and the feedback from readers opened a new view for me. … Read moreComparing Python Virtual Environment tools