﻿ Towards Data Science – Page 90 – Data Science Austria

## Binary Tree: The Diameter.

Dynamic programming sequences sub-problems together, having each sub-problem lead to the solution. With dynamic programming we no longer have to visits paths we’ve been down before, instead we can prune the shorter branches and track the diameter at each step. With the dynamic approach the algorithm travels down the tree and counts lengths on the … Read moreBinary Tree: The Diameter.

## Revisiting Adam Smith’s Invisible Hand in the Data Economy

Fundamental paradigms of the free market should also be scrutinized by data science An unobservable market force that helps the demand and supply of goods in a free market to reach equilibrium automatically and efficiently is what we call the invisible hand. But I am a data scientist, I don’t deal in unobservable forces, no observations … Read moreRevisiting Adam Smith’s Invisible Hand in the Data Economy

## Set Theory — Cardinality & Power Sets

With basic notation & operations cleared in articles one & two in this series, we’ve now built a fundamental understanding of Set Theory. This third article further compounds this knowledge by zoning in on the most important property of any given set: the total number of unique elements it contains. Also known as the cardinality, … Read moreSet Theory — Cardinality & Power Sets

## Fast, static D3 maps built with Turf.js and the command-line

Combining Mike Bostock’s command-line cartography tutorial with the flexibility of Node.js Estimated percent of undocumented residents in U.S. metro areas. Source: Pew Research Center Recently, I needed to build a handful of U.S. state bubble maps to be embedded in a story for San Antonio Express-News. I wanted to use D3 but was concerned about slow asset … Read moreFast, static D3 maps built with Turf.js and the command-line

## Introduction to Unsupervised Learning

Understand principal component analysis (PCA) and clustering methods Photo by Oscar Keys on Unsplash Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are interested in finding … Read moreIntroduction to Unsupervised Learning

## The Basics of Cryptography

With Applications in R Source Have you ever wondered how companies securely store your passwords? Or how your credit card information is kept private when making online purchases? The answer is cryptography. The vast majority of internet sites now use some form of cryptography to ensure the privacy of its users. Even information such as emails … Read moreThe Basics of Cryptography

## Assessing NHL award winners using K-means

Data sets The final data-set used is a combination of traditional and advanced player metrics. Traditional statistics concern metrics like goals and assists (total being known as points), plus-minus, penalty minutes and time on ice, whilst advanced player metrics deal more with player behavior and puck possession. Using Python’s beautifulsoup library, I scraped more traditional … Read moreAssessing NHL award winners using K-means

## Reinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

In this article I want to share my project on implementing reinforcement learning and deep reinforcement learning methods on a Tic Tac Toe game. The article contains: 1. Rigorous definition of the game as a Markov decision process. 2. How to implement the reinforcement learning method, called TD(0), to create an agent that plays the … Read moreReinforcement Learning and Deep Reinforcement Learning with Tic Tac Toe

## On Canonical Companies

In Information Technology and tech startups, we talk about “systems of record”. It’s a system that is the source for a particular data that may also exist in other systems. That “system or record” is the ultimate source of truth. It’s the canonical record of data. We often talk about these systems as the place … Read moreOn Canonical Companies

## H2O for Inexperienced Users

Some background: I am a senior in highschool, and the summer of 2018, I interned at H2O.ai. With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer … Read moreH2O for Inexperienced Users

## Software 2.0 — Playing with Neural Networks (Part 1)

In this article we are going to discuss about neural networks (from scratch), the innovative concept, which has taken the world by storm. I will assume that the reader is already familiar with the following concepts: Cost function (MSE and Cross Entropy) Gradient Descent Logistic regression Activation Function Binary Classification Particularly, this article will try … Read moreSoftware 2.0 — Playing with Neural Networks (Part 1)

## Sentiment Analysis with Word Bags and Word Sequences

For generic text, word bag approaches are very efficient at text classification. For a binary text classification task studied here, LSTM working with word sequences is on par in quality with SVM using tf-idf vectors. But performance is a different matter… The bag-of-words approach to turning documents into numerical vectors ignores the sequence of words … Read moreSentiment Analysis with Word Bags and Word Sequences

## Data Visualization in Music

Last fall I went to an Edward Tufte lecture where he began with a very effective, very sweet video of music visualized in a sequence: Tufte knows this is a great and charming start to a lecture. He knows it provides a welcome change from the outside world; an elegant fusion of music, color, and … Read moreData Visualization in Music

## Visualising Machine Learning Datasets with Google’s FACETS.

Data Although you can work with data provided on the demo page, I shall be working with another set of data. I will be doing EDA with FACETS on the Load Prediction Dataset. The problem statement is to predict whether an applicant who has been granted a loan by a company, will repay it back … Read moreVisualising Machine Learning Datasets with Google’s FACETS.

## The Science Behind AlphaStar

How DeepMind Uses Reinforcement Learning to Beat Human Pros in StarCraft II Long term strategic planning has long been considered a unique quality of the human mind that would be very difficult to imitate by artificial intelligence(AI) agents. Conceptually, strategic thinking involves evaluating a large number of data points in the present in order to … Read moreThe Science Behind AlphaStar

## Main benefits of using a Chatbot for your business

I’m going to tell you about the future — messenger chatbots. What is Messenger Chatbot and why it is crucial for your business? Basically, It’s a digital assistant, most of the times based on AI that has the direction of various commands which looks like a natural sound conversation with your customers. Now you have the opportunity … Read moreMain benefits of using a Chatbot for your business

## Uncertainty estimation for Neural Network — Dropout as Bayesian Approximation

Uncertainty Estimation One of the key distinction about Bayesian is that parameters are distributions instead of fixed weights. Error = Model Uncertainty + Model misspecification + inherent noise The Bayesian neural network decomposes uncertainty into model uncertainty, model misspecification, and inherent noise. MCDropout MCDropout One of the key here in Bayesian is that everything is … Read moreUncertainty estimation for Neural Network — Dropout as Bayesian Approximation

## Machine Learning Techniques applied to Stock Price Prediction

Image generated using Neural Style Transfer. Machine learning has many applications, one of which is to forecast time series. One of the most interesting (or perhaps most profitable) time series to predict are, arguably, stock prices. Recently I read a blog post applying machine learning techniques to stock price prediction. You can read it here. … Read moreMachine Learning Techniques applied to Stock Price Prediction

## Sliding Puzzle – Solving Search Problem with Iterative Deepening A*

Now as we are more familiar with the game, let’s solve it! Search Algorithms Let’s begin our Graph Traversal journey with visualizing and setting our problem. “A goal properly set is halfway reached.” Zig Ziglar Problem Given a board state, find a combination of moves that leads to the final state. Graph Representation Now that we … Read moreSliding Puzzle – Solving Search Problem with Iterative Deepening A*

## What follows AlphaStar for Academic AI Researchers?

DeepMind continues making progress, but the path forward for AI researchers in academia is unclear. Ten years ago I challenged AI researchers across the globe to build a professional-level bot for StarCraft 1. The Brood War API was recently released, and for the first time academics and professionals could test out AI systems on a highly-competitive … Read moreWhat follows AlphaStar for Academic AI Researchers?

## Using AI For Good

How to Help Developing Countries with Artificial Intelligence CE KanBlockedUnblockFollowFollowing Jan 27 Recently, I have come across quite a few articles stating how artificial intelligence may threaten the developing world by eliminating the need for repetitive, labor-intensive manufacturing roles. Automation of factories can potentially lead to higher unemployment rates in poorer nations, thereby disrupting local … Read moreUsing AI For Good

## Hierarchical Bayesian Modeling for Ford GoBike Ridership with PyMC3 — Part II

Photo by sabina fratila on Unsplash In the first part of this series, we explored the basics of using a Bayesian-based machine learning model framework, PyMC3, to construct a simple Linear Regression model on Ford GoBike data. In this example problem, we aimed to forecast the number of riders that would use the bike share tomorrow … Read moreHierarchical Bayesian Modeling for Ford GoBike Ridership with PyMC3 — Part II

## Handling imbalanced datasets in machine learning

Reworking the problem is better Up to now the conclusion is pretty disappointing: if the dataset is representative of the true data, if we can’t get any additional feature and if we target a classifier with the best possible accuracy, then a “naive behaviour” (answering always the same class) is not necessarily a problem and should … Read moreHandling imbalanced datasets in machine learning

## Interactive Controls for Jupyter Notebooks

How to use interactive IPython widgets to enhance data exploration and analysis There are few actions less efficient in data exploration than re-running the same cell over and over again, each time slightly changing the input parameters. Despite knowing this, I still find myself repeatedly executing cells just to make the slightest change, for example, choosing … Read moreInteractive Controls for Jupyter Notebooks

## Understanding Entity Embeddings and It’s Application

As of late I’ve been reading a lot on entity embeddings after being tasked to work on a forecasting problem. The task at hand was to predict the salary of a given job title, given the historical job ads data that we have in our data warehouse. Naturally, I just had to seek out how … Read moreUnderstanding Entity Embeddings and It’s Application

## Mario vs. Wario — round 2: CNNs in PyTorch and Google Colab

Since quite some time I was getting round to playing with Google Colab (yes, free access to GPU…). I think this is a really awesome initiative, which enables people with no GPU on their personal computers to play around with Deep Learning and train model they would not be able to train otherwise. Basically we … Read moreMario vs. Wario — round 2: CNNs in PyTorch and Google Colab

## Degrees of Freedom and Sudoko

Intuitive explanation of Degrees of Freedom and How Degrees of Freedom affects Sudoku Source : Pixabay A lot of aspiring Data Scientists take courses on statistics and get befuddled with the concept of Degrees of Freedom. Some memorize it by rote as ‘n-1′. But there is a intuitive reason why it is ‘n-1’. The Intuitive … Read moreDegrees of Freedom and Sudoko

## 10 Tips for Choosing the Optimal Number of Clusters

Matt.0BlockedUnblockFollowFollowing Jan 27 Photo by Pakata Goh on Unsplash Clustering is one of the most common unsupervised machine learning problems. Similarity between observations is defined using some inter-observation distance measures or correlation-based distance measures. There are 5 classes of clustering methods: + Hierarchical Clustering+ Partitioning Methods (k-means, PAM, CLARA)+ Density-Based Clustering+ Model-based Clustering+ Fuzzy Clustering My … Read more10 Tips for Choosing the Optimal Number of Clusters

## A Gentle Introduction to Deep Learning : Part 3

PCA & Linear Algebra(Advance) Photo by Antoine Dautry “You can’t build great building on a weak foundation”. This quote truly justifies what I am trying to do here, you cannot learn the true form of machine learning or deep learning until you don’t have the knowledge of some of the important mathematical concepts like linear algebra … Read moreA Gentle Introduction to Deep Learning : Part 3

## Data Augmentation for Natural Language Processing

Lessons learned from a hate speech detection task to improve supervised NLP models Note: this post is mainly targeted at an audience unfamiliar with Natural Language Processing and will hence cover some basics concepts before moving on to data augmentation Source: Harvard Political Review Natural Language Processing (NLP) has become increasingly popular in both academia and … Read moreData Augmentation for Natural Language Processing

## Learning to Drive Smoothly in Minutes

Learning to Drive in Minutes — The Updated Approach Although Wayve.ai technique may work in principle, it has some issues that needs to be addressed to apply it to a self-driving RC car. First, because the feature extractor (VAE) is trained after each episode, the distribution of features is not stationary. That is to say, the features are … Read moreLearning to Drive Smoothly in Minutes

## The New Dawn of AI: Federated Learning

The emerging AI market model is dominated by tech giants such as Google, Amazon and Microsoft, who offer cloud-based AI solutions and APIs. This model offers users little control over the usage of AI products and their own data that is collected from their devices, locations etc. In the long run, such a centralized model … Read moreThe New Dawn of AI: Federated Learning

## Analytics Building Blocks: Regression

A modularized notebook to tune and compare 11 regression algorithms with minimal coding in a control panel fasion This article summarizes and explains key modules of my regression block (One of the simple modularized notebooks I am developing to execute common analysis tasks). The notebook is intended to facilitate quicker experimentation for the users with … Read moreAnalytics Building Blocks: Regression

## Generative Adversarial Networks — Learning to Create

A peek into the design, training, loss functions and arithmetic behind GANs Let’s say we have a dataset of images of bedrooms and an image classifier CNN that was trained on this dataset to tells us if a given input image is a bedroom or not. Let’s say the images are of size 16 * 16. … Read moreGenerative Adversarial Networks — Learning to Create

## Machine Learning from First Principles

Machine Learning ~ Applied Mathematics https://bit.ly/2Wns7eN Roadmap Goal: First and foremost machine learning carries with it this connotation that it is extremely complex. While it is mathematically rigorous it is really simple when you break it down into mathematical terms and even more simple to grasp once you see a real world example of how … Read moreMachine Learning from First Principles

## Tensorflow — The core concepts

[source: https://tensorflow.org] Like most machine learning libraries, TensorFlow is “concept-heavy and code-lite”. The syntax is not very difficult to learn. But it is very important to understand its concepts. What is a Tensor? According to the Wikipedia, “A tensor is a geometric object that maps in a multi-linear manner geometric vectors, scalars, and other tensors to … Read moreTensorflow — The core concepts

## Understanding Markov Decision Processes

At a high level intuition, a Markov Decision Process(MDP) is a type of mathematics model that is very useful for machine learning, reinforcement learning to be specific. The model allows machines and agents to determine the ideal behavior within a specific environment, in order to maximize the model’s ability to achieve a certain state in … Read moreUnderstanding Markov Decision Processes

## How to store financial market data for backtesting

I am working on moderately large financial price data sets. By moderately large I mean less than 4 million rows per asset. 4 million rows can cover the last 20 years of minute price bars done by a regular asset without extended trading hours — such as index futures contracts or regular cash stocks — . When dealing with … Read moreHow to store financial market data for backtesting

## Learning NLP Language Models with Real Data

Part 2: Applying Language Models to Real Data Data Source and Pre-Processing For this demonstration, we will be using the IMDB large movie review dataset made available by Stanford. The data contains the rating given by the reviewer, the polarity and the full comment. For example, the first negative comment here in full is the following: … Read moreLearning NLP Language Models with Real Data

## How Twitter does it? Challenges in implementing recommender systems at scale

A summarized view of the challenges in implementing recommender systems from an industry point of view Most of the times data science projects stop at achieving some satisfactory accuracy based on a subset of data. This is the case with recommender systems also. In a controlled environment and with a limited dataset, it might be possible … Read moreHow Twitter does it? Challenges in implementing recommender systems at scale

## Analyzing and Predicting Starbucks’ Location Strategy

Logistic Regression Prediction A basic logistic regression using demographic variables can correctly predict about 60% of zip codes that have a Starbucks and 90% of those that don’t. Given the unbalanced nature of the data set — 31K observations and ~5,500 with a Starbucks — a 60% prediction rate should be sufficient for the purposes of this exercise. Our … Read moreAnalyzing and Predicting Starbucks’ Location Strategy

## Python Virtual Environments made easy

I was starting a project where I had to quickly check if a package, Flask, worked with the Python installed on my machine. As I ran the command to install Flask, it alerted me that the package was already installed as I had Anaconda on my machine. But when I tried to run the Hello … Read morePython Virtual Environments made easy

## Hypothesis Testing Glossary for the Weary Reader

From “alpha” to “z-score” TL;DR — Jump to glossary Why So Weary? When I try to read about statistics I get mired in the jargon. Even just moving past the phrase, “For a given parameterized distribution,” requires that I think about what it means for something to be “parameterized” and what a “distribution” is. I wind up reading in … Read moreHypothesis Testing Glossary for the Weary Reader

## Artificial Neural Network Implementation using NumPy and Classification of the Fruits360 Image…

This tutorial builds artificial neural network in Python using NumPy from scratch in order to do an image classification application for the Fruits360 dataset. Everything (i.e. images and source codes) used in this tutorial, rather than the color Fruits360 images, are exclusive rights for my book cited as “Ahmed Fawzy Gad ‘Practical Computer Vision Applications … Read moreArtificial Neural Network Implementation using NumPy and Classification of the Fruits360 Image…

## Quick guide to run your Python scripts on Google Colaboratory

If you are looking for an interactive way to run your Python script, say you want to start a machine learning project with a couple of friends, look no further — Google Colab is the best solution for you. You can work online and save your code on your local Google Drive, and it allows you to … Read moreQuick guide to run your Python scripts on Google Colaboratory

## How to Learn More in Less Time with Natural Language Processing (Part 2)

And how to create your own bag of words classifier With the nifty extractive text summarizer we created in Part 1, we were able to take news articles and cut them down to half their size or more! Now it is time to take these articles and classify them by subject. In this part we … Read moreHow to Learn More in Less Time with Natural Language Processing (Part 2)

## How to Learn More in Less Time with Natural Language Processing (Part 1)

And how to create your own extractive text summarizer Imagine you are given an assignment from school or work that involves A LOT of research. You spend all night grinding it out, so you can acquire the knowledge you need for a high-quality end product. Now imagine you are given the exact same assignment and … Read moreHow to Learn More in Less Time with Natural Language Processing (Part 1)

## User guide to My First Data Product: Medium Post Metric Displayer

Know Your Medium Post Better with Data Origin As a regular writer on Medium as well as a data geek, after the busy year of 2018, I’d like to reflect what I have achieved on my Medium blog. Furthermore, based on the performance in 2018, I plan to make more aggressive writing plan in the year … Read moreUser guide to My First Data Product: Medium Post Metric Displayer

## EMPOWERING A CITIZEN DATA SCIENTIST FOR HARDWARE DESIGN & MANUFACTURING

Improving productivity of a hardware design and manufacturing professional with an advanced AI tool Authors: Partha Deka and Rohit Mittal What is a citizen data scientist? Expert data scientists rely on custom coding to make sense out of data. The use case could be data cleansing, data imputation, creating segments, finding patterns in the data, … Read moreEMPOWERING A CITIZEN DATA SCIENTIST FOR HARDWARE DESIGN & MANUFACTURING

## How to do Bayesian hyper-parameter tuning on a blackbox model

Optimization of arbitrary functions on Cloud ML Engine Google Cloud ML Engine offers a hyper-parameter tuning service that uses Bayesian methods. It is not restricted to TensorFlow or scikit-learn. In fact, it is not even limited to machine learning. You can use the Bayesian approach to tune pretty much any blackbox model. To demonstrate, I’ll tune … Read moreHow to do Bayesian hyper-parameter tuning on a blackbox model