Blockchain can be the new paradigm of the net

The popularization of blockchain will not depend on the users understanding its operation but on the existence of friendly and effective applications that solve real problems. Nov 22, 2018 Historically, each paradigm of the internet has had its killer application: before the web, it was email, with the original web it was Google and with … Read more Blockchain can be the new paradigm of the net

Combating media bias with AWS Comprehend

Nov 19, 2018 Photo by Randy Colas on Unsplash In the world of fake news and ideology-driven subjective media coverage, it is questionable which sources of journalism can be considered “reliable”. It happens many times that two different news outlets share two completely different takes on the same story. “Experts” point out different consequences of events … Read more Combating media bias with AWS Comprehend

Becoming An Analytics Manager Isn’t A Promotion.

(Photo by rawpixel on Unsplash) Nov 18, 2018 It’s A Career Change. Starting out as a data scientist may be the modern version of becoming a rock star but no-one really seems to be talking about what happens a few years further into your career. Analysing big data sets. Building models. Connecting data pipelines. The challenges … Read more Becoming An Analytics Manager Isn’t A Promotion.

Training your staff in data science? Here’s how to pick the right programming language

Businesses from every sector are investing in a data science education programmes. Working at tech education company Decoded, I have found it fascinating to see the immense value data skills can bring to every sector — from banks and retailers, to charities and government. When embarking on such an initiative, there are plenty of strategic decisions for … Read more Training your staff in data science? Here’s how to pick the right programming language

Kaggle: TGS Salt Identification Challenge

Nov 13, 2018 A few weeks ago finished TGS Salt Identification Challenge on the Kaggle, a popular platform for data science competitions. The task was to accurately identify if a subsurface target is a salt or not on seismic images. Our team: Insaf Ashrapov, Mikhail Karchevskiy, Leonid Kozinkin We finished 28th top 1% and would … Read more Kaggle: TGS Salt Identification Challenge

DOGNET: can an AI model fool a human?

The experiment was simple: could a machine learning (ML) model produce Golden Retriever images that people would mistake for being real? The reason for choosing dogs… was because dogs are awesome! In our current climate, we often hear the term ‘fake news’, and with ML models becoming more advanced, their ability to create non-human content … Read more DOGNET: can an AI model fool a human?

Telling Apart AI and Humans: #2 Photo VS GAN generated image

If you missed the 1st installement of this series, Humans vs Robots is here. Prompted by advances in Generative Adversarial Networks (GAN), a year ago I tweeted a thread about telling apart pictures taken with a camera from generated pictures. Here is the updated version of that thread. A few of my tips are still … Read more Telling Apart AI and Humans: #2 Photo VS GAN generated image

Installing Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

Image Source: www.mapr.com/products/apache-hadoop/ There are many links on the web about install Hadoop 3. Many of them are not working well or need improvements. This article is taken from the official documentation and other articles in addition of many answers from Stackoverflow.com Note: All prerequisites must be applied on name node and data nodes First, … Read more Installing Hadoop 3.1.0 multi-node cluster on Ubuntu 16.04 Step by Step

PyTorch 101 for Dummies like Me

Nov 5, 2018 What is PyTorch? It’s a Python-based package to serve as a replacement for Numpy arrays and to provide a flexible library forDeep Learning Development Platform. As for the why I prefer PyTorch over TensorFLow can be learned from this Fast AI’s blog post for the reason to switch to PyTorch. Or simply put, … Read more PyTorch 101 for Dummies like Me

The Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Austrian Quant The Austrian Quant is named after the Austrian School of Economics which serves as the inspiration for how I structured the portfolio. I designed a trading strategy composed of 3 different investment funds to gain a better understanding of investments, machine learning and programming and how they all combine together in the world … Read more The Austrian Quant: My Machine Learning Trading Algorithm Outperformed the SP500 For 10 Years

Industrial strength Natural Language Processing

I have spent much of my career as a graduate student researcher, and now as a Data Scientist in the industry. One thing I have come to realize is that a vast majority of solutions proposed both in academic research papers and in the workplace are just not meant to ship — they just don’t scale! And … Read more Industrial strength Natural Language Processing

Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Introduction In this series, I’ll explain how to create a chat bot that is capable of detecting sentiment, analyzing images, and finally having the basis of a evolving personality. This is part 1 of that series. The Pieces Ruby Sinatra Google Cloud APIs Line (a chat client) Since I live in Japan: I’ll be using … Read more Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Debugging a Machine Learning model written in TensorFlow and Keras

Things that could go wrong, and how to diagnose if they did. Oct 24, 2018 In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of dumb things, so please don’t judge. Cheat sheet. The numbers refer to sections in this article (https://bit.ly/2PXpzRh) 1 … Read more Debugging a Machine Learning model written in TensorFlow and Keras

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read more Introduction to Linear Regression in Python

A line-by-line layman’s guide to Linear Regression using TensorFlow

Computing the Graph With generate_dataset() and linear_regression(), we are now ready to run the program and begin finding our optimal gradient W and bias b! [line 2, 3] x_batch, y_batch = generate_dataset()x, y, y_pred, loss = linear_regression() In this run() function, we start off by calling generate_dataset() and linear_regression() to get x_batch, y_batch, x, y, y_pred … Read more A line-by-line layman’s guide to Linear Regression using TensorFlow

Perplexity Intuition (and Derivation)

The perplexity of a discrete probability distribution is defined as: from https://en.wikipedia.org/wiki/Perplexity where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events. In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of bits that we need to … Read more Perplexity Intuition (and Derivation)

Telling Apart AI and Humans: #1 Humans VS Androids

ALife 2018 conference, © Lana Sinapayen Prompted by a video where people thought a human was actually a hyper-realistic robot, I decided to write about how to spot humanoid robots. Here are a few tips! After spending so much time with Alter the android and various hyper-realistic robots, I know a thing or two about … Read more Telling Apart AI and Humans: #1 Humans VS Androids

The future of data visualization

Tools to shape the future In many product announcements from Google, Apple and BMW, more and more data will be overlaid in our physical environments through augmented reality or projection. That means not only will data be visualized more, but the visual reality around us will be turned into data. Data visualization of a new AR … Read more The future of data visualization

Waiting for Weekends: Some Insights on How to Select the Best Wine

There is a huge selection of wines on the market and as for a wine lover it is always a quest to select the best wine. US, France, Spain, Germany and many other wine countries with numerous varieties of wines are easily available in any liquor store. Price can also vary drastically. From my experience, … Read more Waiting for Weekends: Some Insights on How to Select the Best Wine

The Best Public Datasets for Machine Learning

First, a couple of pointers to keep in mind when searching for datasets. According to Carnegie Mellon University: 1.- A high-quality dataset should not be messy, because you do not want to spend a lot of time cleaning data. 2.- A high-quality dataset should not have too many rows or columns, so it is easy … Read more The Best Public Datasets for Machine Learning

The intuition behind Shannon’s Entropy

Now, back to our formula 3.49: The definition of Entropy for a probability distribution (from The Deep Learning Book) I(x) is the information content of X. I(x) itself is a random variable. In our example, the possible outcomes of the War. Thus, H(x) is the expected value of every possible information. Using the definition of expected … Read more The intuition behind Shannon’s Entropy

Forecasting Exchange Rates Using ARIMA In Python

Sep 29, 2018 Nearly all sectors use time series data to forecast future time points. Forecasting future can assist analysts and management in making better calculated decisions to maximise returns and minimise risks. I will be demonstrating how we can forecast exchange rates in this article. If you are new to finance and want to … Read more Forecasting Exchange Rates Using ARIMA In Python

‘I want to learn Artificial Intelligence and Machine Learning. Where can I start?’

I bought a plane ticket to the US with no return flight. I’d been studying for a year and I figured it was about time I started putting my skills into practice. My plan was to rock up to the US and get hired. Then Ashlee messaged me on LinkedIn, “Hey I’ve seen your posts … Read more ‘I want to learn Artificial Intelligence and Machine Learning. Where can I start?’

Object Detection using Deep Learning Approaches: An End to End Theoretical Perspective

Fast RCNN So the next idea from the same authors: Why not create convolution map of input image and then just select the regions from that convolutional map? Do we really need to run so many convnets? What we can do is run just a single convnet and then apply region proposal crops on the … Read more Object Detection using Deep Learning Approaches: An End to End Theoretical Perspective

Image Processing Class (EGBE443) #2 -Histogram

Computing the histogram In this section, the histogram was calculated by implementation of python programming code (Python 3.6). For python 3.6, There are a lot of common modules using in image processing such as Pillow, Numpy, OpenCV, etc. but in this program Pillow and Numpy module was used. To import the image from your computer, … Read more Image Processing Class (EGBE443) #2 -Histogram

Get system metrics for 5 min with Docker, Telegraf, Influxdb and Grafana

Hi all, there is a very quick guide how to configure a system monitoring for one or more servers using a modern stack of technologies like Grafana, Docker and Telegraf with Influxdb. The main goal for this article is to show how to start geting system metrics from your servers quick and easy, without spending … Read more Get system metrics for 5 min with Docker, Telegraf, Influxdb and Grafana

How to get fbprophet working on AWS Lambda

Solving package size issues of fbprophet serverless deployment Adi Goldstein / Unsplash I assume you’re reading this post because you’re looking for ways to use the awesome fbprophet (Facebook open source forecasting) library on AWS Lambda and you’re already familiar with the various issues around getting it done. I will be using a python 3.6 … Read more How to get fbprophet working on AWS Lambda

Machine Learning – Particle Swarm Optimization (PSO) and Twitter

We all live in a world where analyzing a massive set of unstructured data is becoming a business need. And the time we spend on the internet is basically the time we spend on social media. Even our daily life is affected by the people around us. And we are tending to change our opinions … Read more Machine Learning – Particle Swarm Optimization (PSO) and Twitter

Multi-Layer perceptron using Tensorflow

Sep 11, 2018 In this blog, we are going to build a neural network(multilayer perceptron) using TensorFlow and successfully train it to recognize digits in the image. Tensorflow is a very popular deep learning framework released by, and this notebook will guide for build a neural network with this library. If you want to understand … Read more Multi-Layer perceptron using Tensorflow

Diving into K-Means…

Sep 9, 2018 We have completed our first basic supervised learning model i.e. Linear Regression model in the last post here. Thus in this post we get started with the most basic unsupervised learning algorithm- K-means Clustering. Let’s get started without further ado! Background: K-means clustering as the name itself suggests, is a clustering algorithm, … Read more Diving into K-Means…

3 approaches for backtesting historical data

Reading and processing data for statistical and quantitative analysis in trading Sep 8, 2018 Anyone interested in the statistical analysis of financial markets has the need to process historical data. Historical data is needed in order to backtest or train: Quantitative trading. Statistical trading. Price action replay/walkthrough. Each need comes from different goals. 3 examples on … Read more 3 approaches for backtesting historical data

Why feature weights in a machine learning model are meaningless

Don’t make decisions based on the weights of an ML model Aug 31, 2018 As I see our customers fall in love with BigQuery ML, an old problem rises its head — I find that they can not resist the temptation to assign meaning to feature weights. “The largest weight in my model to predict customer lifetime value,” … Read more Why feature weights in a machine learning model are meaningless

Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read more Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Automatic Image Quality Assessment in Python

Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective topic. Image quality assessment aims to quantitatively represent the human perception of quality. These metrics are commonly used to analyze the performance of algorithms in … Read more Automatic Image Quality Assessment in Python

The One Probability Review That You Need

Probability and statistics are everywhere: from finance and demographic projections to casino games, these disciplines help us make sense of the world. They also underlie much of the machine learning apparatus that is the rage nowadays. What resources should we turn to, if we were to dust off our knowledge of them? (Disclaimer: I received … Read more The One Probability Review That You Need

Mapping the UK’s Traffic Accident Hotspots

While looking for some interesting geographical data to work with, I came across the Road Safety Data published by the UK government. This is a very comprehensive road accident data set that includes the incident’s geographical coordinates, as well as other related data such as the local weather conditions, visibility, police attendance and more. There … Read more Mapping the UK’s Traffic Accident Hotspots

What Does It Really Mean to Operationalize a Predictive Model?

It is not enough to just stand up a web service that can make predictions. Aug 13, 2018 Original Image Source — Meme overlay by Imgflip In a 2017 SAS survey, 83% of organizations have made moderate-to- significant investments in big data, but only 33% say they have derived value from their investments. Other more recent surveys have … Read more What Does It Really Mean to Operationalize a Predictive Model?

Practical tips for class imbalance in binary classification

4. Class weighted / cost sensitive learning Without resampling the data, one can also make the classifier aware of the imbalanced data by incorporating the weights of the classes into the cost function (aka objective function). Intuitively, we want to give higher weight to minority class and lower weight to majority class. scikit-learn has a … Read more Practical tips for class imbalance in binary classification

Feature Engineering for Healthcare Fraud Detection

The nature of the problem: medical fraud and abuse The U.S. department of health and human services in a pamphlet Avoiding Medicare Fraud and Abuse: A Roadmap for Physicians states “most physicians strive to work ethically, render high-quality medical care to their patients, and submit proper claims for payment,” yet “the presence of some dishonest … Read more Feature Engineering for Healthcare Fraud Detection

Math Behind Reinforcement Learning, the Easy Way

Aug 2, 2018 Photo by JESHOOTS.COM on Unsplash Look at this equation: Value function of Reinforcement Learning If it does not intimidate you, then you are a mathematical savvy and there is no point in reading this article 🙂 This article is not about teaching Reinforcement Learning (RL) but about explaining the math behind it. So it … Read more Math Behind Reinforcement Learning, the Easy Way

Cooking with Machine Learning: Dimension Reduction

Recently I came across this cooking recipes data set in Kaggle, and it inspired me to combine 2 of my main interests in life. Food and machine learning. What makes this data set special is that it contains recipes from 20 different cuisines, 6714 different ingredients, but only 26648 samples. Some cuisines have way fewer … Read more Cooking with Machine Learning: Dimension Reduction

An In-depth Review of Andrew Ng’s deeplearning.ai Speciliazation

So you’ve seen the recent news about how artificial intelligence (AI) is changing everything. However, the idea of AI has been around for a long time. Machines that think and talk like humans have been the inspiration for movies and stories for decades. But what’s the deal? Why has AI been getting better and better … Read more An In-depth Review of Andrew Ng’s deeplearning.ai Speciliazation

An Advanced Example of Tensorflow Estimators Part (1/3)

Estimators were introduced in version 1.3 of the Tensorflow API, and are used to abstract and simplify training, evaluation and prediction. If you haven’t worked with Estimators before I suggest to start by reading this article and get some familiarity as I won’t be covering all of the basics when using estimators. In no means … Read more An Advanced Example of Tensorflow Estimators Part (1/3)

Hypothesis Analysis Explained

Jul 19, 2018 Hypothesis analysis is a widely known concept and is used extensively by researchers, statisticians and quantitative analysts. It allows them to follow a set of formal steps to perform calculated analysis on their data. It is also widely used in machine learning and artificial intelligence. In this article, I will be explaining core concepts of … Read more Hypothesis Analysis Explained

PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset

Jul 8, 2018 In this tutorial we will discuss about integrating PySpark and XGBoost using a standard machine learing pipeline. We will use data from the Titanic: Machine learning from disaster one of the many Kaggle competitions. Before getting started please know that you should be familiar with Apache Spark and Xgboost and Python. The … Read more PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset

Acoustic Noise Cancellation by Machine Learning

DIY Noise-Cancellation System prototype made with TensorFlow. Jun 25, 2018 Image by TheDigitalArtist on Pixabay In this post I describe how I built an active noise cancellation system by means of neural networks on my own. I’ve just got my first results which I am sharing, but the system looks like a ravel of scripts, binaries, … Read more Acoustic Noise Cancellation by Machine Learning