Why feature weights in a machine learning model are meaningless

Don’t make decisions based on the weights of an ML model Aug 31, 2018 As I see our customers fall in love with BigQuery ML, an old problem rises its head — I find that they can not resist the temptation to assign meaning to feature weights. “The largest weight in my model to predict customer lifetime value,” … Read moreWhy feature weights in a machine learning model are meaningless

Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read moreDoing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Introduction to stochastic control theory

I had my first contact with stochastic control theory in one of my Master’s courses about Continuous Time Finance. I found the subject really interesting and decided to write my thesis about optimal dividend policy which is mainly about solving stochastic control problems. In this post I want to give you a brief overview of … Read moreIntroduction to stochastic control theory

Automatic Image Quality Assessment in Python

Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective topic. Image quality assessment aims to quantitatively represent the human perception of quality. These metrics are commonly used to analyze the performance of algorithms in … Read moreAutomatic Image Quality Assessment in Python

Neural Processes: Probabilistic Gaussian Process+Deep Learning

Neural Processes (NPs) caught my attention as they essentially are a neural network (NN) based probabilistic model which can represent a distribution over stochastic processes. So NPs combine elements from two worlds:

Deep Learning – neural networks are flexible non-linear functions which are straightforward to train
Gaussian Processes – GPs offer a probabilistic framework for learning a distribution over a wide class of non-linear functions

The One Probability Review That You Need

Probability and statistics are everywhere: from finance and demographic projections to casino games, these disciplines help us make sense of the world. They also underlie much of the machine learning apparatus that is the rage nowadays. What resources should we turn to, if we were to dust off our knowledge of them? (Disclaimer: I received … Read moreThe One Probability Review That You Need

Mapping the UK’s Traffic Accident Hotspots

While looking for some interesting geographical data to work with, I came across the Road Safety Data published by the UK government. This is a very comprehensive road accident data set that includes the incident’s geographical coordinates, as well as other related data such as the local weather conditions, visibility, police attendance and more. There … Read moreMapping the UK’s Traffic Accident Hotspots

R Functions for Bayesian Stats and Summaries

A new update of my sjstats-package just arrived at CRAN. This blog post demontrates those functions of the sjstats-package that deal especially with Bayesian models. The update contains some new and some revised functions to compute summary statistics of Bayesian models, which are now described in more detail.

Google’s AutoML Killer: Auto-Keras Opensource Automated ML

Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search … Read moreGoogle’s AutoML Killer: Auto-Keras Opensource Automated ML

Multiplicative RNN-LSTM for Sequence-based Recommenders

Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost everyone in different domains like e-commerce, social media, or entertainment. Quite often the dimension of time plays a dominant role in the generation of a relevant recommendation. Which user interaction occurred just before … Read moreMultiplicative RNN-LSTM for Sequence-based Recommenders

A Guide to Restricted Boltzmann Machines Using Pytorch

A Boltzmann machine defines a probability distribution over binary-valued patterns. What makes Boltzmann machine models different from other deep learning models is that they’re undirected and don’t have an output layer. The other key difference is that all the hidden and visible nodes are all connected with each other. Due to this interconnection, Boltzmann machines … Read moreA Guide to Restricted Boltzmann Machines Using Pytorch

What Does It Really Mean to Operationalize a Predictive Model?

It is not enough to just stand up a web service that can make predictions. Aug 13, 2018 Original Image Source — Meme overlay by Imgflip In a 2017 SAS survey, 83% of organizations have made moderate-to- significant investments in big data, but only 33% say they have derived value from their investments. Other more recent surveys have … Read moreWhat Does It Really Mean to Operationalize a Predictive Model?

Practical tips for class imbalance in binary classification

4. Class weighted / cost sensitive learning Without resampling the data, one can also make the classifier aware of the imbalanced data by incorporating the weights of the classes into the cost function (aka objective function). Intuitively, we want to give higher weight to minority class and lower weight to majority class. scikit-learn has a … Read morePractical tips for class imbalance in binary classification

Azure SQL DWH – Overview

There are a multitude of options when it comes to storing and processing data. In this post I want to give you a brief overview of Azure SQL datawarehouse, Microsoft’s datawareshouse solution for the Azure cloud and its answer to Amazon Redshift on AWS. I will start of by talking briefly about its technical architecture … Read moreAzure SQL DWH – Overview

Math Behind Reinforcement Learning, the Easy Way

Aug 2, 2018 Photo by JESHOOTS.COM on Unsplash Look at this equation: Value function of Reinforcement Learning If it does not intimidate you, then you are a mathematical savvy and there is no point in reading this article 🙂 This article is not about teaching Reinforcement Learning (RL) but about explaining the math behind it. So it … Read moreMath Behind Reinforcement Learning, the Easy Way

Cooking with Machine Learning: Dimension Reduction

Recently I came across this cooking recipes data set in Kaggle, and it inspired me to combine 2 of my main interests in life. Food and machine learning. What makes this data set special is that it contains recipes from 20 different cuisines, 6714 different ingredients, but only 26648 samples. Some cuisines have way fewer … Read moreCooking with Machine Learning: Dimension Reduction