Doing XGBoost hyper-parameter tuning the smart way — Part 1 of 2

Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read more

Introduction to stochastic control theory

I had my first contact with stochastic control theory in one of my Master’s courses about Continuous Time Finance. I found the subject really interesting and decided to write my thesis about optimal dividend policy which is mainly about solving stochastic control problems. In this post I want to give you a brief overview of … Read more

Automatic Image Quality Assessment in Python

Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective topic. Image quality assessment aims to quantitatively represent the human perception of quality. These metrics are commonly used to analyze the performance of algorithms in … Read more

Neural Processes: Probabilistic Gaussian Process+Deep Learning

Neural Processes (NPs) caught my attention as they essentially are a neural network (NN) based probabilistic model which can represent a distribution over stochastic processes. So NPs combine elements from two worlds:

Deep Learning – neural networks are flexible non-linear functions which are straightforward to train
Gaussian Processes – GPs offer a probabilistic framework for learning a distribution over a wide class of non-linear functions

Categories Featured ExcerptFavorite

Opensourcing TransmogrifAI: Automated ML for Structured Data

Despite huge progress in machine learning over the past decade, building production-ready machine learning systems is still hard. Three years ago when we set out to build machine learning capabilities into the Salesforce platform, we learned that building enterprise-scale machine learning systems is even harder.

Categories Featured ExcerptFavorite

Program Sythesis: Can We Teach Computers to Write Code?

Can we teach computers to write code? This is the question that brings out an entire branch of research specialized in program synthesis. Programming is a demanding task that requires extensive knowledge, experience and not a frivolous degree of creativity.

Categories Featured ExcerptFavorite

The One Probability Review That You Need

Probability and statistics are everywhere: from finance and demographic projections to casino games, these disciplines help us make sense of the world. They also underlie much of the machine learning apparatus that is the rage nowadays. What resources should we turn to, if we were to dust off our knowledge of them? (Disclaimer: I received … Read more

Differentiable Rendering

Sounds cool, but … what is it? As I’ve started to pay more attention to machine learning, differentiable rendering is one topic that caught my attention and has been popping up with some frequency. My first thought was, “cooooool is this a new system for generating pixels that somehow can leverage machine learning?” After digging … Read more

Mapping the UK’s Traffic Accident Hotspots

While looking for some interesting geographical data to work with, I came across the Road Safety Data published by the UK government. This is a very comprehensive road accident data set that includes the incident’s geographical coordinates, as well as other related data such as the local weather conditions, visibility, police attendance and more. There … Read more

R Functions for Bayesian Stats and Summaries

A new update of my sjstats-package just arrived at CRAN. This blog post demontrates those functions of the sjstats-package that deal especially with Bayesian models. The update contains some new and some revised functions to compute summary statistics of Bayesian models, which are now described in more detail.

Categories R ExcerptFavorite

Google’s AutoML Killer: Auto-Keras Opensource Automated ML

Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search … Read more

Categories Python ExcerptFavorite

Multiplicative RNN-LSTM for Sequence-based Recommenders

Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost everyone in different domains like e-commerce, social media, or entertainment. Quite often the dimension of time plays a dominant role in the generation of a relevant recommendation. Which user interaction occurred just before … Read more

Categories Python ExcerptFavorite

A Guide to Restricted Boltzmann Machines Using Pytorch

A Boltzmann machine defines a probability distribution over binary-valued patterns. What makes Boltzmann machine models different from other deep learning models is that they’re undirected and don’t have an output layer. The other key difference is that all the hidden and visible nodes are all connected with each other. Due to this interconnection, Boltzmann machines … Read more

Categories Python ExcerptFavorite

Data Science Austria

The last few months I set out to build up to build a news and event aggregator. You can see the work in progress here: data-science-austria.at WordPress Plugins Here is a list of plugins that I use for the site grouped by the general overall purpose. The first one is a collection that I would … Read more

What Does It Really Mean to Operationalize a Predictive Model?

It is not enough to just stand up a web service that can make predictions. Aug 13, 2018 Original Image Source — Meme overlay by Imgflip In a 2017 SAS survey, 83% of organizations have made moderate-to- significant investments in big data, but only 33% say they have derived value from their investments. Other more recent surveys have … Read more

Practical tips for class imbalance in binary classification

4. Class weighted / cost sensitive learning Without resampling the data, one can also make the classifier aware of the imbalanced data by incorporating the weights of the classes into the cost function (aka objective function). Intuitively, we want to give higher weight to minority class and lower weight to majority class. scikit-learn has a … Read more

Feature Engineering for Healthcare Fraud Detection

The nature of the problem: medical fraud and abuse The U.S. department of health and human services in a pamphlet Avoiding Medicare Fraud and Abuse: A Roadmap for Physicians states “most physicians strive to work ethically, render high-quality medical care to their patients, and submit proper claims for payment,” yet “the presence of some dishonest … Read more

Azure SQL DWH – Overview

There are a multitude of options when it comes to storing and processing data. In this post I want to give you a brief overview of Azure SQL datawarehouse, Microsoft’s datawareshouse solution for the Azure cloud and its answer to Amazon Redshift on AWS. I will start of by talking briefly about its technical architecture … Read more

Math Behind Reinforcement Learning, the Easy Way

Aug 2, 2018 Photo by JESHOOTS.COM on Unsplash Look at this equation: Value function of Reinforcement Learning If it does not intimidate you, then you are a mathematical savvy and there is no point in reading this article 🙂 This article is not about teaching Reinforcement Learning (RL) but about explaining the math behind it. So it … Read more