Don’t make decisions based on the weights of an ML model Aug 31, 2018 As I see our customers fall in love with BigQuery ML, an old problem rises its head — I find that they can not resist the temptation to assign meaning to feature weights. “The largest weight in my model to predict customer lifetime value,” … Read moreWhy feature weights in a machine learning model are meaningless
Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. After reviewing what hyper-parameters, or hyper-params for short, are and how they differ from plain vanilla learnable parameters, we introduce three general purpose discrete optimization … Read moreDoing XGBoost hyper-parameter tuning the smart way — Part 1 of 2
I had my first contact with stochastic control theory in one of my Master’s courses about Continuous Time Finance. I found the subject really interesting and decided to write my thesis about optimal dividend policy which is mainly about solving stochastic control problems. In this post I want to give you a brief overview of … Read moreIntroduction to stochastic control theory
Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective topic. Image quality assessment aims to quantitatively represent the human perception of quality. These metrics are commonly used to analyze the performance of algorithms in … Read moreAutomatic Image Quality Assessment in Python
Neural Processes (NPs) caught my attention as they essentially are a neural network (NN) based probabilistic model which can represent a distribution over stochastic processes. So NPs combine elements from two worlds:
Deep Learning – neural networks are flexible non-linear functions which are straightforward to train
Gaussian Processes – GPs offer a probabilistic framework for learning a distribution over a wide class of non-linear functions
Despite huge progress in machine learning over the past decade, building production-ready machine learning systems is still hard. Three years ago when we set out to build machine learning capabilities into the Salesforce platform, we learned that building enterprise-scale machine learning systems is even harder.
Can we teach computers to write code? This is the question that brings out an entire branch of research specialized in program synthesis. Programming is a demanding task that requires extensive knowledge, experience and not a frivolous degree of creativity.
Probability and statistics are everywhere: from finance and demographic projections to casino games, these disciplines help us make sense of the world. They also underlie much of the machine learning apparatus that is the rage nowadays. What resources should we turn to, if we were to dust off our knowledge of them? (Disclaimer: I received … Read moreThe One Probability Review That You Need
Sounds cool, but … what is it? As I’ve started to pay more attention to machine learning, differentiable rendering is one topic that caught my attention and has been popping up with some frequency. My first thought was, “cooooool is this a new system for generating pixels that somehow can leverage machine learning?” After digging … Read moreDifferentiable Rendering
While looking for some interesting geographical data to work with, I came across the Road Safety Data published by the UK government. This is a very comprehensive road accident data set that includes the incident’s geographical coordinates, as well as other related data such as the local weather conditions, visibility, police attendance and more. There … Read moreMapping the UK’s Traffic Accident Hotspots
A new update of my sjstats-package just arrived at CRAN. This blog post demontrates those functions of the sjstats-package that deal especially with Bayesian models. The update contains some new and some revised functions to compute summary statistics of Bayesian models, which are now described in more detail.
TDAstats is an R pipeline for topological data analysis, specifically, the use of persistent homology in Vietoris-Rips simiplicial complexes to study the shape of data.
Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search … Read moreGoogle’s AutoML Killer: Auto-Keras Opensource Automated ML
Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost everyone in different domains like e-commerce, social media, or entertainment. Quite often the dimension of time plays a dominant role in the generation of a relevant recommendation. Which user interaction occurred just before … Read moreMultiplicative RNN-LSTM for Sequence-based Recommenders
A Boltzmann machine defines a probability distribution over binary-valued patterns. What makes Boltzmann machine models different from other deep learning models is that they’re undirected and don’t have an output layer. The other key difference is that all the hidden and visible nodes are all connected with each other. Due to this interconnection, Boltzmann machines … Read moreA Guide to Restricted Boltzmann Machines Using Pytorch
The last few months I set out to build up to build a news and event aggregator. You can see the work in progress here: data-science-austria.at WordPress Plugins Here is a list of plugins that I use for the site grouped by the general overall purpose. The first one is a collection that I would … Read moreData Science Austria
It is not enough to just stand up a web service that can make predictions. Aug 13, 2018 Original Image Source — Meme overlay by Imgflip In a 2017 SAS survey, 83% of organizations have made moderate-to- significant investments in big data, but only 33% say they have derived value from their investments. Other more recent surveys have … Read moreWhat Does It Really Mean to Operationalize a Predictive Model?
4. Class weighted / cost sensitive learning Without resampling the data, one can also make the classifier aware of the imbalanced data by incorporating the weights of the classes into the cost function (aka objective function). Intuitively, we want to give higher weight to minority class and lower weight to majority class. scikit-learn has a … Read morePractical tips for class imbalance in binary classification
The nature of the problem: medical fraud and abuse The U.S. department of health and human services in a pamphlet Avoiding Medicare Fraud and Abuse: A Roadmap for Physicians states “most physicians strive to work ethically, render high-quality medical care to their patients, and submit proper claims for payment,” yet “the presence of some dishonest … Read moreFeature Engineering for Healthcare Fraud Detection
There are a multitude of options when it comes to storing and processing data. In this post I want to give you a brief overview of Azure SQL datawarehouse, Microsoft’s datawareshouse solution for the Azure cloud and its answer to Amazon Redshift on AWS. I will start of by talking briefly about its technical architecture … Read moreAzure SQL DWH – Overview
Aug 2, 2018 Photo by JESHOOTS.COM on Unsplash Look at this equation: Value function of Reinforcement Learning If it does not intimidate you, then you are a mathematical savvy and there is no point in reading this article 🙂 This article is not about teaching Reinforcement Learning (RL) but about explaining the math behind it. So it … Read moreMath Behind Reinforcement Learning, the Easy Way
Recently I came across this cooking recipes data set in Kaggle, and it inspired me to combine 2 of my main interests in life. Food and machine learning. What makes this data set special is that it contains recipes from 20 different cuisines, 6714 different ingredients, but only 26648 samples. Some cuisines have way fewer … Read moreCooking with Machine Learning: Dimension Reduction