Creating Azure Logic Apps from R using httr

Logic Apps is a serverless framework in Azure quite similar to IFTTT (if this, then that) and Zapier that allows you to connect different services and create workflows. You can define different types of triggers based on: time and events (e.g. http requests, messages received, …) to start workflows. Logic Apps can be created using a … Read moreCreating Azure Logic Apps from R using httr

RStudio Addin

If you want to create your own RStudio addins, all you need to do is: Create an R package Create some R functions Create a file at inst/rstudio/addins.dcf Links 1. Create am R Package Set up tools for package development library(devtools) library(roxygen2) # getwd() # setwd(“path/to/repo”) Create Package I am mainly following: create(“rstudio_addin”) This … Read moreRStudio Addin

Working with REST APIs for Data Scientists in R

With the growing importance of cloud computing more and more services are exposed as REST APIs. In this post, I want to give a hands-on introduction for data scientists from non-software-engineering backgrounds on how to work with REST APIS. But before we dive straight into the code, let’s start with some background information: A (short) … Read moreWorking with REST APIs for Data Scientists in R

Introduction – Analysing Customer Churn

At first glance, analysing customer churn seems pretty easy. All we have to know is how many customers we have at a certain point in time and how many customers chose to leave our business over a given period in order to calculate a churn rate. We could simply define customer churn rate as: \[ … Read moreIntroduction – Analysing Customer Churn

Git & SSH with Powershell Core

In this post I want to give a quick outline of how to setup Powershell Core (Microsoft’s cross-platfrom version of Powershell) to work with git and ssh. While you can simply install Git for Windows and work with Git Bash, personally I quite like Powershell Core, because it is more tightly integrated with Windows and … Read moreGit & SSH with Powershell Core

Pandas for data.table Users

R and Python are both great languages for data analysis. While they are remarkably similar in some aspects, they are drastically different in others. In this post, I will focus on the similarities and differences between Pandas and data.table, two of the most prominent data manipulation packages in Python/R. There is alreay an excellent post … Read morePandas for data.table Users

The Perceptron Algorithm

In my blog post Neural Nets: From Linear Regression to Deep Nets I talked about how a deep neural net is simply a sequence of simple building blocks of the form: \[\sigma(\underbrace{w^T}_{weights}x + \overbrace{b}^{bias}) = a\] and that a linear regression model is one of the most basic neural networks where the activation function \(\sigma\) … Read moreThe Perceptron Algorithm

Blogging with Hugo and Jupyter

I really love blogging with Hugo+Blogdown, but unfortunately Blogdown is still mostly restricted to R (although Python is now also possible using the reticulate package). Jupyter offers a great literate programming environment for multiple languages and so being able to publish Jupyter notebooks as Hugo blogposts would be a huge plus. I have been looking … Read moreBlogging with Hugo and Jupyter

Getting started – Azure SQL Server Managed Instance

There are a lot of options for data scientists to store data in the Azure cloud. In this blog post I will cover the pros and cons of Azure SQL Server Managed Instance and will provide a few tips so you can hit the ground running if you decide to take it for a test … Read moreGetting started – Azure SQL Server Managed Instance

Neural Nets: From Linear Regression to Deep Nets

Neural networks, especially deep neural networks, have received a lot of attention over the last couple of years. They perform remarkably well on image and speech recognition and form the backbone of the technology used for self-driving cars. What many people find hard to believe is that the mathematics of neural networks have been around … Read moreNeural Nets: From Linear Regression to Deep Nets

SQL Server

Columnstore A columnstore index can provide a very high level of data compression, typically by 10 times, to significantly reduce your data warehouse storage cost. For analytics, a columnstore index offers an order of magnitude better performance than a btree index. Columnstore indexes are the preferred data storage format for data warehousing and analytics workloads. … Read moreSQL Server

Box Cox Transformation

When we do time series analysis, we are usually interested either in uncovering causal relationships (Does \(X_t\) influence \(Y_{t+1}\)?) or in getting the most accurate forecast possible. Especially in the second case it can be beneficial to transform our historical data to make it easier to extract a signal. A very common transformation is to … Read moreBox Cox Transformation

Introduction to stochastic control theory

I had my first contact with stochastic control theory in one of my Master’s courses about Continuous Time Finance. I found the subject really interesting and decided to write my thesis about optimal dividend policy which is mainly about solving stochastic control problems. In this post I want to give you a brief overview of … Read moreIntroduction to stochastic control theory

Azure SQL DWH – Overview

There are a multitude of options when it comes to storing and processing data. In this post I want to give you a brief overview of Azure SQL datawarehouse, Microsoft’s datawareshouse solution for the Azure cloud and its answer to Amazon Redshift on AWS. I will start of by talking briefly about its technical architecture … Read moreAzure SQL DWH – Overview

More advanced SQL Server for Data Scientists

In the previous post I covered the basics you need to know to work with SQL Server. In this post, I want to show you some more advanced techniques that I found pretty helpful. The topics I will cover include: How to speed up your queries with indices and using columnstore Using Views and Table … Read moreMore advanced SQL Server for Data Scientists

Object Oriented Programming in Data Science with R

Since R is mostly a functional language and data science work lends itself to be expressed in a functional form you can come by just fine without learning about object-oriented programming. Personally, I mostly follow a functional programming style (although often not a pure one, i.e. w/o side-effects, because of limited RAM). Expressing mathematical concepts in … Read moreObject Oriented Programming in Data Science with R

Estimating Intervention Effects using Baysian Models in R

Measuring the effect of an intervention on some metric is an important problem in many areas of business and academia. Imagine, you want to know the effect of a recently launched advertising campaign on product sales. In an ideal setting, you would have a treatment and a control group so that you can measure the … Read moreEstimating Intervention Effects using Baysian Models in R

A Framework to tackle tough Data Science Problems

One of the things I particularly like about working in data science, is the science part: Figuring out the right questions to ask, how to frame a problem correctly and finally trying to solve it. While there are many problems that you can simply solve by library(caret) or from sklearn import * and dumping your … Read moreA Framework to tackle tough Data Science Problems

Package development in R – Overview

Creating an R package is as easy as typing: package.skeleton(name = “YourPackageName”) As you might have guessed, this function creates the basic file and folder structure you need to create an R package. You will get: YourPackageName/ DESCRIPTION man/ NAMESPACE R/ You can also use RStudio to create a package with File > New Project … Read morePackage development in R – Overview

Agile Project Management for Data Science

Many data scientists are former academics who are used to working on a specific and often quite narrow research problems for long periods of time, often years. With data science being in high demand at the moment in nearly all industries, more and more researchers switch from an academic career to one in the private … Read moreAgile Project Management for Data Science

Parallel processing in R using Azure Batch and Docker

While (personal) computers have become increasingly powerful over the last years there are still lots of workloads that easily bring even the best workstation to its knees. Running huge Monte-Carlo simulations or training thousands of models takes hours, if not days even on very beefy machines. Now enter Azure Batch processing. Azure Batch is a … Read moreParallel processing in R using Azure Batch and Docker

Azure Container Registry – Quick Start Guide

Azure Container Registry is the Microsoft equivalent to private Dockerhub repositories. First, I will show you how to quickly push an image to Azure Container Registry. In a second step, I will cover how to manage your registries and repositories using the PowerShell cmdlet AzureRM as well as the Azure CLI. Quick start To push … Read moreAzure Container Registry – Quick Start Guide

Azure Machine Learning Services – Overview

We rely heavily on Microsoft’s cloud platform Azure during for our analytics workloads at the Austrian Postal Service. Azure has grown rapidly over the past few years and is adding features at a very fast pace, so it is easy to lose track which services are (still) offered and what services one should use . … Read moreAzure Machine Learning Services – Overview

Conway’s Law

Many organizations have become adept at identifying what they need from software development projects, based on a keen understanding of their business goals. Even so, they’re often surprised to find out that the end results don’t achieve the transformative impact they were expecting. Their mistake? Overlooking the importance of Conway’s Law. In 1967, Melvin Conway … Read moreConway’s Law


Hi, I am Christoph, the Lead Data Scientist in the BI Competence Center at the Austrian Postal Service. I am responsible for designing the data science architecture, building the data science team and for coding up predictive models. Prior to joining the Austrian Post, I worked as a financial consultant at KPMG. I have a … Read moreAbout

Famous Laws of Software Development

Murphy’s Law Probably one of the most famous of all laws, mostly because it is not only applicable to Software Development. If something can go wrong, it will. First derivation: If it works, you probably didn’t write it. Second derivation: Cursing is the only language all programmers speak fluently. Conclusion: A computer will do what … Read moreFamous Laws of Software Development

Machine Learning Overview

Broadly, there are three types of Machine Learning Algorithms.. 1. Supervised Learning How it works: This algorithm consist of a target or outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. … Read moreMachine Learning Overview

Docker Python

Testing the base image docker run python:3 /bin/echo ‘Hello world’ docker run is a command to run a container. python:3 is the image you run. For example, the Ubuntu operating system image. When you specify an image, Docker looks first for the image on your Docker host. If the image does not exist locally, then … Read moreDocker Python

EARL 2018, London

Conference Day 1 {.tabset .tabset-fade} Wednesday 12 September ###Keynote ####Edwina Dunn, Starcount ####Garrett Grolemund, RStudio ###Session 1 ####1. “A Validated R Environment in the Cloud for Life Science R&D” Jobst Loffler, Bayer Business Services GmbH Waiting on Rstudio Item 2 ####2. “A brief history of Data at Autotrader; how R has got us here” Paul … Read moreEARL 2018, London