DevOps: To do or not to do?

Over the past few decades, four key change initiatives have been taking place in the organizations: strategic planning, re-engineering, total quality management and downsizing. The aim of these initiatives was to achieve economic effectiveness, but around 75% of them failed or created problems that were serious enough to threaten organization’s survival (1). It has been … Read more DevOps: To do or not to do?

Estimating Intervention Effects using Baysian Models in R

Measuring the effect of an intervention on some metric is an important problem in many areas of business and academia. Imagine, you want to know the effect of a recently launched advertising campaign on product sales. In an ideal setting, you would have a treatment and a control group so that you can measure the … Read more Estimating Intervention Effects using Baysian Models in R

A Framework to tackle tough Data Science Problems

One of the things I particularly like about working in data science, is the science part: Figuring out the right questions to ask, how to frame a problem correctly and finally trying to solve it. While there are many problems that you can simply solve by library(caret) or from sklearn import * and dumping your … Read more A Framework to tackle tough Data Science Problems

The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated. There are many great resources online discussing how decision trees and random forests are created and this post is not intended to be that. Although … Read more The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

Package development in R – Overview

Creating an R package is as easy as typing: package.skeleton(name = “YourPackageName”) As you might have guessed, this function creates the basic file and folder structure you need to create an R package. You will get: YourPackageName/ DESCRIPTION man/ NAMESPACE R/ You can also use RStudio to create a package with File > New Project … Read more Package development in R – Overview

Agile Project Management for Data Science

Many data scientists are former academics who are used to working on a specific and often quite narrow research problems for long periods of time, often years. With data science being in high demand at the moment in nearly all industries, more and more researchers switch from an academic career to one in the private … Read more Agile Project Management for Data Science

Implementing QANet (Question Answering Network) with CNNs and self attentions

Apr 15, 2018 In this post, we will tackle one of the most challenging yet interesting problems in Natural Language Processing, aka Question Answering. We will implement Google’s QANet in Tensorflow. Just like its machine translation counterpart Transformer network, QANet doesn’t use RNNs at all which makes it faster to train / test. I’m assuming … Read more Implementing QANet (Question Answering Network) with CNNs and self attentions

What I wish I’d done differently as a data science manager

On centralizing siloed data Apr 12, 2018 I still get nostalgic looking at the very first Pebbles. (Photo courtesy of Pebble’s first Kickstarter) In 2014, I joined Pebble, the smartwatch maker later acquired by Fitbit, to lead their data science & analytics team. I was interested in the challenges of managing a data organization at a … Read more What I wish I’d done differently as a data science manager

Machine Learning for People Who Don’t Care About Machine Learning

Greg Lamp, previous co-founder of the data science startup Yhat, and current co-founder & CTO of Waldo shares his thoughts on Machine Learning for those of us who just don’t care about Machine Learning. What is Machine Learning? The definition I have come up with for Machine Learning is as follows… machine learning is using … Read more Machine Learning for People Who Don’t Care About Machine Learning

Hierarchical Clustering on Categorical Data in R

Dissimilarity MatrixArguably, this is the backbone of your clustering. Dissimilarity matrix is a mathematical expression of how different, or distant, the points in a data set are from each other, so you can later group the closest ones together or separate the furthest ones — which is a core idea of clustering. This is the step where … Read more Hierarchical Clustering on Categorical Data in R

Coding the Matrix

How to test the solutions python3 submit.py python_lab.py Lab 1: Introduction to Python—sets, lists, dictionaries, and comprehensions Python provides some simple data structures for grouping together multiple values, and integrates them with the rest of the language. These data structures are called collections. Sets A set is an unordered collection in which each value occurs … Read more Coding the Matrix

Which Leading Artificial Intelligence Course Should You Take and What Should You Do After?

4. Course Content The content of each course is world-class. I’ve frequently stated in my videos these are the best courses I’ve ever taken. The DLND is broken into six parts with five of the parts having significant projects attached. 1. Introduction2. Neural Networks — creating your first neural network.3. Convolutional Neural Networks — building … Read more Which Leading Artificial Intelligence Course Should You Take and What Should You Do After?

Automatic GPUs

A reproducible R / Python approach to getting up and running quickly on GCloud with GPUs in Tensorflow “A high view of a sea of clouds covering a mountain valley in the Dolomites” by paul morris on Unsplash Backstory After completing Google’s excellent Data Engineering Certified Specialization on Coursera recently (*which I highly recommend), I … Read more Automatic GPUs

Crossing Your Data Science Chasm

An analytics roadmap for growth Scenario — You’re an up-and-coming ecommerce/SaaS startup. You’ve got your site up, you have A/B tested your message, and you’ve got your SEO, and social ad buys. You’ve set up your email drip campaign and reminders. You also have basic BI reporting telling you channel traffic and conversions. Traffic is … Read more Crossing Your Data Science Chasm

Python WebServer With Flask and Raspberry Pi

Let’s create a simple WebServer to control things in your home. There are a lot of ways to do that. For example, on my tutorial: IoT — Controlling a Raspberry Pi Robot Over Internet With HTML and Shell Scripts Only, we have explored how to control a robot over the local network using the LIGHTTPD WebServer. For … Read more Python WebServer With Flask and Raspberry Pi

Parallel processing in R using Azure Batch and Docker

While (personal) computers have become increasingly powerful over the last years there are still lots of workloads that easily bring even the best workstation to its knees. Running huge Monte-Carlo simulations or training thousands of models takes hours, if not days even on very beefy machines. Now enter Azure Batch processing. Azure Batch is a … Read more Parallel processing in R using Azure Batch and Docker

Azure Container Registry – Quick Start Guide

Azure Container Registry is the Microsoft equivalent to private Dockerhub repositories. First, I will show you how to quickly push an image to Azure Container Registry. In a second step, I will cover how to manage your registries and repositories using the PowerShell cmdlet AzureRM as well as the Azure CLI. Quick start To push … Read more Azure Container Registry – Quick Start Guide

Azure Machine Learning Services – Overview

We rely heavily on Microsoft’s cloud platform Azure during for our analytics workloads at the Austrian Postal Service. Azure has grown rapidly over the past few years and is adding features at a very fast pace, so it is easy to lose track which services are (still) offered and what services one should use . … Read more Azure Machine Learning Services – Overview

Quick implementation of Yolo V2 with Keras!

Feb 22, 2018 I do not hold ownership to any of the above pictures. These are merely used for educational purposes to describe the concepts. Real time multiple object localization remains a grand debate in the field of digital image processing since many years. With the invent of Deep Learning and convolutional neural networks, the … Read more Quick implementation of Yolo V2 with Keras!

Ordinal Logistic Regression

An overview and implementation in R Feb 19, 2018 Fig 1: Performance of an individual — Poor, Fair, Excellent Can you guess what is the common link in the variables mentioned below: Job satisfaction level — Dissatisfied, Satisfied, Highly Satisfied Performance of an individual — Poor, Fair, Excellent Impact of a regulation on bank’s performance — Positive, Neutral, Negative The variables are not only … Read more Ordinal Logistic Regression

Conway’s Law

Many organizations have become adept at identifying what they need from software development projects, based on a keen understanding of their business goals. Even so, they’re often surprised to find out that the end results don’t achieve the transformative impact they were expecting. Their mistake? Overlooking the importance of Conway’s Law. In 1967, Melvin Conway … Read more Conway’s Law

About

Hi, I am Christoph, the Lead Data Scientist in the BI Competence Center at the Austrian Postal Service. I am responsible for designing the data science architecture, building the data science team and for coding up predictive models. Prior to joining the Austrian Post, I worked as a financial consultant at KPMG. I have a … Read more About

Writing Custom Keras Generators

The idea behind using a Keras generator is to get batches of input and corresponding output on the fly during training process, e.g. reading in 100 images, getting corresponding 100 label vectors and then feeding this set to the gpu for training step. The problem I faced was memory requirement for the standard Keras generator. … Read more Writing Custom Keras Generators

Tips for Using Data to Solve Company Issues that You Can Master Today

It’s not enough to do data analysis (Credit rawpixel: Unsplash) Stop management from ignoring your analysis Feb 3, 2018 To a data analyst, there is nothing more exciting than the data revealing insights about real organizational issues. However, it is completely deflating to present the insights to management and accomplish nothing. From talking to other data … Read more Tips for Using Data to Solve Company Issues that You Can Master Today

On the importance of DSLs in ML and AI

4) Under the hood: expressing computations TensorFlow could be considered a programming system and runtime, not just a “library” in the traditional sense: TensorFlow’s graph even supports constructs like variable scoping and control flow — but rather than using Python syntax, you manipulate these constructs through an API. (Innes2017) TensorFlow and similar tools present themselves as “just … Read more On the importance of DSLs in ML and AI

Bootstrapping microservices — your microservice architecture ready

Jan 13, 2018 The computing world has seen increasing attention on microservices software architecture in order to enhance software scalability and efficiency. Microservices brings many benefits for tech organizations. However, it is also clear that despite the benefits of modularization and containerization, many organizations continue to struggle with microservices. The microservices-based application comprises of numerous … Read more Bootstrapping microservices — your microservice architecture ready

Famous Laws of Software Development

Murphy’s Law Probably one of the most famous of all laws, mostly because it is not only applicable to Software Development. If something can go wrong, it will. First derivation: If it works, you probably didn’t write it. Second derivation: Cursing is the only language all programmers speak fluently. Conclusion: A computer will do what … Read more Famous Laws of Software Development

Machine Learning Overview

Broadly, there are three types of Machine Learning Algorithms.. 1. Supervised Learning How it works: This algorithm consist of a target or outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. … Read more Machine Learning Overview

Reinforcement Learning Series — 01 (Key Concepts)

Reinforcement Learning (RL) is one of the most happening field of Machine Learning (ML) and Artificial Intelligence (AI).Though RL existed for many decades, only recently the giant has awaken after explosion in Neural Network based Deep Learning. This blog is an attempt to explain basic concepts of Reinforcement Learning using simple example and explanation that … Read more Reinforcement Learning Series — 01 (Key Concepts)

ToneNet : A Musical Style Transfer

Nov 27, 2017 By: Team Vesta, University of Southern California. CSCI:599 Deep Learning and Its Applications Suraj Jayakumar ([email protected]), Rakesh Ramesh ([email protected]), Pradeep Thalasta ([email protected]) Introduction: The recent success of Generative Adversarial Networks (GANs) in vision domain such as style transfer inspired us to experiment with these techniques in musical domain. Music generation mainly delves … Read more ToneNet : A Musical Style Transfer

Finding Magic: The Gathering archetypes with Latent Dirichlet Allocation

Combining card games and topic modeling Nov 27, 2017 This article sparked an interesting discussion on reddit and was featured by Wizards of the Coast. One of the coolest projects I’ve done using machine learning revolved around using a method for topic modeling called Latent Dirichlet Allocation (LDA). Topic modeling simply means allocating topics to documents. … Read more Finding Magic: The Gathering archetypes with Latent Dirichlet Allocation

Applied Predictive Modelling

Source cran Chapter 1 Introduction Prediction Versus Interpretation, Key Ingredients of Predictive Models; Terminology; Example Data Sets and Typical Data Scenarios; Overview; Notation (15 pages, 3 figures) Part I: General Strategies Chapter 2 A Short Tour of the Predictive Modeling Process Case Study: Predicting Fuel Economy; Themes; Summary (8 pages, 6 figures, R packages used) … Read more Applied Predictive Modelling

Binary Logistic Regression

An overview and implementation in R Oct 31, 2017 Customer satisfaction for a product — Satisfied vs Dissatisfied (Source: pixabay.com) Have you ever come across a situation where you want to predict a binary outcome like: Whether a person is satisfied with a product or not? Whether a candidate will secure admission to a graduate school or not? … Read more Binary Logistic Regression

Hard Problems in Data Science: Causality, Sequential Learning and Complex Dynamic Theories

Oct 20, 2017 In the second of four informal discussion sessions, Professor Maurits Kaptein from Tilburg University discussed the methodological challenges of data science. ‘’Before discussing the biggest challenges we face in Data Science, we need to first have some common understanding about what data science actually is.‘’ Maurits starts his talk by attempting to define … Read more Hard Problems in Data Science: Causality, Sequential Learning and Complex Dynamic Theories

CycleGANS and Pix2Pix

Aug 15, 2017 Credits: Presenting abridged version of these blogs to explain the idea and concepts behind pix2pix and cycleGANs. Christopher Hesse blog: Olga Liakhovich blog: Pix2Pix: paper: https://phillipi.github.io/pix2pix/ pix2pix uses a conditional generative adversarial network (cGAN) to learn a mapping from an input image to an output image. An example of a dataset would … Read more CycleGANS and Pix2Pix

How I’m Learning Deep Learning in 2017 — Part 3

I’ve said yes to far too many hidden units Part of the How I’m Learning Deep Learning Series: Part I: A new beginning.Part II: Learning Python on the fly.Part III: Too much breadth, not enough depth. (You’re currently reading this)Part IV: AI(ntuition) versus AI(ntelligence).Extra: My Self-Created AI Master’s Degree Before we get into specifics, and … Read more How I’m Learning Deep Learning in 2017 — Part 3

Attempting to Visualize a Convolutional Neural Network in Realtime

While replicating the End-to-End Deep Learning approach for Self- Driving Cars, I was frustrated by the lack of visibility into what the network is seeing. I built a tool to fix this. Mar 5, 2017 The simulator and the python script running the neural network communicate over a websocket connection. I decided to write a small … Read more Attempting to Visualize a Convolutional Neural Network in Realtime

Perceptron : Where It All Started

Implementing the Perceptron Perceptron is one the most elegant yet easy to implement learning algorithms. Though most packages/libraries provide implementation for this algorithm, let us try our hand on it as well. The following python function helps in updating the weight vector : # update weights of the perceptrondef update_weights(row,weights_array=[1,0,0],lr=0.001):# get predicted label using current weight … Read more Perceptron : Where It All Started

Visualising high-dimensional datasets using PCA and t-SNE in Python

Oct 29, 2016 Update: April 29, 2019. Updated some of the code to not use ggplot but instead use seaborn and matplotlib. I also added an example for a 3d-plot. I also changed the syntax to work with Python3. The first step around any data related challenge is to start by exploring the data itself. … Read more Visualising high-dimensional datasets using PCA and t-SNE in Python

A Concise History of Neural Networks

Aug 13, 2016 “From the barren landscapes inside our personal devices come furtive anthems hummed by those digital servants who will one day be our overlords” A.I. winter The idea of neural networks began unsurprisingly as a model of how neurons in the brain function, termed ‘connectionism’ and used connected circuits to simulate intelligent behaviour .In … Read more A Concise History of Neural Networks