Data Science Austria

Don’t make this mistake when clustering time series data!

Disclaimer: This is not original work, the aim of this post is to spread and make more accesible the content of the paper “ Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research” by Eamonn Keogh & Jessica Lin. Recently, I’ve started doing some research in … Read moreDon’t make this mistake when clustering time series data!

Neural Networks Generated Lamb of God Drum Tracks

An experiment of using different neural network structures to generate drum beats. Photo by Michael Dobrinski on Unsplash Intro I’ve had the idea of implementing some simple neural networks to generate music since last October. I did some quick experiments in Jupyter notebook back then, recently I finally have time to rewrite … Read moreNeural Networks Generated Lamb of God Drum Tracks

It’s Only Natural: An Excessively Deep Dive Into Natural Gradient Optimization

I’m going to tell a story: one you’ve almost certainly heard before, but with a different emphasis than you’re used to. To a first (order) approximation, all modern deep learning models are trained using gradient descent. At each step of gradient descent, your parameter values begin at some starting point, … Read moreIt’s Only Natural: An Excessively Deep Dive Into Natural Gradient Optimization

NLP Learning Series: Part 3 — Attention, CNN and what not for Text Classification

Making Machines read for us Photo by Francesco Ungaro on Unsplash This post is the third post of the NLP Text classification series. To give you a recap, I started up with an NLP text classification competition on Kaggle called Quora Question insincerity challenge. So I thought to share the knowledge via … Read moreNLP Learning Series: Part 3 — Attention, CNN and what not for Text Classification

Imbalanced Class Sizes and Classification Models: A Cautionary Tale

Avoiding Imbalanced Class Pitfalls in Classification For a recent data science project, I developed a supervised learning model to classify the booking location of a first-time user of the vacation home site Airbnb. This dataset is available on Kaggle as a part of a 2015 Kaggle competition. For my project, … Read moreImbalanced Class Sizes and Classification Models: A Cautionary Tale

Market Segmentation with R (PCA & K-means Clustering) — Part 1

Principal Component Analysis (PCA) The term “dimension reduction” used to freak me out. However, it is not as complicated as it sounds: it’s simply the process of extracting the essence from a myriad of data, so the new, smaller dataset can represent the unique features of the original data without losing … Read moreMarket Segmentation with R (PCA & K-means Clustering) — Part 1

Automatically Storing Data from Analyzed Data Sets

How to Store Data Analysis Results to Facilitate Later Regression Analysis Figure 1: Example Folder Hierarchy This is the fifth article in a series teaching you to how to write programs that automatically analyze scientific data. The first presented the concept and motivation, then laid out the high level steps. … Read moreAutomatically Storing Data from Analyzed Data Sets

Interactive spreadsheets in Jupyter

A spreadsheet is an interactive tool for data analysis in a tabular form. It consists of cells and cell ranges. It supports value dependent cell formatting/styling and one can apply mathematical functions on cells and perform chained computations. It is the perfect user interface for statistical and financial operations. The … Read moreInteractive spreadsheets in Jupyter