Data Science Austria

Latest News about


What to do when your data fails OLS Regression assumptions

Regression analysis falls in the realm of inferential statistics. Consider the following equation: y ≈ β0 + β1x + e The approximate equals sign indicates that there is an approximate linear relationship between x and y. The error e term indicates this model isn’t going to fully reflect reality via

Using Machine Learning to Identify the Minerals in Meteorites

How Meteorites are Studied The scientists scan meteorites using an electron microprobe (EMP). An EMP shoots a beam of electrons at the meteorite. When the beam of electrons collides with the atoms in the meteorite, the atoms emit x-rays. Each element has a distinct, characteristic frequencies. A graph of characteristic frequencies

Word2vec from Scratch with NumPy

How to implement a Word2vec model with Python and NumPy Introduction Recently, I have been working with several projects related to NLP at work. Some of them had something to do with training the company’s in-house word embedding. At work, the tasks were mostly done with the help of a Python

How to Stay Up To Date as a Data/Research Scientist

1. CONFERENCE At the Rework Deep Learning Conference I want to start with conferences because I just came back from one in January 2019. This is probably the most expensive but also the most fun option in my opinion. Since I became a data scientist, I have attended the following

The Advent of Architectural AI

Artificial Intelligence Artificial Intelligence is fundamentally a statistical approach to architecture. The premise of AI, that blends statistical principles with computation is a new approach that can improve over the drawbacks of parametric architecture. “Learning”, as understood by machines, corresponds to the ability of a computer, when faced with a

The complete beginner’s guide to data cleaning and preprocessing

How to successfully prepare your data for a machine learning model in minutes Data preprocessing is the first (and arguably most important) step toward building a working machine learning model. It’s critical! If your data hasn’t been cleaned and preprocessed, your model does not work. It’s that simple. Data preprocessing is

Hyperparameter Tuning

Kaggle’s Don’t Overfit II competition presents an interesting problem. We have 20,000 rows of continuous variables, with only 250 of them belonging to the training set. The challenge is not to overfit. With such a small dataset — and even smaller training set, this can be a difficult task! In this article,

Statistics is the Grammar of Data Science — Part 5/5

Neon sign showing the simple statement of Bayes’ theorem. Courtesy: Wikipedia Bayes’ Theorem Having just explored what the Conditional Probability is, let’s take a look at the Bayes’ Theorem. It simply says: The probability of A given B is equal to the probability of B given A times the probability

Reinforcement Learning Using a Single Demonstration

source Learning from Demonstration Reinforcement learning bears a lot of promise for the future; recent achievements have shown its ability in solving problems at super human level, like playing board games, predicting the structure of proteins based on their genetic sequence and playing real-time strategy games on a professional level.

Python vs SQL: Comparison for Data Pipelines

Breaking into the workforce as a web developer, my first interaction with databases and SQL was using an Object Relational Model (ORM). I was using the Django query sets API and had an excellent experience using the interface. Thereon-after, I changed to a data engineering role and became much more