Data Science Austria

Probability theory and the optimal dating strategy for 2018

The naive (or the desperate) approach: Let’s say we foresee N potential people who will come to our life sequentially and they are ranked according to some ‘matching/best-partner statistics’. Of course, you want to end up with the person who ranks 1st — let’s call this person X. If you can rank … Read moreProbability theory and the optimal dating strategy for 2018

Using MongoDB Change Streams to replicate data into BigQuery

Photo by Quinten de Graaf on Unsplash Learnings and challenges we faced while building a MongoDB to BigQuery data pipeline using MongoDB Change Streams. Before jumping into the technical details, it’s good to review why we decided to build this pipeline. We had two main reasons to develop it: Querying MongoDB … Read moreUsing MongoDB Change Streams to replicate data into BigQuery

Build a Predictive Model on Snowflake in 1 day with Xpanse AI

“Building a Churn Model is at least 3 months of work” You are the Data Analytics Manager and you have successfully implemented your cloud data warehouse on Snowflake. Great news! You deserve a pat on the back ? Thanks to Business Intelligence and Data Warehousing you can easily identify problems … Read moreBuild a Predictive Model on Snowflake in 1 day with Xpanse AI

Distributed Data Pre-processing using Dask, Amazon ECS and Python (Part 2)

Source: pixabay.com Using Dask for EDA and Hyperparameters Optimization (HPO) In Part 1 of this series, I explained how to build a serverless cluster of Dask scheduler and workers on AWS Fargate. Scaling the number of workers up and down is quite simple. You can achieve that by running the … Read moreDistributed Data Pre-processing using Dask, Amazon ECS and Python (Part 2)

Predicting Invasive Ductal Carcinoma using Convolutional Neural Network (CNN) in Keras

Tackling data imbalance by random undersampling y_train.count(1) #counting the number of 1y_train.count(0) #counting the number of 0 Counting the number of 1’s and 0’s in the array Y, we find that there are 44478 images of class 0 and 15522 images of class 1. This problem is known as data … Read morePredicting Invasive Ductal Carcinoma using Convolutional Neural Network (CNN) in Keras

5 Steps of a Data Science Project Lifecycle

The OSEMN framework Data Science Process (a.k.a the O.S.E.M.N. framework) I will walk you through this process using OSEMN framework, which covers every step of the data science project lifecycle from end to end. 1. Obtain Data The very first step of a data science project is straightforward. We obtain the … Read more5 Steps of a Data Science Project Lifecycle

A Lesson on Modern Classification Models

In machine learning, classification problems are one of the most fundamentally exciting and yet challenging existing problems. The implications of a competent classification model are enormous — these models are leveraged for natural language processing text classification, image recognition, data prediction, reinforcement training, and a countless number of further applications. However, the … Read moreA Lesson on Modern Classification Models

Cellular Automata and Driverless Cars

Self organizing networks, IoT, Machine Learning, and Trains Introduction This article started with a simple thought experiment: If all cars were driverless, will we need traffic lights? And in this case I’m speaking of specifically driverless cars — as opposed to self-driving cars which still require a human at the wheel. Perhaps a … Read moreCellular Automata and Driverless Cars