Data Science Austria

Surprising Sorting Tips for Data Scientists

Python, Numpy, Pandas, PyTorch, TensorFlow & SQL Sorting data is a basic task for data scientists and data engineers. Python users have a number of libraries to choose from with built-in, optimized sorting options. Some even work in parallel on GPUs. Surprisingly some sort methods don’t use the stated algorithm … Read moreSurprising Sorting Tips for Data Scientists

Using Publicly Available FracFocus Data and Python’s Matplotlib Function to Visualize Oil and Gas…

I recently wrote some script that automated data pulls from the publicly available FracFocus database, a government-operated data source which provides a comprehensive listing of hydraulic fracturing chemicals pumped in unconventional oil and gas completions jobs in the United States. This database is a great resource — not only for … Read moreUsing Publicly Available FracFocus Data and Python’s Matplotlib Function to Visualize Oil and Gas…

Supercharging Jupyter Notebooks

Jupyter Notebooks are currently the hottest programming environment for Pythonistas the world over, especially those who are into Machine Learning and Data Science. I discovered Jupyter Notebooks when I first started to get serious about Machine Learning a few months ago. Initially, I was simply amazed, loved how everything ran … Read moreSupercharging Jupyter Notebooks

Easily Scrape and Summarize News Articles Using Python

Webscraping: Now let’s scrape! First, we’ll turn the page content into a BeautifulSoup object, which will allow us to parse the HTML tags. # Turn page into BeautifulSoup object to access HTML tagssoup = BeautifulSoup(page) Then, we’ll need to figure out which HTML tags contain the headline and the main … Read moreEasily Scrape and Summarize News Articles Using Python

Maximizing group happiness in White Elephants using the Hungarian optimal assignment algorithm

Let’s consider a simple scenario in which four players (Alex, Brad, Chloe, and Daisy) are participating in a White Elephant. After opening the presents in order, everyone feels like the distribution of presents is suboptimal. They feel like if they only knew how much each person liked each present, they … Read moreMaximizing group happiness in White Elephants using the Hungarian optimal assignment algorithm

Visualizing Support Vector Machine Decision Boundary

Pipeline, GridSearchCV and Contour Plot Decision Boundary (Picture: Author’s Own Work, Saitama, Japan) In a previous post I have described about principal component analysis (PCA) in detail and, the mathematics behind support vector machine (SVM) algorithm in another. Here, I will combine SVM, PCA, and Grid-search Cross-Validation to create a … Read moreVisualizing Support Vector Machine Decision Boundary

Reducing Data Inconsistencies with POI Normalization

By Franki Chamaki Data normalization is an elegant technique that reduces data inconsistency. Especially when we are dealing with a huge dataset.I will guide you through this article to the steps I have followed with a couple of choices that I have made. First of all, let me introduce the … Read moreReducing Data Inconsistencies with POI Normalization

Machine Learning Pipelines: Feature Engineering Numbers

A really important part of any machine learning model is the data, especially the features used. In this article, we will go over where feature engineering falls in the machine learning pipeline, and how to do some feature engineering on numbers using binning, transformations, and normalization. The real benefit of … Read moreMachine Learning Pipelines: Feature Engineering Numbers