Data Science Austria

Rating London Properties by their “Pub Score”: An Alternative Lens on the London Housing Market

I’m looking to buy a house soon. That was the spark for this project. I started looking on various property websites, and whilst the information on them is great, I felt that if I could get hold of the data used to populate such sites, I could approach my house hunt … Read moreRating London Properties by their “Pub Score”: An Alternative Lens on the London Housing Market

Random forest text classification: Trump v. Obama

Can I successfully determine the differences in speech content between the 2 most recent American Presidents? There may never have been ­­2 consecutive presidents who differ so much in their character as Presidents Donald J. Trump and Barack Obama. I thought it would, therefore, be pretty interesting to do some … Read moreRandom forest text classification: Trump v. Obama

Unsupervised Classification Project: Building a Movie Recommender with Clustering Analysis and…

Introduction The goal of this project is to find out the similarities within groups of people in order to build a movie recommending system for users. We are going to analyze a dataset from Netflix database to explore the characteristics that people share in movies’ taste, based on how they … Read moreUnsupervised Classification Project: Building a Movie Recommender with Clustering Analysis and…

Dockerizing Python Flask app and Conda environment

Originally published at www.easy-analysis.com Use Docker to package your Python Flask app and your Conda environment. This post will describe how to dockerize your Python Flask app and recreate your Conda Python environment. So you are developing a Python Flask app, and you have set up a Conda virtual environment … Read moreDockerizing Python Flask app and Conda environment

Virtual, Headless, and Distributed (Oh My!)

Fearless Web Scraping with Python in DataLab Notebooks This post empowers the Pythonista, with a complete framework to explore the world of data on the internet — all behind randomized proxy servers in a fast parallelized sequence, while protecting your company’s immutable IP from curious eyes, and other potential trolls. With this … Read moreVirtual, Headless, and Distributed (Oh My!)

We Must Prevent Data Pseudoscience Before It’s Too Late

A Beacon of Hope: The Hypatic Oath Hypatia at the Haymarket Theatre, H. M. Paget via Wikimedia Commons Hypatia was a philosopher, mathematician, and astronomer who lived in Alexandria, Egypt around 400 CE. While much of her legend is most likely apocryphal, it is believed that Hypatia’s death resulted in the … Read moreWe Must Prevent Data Pseudoscience Before It’s Too Late

An Orwellian Approach to the Litter Problem

Photo by Paweł Czerwiński on Unsplash Using computer vision to detect someone missing the garbage Anyone who has lived in an urban environment knows how filthy it can be. No matter the effort exerted by municipalities, trash finds a way to roll through cities like tumble weeds. Simple solutions involve sending individuals … Read moreAn Orwellian Approach to the Litter Problem

It’s OK to use spreadsheets in data science

Because they’re great in a bunch of messy sub-optimal data science contexts. With all the great sophisticated data tools that exist out there these days, it’s easy to think that spreadsheets are too primitive for use in serious data science work. The fact that there’s literally 20+ years of literature … Read moreIt’s OK to use spreadsheets in data science