Data Science Austria

Latest News about


Custom Named Entity Recognition Using spaCy

Figure 1: Source What is Named Entity Recognition (NER)? Named entity recognition (NER) is a sub-task of information extraction (IE) that seeks out and categorises specified entities in a body or bodies of texts. NER is also simply known as entity identification, entity chunking and entity extraction. NER is used in

Fixing the machine behind the machines

Immersion as a means to address bias Photo by Alina Grubnyak on Upsplash Recently, a technology firm sent a group of its seniors leaders for a week-long learning expedition to a small town in southern West Virginia. On the surface, the location was a strange choice: with a population of ~3000 and

How to learn data science on your own: a practical guide

credit: If you’ve ever worked from home, you know that it’s not the magical, liberating experience most people imagine. Keeping your focus and morale up and keeping colleagues in the loop aren’t as easy as most people assume. But fortunately, the work-from-home problem has gotten a lot of attention:

Spark JOIN using REGEX

Context For the past couple of months I have been struggling with this small problem. I have a list of REGEX patterns and I want to know which WIKIPEDIA article contains them. What I wanted to end with was a table with the following columns: Wikipedia Article ID Wikipedia Article

Tailored Savings Recommender

Building a Recommender: Step 1: Gather + Clean The first step to building my recommender was to gather San Francisco business data. I used a subset of the Registered Business Location dataset provided by the City and County of San Francisco for a list of businesses. From this business list, I

Code for Boston’s Safe Drinking Water Project

My first time at a Code for Boston Hack Night was last December. It was right in the middle of the intensive data science program I was in. We had multiple labs due that week (as was the norm), and several of my classmates were taken aback by the thought

Master basics of machine learning by solving a hackathon problem

Learn how to solve regression problem step by step and gain decent rank in hackathon leaderboards Hackathons are a good way to learn and implement new concepts in a short span of time. Today we are going to cover basic steps in machine learning and how to get a good

4 Ways Data Science Could Revolutionize the Testing Phase in Nearly Every Industry

The most successful companies in all industries typically have testing phases that help them develop new products, test new materials, guide marketing campaigns and more. Data science and big data platforms could collectively upend the testing phase in almost every industry, helping companies save money and better assess their results.

The Major Flaw with Data Scientists

The “academic working style” In academia, a typical PhD thesis goes like this: Pick a narrowly defined problem Focus on it for years Find the best humanly possible solution Finally publish, but only after extensive reviews A large portion of data science recruits have been through this process, and it has

How to assess a binary Logistic Regressor with scikit-learn

Functionality Overview Logistic Regression is a valuable classifier for its interpretability. This code snippet provides a cut-and-paste function that displays the metrics that matter when logistic regression is used for binary classification problems. Everything here is provided by scikit-learn already, but can be time consuming and repetitive to manually call