﻿ Featured – Page 144 – Data Science Austria

## De-Googling Bach: Counterpointing Bach’s Rules of the Road With American Populist Music

Counterpointing Bach’s Rules of the Road With American Populist Music “Thnking Outside the Bachs” by Max Harper Ellert I had fun this week playing with the Google Doodle to create Bach harmonies from simple melodies. Looking through some of the articles about the process of melding A.I. and the principles of counterpoint was interesting too, like this … Read more De-Googling Bach: Counterpointing Bach’s Rules of the Road With American Populist Music

## Corners in Images and Angular Representation of Their Relationships

Corner detection has been an important subject in image processing. It is essential and important, because it helps us find the unique features in images. There are several methods for detecting corners in images. The most famous one, that I assume, is Harris Corner Detection. After I read about it in Open-CV documentation, it gave … Read more Corners in Images and Angular Representation of Their Relationships

## Understanding Negative Log Loss

While learning fast.ai, I decided to test out the “3 lines of code” on some dataset other than the ones used in the course. The wiki page of fast.ai has some recommendations and I decided to try out as many as possible. The first recommended dataset under the easy category was Dogs vs. Cats Redux: … Read more Understanding Negative Log Loss

## How to build an Autonomous Sailboat Using Machine Learning

We teach machine learning in quite an onorthodox fashion: by letting participants turn a real sailing yacht into a sail racing robot. In this blog post we will take apart this challenge and focus on the first sub-task: using machine learning to find the optimal course to steer in a sailing race. You will learn how … Read more How to build an Autonomous Sailboat Using Machine Learning

## Breaking down Mean Average Precision (mAP)

What AP does, in this case, is to penalize models that are not able to sort G’ with TPs leading the set. It provides a number that is able to quantify the goodness of the sort based on the score function d( , ). By dividing the sum of precision with the total GTP instead of … Read more Breaking down Mean Average Precision (mAP)

## On Retractions in Biomedical Literature

The fierce competition in academia and the rush to publish, many times lead to flawed results and conclusions in scientific publications. While some of these are honest mistakes, others are deliberate scientific misconduct. According to one study, 76% of retractions were due to scientific misconduct in papers retracted from a specific journal¹. Another study from … Read more On Retractions in Biomedical Literature

## Will Scientific Research be able to avoid Artificial Intelligence pitfalls?

It’s now obvious that AI, Machine Learning and Deep Learning are no longer buzzwords as they’re getting more and more present in every industry. Notwithstanding the trend has been overhyped in 2017, we are now certain that these technologies will be ubiquitous by 2020. Scientific research has not been left behind and AI has been … Read more Will Scientific Research be able to avoid Artificial Intelligence
pitfalls?

## Don’t let them GO!

Using machine learning to detect customer churn. We have an example of a virtual company called ‘Sparkify’ who offers paid and free listening service, the customers can switch between either service, and they can cancel their subscription at any time. The given customers dataset is huge (12GB), thus the standard tools for analysis and machine learning … Read more Don’t let them GO!

## Exploring FIFA

SalRiteBlockedUnblockFollowFollowing Mar 24 ‘The thing about football — the important thing about football — is that it is not just about football.’ ~Sir Terry Pratchett. Soccer or Association Football, is not just a game, its an emotion for many. People follow their favorite Clubs no lesser than their Religion! Great Players are celebrated all over the world. But not … Read more Exploring FIFA

## The Deployment Pain

Possible Causes of Deployment Anxiety This article was co-authored with Patrick Slavenburg The data science cycle in magnets: data access, data processing, model training, and deployment In October 2017, I was running the KNIME booth at the ODSC London conference. At the booth, we had the usual conference material to distribute: informative papers, various gadgets, … Read more The Deployment Pain

## Natural Language Processing with Spacy in Node.js

Show Me some Examples Extract Dates Say you want to extract all of the dates from this text: The United States increased diplomatic, military, and economic pressures on the Soviet Union, at a time when the communist state was already suffering from economic stagnation. On 12 June 1982, a million protesters gathered in Central Park, New … Read more Natural Language Processing with Spacy in Node.js

## Something You don’t know about data File if you just a Starter in Data Science, Import data File…

To be a master in data science, You have to understand how to manage your data and import it from the web because approx. 90% of data in real-world come straight from the internet. Data Engineer Life ( Source: Agula) If you are new to Data Science field, then you must be working hard to learn … Read more Something You don’t know about data File if you just a Starter in Data Science, Import data File…

## DeViSE Zero-shot learning

Let’s take a closer look at the class probabilities an image classifier returns: With a softmax output layer, each picture can belong to only one single category as softmax is designed to assign a high probability to one single class. This means that you should not introduce an additional category “dog” because the network would … Read more DeViSE Zero-shot learning

## The complete beginner’s guide to machine learning: simple linear regression in four lines of code!

Even you can build a machine learning model. Seriously! Good data alone doesn’t always tell the whole story. Are you trying to figure out what someone’s salary should be based on their years of experience? Do you need to examine how much you’re spending on advertising in relation to your yearly sales? Linear regression might … Read more The complete beginner’s guide to machine learning: simple linear regression in four lines of code!

## Which Data Science Bootcamp is right for you?

Photo by NESA by Makers on Unsplash If you’re thinking about attending a data science bootcamp but have zero data science experience yourself, you’ll probably not be able to sort the good from the bad. You won’t know which ones focus on the right things, the unnecessary things, the weird edge-case things. And most importantly, you … Read more Which Data Science Bootcamp is right for you?

## Learning Theory: (Agnostic) Probably Approximately Correct Learning

In my previous article, I discussed what is Empirical Risk Minimization and the proof that it yields a satisfactory hypothesis under certain assumptions. Now I want to discuss Probably Approximately Correct Learning (which is quite a mouthful but kinda cool), which is a generalization of ERM. For those who are not familiar with ERM, I … Read more Learning Theory: (Agnostic) Probably Approximately Correct Learning

## Everybody has a right to know what’s happening with the planet: towards a global commons

The importance of knowing our environmental history How can we judge today if we don’t know what happened yesterday? For anyone to be able to understand ecosystem services and the value they represent to the environment, they must first have insight into past environmental conditions. Some selected point in the past (often referred to in … Read more Everybody has a right to know what’s happening with the planet: towards a global commons

## Overcoming challenges when designing a fraud detection system

A word on how oversampling our data, and choosing the right model and metric can improve our prediction systems The recent advances in the field AI have opened the doors to a plethora of smart and personalized methods of fraud detection. What once was an area that required a numerous amount of manual labor, is … Read more Overcoming challenges when designing a fraud detection system

## Statistical Overview of Linear Regression (Examples in Python)

coef: These are the estimates of the factor coefficients. Oftentimes it would not make sense to consider the interpretation of the intercept term. For instance, in our case, the intercept term has to do with the case where the house has 0 rooms…it doesn’t make sense for a house to have no rooms. On the … Read more Statistical Overview of Linear Regression (Examples in Python)

## When Excel isn’t enough: Using Python to clean your Data, automate Excel and much more…

@headwayio How a Data Analyst can survive in a spreadsheet-driven organization Excel is a very popular tool in many companies, and Data Analysts and Data Scientists alike often find themselves making it part of their daily arsenal of tools for data analysis and visualization, but not always by choice. This was certainly my experience at … Read more When Excel isn’t enough: Using Python to clean your Data, automate Excel and much more…

## Should you Fly or Should you Drive?

The thought of a plane crash gives me the creeps because I need to fly regularly home to visit my family. Recently there was a tragic accident by an Ethiopian airline where all passengers died. If you are interested in details about the plane crash, you can get them here: If such a crash happens, … Read more Should you Fly or Should you Drive?

## Facial Keypoint Detection: Detect relevant features of face in a go using CNN & your own dataset…

Facial key-points are relevant for a variety of tasks, such as face filters, emotion recognition, pose recognition, and so on. So if you’re onto these projects, keep reading! In this project, facial key-points (also called facial landmarks) are the small magenta dots shown on each of the faces in the image below. In each training … Read more Facial Keypoint Detection: Detect relevant features of face in a go using CNN & your own dataset…

## Big data analytics: Predicting customer churn with PySpark

Exploratory analysis Just from a cursory look at the data, we noticed that there were rows where the userId was missing. Upon further investigation, it looks like only the following pages have no userId: + — — — — — — — — — -+| page|+ — — — — — — — — — … Read more Big data analytics: Predicting customer churn with PySpark

## Exploratory Data Analysis: An Illustration in Python

Import the Toolkit We begin by importing some Python packages. These will serve as your toolkit for an effective EDA: import numpy as npimport seaborn as snsimport matplotlib.pyplot as pltimport pandas as pd %config InlineBackend.figure_format = ‘retina’%matplotlib inline In this example, we will use the Boston housing dataset (practice with it afterward and convince yourself). Let’s … Read more Exploratory Data Analysis: An Illustration in Python

## Statistician proves that statistics are boring

Back-to-basics with nuanced vocabulary I’m about to prove to you that statistics are boring… to help you appreciate the point of all those fancy calculations that statisticians like myself get up to. As an added bonus, this is pretty much what you’d learn about on day 1 of most STAT101 classes, so it doubles as … Read more Statistician proves that statistics are boring

## Predicting the ‘Future’ with Facebook’s Prophet

Making the Predictions Making the dataset ‘Prophet’ compliant. Let’s convert the data in the format desired by Prophet. We shall rename ‘Date’: ‘ds’ and ‘Views’: ‘y’ df.columns = [‘ds’,’y’]df.head() Prophet follows the sklearn model API wherein an instance of the Prophet class is created and then the fit and predict methods are called. The model … Read more Predicting the ‘Future’ with Facebook’s Prophet

## A Design Thinking Mindset for Data Science

Adapted from a research paper written for The University of Texas capstone. Abstract Data science has received recent attention in the technical research and business strategy since; however, there is an opportunity for increased research and improvements on the data science research process itself. Through the research methods described in this paper, we believe there … Read more A Design Thinking Mindset for Data Science

## Managing Data Science Workflows the Uber Way

Orchestrating workflows is one of the main challenges of machine learning solutions in the real world. A machine learning solution involves more than just picking the right model and productizing it. Data ingestion, training, deployment or optimization are common steps in any machine learning workflow. Unfortunately, the technology stacks for building and managing coordinated actions … Read more Managing Data Science Workflows the Uber Way

## Data science productionization: maintenance

In the last post, I used a simple word-normalizing function to illustrate a few principles of code portability: Now let’s look at the same function, but this time prioritizing maintenance: The first part doesn’t even include the function itself. What I’ve set up here is logging infrastructure. I’ve designated a file for recording errors (called … Read more Data science productionization: maintenance

## Weekly Selection — Mar 22, 2019

Data scientist: The sexiest job of the 22nd century By Cassie Kozyrkov — 6 min read Data science has been called “the sexiest job of the 21st century” — a sentiment I’d believe if I saw more business leaders hiring data scientists into environments where we can be effective. Instead, many of us feel misunderstood and invisible. Favorite

## Creating AI for GameBoy Part 4: Q-Learning and Variations

The part we’ve all been waiting for Hello and welcome to part 4 of Building an AI for Gameboy! This is where the real magic happens — we’ve built our tools and now it is time to set them in motion. A quick recap of our journey so far: first, we built a controller so that we could … Read more Creating AI for GameBoy Part 4: Q-Learning and Variations

## Colorizing Old B&W Photos and Videos With the Help of AI

This project is based on a research work developed at the University of California, Berkeley by Richard Zhang, Phillip Isola, and Alexei A. Efros. Colorful Image Colorization. The idea behind this tutorial is to develop a fully automatic approach that will generate realistic colorizations of Black & White (B&W) photos and by extension, videos. As … Read more Colorizing Old B&W Photos and Videos With the Help of AI

## Evaluating Keras neural network performance using Yellowbrick visualizations

If you have ever used Keras to build a machine learning model, you’ve probably made a plot like this one before: {training, validation} {loss, accuracy} plots from a Keras model training run This is a matrix of training loss, validation loss, training accuracy, and validation accuracy plots, and it’s an essential first step for evaluating the … Read more Evaluating Keras neural network performance using Yellowbrick visualizations

## BERT in Keras with Tensorflow hub

At Strong Analytics, many of our projects involve using deep learning for natural language processing. In one recent project we worked to encourage kids to explore freely online while making sure they stayed safe from cyberbullying and online abuse, while another involved predicting deductible expenses from calendar and email events. A key component of any … Read more BERT in Keras with Tensorflow hub

## Data science productionization: portability

The first step to productionizing data science is to make it portable. To explain what I mean, let’s look at a simple example of code portability: The above code performs a simple task that is commonly found in text analysis: take a word, remove all white space on the ends, and lowercase all the characters; … Read more Data science productionization: portability

## Playing Poker on Mars: How AI Mastered the Game

Or, the Edge of Trillions of Hands By Dirk Knemeyer and Jonathan Follett Figure 01: Poker, the quintessentially human game of gamblers and dreamers[Illustration: Le Poker (Poker) by Félix Vallotton, 1896 woodcut, National Gallery of Art, Open Access] Poker seems like a quintessentially human game. At a superficial level poker is a more casual, social, and approachable … Read more Playing Poker on Mars: How AI Mastered the Game

## From software engineering to Data Science: What resources helped me?

Almost 2 years ago, I took the decision to quit my job as a software engineer and to start looking for a job in the machine learning field. Right away after quitting my job, I wrote an article in my blog Up to my new Tech challenges and from there the journey started. Photo by … Read more From software engineering to Data Science: What resources helped me?

## Building a Music Recommendation Engine with Probabilistic Matrix Factorization in PyTorch

Recommendation systems are one of the most widespread forms of machine learning in modern society. Whether you are looking for your next show to watch on Netflix or listening to an automated music playlist on Spotify, recommender systems impact almost all aspects of the modern user experience. One of the most common ways to build … Read more Building a Music Recommendation Engine with Probabilistic Matrix Factorization in PyTorch

## Data Cleaning with R and the Tidyverse: Detecting Missing Values

Data cleaning is one of the most important aspects of data science. As a data scientist, you can expect to spend up to 80% of your time cleaning data. In a previous post I walked through a number of data cleaning tasks using Python and the Pandas library. That post got so much attention, I … Read more Data Cleaning with R and the Tidyverse: Detecting Missing Values

## AI, Machine Learning and Data Science Roundup: March 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News TensorFlow Privacy: a Python library for training … Read more AI, Machine Learning and Data Science Roundup: March 2019

## Humans Have Struggled to Understand Rare Cancers, Let’s Give Artificial Intelligence a Turn.

The Challenges of Cancer Subtyping & How Artificial Intelligence Can Help There is a certain level of loneliness that comes with being diagnosed with a rare cancer like intrahepatic cholangiocarcinoma (ICC). Most people wouldn’t even know where that cancer occurs. It is a cancer of the bile duct, a group of tube-like structures that extend out … Read more Humans Have Struggled to Understand Rare Cancers, Let’s Give Artificial Intelligence a Turn.

## Relationships validated between population health chronic indicators

In the last story, we started looking into a 15 year chronic disease dataset from the U.S. Center for Disease Control and Prevention, or CDC. The beginnings of the exploratory data analysis started with understanding the columns and rows of data and what was relevant for further analysis. In this post, we are going to … Read more Relationships validated between population health chronic indicators

## The technologies that every analytics group needs to have

Four game-changing technologies can enable a world-class analytics group As data becomes more and more pervasive in workplaces, many executives are now realizing the value of having an analytics function to support their mission, whatever that is. However, some think that by hiring a few analysts or data scientists they have done all they need … Read more The technologies that every analytics group needs to have

## 10-Step guide to schedule your script using cloud services

Scheduling Python/R scripts using Kaggle and PythonAnywhere cloud services Kaggle account, we will use the kernels to host and run our Python script. PythonAnywhere account, we will use the task scheduling to trigger our script hosted on Kaggle. What do we need to do? Setup Kaggle account and go to ‘Kernels’ tab, then ‘New Kernel’. You can … Read more 10-Step guide to schedule your script using cloud services

## Implementing the XOR Gate using Backpropagation in Neural Networks

Let’s implement the first part of the algorithm. We’ll initialize our weights and expected outputs as per the truth table of XOR. inputs = np.array([[0,0],[0,1],[1,0],[1,1]])expected_output = np.array([[0],[1],[1],[0]]) Step 1: To initialize the weights and biases with random values import numpy as np inputLayerNeurons, hiddenLayerNeurons, outputLayerNeurons = 2,2,1 hidden_weights = np.random.uniform(size=(inputLayerNeurons,hiddenLayerNeurons))hidden_bias =np.random.uniform(size=(1,hiddenLayerNeurons)) output_weights = np.random.uniform(size=(hiddenLayerNeurons,outputLayerNeurons))output_bias = … Read more Implementing the XOR Gate using Backpropagation in Neural Networks

## Learn Machine Learning and Computer Vision using Chicken Rice

The O.S.E.M.N. Framework Let’s follow the 5-step OSEMN framework to guide us through the process. The OSEMN framework is designed to help us focus and prioritize the right data science tasks at different stages. Step 1: Obtain Data We will need to find chicken rice images as our training data set. Luckily for us, there is … Read more Learn Machine Learning and Computer Vision using Chicken Rice

## Making Bets: Predicting When AI Self-Driving Cars Will Be Prevalent

An essential equation for the advent of driverless cars Defining an equation for predicting when AI self-driving cars will be prevalent We all enjoy a good equation. How many times have you quoted or seen Einstein’s famous equation about matter and energy? One of the most famous probabilistic formulas is the celebrated Drake equation, which … Read more Making Bets: Predicting When AI Self-Driving Cars Will Be Prevalent

## Model Based Reinforcement Learning

Introduction If you have ever played a real-time strategy game (RTS) such as Age of Empires or others, you surely know that you start by an almost black screen. The first thing you do is to send units in every direction to scout the terrain and discover the enemy location, as well as the strategic … Read more Model Based Reinforcement Learning

## How To Query the Future

The snapshot images that proved that horses do leave the ground, and sparked the invention of the motion picture camera. A new physics of business intelligence, and thinking BI hasn’t had a breakthrough in a decade. 10 years ago, visual analytics pioneers ushered in a new era of business intelligence with tools that connected data at the … Read more How To Query the Future

## The Data-Driven Guide to Crime in Milwaukee

Time-Based Visualizations When Crimes Occur The following results are indicative of when the crimes were reported, not when the crimes were committed. There is likely to be variability between when the crime occurred and when it was reported, please keep this in mind while reading. Daily Crime Reporting Peaks at 4:00 pm According to Total Crime Reports … Read more The Data-Driven Guide to Crime in Milwaukee