﻿ data_admin – Page 126 – Data Science Austria

## Assumptions Of Linear Regression Algorithm

These Assumptions which when satisfied while building a linear regression model produces a best fit model for the given set of data. Linear Regression — Introduction Linear Regression is a machine learning algorithm based on supervised learning.It performs a regression task to compute the regression coefficients.Regression models a target prediction based on independent variables. Linear Regression performs the … Read more Assumptions Of Linear Regression Algorithm

## Predicting the Future (of Music)

Using Python and linear regression to predict the popularity of music Taylor Fogarty, Peter Worcester, Lata Goudel In the era of Big Data, we have the all-time advantage when it comes to making predictions. With websites like Kaggle.com and OurWorldinData.org, we have endless data at our fingertips- data we can use to predict things like who … Read more Predicting the Future (of Music)

## Model Assumptions for Regression Problems

Photo by Campaign Creators on Unsplash The Key Ideas to Consider for Your Next Regression Model Tom Allport, Jammie Wang, Jack Stalfort This article is going to walk through a few of the ideas you need to be thinking about when building a regression model. In order to produce a well fitting regression model for your … Read more Model Assumptions for Regression Problems

## Morbidity, Mortality & Murder in Westeros

Survival analysis and a data-based look at death in A Game of Thrones So the long-running hit TV show A Game of Thrones has finally come to an end after 8 years, 73 episodes and over 200 deaths. To ensure this is a spoiler free article I’ll end the discussion of Season 8 there… This is … Read more Morbidity, Mortality & Murder in Westeros

## An Easy Guide to Gauge Equivariant Convolutional Networks

Manifolds A manifold is a simple thing. Every 2-D surface you see can be considered a manifold. The surface of a sphere, the surface of a cube, all manifolds. But its not restricted to 2-D, heck, its not even restricted to things that can be imagined. A curve is a manifold. 4-D Space-time is a … Read more An Easy Guide to Gauge Equivariant Convolutional Networks

## Binary Classification Model Evaluation

Authors: Ishaan Dey, Evan Heitman, & Jagerynn T. Verano Introduction to Classification A doctor wants to know if his patient has a disease. A credit card company is interested in determining if a certain transaction is a fraud. A graduate school candidate is interested if whether or not it’s likely that she gets accepted into … Read more Binary Classification Model Evaluation

## Intro to Feature Selection Methods for Data Science

A guide to making data more manageable Authored by: Ryan Farmar, Ning Han, Madeline McCombe Photo by Eugenio Mazzone on Unsplash What is feature selection? Well, let’s start by defining what a feature is. A feature is an X variable in your dataset, most often defined by a column. Many datasets nowadays can have 100+ features … Read more Intro to Feature Selection Methods for Data Science

## The Importance of Analyzing Model Assumptions in Machine Learning

By Reilly Meinert, Adeet Patel, & Simon Li Checking model assumptions is essential prior to building a model that will be used for prediction. If assumptions are not met, the model may inaccurately reflect the data and will likely result in inaccurate predictions. Each model has different assumptions that must be met, so checking assumptions … Read more The Importance of Analyzing Model Assumptions in Machine Learning

## Maximizing Scarce Maintenance Resources with Data

A common scenario facing governmental and non-governmental organizations is how to deploy finite resources to maximum impact. Often these organizations must make decisions without full information transparency. This essay demonstrates how data analysis and modeling can be used to help organizations in this scenario maximize their resources. To bring the concepts to life, we will … Read more Maximizing Scarce Maintenance Resources with Data

## Rstudio & ThinkR roadshow – June 6 – Paris

On June the 6th, 2019, Rstudio is partnering with ThinkR to offer you a one day event around “R in production”. See you in Paris! If you’re an experienced developer or a decision-maker looking to learn more about what R and RStudio have to offer, then this event made is for you! During the first … Read more Rstudio & ThinkR roadshow – June 6 – Paris

## Codifying adversarial examples as Features

Separating robust and non-robust features Adversarial examples, being a huge nuisance for AI practitioners, are nevertheless a huge bonus in AI theory, helping us understand the internals of the machine learning models and algorithms, and energizing emerging technologies like GAN. So it is not surprise that the new paper, which I’m going to review here, … Read more Codifying adversarial examples as Features

## Weekly Selection — May 24, 2019

The Simpsons meets Data Visualization By Adam Reevesman — 9 min read There are few things I love more than ​The Simpsons​. It is one of those shows that I think about on a daily basis. With thirty seasons and over 600 episodes, the animated comedy show holds a special place in my heart. Favorite

## Data science for hit song prediction

Can an algorithm predict hit songs? Let’s explore how we can successfully build a hit song classifier using only audio features, as described in my publication (Herremans et al., 2014). During my PhD research I came across a paper by Pachet & Roi (2008) entitled “Hit song science not yet a science”. This was intriguing … Read more Data science for hit song prediction

## Estimators, Loss Functions, Optimizers —Core of ML Algorithms

In order to understand how a machine learning algorithm learns from data to predict an outcome, it is essential to understand the underlying concepts involved in training an algorithm. I assume you have basic machine learning understanding and also basic knowledge of probability and statistics. If not please go through my earlier posts here and … Read more Estimators, Loss Functions, Optimizers —Core of ML Algorithms

## Why and How to do Cross Validation for Machine Learning

Cross-validation is a statistical technique for testing the performance of a Machine Learning model. In particular, a good cross validation method gives us a comprehensive measure of our model’s performance throughout the whole dataset. All cross validation methods follow the same basic procedure: (1) Divide the dataset into 2 parts: training and testing (2) Train … Read more Why and How to do Cross Validation for Machine Learning

## The Thin Line Between Parasites and Mutualists

How agent-based simulations can be used to understand the evolution from mutualism to parasitism and vice versa The word parasite evokes images of tapeworms and ticks while the word mutualist conjures quaint scenes of cleaner fish nibbling inside the mouths of giant sharks and birds riding on the backs of ungulates on the savannah. Clearly one … Read more The Thin Line Between Parasites and Mutualists

## Gradient Descent in Deep Learning

Photo by asoggetti on Unsplash Deep Learning, to a large extent, is really about solving massive nasty optimization problems. A Neural Network is merely a very complicated function, consisting of millions of parameters, that represents a mathematical solution to a problem. Consider the task of image classification. AlexNet is a mathematical function that takes an array … Read more Gradient Descent in Deep Learning

## The craft of intelligent design: An attempt to break free from the pejorative notion of artificial

Have you ever looked up the word “artificial” in the dictionary? Well, I have! Before writing this story, I tried my best to come up with a neutral term to replace “artificial intelligence”. Take a look at some of the results according to the Oxford dictionary: feigned, insincere, false, affected, mannered, unnatural, stilted, contrived, pretended, … Read more The craft of intelligent design: An attempt to break free from the pejorative notion of artificial

## [R]eady for Production: a Joint Event with RStudio and EODA

We’re excited to team up with EODA, an RStudio Full Service Certified Partner, to host a free data science in production event in Frankfurt, Germany, on June 13. This one-day event will be geared for data science and IT teams that want to learn how to integrate their analysis solutions with the optimal IT infrastructure. … Read more [R]eady for Production: a Joint Event with RStudio and EODA

## A Guide to Python’s Virtual Environments

Dante speaks with the traitors in the ice — Canto XXXII. Illustration by Gustave Doré. How Virtual Environments Do Their Thing So you want to know more about virtual environments, eh? Like how an active environment knows how to use the right Python interpreter and how to find the right third party libraries. echo \$PATH It all comes down … Read more A Guide to Python’s Virtual Environments

## Pandas for Football Analysis

Merging the Newly Constructed DataFrame with Scraped Non-Tabular data A prize money column added to this newly formed ‘Merged_premier_league_table’ would make it even more informative. We could start to draw relationships between the average minutes leading in a Premier League game, and Prize money received. However, in this case, when I go to a website, … Read more Pandas for Football Analysis

## Spatial Modelling Tidbits: Honeycomb or Fishnets?

Why we at Locale are fond of hexagonal grids! Source: Wikipedia Introduction If you are a two-degree marketplace like Uber, you cater to millions of users requesting a ride through your driver partners accepting and fulfilling those requests. For a three-degree marketplace like Swiggy, there is another static component added (like restaurants or stores), where … Read more Spatial Modelling Tidbits: Honeycomb or Fishnets?

## A Guide to Conda Environments

Where Environments Live When you create an environment with Python’s venv module, you need to say where it lives by specifying its path. % python3 -m venv /path/to/new/environment Environments created with conda, on the other hand, live by default in the envs/ folder of your Conda directory, whose path will look something like: % /Users/user-name/miniconda3/envs … Read more A Guide to Conda Environments

## Behind The Models: Cholesky Decomposition

The 19th Century Map-Maker’s Trick That Runs Today’s Linear Models and Monte Carlo Simulations André-Louis Cholesky is a bit of an oddity among mathematicians: his work was published posthumously after he died in battle during WWI. He discovered the linear algebra method that carries his name through his work as a late 19th century map … Read more Behind The Models: Cholesky Decomposition

## Estimates on training vs. validation samples

Before moving to cross-validation, it was natural to say “I will burn 50% (say) of my data to train a model, and then use the remaining to fit the model”. For instance, we can use training data for variable selection (e.g. using some stepwise procedure in a logistic regression), and then, once variable have been … Read more Estimates on training vs. validation samples

## Real-Time Dashboards to Support eSports Spectating

This must have been my favorite project during my time at academia. The League of Legends and Counter-strike: Global Offensive dashboards in action! Read on to learn more. My PhD was funded through multiple (European and smaller) projects, on topics such as Learning Analytics, Digital Humanities, and Unemployment. And they were interesting: I designed data … Read more Real-Time Dashboards to Support eSports Spectating

## CNNs, Part 1: An Introduction to Convolutional Neural Networks

A simple guide to what CNNs are, how they work, and how to build one from scratch in Python. There’s been a lot of buzz about Convolution Neural Networks (CNNs) in the past few years, especially because of how they’ve revolutionized the field of Computer Vision. In this post, we’ll build on a basic background knowledge … Read more CNNs, Part 1: An Introduction to Convolutional Neural Networks

## What would a hockey 2-point line look like?

Photo by Braden Barwich on Unsplash Thanks to the NHL stats API, we can find out! The Idea The idea of a 2-point line in hockey isn’t exactly new. Nor is there anything unique about hockey that makes a 2-point line more necessary than say, soccer. I chose to examine hockey because the NHL has the most … Read more What would a hockey 2-point line look like?

## Royal Society of Biology: Introduction to Reproducible Analyses in R

Learn to experiment with R to make analyses and figures more reproducible If you’re in the UK and not too far from York you might be interested in a Royal Society of Biology course which forms part of the Industry Skills Certificate. More details at this link Introduction to Reproducible Analyses in R 24 June … Read more Royal Society of Biology: Introduction to Reproducible Analyses in R

## Pandas for People In A Hurry

Dataframe for all your data exploration needs Photo by Pascal Müller on Unsplash Pandas is the most popular Python library for data manipulation and data analysis. It is a must know for all data scientists! The two Pandas data structures are: Pandas DataFrame Pandas Series I like to think of the Pandas Dataframe almost like an … Read more Pandas for People In A Hurry

## Spotlight on: Julia Silge, Stack Overflow

Julia Silge is joining us as one of our keynote speakers at EARL London 2019. We can’t wait to hear Julia’s full keynote, but until then she kindly answered a few questions. Julia shared with us what we can expect from her address – which will focus on how Stack Overflow uses R and their … Read more Spotlight on: Julia Silge, Stack Overflow

## The CX Revolution: Implementing AI to Deliver Outstanding Customer Experience

Atif M.BlockedUnblockFollowFollowing May 23 Artificial Intelligence (AI) has been around since a long time now; in the late 1980s, we used its power to beat chess pros at their own game and today we’re using it to power self-driving cars. The AI revolution has been one of the most significant evolutions of this century, and … Read more The CX Revolution: Implementing AI to Deliver Outstanding Customer Experience

## What’s Linear About Logistic Regression

There’s already a bunch of amazing articles and videos on Logistic Regression, but it was a struggle for me to understand the connection between the probabilities and the linearity of Logistic, so I figured I would document it here for myself and for those who might be going through the same thing. This will also … Read more What’s Linear About Logistic Regression

## Bayesian estimation of fatality rates and accidents involving cyclists on Queensland roads

In my previous post I built a Shiny app mapping accidents on Queensland roads which was great at showing the problematic areas within cities and regional areas. I have added to this by estimating the fatality rate given the observed accidents and the rate of accidents involving cyclists for SA3 and SA4 areas. I have … Read more Bayesian estimation of fatality rates and accidents involving cyclists on Queensland roads

## OpenAI GPT-2 writes alternate endings for Game of Thrones

I trained the GPT-2 language model on GRRM’s book series “A Song of Ice and Fire” and let it complete the HBO show’s storyline. Can it do better than HBO’s season 8 train-wreck? Game of Thrones season 8 storyline has left its fandom divided with millions of fans (including myself) disappointed by its rushed and … Read more OpenAI GPT-2 writes alternate endings for Game of Thrones

## Decision Tree Regressor explained in depth

Decision Tree algorithm has become one of the most used machine learning algorithm both in competitions like Kaggle as well as in business environment. Decision Tree can be used both in classification and regression problem. This article present the Decision Tree Regression Algorithm along with some advanced topics. Table of Contents Introduction Important Terminology How does … Read more Decision Tree Regressor explained in depth

## Comparing Frequentist, Bayesian and Simulation methods and conclusions

So, a programmer, a frequentist, and a bayesian walk into a bar. No this postisn’t really on the path to some politically incorrect stereotypical humor. Juttrying to make it fun and catch your attention. As the title implies this postis really about applying the differing viewpoints and methodologies inherent inthose approaches to statistics. To be … Read more Comparing Frequentist, Bayesian and Simulation methods and conclusions

## Analysing the HIV pandemic, Part 4: Classification of lab samples

Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the Division of Virology, University of the Free State, and National Health Laboratory Service (NHLS), Bloemfontein, South Africa In this post we complete our series on analysing the … Read more Analysing the HIV pandemic, Part 4: Classification of lab samples

## MRAN snapshots, and you

For almost five years, the entire CRAN repository of R packages has been archived on a daily basis at MRAN. If you use CRAN snapshots from MRAN, we’d love to hear how you use them in this survey. If you’re not familiar with the concept, or just want to learn more, read on. Every day … Read more MRAN snapshots, and you

## The Simpsons meets Data Visualization

Introduction There are few things I love more than ​The Simpsons​. It is one of those shows that I think about on a daily basis. With thirty seasons and over 600 episodes, the animated comedy show holds a special place in my heart. Every so often, I find myself singing along when ​Mr. Plow​ or … Read more The Simpsons meets Data Visualization

## Finding Business Value in Simple Models

Lesson 3: Interpret the results as a guide, not as the answer I tend to assume that a significant amount of bias exists in my models. Working very closely with data over the years has had a sobering effect on me, now that I know how the sausage is made. There are so many assumptions that … Read more Finding Business Value in Simple Models

## The man with a suit and a straw hat

Demystify Anomaly Detection to a non-technical audience AileenBlockedUnblockFollowFollowing May 22 To detect anomaly is to detect outliers. This might seem straightforward but is not always the case. The question is, how do you define an outlier? Let’s first imagine a man wearing a suit. This is normal. Then imagine a man wearing a straw hat, … Read more The man with a suit and a straw hat

## The Quest of Higher Accuracy for CNN Models

In this post, we will learn techniques to improve accuracy using data redesigning, hyper-parameter tuning and model optimization Performance is key when it comes to deep learning models and it becomes an arduous task when you have limited resources. One of the important parameter to measure performance is ‘Accuracy’. This article is all about achieving … Read more The Quest of Higher Accuracy for CNN Models

## Do you Understand it?

An intro to Natural Language Processing Understanding a Language Started with Checking a Dictionary I randomly browsed over the Oxford Dictionary of English today, with a quest to boil language down to the “first principle”. I first checked something that a kid will probably learn at age of 3 — the meaning of “good” — “To be desired or … Read more Do you Understand it?

## How to Get a Return on Your Predictive Analytics Investment: What Companies Should Know

Thanks to predictive analytics, companies can dig into current and historical data and predict future events with more certainty than with experience and educated guesses alone. But, a predictive analytics tool is not a magic solution for enterprises. Applying predictive analytics to business needs is crucial for optimal results. Here are five things you can … Read more How to Get a Return on Your Predictive Analytics Investment: What Companies Should Know

## Converting a Simple Deep Learning Model from PyTorch to TensorFlow

Converting the model to TensorFlow Now, we need to convert the .pt file to a .onnx file using the torch.onnx.export function. There are two things we need to take note here: 1) we need to pass a dummy input through the PyTorch model first before exporting, and 2) the dummy input needs to have the shape (1, … Read more Converting a Simple Deep Learning Model from PyTorch to TensorFlow

## Deep (learning) like Jacques Cousteau – Part 5 – Vector addition

(TL;DR: You can add vectors that have the same number of elements.) LaTeX and MathJax warning for those viewing my feed: please viewdirectly on website! You want to know how to rhyme, you better learn how to addIt’s mathematics Mos Def in ‘Mathematics’ Lasttime,we learnt about scalar multiplication. Let’s get to adding vectorstogether! We will … Read more Deep (learning) like Jacques Cousteau – Part 5 – Vector addition

## Pretty displaying tricks for columnar data in Python

Improve how Python and its libraries show data and make it readable For everyone who has extensively wrangled data using lists, Pandas, or NumPy before, you might have had experienced issues with printing the data in the right way. Especially if there are a lot of columns, displaying the data becomes a hassle. This article … Read more Pretty displaying tricks for columnar data in Python

## Calculating New and Returning Customers in R

I recently came across an issue in which I wanted to calculate new and returning customers. So, I naturally googled about it and was surprised to see that I could not find any solutions on it in R. This has generally been the issue with most blogs/tutorials on R, that they are not very business … Read more Calculating New and Returning Customers in R

## New Color Palette for R

As I was preparing some graphics for a presentation recently, I started digging into some of the different color palette options. My motivation was entirely about creating graphics that weren’t too visually overwhelming, which I found the default “rainbow” palette to be. But as the creators of the viridis R package point out, we also … Read more New Color Palette for R