﻿ July 2019 – Page 15 – Data Science Austria

The Model The purpose is to determine what fraction of a portfolio to invest in each of several possible assets with the goal of minimizing the volatility of the portfolio, subject to a target return. To frame the question mathematically, suppose f is an n-dimensional vector of the fractions that I’ll invest in each of … Read moreDon’t Put All Your Eggs in One Basket

## Artificial Intelligence in Video Games

An overview of how video game A.I. has developed over time and current uses in games today Written by Laura E. Shummon Maass and Andy Luc Virtual Reality Photo by Harsch Shivam Most people probably imagine that the majority of games released in the last couple of years have highly sophisticated A.I. for any non-player controlled characters, … Read moreArtificial Intelligence in Video Games

## My first year as a Project Manager for Artificial Intelligence (AI)

It has already been more than a year since I started working as a Project Manager for Artificial Intelligence (AI). I suppose you don’t notice the time passing when you love your job. I started onto this role with a background in wireless communication, something which is not usual and mostly helpful while working at … Read moreMy first year as a Project Manager for Artificial Intelligence (AI)

## Implement K-Nearest Neighbors classification Algorithm

Building Heart disease classifier using K-NN algorithm source The most crucial task in the healthcare field is disease diagnosis. If a disease is diagnosed early, many lives can be saved. Machine learning classification techniques can significantly benefit the medical field by providing an accurate and quick diagnosis of diseases. Hence, save time for both doctors … Read moreImplement K-Nearest Neighbors classification Algorithm

## A Python Beginner’s Look at .loc –Part 2

Setting row and column values in a pandas DataFrame As a Python beginner, using .loc to retrieve and update values in a pandas dataframe just wasn’t clicking for me. In an earlier post, I shared what I’d learned about retrieving data with .loc. Today, we’ll talk about setting values. As a refresher, here are the … Read moreA Python Beginner’s Look at .loc –Part 2

## Ecom Data Series: What is Demand Forecasting?

MAKING ECOMMERCE DATA SCIENCE CONCEPTS SIMPLE ONE TOPIC AT A TIME. Black magic that has powered retail and logistics operations for generations. Ecom Data Talk Episode 4: What is Demand Forecasting? Understanding of past events to predict future sales 📈📊 is fundamental to retail and ecommerce operation optimization. Before you accurately measure your pricing and … Read moreEcom Data Series: What is Demand Forecasting?

## Language Detection Benchmark using Production Data

This is a benchmark on real-life social media data for multilingual language detection algorithms. The Tower of Babel by Pieter Bruegel the Elder (1563) As data scientists, we’re accustomed to processing many different types of data. But when it comes to text-based data, knowing the language of the data is a top priority. I experienced this … Read moreLanguage Detection Benchmark using Production Data

## Press Coverage of the early 2020 Primary

Observations of the early press coverage in the 2020 Democratic presidential primary race Admittedly, we’re still in the year 2019 and the next U.S. presidential election is 2020, about 17 months from the time of this writing. However, the election process has already begun, and there are over 20 individuals who have declared candidacies and are … Read morePress Coverage of the early 2020 Primary

## Apply and Lambda usage in pandas

Filtering a dataframe Filtering…. Pandas make filtering and subsetting dataframes pretty easy. You can filter and subset dataframes using normal operators and &,|,~ operators. # Single condition: dataframe with all movies rated greater than 8 df_gt_8 = df[df[‘Rating’]>8] # Multiple conditions: AND – dataframe with all movies rated greater than 8 and having more than … Read moreApply and Lambda usage in pandas

## Malware Detection Using Deep Learning

Malware Detection Using Convolutional Neural Networks In fast.ai Photo by Markus Spiske on Unsplash What is Malware? Malware refers to malicious software perpetrators dispatch to infect individual computers or an entire organization’s network. It exploits target system vulnerabilities, such as a bug in legitimate software (e.g., a browser or web application plugin) that can be … Read moreMalware Detection Using Deep Learning

## Bayesian inference problem, MCMC and variational inference

Markov Chains Monte Carlo (MCMC) As we mentioned before, one of the main difficulty faced when dealing with a Bayesian inference problem comes from the normalisation factor. In this section we describe MCMC sampling methods that constitute a possible solution to overcome this issue as well as some others computational difficulties related to Bayesian inference. The … Read moreBayesian inference problem, MCMC and variational inference

## Uncovering what neural nets “see” with FlashTorch

Motivation behind FlathTorch When I discovered the world of feature visualisation, I got immediately drawn to its potential in making neural nets more interpretable and explainable. Then I quickly realised that there was no tool available to easily apply these techniques to neural networks I’ve built in PyTorch. So I decided to build one — FlashTorch, which … Read moreUncovering what neural nets “see” with FlashTorch

## VLOOKUP in R with Schwartau Beehive Data

I started learning R back in 2016 in college thanks to a couple of my professors who used it to teach statistics: Dr. Grimshaw and Dr. Lawson. Thanks to the R community I’ve learned a lot more since then, but recently I did an embarrassing Google search for “how to do VLOOKUP in r.” For those of … Read moreVLOOKUP in R with Schwartau Beehive Data

## 7 Ways to Secure Amazon Athena

Broadly, data security can be considered in two areas: when data is at rest and when data is in flight. Let’s consider data at rest. Scenario #1: You have an S3 bucket containing data you want to query from Athena. How can you ensure the data is secure in the bucket? First, make sure the … Read more7 Ways to Secure Amazon Athena

## Tweepy for beginners

Using Twitter’s API to build your own data set A good way to build out your portfolio is with a natural language processing project, but like every project, the first step is getting hold of the data. Twitter can be a great resource for text data; it has an API, credentials are easy to acquire and … Read moreTweepy for beginners

## Prob/Stat for Data Sci: Math + R + Data

My new book, Probability and Statistics for Data Science: Math + R + Data, pub. by the CRC Press, was released on June 24! This book arose from an open-source text I wrote and have been teaching from. The open source version will still be available, though rather different from the published one. This is … Read moreProb/Stat for Data Sci: Math + R + Data

## Powerlytics: Impact of Age, Gender, and Weight on Total Weight Lifted in Powerlifting Meets

A. Background The Open Powerlifting initiative attempts to create an accurate and open archive of all powerlifting meet data throughout the world. As someone who recently started competing again after a six year delay from powerlifting, I often mess around with the Open Powerlifting data as it’s of personal interest. Most of the anlysis that … Read morePowerlytics: Impact of Age, Gender, and Weight on Total Weight Lifted in Powerlifting Meets

## How to write a do-while loop on Tensorflow?

Two difficulties arise: There is no simple while statement in Tensorflow, and instead we must use the function tf.while_loop(cond, body, loop_vars) . Tensorflow — — at least in graph mode — — prohibits using tf.Tensor objects as boolean objects (True/False) for control flow. We must instead use the tf.cond(pred, true_fn, false_fn) statement. Concerning the first … Read moreHow to write a do-while loop on Tensorflow?

I’m looking at ways to effectively visualise the splits data for the 2019 edition of the Comrades Marathon. My objectives are to provide: an overall view of the splits across the entire field and a detailed view for individual runners (relative to the rest of the field). Ridge Plot My working solution for visualising the … Read moreComrades Marathon (2019) Splits

## Hubway Station Metrics

Hubway, a bike sharing system in Boston, was launched in July of 2011. In the past 8 years, they have expanded to over 150 locations throughout the city. In 2014, as a part of a data science challenge, Hubway made 3 years of its data public. This reflected every time a user started or ended … Read moreHubway Station Metrics

## NVIDIA Jetson Nano and LEGO Minifigures

LEGO Minifigures object detection with NVIDIA Jetson Nano. NVIDIA Jetson Nano is a small AI computer which people often refer to it as “Raspberry Pi on steroids.” I received my Jetson Nano Developer Kit a few days ago and decided to build a small project with it: LEGO Minifigures object detection. Setting up Jetson Nano … Read moreNVIDIA Jetson Nano and LEGO Minifigures

## Reordering and facetting for ggplot2

I recently wrote about the release of tidytext 0.2.1, and one of the most useful new features in this release is a couple of helper functions for making plots with ggplot2. These helper functions address a class of challenges that often arises when dealing with text data, so we’ve included them in the tidytext package. … Read moreReordering and facetting for ggplot2

## B3 is shutting down its ftp site

Well, bad news travels fast. Over the last couple of weeks I’ve been receiving a couple of emails regarding B3’s decision of shutting down its ftp site. More specifically, users are eager to know how it will impact my data grabbing packages in CRAN. I’ll use this post to explain the situation for everyone. The … Read moreB3 is shutting down its ftp site

## Imagine your Data Before You Collect It

As data scientists, we are often presented with a dataset and are asked to use it to produce insights. We use R to wrangle, visualize, model, and produce tables and plots for sharing or publication. When we focus on the data in hand in this way, we don’t get to consider where the data came … Read moreImagine your Data Before You Collect It