﻿ data_admin – Page 3 – Data Science Austria

## Seaborn — Let’s make plotting fun

Introduction to Seaborn library in Python If you’ve ever worked with plots such as line plots, bar plots and others in Python, you must have encountered the library called matplotlib. When I started with visualizations in Python, I started with it and even wrote an exciting article about how to use matplotlib to make data … Read moreSeaborn — Let’s make plotting fun

## Breaking Down Richard Sutton’s Policy Gradient With PyTorch And Lunar Lander

Theory Behind The Policy Gradient Algorithm Before we can implement the policy gradient algorithm, we should go over specific math involved with the algorithm. The math is very straight-forward and very easy to follow and for the most part, is reinterpreted from the OpenAI resource mentioned above. First, we define tau to be a trajectory … Read moreBreaking Down Richard Sutton’s Policy Gradient With PyTorch And Lunar Lander

## How to Find a Descent Learning Rate using Tensorflow 2

Taken from http://www.merzpraxis.de/index.php/2016/06/13/der-suchende/ When it comes to building and training Neural Networks, you need to set a massive amount of hyper-parameters. Setting those parameters right has a tremendous influence on the success of your net and also on the time you spend heating the air, aka training you model. One of those parameters that you … Read moreHow to Find a Descent Learning Rate using Tensorflow 2

## Beating the S&P500 Using Machine Learning

Objectives and Methodology A machine learning algorithm written in Python was designed to predict which companies from the S&P 1500 index are likely to beat the S&P 500 index on a monthly basis. To do so, a random forest regression based algorithm, taking as input the financial ratios of all the constituents of the S&P … Read moreBeating the S&P500 Using Machine Learning

## Python Data Preprocessing Using Pandas DataFrame, Spark DataFrame, and Koalas DataFrame

Preparing data for machine learning in Python With widespread use in data preprocessing, data analytics, and machine learning, Pandas, in conjunction with Numpy, Scikit-Learn, and Matplotlib, becomes a de facto data science stack in Python. However, one of the major limitations of Pandas is that Pandas was designed for small datasets that can be handled … Read morePython Data Preprocessing Using Pandas DataFrame, Spark DataFrame, and Koalas DataFrame

## AutoAI: The Secret Sauce

Ramp up your path to AI On making the data scientist life easier Photo by Sam X on Unsplash In a recent competition for predicting consumer credit risk, AutoAI beat 90% of the participating data scientists. AutoAI is a new tool that utilizes sophisticated training features to automate many of the complicated and time-consuming tasks … Read moreAutoAI: The Secret Sauce

## Sorting data frames in pandas

How to sort data frames quickly and efficiently Many beginner data scientists try to sort their data frames by writing complicated functions. This is not the most efficient or easiest way to do it. Do not reinvent the wheel and use sort_values() function provided by pandas package. Let’s have a look at the real-life example … Read moreSorting data frames in pandas

## Kepler.GL & Jupyter Notebooks: Geospatial Data Visualization with Uber’s opensource Kepler.GL

Plot Geospatial data inside Jupyter notebook & Easily interact with Kepler’s User interface to tweak the visualisation. kepler.gl for Jupyter is an excellent tool for big Geospatial data visualisation. Combine world-class visualisation tool, easy to use User interface (UI), and flexibility of python and Jupyter notebooks (3D Visualization GIF below, more in the article). 3D … Read moreKepler.GL & Jupyter Notebooks: Geospatial Data Visualization with Uber’s opensource Kepler.GL

## BLEU-BERT-y: Comparing sentence scores

Blueberries — Photo by Joanna Kosinska on Unsplash The goal of this story is to understand BLEU as it is a widely used measurement of MT models and to investigate its relation to BERT. This is the first story of my project where I try to use BERT contextualised embedding vectors in the Neural Machine … Read moreBLEU-BERT-y: Comparing sentence scores

## What I’ve Learned Doing Data Science and Analytics at 8 Different Companies and 4 Jobs in 6 Years

Over the past 6 years, I’ve done Data Science and Analytics projects at companies like Adobe, USAA Bank, Nu Skin, Purple Mattress, Franklin Sports, and others. I’ve also held 4 different jobs in analytics, one with a mid-size IT consulting firm, one with a major corporation, one with a startup, and one with an e-commerce … Read moreWhat I’ve Learned Doing Data Science and Analytics at 8 Different Companies and 4 Jobs in 6 Years

## Private Security and the Pareto Principle

[This article was first published on R on datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Private Security is a big industry in South Africa. Most Private … Read morePrivate Security and the Pareto Principle

## Predicting What Song Phish Will Play Next with Deep Learning

Phish, Hampton 2018 Phish — an iconic, live rock & roll band, and the world of machine learning…what can they possibly have in common? Unlike the vast majority of music artist’s live performances, for Phish, most events aren’t planned. From the days and hours leading up to the moment that the band steps foot onto … Read morePredicting What Song Phish Will Play Next with Deep Learning

## How the cloud can drive economic growth in APAC (and everywhere)How the cloud can drive economic growth in APAC (and everywhere)Asia Pacific, Google Cloud

Public cloud adoption in the Asia Pacific (APAC) region continues to outstrip the pace of growth in North America and Europe, according to BCG’s “Ascent to the Cloud: How Six Key APAC Economies can Lift-off” report. BCG’s report examines the public cloud’s economic impact in six key APAC markets: Australia, India, Indonesia, Japan, Singapore, and … Read moreHow the cloud can drive economic growth in APAC (and everywhere)How the cloud can drive economic growth in APAC (and everywhere)Asia Pacific, Google Cloud

## New package: GetEdgarData

Example 01 – Apples Quarterly Net Profit The first step in using GetEdgarData is finding information about available companies: library(GetEdgarData) library(tidyverse) my_year <- 2018 type_form <- ’10-K’ df_info <- get_info_companies(years = my_year, type_data = ‘yearly’, type_form = type_form) glimpse(df_info) ## Observations: 450 ## Variables: 13 ## \$ current_name “AIR PRODUCTS & CHEMICALS INC /DE/”, “ALICO … Read moreNew package: GetEdgarData

## Easy introduction to Offensive Programming

[This article was first published on NEONIRA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Many of us are using R in a way or another and we … Read moreEasy introduction to Offensive Programming

## Recognizing Depth in Autonomous Driving

This article will describe some of the state-of-the-art methods in depth predictions in image sequences captured by vehicles that help in the development of new autonomous driving models without the use of extra cameras or sensors. As mentioned in my previous article “How does Autonomous Driving Work? An Intro into SLAM”, there are many sensors … Read moreRecognizing Depth in Autonomous Driving

## Le Monde puzzle [#1114]

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Another very low-key arithmetic problem as Le Monde current mathematical … Read moreLe Monde puzzle [#1114]

## Amazon EC2 Instances are Now Available in South America (Sao Paulo)

The 3 new C5 instance sizes are powered by custom 2nd Generation Intel Xeon Scalable Processors (based on the Cascade Lake architecture) with sustained all-core turbo frequency of 3.6 GHz and maximum single core turbo frequency of 3.9 GHz. C5 instances are optimized for compute-intensive workloads and deliver cost-effective high performance at a low price … Read moreAmazon EC2 Instances are Now Available in South America (Sao Paulo)

## Trends in data science with O’Reilly Media’s Chief Data Scientist

Editor’s note: This is the ninth episode of the Towards Data Science podcast’s “Climbing the Data Science Ladder” series, hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: When I started grad school, data science was only just … Read moreTrends in data science with O’Reilly Media’s Chief Data Scientist

## Latent Semantic Analysis: Distributional Semantics in NLP

Hello Folks! In this article, I will be describing an algorithm used in Natural Language Processing: Latent Semantic Analysis ( LSA ).The major applications of this aforementioned method are wide-ranging in linguistics: Comparing the documents in low-dimensional spaces (Document Similarity), Finding re-curring topics across documents (Topic Modeling), Finding relations between terms (Text Synoymity). Picture Courtesy: … Read moreLatent Semantic Analysis: Distributional Semantics in NLP

## The essential step towards joined human and computational forces

We humans have always been studying from data. However, there is a shift in the availability of knowledge. At this moment, knowledge is everywhere as part of our daily life. We are exposed to an overload of information on topics ranging from the lives of people (close or unknown) to news from all over the … Read moreThe essential step towards joined human and computational forces

## What is Teacher Forcing?

Frequently Asked Questions Q: Since we pass the whole ground truth sequence through the RNN model, is it possible for the model to “cheat” by simply memorizing the ground truth? A: No. At timestep t, the input of the model is the ground truth at timestep t – 1, and the hidden states of the … Read moreWhat is Teacher Forcing?

## How to manage files in Google Drive with Python

As a Data Analyst, most of the time I need to share my extracted data to my product manager/stakeholder and Google Drive is always my first choice. One major issue over here is I have to do it on weekly or even daily basis, which is very boring. All of us hate repetitive tasks, including … Read moreHow to manage files in Google Drive with Python

## upcoming AI-related courses

I forgot to do some marketing for the following upcoming AI-related courses which will be given in Leuven, Belgium by BNOSAC 2019-10-17&18: Statistical Machine Learning with R: Subscribe here 2019-11-14&15: Text Mining with R: Subscribe here 2019-12-17&18: Applied Spatial Modelling with R: Subscribe here 2020-02-19&20: Advanced R programming: Subscribe here 2020-03-12&13: Computer Vision with R … Read moreupcoming AI-related courses

## How IMF applies BERT to gain insights in Article IV reports

Background Just like a doctor who conducts your annual health-check, IMF teams conduct surveillance to our member countries, diagnose their vulnerabilities in various parts of the economy, and advise them on policy-making (fiscal and monetary policy, structural reforms, etc). The entire process is summarized in an “Article IV” report publication for each country. One hidden … Read moreHow IMF applies BERT to gain insights in Article IV reports

## Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We are excited to share a free extract of Zumel, … Read moreFree R/datascience Extract: Evaluating a Classification Model with a Spam Filter

## How To Assess Statistical Significance In Your Data with Permutation Tests

Permuting a color grid means shuffling it! The proportion in the data is the same, but the structure is lost with every new iteration. A permutation test is basically doing what this image is doing, but to our data. We shuffle and mix everything together to get a big pool of data and compare this … Read moreHow To Assess Statistical Significance In Your Data with Permutation Tests

## Microsoft Azure AI hackathon’s winning projects

We are excited to share the winners of the first Microsoft Azure AI Hackathon, hosted on Devpost. Developers of all backgrounds and skill levels were welcome to join and submit any form of AI project, whether using Azure AI to enhance existing apps with pre-trained machine learning (ML) models or by building ML models from … Read moreMicrosoft Azure AI hackathon’s winning projects

## Parsing Sda Pages

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. SDA is a suite of software developed at Berkeley for the … Read moreParsing Sda Pages

## Linear Regression with Gradient Descent from Scratch in Numpy

I strongly advise you to read the article linked above. It will set the foundations on the topic, plus some math is already discussed there. To start out, I’ll define my dataset — only three points that are in a linear relationship. I’ve chosen so few points only because the math will be shorter — … Read moreLinear Regression with Gradient Descent from Scratch in Numpy

## A Full-Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

A quick recap of a community taught ML Engineer’s journey. 15th of October, 2019 marks a special milestone, actually quite a few milestones. So I considered sharing it in the form a blog post, on a publication that has been home to all of my posts 🙂 (Note: This was originally posted in HackerNoon) The … Read moreA Full-Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

## Super Solutions for Shiny Apps #4 of 5: Using R6 Classes

TL;DR Why use object-oriented programming in Shiny applications? It’ll help organizize organize the code in your application!   Organize Your Shiny Code with Object-Oriented Programming Classes are used widely in all R programming — usually the S3 ones. Even if you’ve never heard of them, as an R user you’re for sure familiar with object classes … Read moreSuper Solutions for Shiny Apps #4 of 5: Using R6 Classes

## Simulating data with Bayesian networks

Bayesian networks are really useful for many applications and one of those is to simulate new data. Bayes nets represent data as a probabilistic graph and from this structure it is then easy to simulate new data. This post will demonstrate how to do this with bnlearn. Fit a Bayesian network Before simulating new data … Read moreSimulating data with Bayesian networks

## JAMA retraction after miscoding – new Finalfit function to check recoding

[This article was first published on R – DataSurg, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Riinu and I are sitting in Frankfurt airport discussing the paper … Read moreJAMA retraction after miscoding – new Finalfit function to check recoding

## Finding free science books from Springer

Today the biggest book fair of the world starts again in Frankfurt, Germany. I thought this might be a good opportunity to do you some good! Springer is one of the most renowned scientific publishing companies in the world. Normally, their books are quite expensive but also in the publishing business Open Access is a … Read moreFinding free science books from Springer

## Merge MLP And CNN in Keras

In the post (https://statcompute.wordpress.com/2017/01/08/an-example-of-merge-layer-in-keras), it was shown how to build a merge-layer DNN by using the Keras Sequential model. In the example below, I tried to scratch a merge-layer DNN with the Keras functional API in both R and Python. In particular, the merge-layer DNN is the average of a multilayer perceptron network and a … Read moreMerge MLP And CNN in Keras

## Looking into Natural Language Processing (NLP)

Natural language processing (NLP) is a branch of artificial intelligence. It helps computers understand, interpret and manipulate human text language. Today there are an enormous amount of emails, social media text, video stream, customer reviews, customer support requests, etc. All of these textual data become the perfect place to apply NLP. We need NLP tools … Read moreLooking into Natural Language Processing (NLP)

## EPL Fantasy GW8 Recap and GW9 Algorithm Picks

Our Moneyball approach to the Fantasy EPL (team_id: 2057677) If this is the first time you land on one of my Fantasy EPL Blogs, you might want to check out Part1, Part2, Part3, Part4 and Part5 first to get familiar with our overall approach and the improvements we’ve made over time. My partner in crime … Read moreEPL Fantasy GW8 Recap and GW9 Algorithm Picks

## Fear Tells Us What We Have To Do

My deep learning self-study for 09/30/19–10/07/19 I’m a math lecturer and aspiring data scientist hoping to participate in artificial general intelligence research, and this week I decided to start keeping a weekly blog of what I’ve been doing, both for my own reference and potentially to help others on a similar path, following the advice … Read moreFear Tells Us What We Have To Do

## Are you Bilingual? Be Fluent in R and Python!

If you ask me where to invest your time in R or Python, I will advise to be fluent in both. I cannot tell you which language — Mandarin, English, Hindustani, Spanish, Arabic, Malay, Russian, Greek, or Hindi is superior. Each language has its long history of development and merits — just like R and … Read moreAre you Bilingual? Be Fluent in R and Python!

## Prototyping an anomaly detection system for videos, step by step using LSTM convolutional…

If we want to treat the problem as a binary classification problem, we need labeled data and in this case, collecting labeled data is hard because of the following reasons: Abnormal events are challenging to obtain due to their rarity. There is a massive variety of abnormal events, and manually detecting and labeling such events … Read morePrototyping an anomaly detection system for videos, step by step using LSTM convolutional…

## The top 20 CO₂ polluters, visualized

“The Guardian” had the data and I had a free afternoon Polluting factory — Patrick Hendry, Unsplash “The Guardian” has recently published a list of just 20 companies, who are responsible for a third of all the global energy-related CO₂ emissions since 1965, a year when the political and industry leaders acknowledged that burning fossil … Read moreThe top 20 CO₂ polluters, visualized

## The AI Box Experiment

What a Simple Experiment Can Teach Us About Superintelligence Imagine it’s 2040. After years of research and dedicated programming, you believe you have created the World’s first Artificial General Intelligence (AGI): an Artificial Intelligence (AI) that’s roughly as intelligent as humans are among all their intellectual domains. A superintelligence will find a way to get … Read moreThe AI Box Experiment

## Word cloud 101: The Fundamentals

A Step-By-Step Guide on How and When to Use Them Word clouds are killer visualisation tools. They present text data in a simple and clear format, that of a cloud in which the size of the words depends on their respective frequencies. As such, they are visually nice to look at as well as easy … Read moreWord cloud 101: The Fundamentals

## Selenium Tutorial: Scraping Glassdoor.com in 10 Minutes

I had to scrape jobs data from Glassdoor.com for a project. Let me tell you how I did it… What is Scraping? It’s a method for collecting information from web pages. Why Scraping? Other than the fact that it is fun, Glassdoor’s library provides a limited number of data points. It doesn’t allow you to … Read moreSelenium Tutorial: Scraping Glassdoor.com in 10 Minutes

## On the Automation of Time Series Forecasting Models: Technical and Organizational Considerations.

This post is an elboration on a reply that I originally posted to a question on Cross-Validated (Stackoverflow’s sister site for statistics and data science related topics). The original question was: I would like to build an algorithm that would be able to analyze any time series and “automatically” choose the best traditional/statiscal forecasting method … Read moreOn the Automation of Time Series Forecasting Models: Technical and Organizational Considerations.

## Shh…The Secret to Building Great AI

When you finish a PhD in machine learning, they take you to a special room and explain that great data is way more important than all the fancy math and algorithms you just learned At least, that is how I imagine it. I don’t have a Ph.D., so I couldn’t say for sure, but it … Read moreShh…The Secret to Building Great AI

## Generating MRI Images of Brain Tumors with GANs

The need for more data within the field of artificial intelligence is significant, especially in medical imaging. In order to produce ways in which we can speed up the process of diagnosing certain disorders through AI, we need many data sets with accurate imaging first, so that they can be fed into neural networks accordingly. … Read moreGenerating MRI Images of Brain Tumors with GANs

## Strange Attractors: an R experiment about maths, recursivity and creative coding

[This article was first published on R on Coding Club UC3M, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. by Antonio Sánchez Learning to code can be quite … Read moreStrange Attractors: an R experiment about maths, recursivity and creative coding