Data Science Austria

Modeling Price with Regularized Linear Model & Xgboost

Developing statistical models for predicting individual house prices We would like to model the price of a house, we know that the price depends on the location of the house, square footage of a house, year built, year renovated, number of bedrooms, number of garages, etc. So those factors contribute to … Read moreModeling Price with Regularized Linear Model & Xgboost

Spark Joy — Saying Konmari to your event logs with grammar of data manipulation

When you have a tonne of event logs to parse, what should the go to weapon of choice be? In this article I’ll share with you my experience with using spark/sparklyr to tackle this nasty problem. At Honestbee ?, event logs are stored in AWS S3 buckets, delivered to us … Read moreSpark Joy — Saying Konmari to your event logs with grammar of data manipulation

Essentials of Hypothesis Testing and the Mistakes to Avoid

Hypothesis testing is the bedrock of the scientific method and by implication, scientific progress. It allows you to investigate a thing you’re interested in and tells you how surprised you should be about the results. It’s the detective that tells you whether you should continue investigating your theory or divert … Read moreEssentials of Hypothesis Testing and the Mistakes to Avoid

REDDIT: A one word reason why I support OpenAI’s GPT-2 decision

TLDR: They seeded their webscrape via REDDIT, the mother lode of all ideas tinderboxy and weaponizable. So, it will at the very least be a PR disaster if they release the bigger model. The smaller 117M is nasty as is sans the subtleties! Table of Contents Intro: One paper that I … Read moreREDDIT: A one word reason why I support OpenAI’s GPT-2 decision

Reinforcement Learning from Scratch: Simple Application and Evaluating Parameters in Detail

Introducing RL Algorithms We have introduced episodes and how to choose actions but we have yet to demonstrate how and algorithm uses this to learn the best actions. Therefore, we will formally define our first RL algorithm, Temporal Difference 0. Temporal Difference — Zero Temporal Difference λ are a family of algorithms … Read moreReinforcement Learning from Scratch: Simple Application and Evaluating Parameters in Detail

My journey to Performance Analysis 2/2 (HAR files)

This article is to explain mostly the second type of performance analysis I have performed recently. It is a follow up from this article. This article covers mostly the analysis of HAR files and what type of KPI you can retrieve from them. What are HAR files ? Following the W3C github … Read moreMy journey to Performance Analysis 2/2 (HAR files)

Building a WiFi spots Map of networks around you with WiGLE and R

It’s always fun to explore the world around us — that’s even more fun when you get to explore the world with the Vision of a Data Scientist. In this analysis, We are going to identify the open WiFi Networks around us and map them on an interactive map . Toolkit wiglr R … Read moreBuilding a WiFi spots Map of networks around you with WiGLE and R