Data Science Austria

Latest News about


When Data is Scarce… Ways to Extract Valuable Insights

Getting to know the data We use the pandas library for this, and here is what one of files found in Open Data looks like: Sample of the 1999 Freedom of Information Request File We have 18 files, one for each year, 1999 to 2016, 576 requests in total, and amazingly all

Create basic graph visualizations with SeaBorn

When it comes to data preparation and getting acquainted with data, the one step we normally skip is the data visualization. While a part of it could be attributed to the lack of good visualization tools for the platforms we use, most of us also get lazy at times. For

How a team of deep learning newbies came 3rd place in a kaggle contest

Train model for 5 cycles with learning rate = 1e-2 We’ll train our model for 5 epochs (5 cycles through all our data) using the fit_one_cycle function. Training and validation losses Notice the metrics getting displayed i.e training_loss and valid_loss? We use them to monitor model improvements over time. Our

When Clustering Doesn’t Make Sense

Considerations before clustering Clustering is one of the most widely used forms of unsupervised learning. It’s a great tool for making sense of unlabeled data and for grouping data into similar groups. A powerful clustering algorithm can decipher structure and patterns in a data set that are not apparent to

Classification: A Linear Approach (Part 1)

Attempt #2 — Linear Discriminant Analysis (LDA) Figure 4 — Real dataset (left), LDA fitted dataset (right) Linear Discriminant Analysis (LDA) is an immediate improvement from our first attempt. Figure 4 shows the output from the LDA model on our training set. We no longer exhibit masking and the number of misclassifications have greatly reduced.

Need for Explainability in AI and Robotics

(Source = Introduction Breakthroughs in Artificial Intelligence (AI) taken place during the last few years enabled possibilities for computers to perform tasks that would have been impossible to do using traditional software programming. These advancements are now opening up to us an entirely new world of potential applications for

Which Democratic Candidate Gets the Most News Coverage?

A Data Analysis of the 2020 Presidential Contenders In the 2016 primaries, one key to Donald Trump’s success was his ability to get media attention. By some estimates, the obsessive wall-to-wall coverage of Trump was the equivalent of 2 billion dollars of free advertising¹ for his campaign. In this age

Data Lake: an asset or a liability?

Build it, they will come Building a Data Lake should not be an objective in itself, but should rather be a means to an end; the end being to address digital transformation and data driven initiatives in a company. Yet many IT departments started building Data Lakes because it’s cool and

Survival analysis and the stratified sample

Part II: Case-control sampling and regression strategy Due to resource constraints, it is unrealistic to perform logistic regression on data sets with millions of observations, and dozens (or even hundreds) of explanatory variables. Luckily, there are proven methods of data compression that allow for accurate, unbiased model generation. Traditional logistic