Time series anomaly detection with “anomalize” library

Like in any other machine learning algorithm, preparing data is probably the most important step you can take towards anomaly detection. On the positive side though, you’ll likely use only one column at a time. So unlike hundreds of features in other machine learning techniques, you can focus on only one column that is being … Read more Time series anomaly detection with “anomalize” library

Useful sites for finding datasets for Data Analysis tasks

Let’s now look at some of the useful sites for finding open and publicly available datasets, quickly and without much hassle. Screenshot of the Google Dataset Search page (Image by Author) Google Dataset Search is a search engine dedicated to finding datasets. It is a search engine over metadata from data providers. This implies that … Read more Useful sites for finding datasets for Data Analysis tasks

Bias, Variance and How they are related to Underfitting, Overfitting

In the first image, we try to fit the data using a linear equation. The model is rigid and not at all flexible. Due to the low flexibility of a linear equation, it is not able to predict the samples (training data), therefore the error rate is high and it has a High Bias which … Read more Bias, Variance and How they are related to Underfitting, Overfitting

Top 10 Libraries every Java Developer should know

A curated list of the essential Java libraries in Java and JVM software development Photo by Min An from Pexels Java is the number one programming language in Business Application development. It is also one of the top programming languages. One of the key features of Java is that it has a feature-rich and vast … Read more Top 10 Libraries every Java Developer should know

Will The Next Hurricane Hit My Home?

A Data Analysis Based On Historical Storm Trajectories Photo by Shashank Sahay on Unsplash We’re in the midst of a very active hurricane season with hurricane Sally making landfall last night and several other tropical storms brewing in the Atlantic Ocean. The big question on everyone’s mind is always: “Will the next hurricane hit close … Read more Will The Next Hurricane Hit My Home?

Will AutoML take away my job? What is it?

Photo by Alexandre Debiève on Unsplash The development of a model involves a lot of repetitive and tedious tasks inside the Model Development Life Cycle(MDLC), such as tuning the hyper-parameters, generating and selecting features. These tasks consume a lot of time during the development as they are iterative and various permutations and combinations have to … Read more Will AutoML take away my job? What is it?

Continual learning — where are we?

“Continuous learning ability is one of the hallmarks of human intelligence.” — Lifelong Machine Learning As the deep learning community aims to bridge the gap between human and machine intelligence, the need for agents that can adapt to continuously evolving environments is growing more than ever. This was evident at the ICML 2020 which hosted … Read more Continual learning — where are we?

My Odyssey, Finding The Most Popular Python Function

We all love Python, but how often do we use which mighty functionality? An article about my quest to figure it out The most mentioned Python functions mentioned inside Pythonrepositories calculated via GitHub commits. Image by Author The other day while I was running some zip() with some lists through a map(). I couldn’t stop … Read more My Odyssey, Finding The Most Popular Python Function

Oh, the Places You’ll Go in Monopoly

This updated equation, which fully describes the probabilities of moving between any two spaces on the board, is fairly easy to solve, since all terms for P(R) and P(M|R) have been determined earlier (shown in Figure 2 and Table 1). Transition matrix For any given space i, there are a total of 40 different destinations … Read more Oh, the Places You’ll Go in Monopoly

Getting Started with Python Classes

In computer programming, classes are a convenient way to organize data and functions such that they are easy to reuse and extend later. In this post, we will walk through how to build a basic class in python. Specifically, we will discuss the example of implementing a class that represents instagram users. Let’s get started! … Read more Getting Started with Python Classes

Gold-Mining Week 2 (2020)

[This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Favorite

The tale of Ultra Modern Visualizations – Sankey chart

Let’s dive into exploring the use case of Sankey Charts in this series of Advanced Visualisation Techniques for Data Science. Data Science has been gaining momentum over the past couple of years. It’s undoubtedly one of the hottest fields in today’s time. In this article, I am going to discuss about an essential part of … Read more The tale of Ultra Modern Visualizations – Sankey chart

Amazon Kinesis Data Analytics is now available in the Europe (Milan) AWS region

Amazon Kinesis Data Analytics is the easiest way to transform and analyze streaming data in real time with Apache Flink. Apache Flink is an open source framework and engine for processing data streams. Amazon Kinesis Data Analytics reduces the complexity of building and managing Apache Flink applications. Amazon Kinesis Data Analytics for Apache Flink integrates … Read more Amazon Kinesis Data Analytics is now available in the Europe (Milan) AWS region

D3.js — How to Make a Beautiful Bar Chart With The Most Powerful Visualization Library

First things first — the data. We’ll store some dummy data in the JSON format. I’ve named mine sales.json and it looks like this: [{“Period”: “Q1–2020”, “Amount”: 1000000},{“Period”: “Q2–2020”, “Amount”: 875000},{“Period”: “Q3–2020”, “Amount”: 920000},{“Period”: “Q4–2020”, “Amount”: 400000}] And that’s it for the data. Next, we need the HTML file. Don’t worry if you don’t know … Read more D3.js — How to Make a Beautiful Bar Chart With The Most Powerful Visualization Library

Your ML Algorithm Is Not Performing Well

Photo by Rob Schreckhise on Unsplash How to Detect the Problem We spend so much time developing a machine learning algorithm. But after deploying if that algorithm performs poorly, that becomes frustrating. The question is what is the next step if the algorithm does not work as expected. What went wrong? Was the number of … Read more Your ML Algorithm Is Not Performing Well

Learning Data Science with RStudio Cloud: A Student’s Perspective

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. On August 5, 2020, RStudio announced the general availability of RStudio Cloud, … Read more Learning Data Science with RStudio Cloud: A Student’s Perspective

High School Swimming State-Off Tournament Championship California (1) vs. Texas (2)

Two swimmers come into this meet on Swimmer of the Meet streaks. Lillie Nordmann has won two in row for the Texas girls, and Zoie Hartman has won two in a row for the California girls. Swimmer of the Meet criteria is still the same as it’s been for the entire State-Off. We’ll look for … Read more High School Swimming State-Off Tournament Championship California (1) vs. Texas (2)

Improving a Famous NFL Prediction Model

You should now have a higher quality understanding of how the FiveThirtyEight model works. It is a simple but effective model to predict the outcome of games, win probabilities, and their point spreads. However, the model can be improved even further, and improve it I did! I have found other adjustments that the people at … Read more Improving a Famous NFL Prediction Model

Find Highly Correlated Stocks with Python!

Whether you are crafting a portfolio and want to incorporate diversification or trying to find stocks for a Pairs Trading strategy, the ability to calculate the correlation between the movement of two stocks is a must. Having a portfolio of stocks that are not closely correlated allows you to tap into different performing assets that … Read more Find Highly Correlated Stocks with Python!

Export data from Cloud SQL without performance overheadExport data from Cloud SQL without performance overheadProduct Manager, Google Cloud Platform

While there are a variety of reasons to export data out of your databases – such as to maintain backups, meet regulatory data retention policies, or feed downstream analytics – exports can put undue strain on your production systems, making them challenging to schedule and manage. To eliminate that resource strain, we’ve launched a new … Read more Export data from Cloud SQL without performance overheadExport data from Cloud SQL without performance overheadProduct Manager, Google Cloud Platform

Risk Scoring in Digital Contact Tracing Apps

Abstract: We attempt a mathematical description of the risk scoring algorithm used in the Google and Apple exposure notification framework (v1). This algorithm is used in many digital contact tracing apps for COVID-19, e.g., in Germany or Switzerland. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of … Read more Risk Scoring in Digital Contact Tracing Apps

How to Fail The Azure Fundamentals Certification

The amount of time you need to prepare for the exam will vary significantly. If you’re a senior Cloud engineer who’s transferring from AWS to Azure, you probably don’t need more than a day at tops. If you’re wondering why Microsoft offers a certification covering those fluffy white things in the sky, give yourself a … Read more How to Fail The Azure Fundamentals Certification

Association Rule Mining: What Frequent Itemsets is all about?

Photo by Kelly Sikkema on Unsplash Finding the frequency of occurrence of unique combinations of items To understand frequent itemsets one first needs to understand frequent and itemsets. Let us first look at what itemsets mean. simply put itemsets are the group of items that appear together in a transaction or record. The size of … Read more Association Rule Mining: What Frequent Itemsets is all about?

Ultimate Guide to Merging/Joining Data in Pandas

Introduction The goal of this article is that you come away with a strong knowledge of combining data in pandas using precise methods suited for any question you want to ask about your data. With each data science project or dataset, you want to perform several analyses and create plots to find insights. Often, the … Read more Ultimate Guide to Merging/Joining Data in Pandas

Azure Container Instances – Docker integration now in Docker Desktop stable release

We’re happy to announce the new stable release of Docker Desktop includes the Azure Container Instances – Docker integration. Install or update to the latest release and get started deploying containers to Azure Container Instances (ACI) today. Azure Docker integration The Azure Docker integration enables you to deploy serverless containers to Azure Container Instances (ACI) … Read more Azure Container Instances – Docker integration now in Docker Desktop stable release

Build a scalable security practice with Azure Lighthouse and Azure Sentinel

The Microsoft Azure Lighthouse product group is excited to launch a blog series covering areas in Azure Lighthouse where we are investing to make our service provider partners and enterprise customers successful with Azure. Our first blog in this series covers a top area of consideration for companies worldwide—Security with focus on how Azure Lighthouse … Read more Build a scalable security practice with Azure Lighthouse and Azure Sentinel

Distributed Deep Learning Training with Horovod on Kubernetes

You may have noticed that even a powerful machine like the Nvidia DGX is not fast enough to train a deep learning model quick enough. Not mentioning the long wait time just to copy data into the DGX. Datasets are getting larger, GPUs are disaggregated from storage, workers with GPUs need to coordinate for model … Read more Distributed Deep Learning Training with Horovod on Kubernetes

Understanding Signals. It’s not that complicated.

Sound is a wave that results from the back and forth vibration of the medium particles through which the sound wave moves. These sound waves consist of a repeating pattern of high-pressure and low-pressure regions. They are also referred to as pressure waves. An example of a sine wave. When we hear something, our brain … Read more Understanding Signals. It’s not that complicated.

Best places for new businesses in Florianópolis, Brazil (North Shore): a Foursquare data analysis

Which opportunities could the data show? Introduction The Background Brazil is known for Rio de Janeiro or São Paulo. But one of the most southern states os Brazil has a peculiar capital on an island. Or most of the capital is on this island. I’m talking about Florianópolis. Different from the idyllic idea of an … Read more Best places for new businesses in Florianópolis, Brazil (North Shore): a Foursquare data analysis

How to Build a Machine Learning Model to Identify Credit Card Fraud in 5 Steps

When starting a new modeling project, it is important to start with EDA in order to understand the dataset. In this case, the credit card fraud dataset from Kaggle contains 284,807 rows with 31 columns. This particular dataset contains no nulls, but note that this may not be the case when dealing with datasets in … Read more How to Build a Machine Learning Model to Identify Credit Card Fraud in 5 Steps

Deutsche Bӧrse Group continues its journey to the cloudDeutsche Bӧrse Group continues its journey to the cloudManaging Director, Google Cloud DACHGeneral Manager, Google Cloud Compute

The word “transformation” brings many things to mind, like innovation, agility, and change. Consistency and stability are probably not as high on the list of synonyms, but for regulated industries undergoing digital transformation initiatives, those characteristics are just as critical—in fact, they’re critically important for digital transformation to succeed. Deutsche Bӧrse Group, an international financial … Read more Deutsche Bӧrse Group continues its journey to the cloudDeutsche Bӧrse Group continues its journey to the cloudManaging Director, Google Cloud DACHGeneral Manager, Google Cloud Compute

Automate Data Preparation using Google Colab: Read and Process Citi Bike Data in Zip File

Photo by Anthony Fomin on Unsplash From time to time I get requests from colleagues to process some large data files and report some statistics from the data. Since they rely on Excel as their main data processing/analysis tool and don’t use Python, R or SQL, reading and processing data files with more than 1,048,576 … Read more Automate Data Preparation using Google Colab: Read and Process Citi Bike Data in Zip File

Building a Command Line Application to Check For Open Source Vulnerabilities

Vulnerabilities in open source software, programming languages or projects is a big thing as a single exploit could cause a lot of chaos and lead to the loss of thousands of dollars for big organizations. A lot of companies have been paying attention to vulnerabilities in software, dependencies, and languages they use in powering their … Read more Building a Command Line Application to Check For Open Source Vulnerabilities

Active and Semi-Supervised machine learning: Aug 31 — Sep 11

Explainable AI is a big thing nowadays. In ALEX: Active Learning based Enhancement of a Model’s Explainability, the authors use a novel kind of query strategy: prioritization of instances that are “difficult to explain”. (They use the SHAP framework to determine the latter.) Their goal is to arrive at a classifier that is optimized for … Read more Active and Semi-Supervised machine learning: Aug 31 — Sep 11

Predicting Poetic Movements

Analyzing and categorizing poetry to prepare for content-based recommendation Image by abi ismail on Unsplash Within written media, poetry is often regarded as enigmatic, frivolous, or too niche. As a result, poems (even by established poets) are often overlooked by larger publishers and literature-focused websites alike. (The anti-capitalist nature of poetry may play a role … Read more Predicting Poetic Movements

4.5 years of a relationship, in Facebook activity

Analysing data from Facebook interactions and messages with my girlfriend I recently downloaded all the data Facebook has about me. There were many interesting (read: cringeworthy) things I found, but one area I was particularly keen to examine is how the data reflects the progression of my relationship with my girlfriend. In this article, I’m … Read more 4.5 years of a relationship, in Facebook activity

Machine Learning Tasks on Graphs

A graph is an interesting type of data. We could’ve thought that we can make predictions and train the model in the same way as with “normal” data. Surprisingly, machine learning tasks are defined much differently on graphs and we can categorize it into 4 types: node classification, link prediction, learning over the whole graph, … Read more Machine Learning Tasks on Graphs

Learn to Create a Doodle Draw Game on Android

With PyTorch and Deep Java Library QuickDraw Dataset (Sourcehttps://github.com/googlecreativelab/quickdraw-dataset/blob/master/preview.jpg) The objective of a doodle draw game is to race to create a drawing of a particular item, like a house or a cat, as fast as you can. While the drawing part is simple, before the advent of deep learning it would have been impossible … Read more Learn to Create a Doodle Draw Game on Android

The Most Efficient Way to Read Code Written by Someone Else

As developers, regardless of our specialty, whether it being data science, front end, or back end, we spend more than 75% of our time reading code written by others. That task can be such a demanding task. That being said, the ability to read others’ code efficiently is one of the skills that could make … Read more The Most Efficient Way to Read Code Written by Someone Else

Predicting pneumonia outcomes: Results (using DataRobot API)

Performance of models The AUC was high and similar for all models.For this study, it was more important to identify as many patients who became worse despite seeing a doctor (i.e. true positive). Identification of these patients with poor outcomes would allow better intervention to be provided to increase their chances of a better clinical evolution. … Read more Predicting pneumonia outcomes: Results (using DataRobot API)

How to Explore Data: {DataExplorer} Package

Let’s get started by loading our packages and importing a bit of data. 2.1 Load Packages # Core Packages library(tidyverse) library(tidyquant) library(recipes) library(rsample) library(knitr) # Data Cleaning library(janitor) # EDA library(skimr) library(DataExplorer) # ggplot2 Helpers library(scales) theme_set(theme_tq()) 2.2 Import Data For our case-study we are using data from the Tidy Tuesday Project archive. Each record … Read more How to Explore Data: {DataExplorer} Package