## Performance Metrics in ML - Part 3: Clustering

In the first two parts of this series, we explored the main types of performance metrics used to evaluate Machine Learning models. These covered the two major types of ML tasks, Classification and Regression. While this type of tasks make up of most of the usual applications, another key category exists: Clustering. To read the … Read more

Categories Featured Excerpt

## Image Processing with Python — Using RG Chromaticity

How to use the Gaussian Distribution for Image Segmentation Masked Red Tree (Image by Author) Segmenting images by their color is an extremely useful skill to have. I’ve previously written an article on how to do this via the RGB and HSV color spaces. However, another effective way to segment images based on their color … Read more

## Predicting HDB Prices Using Machine Learning Algorithms(Part 2)

Photo by Galen Crout on Unsplash Part 1: Predicting HDB prices using Neural Networks. Note: Good day everyone, this story is a follow up post from my previous post. Since my previous post, I have been working on few ways to improve the model through feature engineering. I realized that in the dataset there were … Read more

Categories Featured Excerpt

## Simple SVD algorithms

Aim of this post is to show some simple and educational examples how to calculate singular value decomposition using simple methods. If you are interested in industry strength implementations, you might find this useful. Singular value decomposition (SVD) is a matrix factorization method that generalizes the eigendecomposition of a square matrix (n x n) to … Read more

Categories Featured Excerpt

## Functional Modeling and Quantitative System Analysis in Python

We restrict ourselves to the case of unidirectional functional chain without feedback loops. Feedback loops introduce dynamic effects requiring an extension of the pattern — this will be covered in a follow-up article. Consider the following illustrative functional block diagram. Illustrative functional block diagram (image by author). It shows a transformation chain from polar coordinates … Read more

Categories Featured Excerpt

## Simple GPS data visualization using Python and Open Street Maps

The image below shows the goal of this method. There are three main elements that need to be included: Map image — map in some image format like .png, .jpg, etc. GPS records — records that consist of (latitude, longitude) pairs. Geographical coordinates — conversion from pixels to geographical coordinates. The final result of the … Read more

## Predicting Music Genres Using Waveform Features

The dataset was divided into a training set and a test set, so we could score our predictor later on data that was not used in the training process. All the data exploration will be done using the training data, as we keep our test set as unseen data. Now, we will take a closer … Read more

Categories Featured Excerpt

## When Did the US Senate Best Reflect the US Population?

The data for this analysis will come from two primary sources. Information on the US Senators will come from the same ProPublica Congress API as the original visualization. Information on the US Population Age Distribution will come from a variety of source from the US Census Bureau. Setting up the libraries While the workhorse functions … Read more

Categories R Tags Excerpt

## Shiny 1.6

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We are thrilled to announce that Shiny 1.6.0 is now on CRAN! … Read more

Categories R Tags Excerpt

## Enjoy More Rstudio::global(2021)

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. It’s been just over a week since we wrapped up the final … Read more

Categories R Tags Excerpt

## All You Need To Know About Merging (Joining) Datasets in R

Merging—also known as joining—two datasets by one or more common ID variables (keys) is a common task for any data scientist. If you get the merge wrong you can create some serious damage to your downstream analysis so you’d better make sure you’re doing the right thing! In order to do so, I’ll walk you … Read more

Categories R Tags Excerpt

## You write R packages and functions? This package will change your life!

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. What is it? We are talking about the insight package. It … Read more

Categories R Tags Excerpt

## HAPPY KAGGLING | Data Scientist’s Competition

Kaggle, a popular platform for data science competitions, can be intimidating for beginners to get into. After all, some of the listed competitions have over \$1,000,000 prize pools and hundreds of competitors. Top teams boast decades of combined experience, tackling ambitious problems such as improving airport security or analyzing satellite data. It’s no surprise that … Read more

Categories Featured Excerpt

## Creating beautiful maps with Python

I always liked city maps and a few weeks ago I decided to build my own artistic versions of it. After googling a little bit I discovered this incredible tutorial written by Frank Ceballos. It is a fascinating and handy tutorial, but I prefer a more detailed/realistic blueprint maps. Because of that, I decided to … Read more

Categories Featured Excerpt

## iPad Pro + Raspberry Pi for Data Science Part 1: First Time OS Initialization

Raspberry Pi + iPad Getting you going for iPad + Pi work from unboxing to OS initialization! A few weeks ago, I wrote a post about the state of being able to perform data science activities on an iPad Pro. While I was truly surprised to see how far Apple has come in recent years, … Read more

Categories Featured Excerpt

## Everything You Ever Wanted to Know About Decision Trees in Python

From first principles to deployment in a production environment including worked examples and easy-to-follow explanations Photo by Shahadat Rahman on Unsplash I have come across many articles on decision tree machine learning algorithms in Python across various mediums but they have always left me wanting more. They either seem to leap in part-way through the … Read more

Categories Featured Excerpt

## Let’s Put Soft Skills on the Spot for a Little While

It is time to close the gap between technical and non-technical people! Master these five soft skills to shine as a data scientist Photo by Farnoosh Abdollahi on Unsplash Many organizations do not experience full gains from introducing data science methods. This is not because the technology is unmatured. It is simply because the gap … Read more

Categories Featured Excerpt

## Displaying Logging While Drilling (LWD) Image Logs in Python

Utilizing the power of matplotlib to display wellbore image data Logging While Drilling image data displayed using matplotlib in Python. Image created by the author. Borehole image logs are false-color pseudo images of the borehole wall generated from different logging measurements/tools. How borehole images are acquired differs between wireline logging and logging while drilling (LWD). … Read more

## Getting data from the Canada Covid-19 Tracker using R

Last semester (Fall 2020) I taught a new course in healthcare data science for the Johnson Shoyama Graduate School in Public Policy. One of the final topics of the course was querying application programming interfaces (APIs) from within R. The example we used was querying data on the Covid 19 pandemic from the Covid-19 Tracker … Read more

Categories R Tags Excerpt

## Video Tutorial: Build a Video Game in R Shiny with Appsilon’s Pedro Silva

A Video About Building a Video Game in Shiny Did you know you can use R and Shiny to build video games? Well, don’t expect to create the next GTA in R, but you can still develop simple, enjoyable, and easy to play games. Our engineer Pedro Silva used Shiny to create a game called Shiny Decisions, which was … Read more

Categories R Tags Excerpt

## Implementing Random Forest

If you don’t consider runtime, building more trees might help solve this problem a bit, as each tree has another set of randomly selected features. However, this method is not recommended, because Random Forest is prone to sparsity/density by design. If you prefer using tree algorithms, XGBoost is insensitive to sparse/dense data and worth trying, … Read more

Categories Featured Excerpt

## Mastering Git Commands & the logic behind Git

Part IV: Branch Management & Pull Request In this part, we will see how to create new branches and we also see some command like git log for the specific branch, git checkout git switch and git branch. We will see how to remove branches locally and in the remote repository. Now, imagine that as … Read more

Categories Featured Excerpt

## How to Speed up Your K-Means Clustering by up to 10x Over Scikit-Learn

Using the Faiss library Chire, CC BY-SA 4.0, via Wikimedia Commons K-Means Clustering is one of the most well-known and commonly used clustering algorithms in Machine Learning. Specifically, it is an unsupervised Machine Learning algorithm, meaning that it is trained without the need for ground-truth labels. Indeed, all you have to do to use it … Read more

Categories Featured Excerpt

## CART: Classification and Regression Trees for Clean but Powerful Models

Machine Learning How does the CART algorithm work, and how to successfully use it in Python? CART model prediction surface. See how the chart was made in the Python section at the end of this story. Image by author. If you want to be a successful Data Scientist, it is essential to understand how different … Read more

## Use R To Pull Energy Data From The Department of Energy’s EIA API

Now that we have our API key and the Series IDs, we can write the R code to access the data. First, import the necessary libraries. We need to use the httr and jsonlite libraries. #Import librariesinstall.packages(c(“httr”, “jsonlite”))library(httr)library(jsonlite) Now, paste your API key into the code. Then paste in the series IDs you want to … Read more

Categories Featured, R Excerpt

## Must-read Guide to Hypothesis Tests You Will Never Use

Hypothesis Testing Pipeline So far, we have talked about the first two steps of hypothesis testing: setting up the null and alternative identify error types and set a significance threshold Now, we will look at a simple scenario using Python code. Below, we have the tips dataset from Seaborn which contains 244 records of clients … Read more

## BachGAN: Using GANs to generate original Baroque Music

With playable audio files to listen to the generated music Photo by Marius Masalar on Unsplash GANs are highly versatile, allowing for the generation of anything that can be synthesized into images. By utilizing this feature of GANs, it is possible to generate very unorthodox content, at least from the perspective of machine learning. This … Read more

Categories Featured Excerpt

## Analysing Passing in the Top Five European Domestic Leagues

First, we turn our attention to the leagues that attempt the most number of passes per 90 minutes of football. Other than when Serie A had the highest passing attempts in the 2017–18 season, the German Bundesliga has reigned supreme when it comes to the number of passes attempted, with the current season the highest … Read more

Categories Featured Excerpt

## Message queues for data UI

IPC to simplify medium and large applications development Image prepared by the author. All rights reserved. Introduction Data Science visualisation normally gravitates around displaying individual charts and graphs. Therefore, there is a lot of material covering graphic libraries and frameworks to generate charts. Most advanced charting and plotting libraries allow interaction with data, but normally … Read more

Categories Featured Excerpt

## Time Series Demystified

Components of Time Series Time series can be decomposed into four components, each expressing a particular aspect of the movement of the values of the time series. They are: Trend, Seasonality, Cycles, Irregularities Time Series components – Image by Author Seasonal and Cyclic Variations are the short-term fluctuations, whereas the trend is long-term movements and … Read more

Categories Featured Excerpt

## Correlation Analysis in R, Part 2: Performing and Reporting Correlation Analysis

[This article was first published on Data Enthusiast’s Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the second part of the Correlation Analysis in R … Read more

Categories R Tags Excerpt

## Productivity Tip: Adding Jupyter and Anaconda prompts to Windows’ right-click context menu

After searching for the possibility online, I discovered that it’s not so cut and dry to do, but is not that complicated either, so bear with me! The first thing to do is discover the PATH to your Anaconda installation. If you used the default location during the installation process it should be located somewhere … Read more

Categories Featured Excerpt

## What is Data Condensation?

The topic of data-efficient learning an important topic in Data Science and is an active area of research. Training large models on big data could take a lot of time and resources, so the question is can we replace a large data set with a smaller one, that will nevertheless contain all useful information from … Read more

Categories Featured Excerpt

## Use R to Exploit Unexplored Data Territories!

Source: https://unsplash.com/@andrewtneel A case study on how to use R to collect data from outside sources. Let’s assume for a project you need data about your customers’ socioeconomic background such as the average income of the neighborhood where they live, education level, employment level, and so on. Typically such data are made available by some … Read more

Categories Featured Excerpt

Decision Intelligence is the Missing link in most Projects Digital transformation is the flavor of the season. Every company has accelerated its efforts to digitize operations, gather intelligence, and rapidly respond to a changing market. McKinsey senior partner Kate Smaje says that organizations are now accomplishing in 10 days what used to take them 10 … Read more

Categories Featured Excerpt

## Eyes on RT-PCR tests with echarts and french open data — COVID-19

[This article was first published on Guillaume Pressiat, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet Data on COVID-19 screening tests for France ; a shared dashboard … Read more

Categories R Tags Excerpt

## LogicGamesSolver— How to solve logic games using Computer Vision and Artificial Intelligence

So, here we are! We have all the elements to solve the game. Like many others logic puzzle games, Sudoku, Star Battle and Skyscrapers can be described as Constraints Satisfaction Problems³. A CSP consists of three elements: A set of variables of which we want find the right value A domain of possible values for … Read more

Categories Featured Excerpt

## Shinyapp to monitor Covid-19 cases, deaths, recoveries and vaccinations

[This article was first published on Statistics & R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet The data pertaining to cases, deaths and recoveries is pooled … Read more

Categories R Tags Excerpt

## Parsing portfolio optimization

[This article was first published on R on OSM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Our last few posts on risk factor models haven’t discussed how … Read more

Categories R Tags Excerpt

## Value-based Methods in Deep Reinforcement Learning

Deep Reinforcement learning has been a rising field in the last few years. A good approach to start with is the value-based method, where the state (or state-action) values are learned. In this post, a comprehensive review is provided where we focus on Q-learning and its extensions. unsplash There are three types of common machine … Read more

Categories Featured Excerpt

## Rolling Regression and Pairs Trading in R

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In a previous post, we have provided an example of … Read more

Categories R Tags Excerpt

…is like driving an old car Photo by Ralph (Ravi) Kayden on Unsplash The algorithm to be discussed will allow you to routinely reach lower cost function values by a factor of about 10¹⁰ compared to the Adam optimization algorithm. Keep on reading to see why it’s like “driving an old car”! There are many … Read more

Categories Featured Excerpt

## Replicate Avro Messages To Target, Conflicting Schema Exists On Target Schema Registry under same subject

This is a follow-up article based on this, where we discussed what to expect when replicator tries to copy topic with Avro messages to target but target schema registry already have same schema ID (which is embedded into messages) residing in different subject and that schema object is completely different with what it had on … Read more

Categories Featured Excerpt

## Practical experimentation tips using the Robinhood/GME fiasco

tl;dr You don’t need (and probably can’t use) an A/B test to know that Robinhood churned its user base by restricting GME trading. I’m going to use the recent Robinhood/GME fiasco as a hypothetical example in sharing a couple of practical ideas around experimentation I’ve picked up over the years. Disclaimer: this is obviously hypothetical. … Read more

Categories Featured Excerpt

## 8 Fundamental Statistical Concepts for Data Science

Probability, in simple terms, is the likelihood of an event occurring. In statistics, an event is the outcome of an experiment which could be something like the rolling of a dice or the results of an AB test. Probability for a single event is calculated by dividing the number of events by the number of … Read more

Categories Featured Excerpt

## Two new versions of gratia released

While the Covid-19 pandemic and teaching a new course in the fall put paid to most of my development time last year, some time off work this January allowed me time to work on gratia 📦 again. I released 0.5.0 to CRAN in part to fix an issue with tests not running on the new … Read more

Categories R Tags Excerpt

## 4 Must-Know Properties of Databases

How to make a database ACID-compliant Photo by Code Mnml on Unsplash Everything about data science starts with data. Without proper and accurate data, data science is like a luxury car with no gas. A well-maintained, easily accessible, scalable, and hard-to-fail database is essential to provide access to data. In order to make sure a … Read more

Categories Featured Excerpt

## Why Data Analysts Should Apply to Data Scientist Jobs

Trends versus predictions Data analysts use data at an aggregate level to find trends and provide recommendations to improve business performance. Data scientists will use data in machine learning models to predict an event typically at a customer level. Data analysts look at the past to find trends while data scientists use the past to … Read more

Categories Featured Excerpt

## Large-Scale Analysis of On-line Conversation about Vaccines before COVID-19

Twitter and news sources played a role in the pre-pandemic world Do you remember the old days of anti-vaccination debates? Will it affect today’s attitude? Data can expose it all Photo by Mehmet Turgut Kirkgoz on Unsplash Discussion over the role and need of vaccines has never been so strong. The COVID-19 pandemic has changed … Read more

Categories Featured Excerpt

## Two Advanced Tips for Event Logs in Power BI

Getting quick answers from logs without having to deploy a server It’s an old trope, but the companies of the world are sitting on a goldmine of insights locked away in data that no-one has ever looked at. This hidden data is nowhere more common than in server or machine event logs and over the … Read more

Categories Featured Excerpt