What is sound?

Machine Learning and data analysis on sound is a growing domain with a huge potential for Data Science use cases. In this article, you will learn what sound is and how it can be represented on a computer. You will also learn what a Fourier Transform is and how it works. You will then see … Read more

Python 3.10 — Five New Features And Considerations

Not only listing but also the examples and considerations. A few days ago, Python 3.10 has released eventually. There are already many articles online that were even published before it was released. However, I found most of those are just listing the new features without too much discussion. Therefore, in my article, I’ll try to … Read more

Why does gradient descent work?

You might understand it, but have you seen it? I) Why bother First, let’s get what might be the elephant in the room for some out of the way. Why should you read this blog? First, because it has awesome animations like figure 1 below. Don’t you want to know what’s going on in this … Read more

More Fun with Codex and COBOL

image — shutterstock Codex generates Python and JavaScript from (slightly) more complex COBOL Since I got access to OpenAI’s Codex I have been trying to see what it can do and looking for some experiments that haven’t already been explored. I wrote an article and published a video about some basic tests I did getting … Read more

Reduced-Rank Vector Autoregressive Model for High-Dimensional Time Series Forecasting

Nowadays, with the remarkable development of data collection/availability techniques, we have more opportunities to approach many kinds of time series data in a lot of scientific and industrial fields. There are many types of time series data, including univariate time series, multivariate time series, and multidimensional time series. For multivariate time series, the data has … Read more

Avoiding Data Leakage in Timeseries 101

In what follows next, I will be trying to standardize some practices which can be used to avoid this trap. Some of the practices detailed here are very commonly accepted among ML practitioners and I will mention those for the sake of completeness. Split first, normalize later. This practice is well known among the ML … Read more

Production Planning and Resource Management of Manufacturing Systems in Python

First, let’s flush out the case study we will be using for this exercise. In this application, we will be discussing biologics manufacturing within pharmaceuticals. Many vaccines such as those for COVID manufactured today are derived from microorganisms modified to produce the bulk material eliciting an immune response necessary to prevent disease transmission and illness. … Read more

Optimal Control for Robotics: Part 2

TOPP-CO When we talked about optimal control problems, we until now evoked two results, the Euler-Lagrange algorithm (leading to the calculus of variations theory), and Pontryagin’s maximum principle. In these two cases, the goal is to transform the problem formulation into something more tractable. These methods can be referred to as indirect methods. Having methods … Read more

Why you should use Bayesian Neural Network?

So now you are able to distinguish SNN and BNN and know the difference between them. As mentioned, BNN is used to measure the uncertainties of the model. In fact, there are two types of uncertainties. Aleatory Uncertainty Aleatoric uncertainty is also known as statistical uncertainty. In Statistics, it is representative of unknowns that differ … Read more

9 Clean Code Patterns I wish I knew earlier

Photo by Ruthson Zimmerman on Unsplash Every programming language has a syntax that you need to follow, otherwise, it will not work. And then there are conventions. You don’t have to follow them; it will still work. However, it makes the life of others way easier if you do follow them. One of the simplest … Read more

How to Use Abstract Classes in Python

We can create an abstract class by inheriting from the ABC class which is part of the abc module. from abc import (ABC,abstractmethod,)class BasicPokemon(ABC):def __init__(self, name):self.name = nameself._level = 1 @abstractmethoddef main_attack(self):… In the code above, we create a new abstract class called BasicPokemon. We indicate that the method main_attack is an abstract method by … Read more

How to Combine Data in Pandas — 5 Functions You Should Know

1. concat The concat function is named after concatenation, which allows you to combine data side by side horizontally or vertically. When you combine data that have the same columns (or most of them are the same, practically), you can call concat by specifying axis to 0, which is actually the default value too. >>> … Read more

Causal Inference

Answering causal questions with Python This is the second post in a series of three on causality. In the last post I introduced this “new science of cause and effect” [1], and gave a flavor for causal inference and causal discovery. In this post we will dive further into some details of causal inference and … Read more

How I Doubled My Salary in 4 Months

Photo by Rich Tervet on Unsplash Coding Prep Leetcode is a reliable resource to hone and touch up your coding interview skills. It has a huge bank of questions, and the questions are tagged by topic and by company. I did not want to study aimlessly on leetcode. I had to strategize in how I … Read more

Overcoming ImageNet dataset biases with PASS.

The PASS paper on arXiv | Source: https://arxiv.org/pdf/2109.13228.pdf What is PASS? PASS stands for Pictures without humAns for Self-Supervision. It is a large-scale unlabelled image dataset created to alleviate ethical, legal, and privacy concerns around the famous Imagenet dataset. Issues with ImageNet Some of the current issues with ImageNet are: Issues with ImageNet | Image … Read more

The Life of a Data Analyst

This is definitely the biggest struggle for most projects dealing with a large amount of data: the data everyone wants is not easily available. That’s part of the reason why data engineers and data analysts exist, however, sometimes it would be nice if a simple query — select * from… — was all it took … Read more

Visual Studio Code for Python and Data Science? Top 3 Plugins You Must Have

Are you struggling to find an optimal code editor for Python programming and data science? You’re not alone. There’s a ton of options to choose from — both free and paid — and today I’ll show you my favorite free one. It’s Visual Studio Code — a completely free code editor from Microsoft. It’s by … Read more

5 Data Science Podcasts To Follow in 2022

Hear the voices from within the world of data science. Photo by Lee Campbell on Unsplash Data science is not something you can learn from listening to podcasts. However, what podcasts provide is something you cannot learn from other resources: real life experience. We live in a time where it is extremely easy and cheap … Read more

Fuzzy Systems: Life between the 1’s and 0’s

AI can be fuzzy, let’s look at three ways Fuzzy Systems is a branch of Computational Intelligence hoping to represent the uncertainty of a fuzzy, uncertain world. Fuzzy systems find their inspiration in the imprecision of human language. Fuzzy systems are widely known by Fuzzy Logic, and in this post, we dive into Fuzzy Rule … Read more

R²: An intuitive metric to measure the accuracy of a model

Photo by Tolga Ulkan on Unsplash Modeling data is probably the most frequent task in Machine Learning and Data Science. The question that comes inevitably along with modeling is the prediction accuracy for unseen data points. Various measures for accuracy have been proposed, and each of them has its up and downsides. This article explains … Read more

Benefits of the CatBoost Machine Learning Algorithm

Introduction Categorical Features are More Powerful Integrated Plotting Features Efficient Processing Summary References Out with the old, and in with the new. More specifically, CatBoost [2] may be replacing XGBoost for many data scientists and ML engineers moving forward. Not only is this a great algorithm for data science competitions, but it is also very … Read more

Time-Series Forecasting and Causal Analysis in R with Facebook Prophet and Google CausalImpact

A study of Montreal’s crime forecasting in conjunction with COVID’s lockdown impact This article will be part of my annual dive in R; the idea will be to use two R libraries in time-series forecasting and causal inference. I wanted to write an article for a long time, but I never found the time/resources to … Read more

Salami Coffee Grinding

This experiment was neat and informative for me. I didn’t think grinding was homogenous, but this was definitely not what I expected. I wonder if grinding directly into a basket causes issues with grounds distribution being too coarse on the bottom. I suspect this is why a method like WDT or some kind of distribution … Read more

Clustering Made Easy with PyCaret

Low-code Machine Learning with a Powerful Python Library Photo by Lucas Hobbs on Unsplash The content of this article was originally published in my latest book, Simplifying Machine Learning with PyCaret. You can click here to learn more about it. One of the fundamental tasks in unsupervised machine learning is clustering. The goal in this … Read more

Eliminating AI Bias

Identifying AI Bias and knowing how to prevent it from occurring within the AI/ML pipeline Photo by Sushil Nash on Unsplash The primary purpose of Artificial Intelligence (AI) is to reduce manual labour by using a machine’s ability to scan large amounts of data to detect underlying patterns and anomalies in order to save time … Read more

Data Splitting for Model Evaluation

Time to return to fundamentals. Data splitting, or train-test split, is such a basic concept that we sometimes forgot its importance. Photo by Karsten Winegeart on Unsplash Data splitting, or commonly known as train-test split, is the partitioning of data into subsets for model training and evaluation separately. In 2017, a Stanford research team under … Read more

MySQL vs Cassandra DB

Another RDBMS and NoSQL showdown… Photo by Daniil Kuželev on Unsplash In my last MySQL vs article, I talked about Redis, which was a database I hadn’t heard about before. This time, I wanted to talk about a database that I’ve heard of but never gotten the time to try out. Cassandra DB is another … Read more

Not Merely Averages: Using Machine Learning to Estimate Heterogeneous Treatment Effects

The original paper does what an econometrics paper has to do — it shows how to get theoretically correct inference. Here, we will not review their proofs but instead focus on how to apply the method. The main idea is the following: We first create predictions of individual treatment effects and then use these predictions … Read more

Multidimensional Scaling (MDS) for Dimensionality Reduction and Data Visualization

Explaining and reproducing Multidimensional Scaling (MDS) using different distance approaches with python implementation Dimensionality reduction methods allow examining the dataset in another axis according to the relationship between various parameters such as correlation, distance, variance in datasets with many features. After this stage, operations such as classification are performed on the dataset with supervised or … Read more

Getting Started with R Shiny

Take the first steps towards becoming an R shiny expert Photo by Luke Chesser on Unsplash Shiny is an R package that allows programmers to build web applications within R. For someone like me, who found building GUI applications in Java really hard, Shiny makes it much easier. This blog article will get you building … Read more

MetaClean Automates Peak Quality Assessments

Leverage machine learning to detect poor quality integrations and save hours assessing peaks manually Source: Author Even the best metabolomics pipelines have a degree of variance, which can cause poor peak integration between samples. This reduces your ability to accurately quantify a metabolite, and usually means that you have to manually check the quality of … Read more

How to Use W&B Sweeps with LightGBM for Hyperparameter Tuning

Understand how different hyperparameters’ effects on model performance Hyperparameter tuning is an important step in the modeling process to improve model performance and to customize model hyperparameters to better suit your dataset. There are different useful tools and packages that help with hyperparameter tuning, using grid, random or Bayesian searches. These search functions return outputs … Read more

Julia Tutorial for 3D Data Science

3D Tutorial Discover the make-it-all alternative to Python, Matlab, R, Perl, Ruby, and C through a 6-step workflow for 3D point cloud and mesh processing. If you are always on the lookout for great ideas and new “tools” that make them easier to achieve, then you may have heard of Julia before. A very young … Read more

Kernel Methods: A Simple Introduction

Machine Learning Must Know The basics of kernel methods and Radial Basis Functions Photo by Markus Winkler on Unsplash The bias-variance dilemma dominates machine learning methods. If a model is too simple, the model will struggle to find appropriate relationships between inputs and outputs. However, if a model is too complex, it will perform better … Read more

5 Practical Data Science Projects That Will Help You Solve Real Business Problems for 2022

What? Recommendation systems are algorithms with an objective to suggest the most relevant information to users, whether that be similar products on Amazon, similar TV shows on Netflix, or similar songs on Spotify. There are two main types of recommendation systems: collaborative filtering and content-based filtering. Content-based recommendation systems recommend particular items based on previously … Read more

Build Your First Mood-Based Music Recommendation System in Python

Audio-Based Recommendations From Scratch Using the Spotify API Photo by Alena Darmel While music genre plays an enormous role in building and displaying social identity, the emotional expression of a song and — even more importantly — its emotional impression on the listener is often underestimated in the domain of music preferences. Genre is not … Read more

Unlocking the Value of AI in Business Applications with ModelOps

Image from Canva AI is fast becoming critical to business and IT applications and operations. Organizations have been investing in artificial intelligence capabilities for years to stay competitive, are hiring the best data scientist teams and are investing more and more in artificial intelligence and machine learning systems. However, implementing AI / ML models is … Read more

From Jupyter Notebook to Deployment — A Straightforward Example

A step-by-step example of taking typical machine learning research code and building a production-ready microservice. This article is intended to serve as a consolidated example of the journey I took in my work as a Data Scientist, beginning from a typical solved problem in Jupyter Notebook format and developing it into a deployed microservice. Although … Read more

4 Key Figures to Understand the Climate Crisis

Image from Unsplash by Christian Lue Global Energy Consumption Trend The Industrial Revolution that began in the 18th century, when agricultural societies became more industrialized and urban, changed human society forever. Courtesy of the inventions such as transcontinental rail network, cotton gin, electricity, internal combustion engine vehicles, etc., global energy consumption started ramping up from … Read more

The Variable

At its core, data science is a discipline whose purpose is to help people—business owners, doctors, educators, public servants—make good real-world decisions. The assumption is that outcomes improve as our actions become more strongly informed by data. But is that always true? Can data lead us astray? Vishesh Khemani explores this question in depth in … Read more

The Most Amazing Chart!

Why density charts make everything clearer I wrote in a previous post how our attempt to interpret large amounts of data is a doomed effort: Ideally, we would want to explore the entire data set, comprehend every single data point, but that is an impossible undertaking of course. So we revert instead to summary descriptors … Read more

Tesla AI Day 2021 Review — Part 4: Why Tesla Won’t Have an Autonomous Humanoid Robot in 2022

Elon Musk didn’t hesitate to promise Tesla, one of his many companies, will have the prototype of a general-purpose humanoid robot before the end of 2022. But he won’t be able to fulfill that promise. The technicality of the AI day presentation made it very obscure for most viewers. The press didn’t very much echo … Read more