Learning Disentangled Representations with Invertible(Flow-based) Interpretation Networks

What are disentangled representations? How can we learn disentangled representations for any arbitrary model using flow-based generative models? Fig. 1: The IIN network can be applied to arbitrary existing models. IIN takes the representation z, learned by the arbitrary model and factorised it into smaller factors such that each factor learns to represent one generative … Read more

Designing a Fairness Workflow for Your ML Models

Trustworthy AI How do you ensure your model is fair from start to finish? Russell Holz contributed to this article. Photo by Gio Bartlett on Unsplash In the first blog post of this series, we discussed three key points to creating a comprehensive fairness workflow for ensuring fairness for machine learning model outcomes. They are: … Read more

Recursive Least Squares: Learning on the fly

Simple Online Learning Algorithm One of the most basic yet powerful online learning algorithms in literature. Image by Author Online learning is a booming field of research in the AI research space. Many problems in today’s world require machines to learn on the fly and improve or adapt as they collect new information. In this … Read more

Fourier Transforms: An Intuitive Visualisation

Time-series Data Processing An intuitive visualization of discrete Fourier transforms applied to simple time-series data. Image by Author This article visualizes the decomposition of a time series signal into its harmonics using the Fourier transform. The formula is explained in a visual manner to help understand its meaning. The Fourier Transform is an extremely powerful … Read more

A Broad and Practical Exposition of Online Learning Techniques

An overview of online learning techniques, focusing on those that are most effective for the practitioner. source In this blog post, I will take a deep dive into the topic of online learning — a very popular research area within the deep learning community. Like many research topics in deep learning, online learning has wide … Read more

Regular Expressions Clearly Explained with Examples

Don’t worry if the regex characters above don’t make much sense to you now — they merely serve as references for the examples that we are about to go through. In this section, we will be focusing on 6 different examples that will hopefully reinforce your understanding of regular expressions. Specifically, we will be looking … Read more

Interview Series Part III: — Crack any Business Case Interviews

Machine Learning/DataScience Interview Series: How you should approach business case interviews to make it to the next stage Photo by Firmbee.com on Unsplash This is my 3rd post in the “Machine Learning/Data Science Interview” series. The first post I wrote focused on technical interviews and can be found here. The second post I wrote focused … Read more

Function Decorators in Python

Next, let’s look at an example of decorating a function. Assume we have a simple function that returns a sum of two integers: def sum_up(n, m):return n + m Then we can run it like so: print(sum_up(3, 7)) Output: 10 Now, assume we want to log what is happening in this part of our codebase … Read more

Use these Principles to Design Brilliant Dashboards

As a Software Product Analyst working in Data Science, and someone that loves data in general, I frequently create dashboards to convey information about key performance indicators and metrics of interest to project stakeholders. Most recently, at work I’ve been digging into product usage data and have been using Elasticsearch and Kibana to create dashboards. … Read more

Forecasting when accuracy is no longer the goal: a new world of possibilities.

In this series of articles[1], we’ve so far demonstrated why and how existing forecast accuracy metrics prevent demand planners from delivering more business value. More precisely, using data from the M5 competition, we’ve empirically established the weak correlation between accuracy metrics and cost-effectiveness. In other words: existing metrics do not take into account the intended … Read more

Bad Data Visualizations and How To Fix Them

Using data visualization principles to fix misleading and uninformative charts Photo by Firmbee.com on Unsplash Building data visualizations: the stage in the data science cycle where you get to present your findings after you have worked on understanding and cleaning a dataset. I am sure you have wondered what the best way to go about … Read more

Mathematics Hidden Behind Linear Regression

Exploring statistics using Calculus Hi Everyone, This is about the mathematics that is used in the linear regression (with gradient descent) algorithm. This was a part of my IB HL Mathematics Exploration. Linear Regression is a statistical tool that produces a line of best fit for a given dataset analytically. To produce the regression line … Read more

Cloud Native Data Pipelines using ArgoWorkflow

Leveraging containers and Kubernetes to scale your data engineering pipelines Whether you are a data engineer, platform engineer, data scientist, or ML engineer, when working with data, we all are faced with the challenge of creating pipelines. Despite the eclectic disparities between our data processing goals, one aspect remains constant “we need the ability to … Read more

A comprehensive study of Mixed Integer Programming with JuMP on Julia (Part 3)

I am solving a problem with an exponential number of constraints with the Branch-and-Cut framework Photo by Claudio Schwarz on Unsplash Yes, it’s possible. Even though it’s very counter-intuitive, we can handle a linear program with an exponential number of constraints provided that we have a practical (even approached) way of separating these constraints. This … Read more

How To Iterate Over Keys and Values in Python Dictionaries

Iterating over both keys and values Now in case you need to iterate over both keys and values in one go, you can call items(). The method will return a tuple containing the key value pairs in the form (key, value). for key, value in my_dict.items():print(f’Key: {key}, Value: {value}’)# OutputKey: a, Value: 1Key: b, Value: … Read more

Implementing Deep Convolutional Neural Networks in C without External Libraries

We apply FSRCNN on the Y component of the YUV420 video. Since the HVS has less sensitivity on U and V, these components could be upsampled using simpler interpolation algorithms like Bicubic. Hence in this post, we focus on implementing CNN on C, for upsampling U and V components, we simply repeat existing elements to … Read more

Automating Machine Learning Using FLAML

Using FLAML for Automating Machine Learning Process Photo by Pietro Jeng on Unsplash Machine Learning is a process where we try to solve real-life business problems using a different set of algorithms. Creating a Machine Learning model is easy but selecting which model performs the best for our data in terms of generalization and performance … Read more

Functions That Generate a Multi-index in Pandas and How to Remove the Levels

Introduction In this article, we will look at what a multiindex is, where and when to use it, functions that generate a multiindex, and how to collapse it into a single index. But first, let’s get some basic definitions out of the way. An index is a column in a DataFrame that ‘uniquely’ identifies each … Read more

ROK Defeats Niche Zero Part 3

I use two metrics for evaluating the differences between techniques: Final Score and Coffee Extraction. Final score is the average of a scorecard of 7 metrics (Sharp, Rich, Syrup, Sweet, Sour, Bitter, and Aftertaste). These scores were subjective, of course, but they were calibrated to my tastes and helped me improve my shots. There is … Read more

Demystified: Wasserstein GANs (WGAN)

What is the Wasserstein distance? What is the intuition behind using Wasserstein distance to train GANs? How is it implemented? Fig. 1: Optimal discriminator and critic when learning to differentiate two Gaussians[1]. In this article we will read about Wasserstein GANs. Specifically we will focus on the following: i) What is Wasserstein distance?, ii) Why … Read more

Graph Neural Network (GNN) Architectures for Recommendation Systems

Recommendation systems are used to generate a list of recommended items for a given user(s). Recommendations are drawn from the available set of items (e.g., movies, groceries, webpages, research papers, etc.,) and are tailored to individual users, based on: user’s preferences (implicit or explicit), item features, and/or user<->item past interactions. The quantity and the quality … Read more

Algorithmic Thinking for Data Science

The one prominent question that data science students constantly ask is, “Why Algorithms?” And with all honesty, I do not blame them. You see libraries and languages advancing every day, Python with scikit-learn can implement almost any data structure in one line of code. Why would one want to know the science and mathematics behind … Read more

Is your data strategy missing the “Mark”?

The benefits realized by any and every data initiative will be coupled to and limited by the maturity of an organization’s information literacy. Ask any data leader about their data strategy; they’ll likely start with their modern data architecture, mentioning buzzwords like data lakes, event streaming, or unstructured/semi-structured data. Next, they may dive into the … Read more

Sibling Rivalry and Cointegration in Game of Thrones

Granger Causality between Tyrion and Cersei Lannister. In Python. No Dragons. Testing for time series cointegration, Granger causality, stationarity (ADF and KPSS) and white noise residuals. With a quick helping of vector autoregression. Process Steps: It is a truth universally acknowledged that a data scientist in possession of a somewhat nerdy mind must be in … Read more

Introducing the Synthetic Data Community

Not interested in the research stuff — just tell me how do I get started? Sure, fire your terminal and type in the following: pip install ydata-synthetic That’s it. You have all the synthesizers installed in a single command. Now to walk you through various library usages, we have included multiple examples presented as jupyter … Read more

Visualizing Spotify Data with Python and Tableau

Create a dynamic dashboard using your streaming data & Spotify’s API I didn’t need to do this project to find out that I’m still addicted to ‘Ribs’ by Lorde, but it was fun to work on nonetheless. See below to learn how you can easily replicate it using your own Spotify data! Tableau Public DashboardJupyter … Read more

Computing on Coupled Data Streams with Beam

Coupled data streams need to be analyzed together paying particular attention to simultaneity in event-time and other process specific variables controlling the participant streams… Data streams are ubiquitous now-a-days. IOT devices, ATM transactions, apps, sensors etc… pushing out a steady stream of data. Analyzing these streams of data as they arrive poses challenges — especially … Read more

Basic Molecular Representation for Machine Learning

RDKit supports several fingerprint functions, which outputs could be used for calculating molecular similarity or as the inputs to the downstream machine learning models. Figure 8 shows the codes for retrieving RDKit Fingerprint and Morgan Fingerprint, and Figure 9 shows the results of these fingerprint functions. Fig. 8. Retrieving RDKit Fingerprint and Morgan Fingerprint Fig. … Read more

Will You Switch From PyCharm to DataSpell — the Latest Data Science IDE from JetBrains?

Review of the key features for the DataSpell IDE Photo by Nick Fewings on Unsplash Among the common Python IDEs, PyCharm is my favorite for several reasons, just to name a few: 1). PyCharm gives me a more coherent user experience because I used to use AndroidStudio a lot; 2). Great auto-completion intelligence for high … Read more

Feedback Alignment Methods

Backpropagation’s simplicity, efficiency, and high accuracy and convergence rates, make it the de facto algorithm to train neural networks. However, there is evidence that such an algorithm could not be biologically implemented by the human brain [1]. One of the main reasons is that backpropagation requires synaptic symmetry in the forward and backward paths. Since … Read more

Programming An Intuitive Image Classifier, Part 1

Learning the “Mental” Generative Models Now it’s time to learn the “mental models” — the generative probability models for each class of handwritten digit the computer will learn from data in order to make educated inferences at classification time. To create the actual model, we will use a technique in the sci-kit learn module called … Read more

Analysis of the emotion data — a dataset for emotion recognition tasks.

We’ll start by importing the necessary libraries and visualizing the data. As we already know, the data has been preprocessed, so that is a bonus. We’ll typically look for imbalance in the dataset and length of the tweets to start with. Beyond that, feel free to dive in further. Creating a column with label names. … Read more

Is Hands-On Knowledge More Important than Theory?

We see these debates ebb and flow almost every week on TDS: should data scientists first master high-level concepts and get fluent in, say, probability theory—or dig right into the (occasionally) messy world of model tuning and data cleaning? In truth, the best posts we read and share blend these two sides of data science … Read more

The Confusing Matrix

There are many types of questions we can ask about the various performance metrics. Here we will only cover a few types of questions, and you can build upon these and come up with many more variations. Type A Only a single metric is presented in the question, and we condition on the true condition … Read more

Fast Feature Engineering in Python: Image Data

The Albumentations library can also be used to create augmentations for other tasks such as object detections. Object detection requires us to create bounding boxes around the object of interest. Working with raw data can prove to be challenging when trying to annotate images with the coordinates for the bounding boxes. Fortunately, there are many … Read more

18 Months Into My Data Science Journey

Lessons learned, mistakes made, and future plans Photo by Jayden Yoon ZK on Unsplash I started learning data science around 18 months ago. I remember how it all started. A Covid-19 lockdown had just been imposed on the entire country, and I was stuck at home. I was also on a semester break from college … Read more

14 Tips for Nonprofits Working with Data

Photo taken by Delta Analytics The first and most important part of starting any data project is framing the question to be answered. This may require focusing on a specific area of your nonprofit’s mission or goals that you want to understand better or a question where you believe data will provide actionable insights. This … Read more

What Is The Difference Between predict() and predict_proba() in scikit-learn?

The predict_proba() method In the context of classification tasks, some sklearn estimators also implement the predict_proba method that returns the class probabilities for each data point. The method accepts a single argument that corresponds to the data over which the probabilities will be computed and returns an array of lists containing the class probabilities for … Read more

A Simple Interpretation of Logistic Regression Coefficients

Odds ratios simply explained. Image by Ian Dooley (source: Unsplash) — thanks Ian! I’ve always been fascinated by Logistic Regression. It’s a fairly simple yet powerful Machine Learning model that can be applied to various use cases. It’s been widely explained and applied, and yet, I haven’t seen many correct and simple interpretations of the … Read more

Rebuild The Chain Rule to Automatic Differentiation

Neural networks are cool. The different machine learning frameworks are even cooler. As you may know, modern neural networks are large formulas with a huge number of variables. Given a problem, these frameworks will help you find a suitable set of parameter values, with a process called “training”. This training process can be done quite … Read more

Deduplication of streaming data in Kafka ksqlDB or How to find distinct events in streaming…

Or how to find distinct events in streaming pipeline Photo by Tamanna Rumee on Unsplash As stated by the creators, “ksqlDB is a database that is purpose-built for stream processing applications”. For sure, when it first came into play it changed the world of streaming data processing. Being built on Kafka Streams, it allows you … Read more

Don’t Let Tooling and Management Approaches Stifle Your AI Innovation

Image from Canva It is no coincidence that companies are investing in AI at unprecedented levels at a time when they are under tremendous pressure to innovate. The artificial intelligence models developed by data scientists give enterprises new insights, enable new and more efficient ways of working, and help identify opportunities to reduce costs and … Read more