Designing a Fairness Workflow for Your ML Models

Trustworthy AI How do you ensure your model is fair from start to finish? Russell Holz contributed to this article. Photo by Gio Bartlett on Unsplash In the first blog post of this series, we discussed three key points to creating a comprehensive fairness workflow for ensuring fairness for machine learning model outcomes. They are: … Read more

Recursive Least Squares: Learning on the fly

Simple Online Learning Algorithm One of the most basic yet powerful online learning algorithms in literature. Image by Author Online learning is a booming field of research in the AI research space. Many problems in today’s world require machines to learn on the fly and improve or adapt as they collect new information. In this … Read more

Fourier Transforms: An Intuitive Visualisation

Time-series Data Processing An intuitive visualization of discrete Fourier transforms applied to simple time-series data. Image by Author This article visualizes the decomposition of a time series signal into its harmonics using the Fourier transform. The formula is explained in a visual manner to help understand its meaning. The Fourier Transform is an extremely powerful … Read more

A Broad and Practical Exposition of Online Learning Techniques

An overview of online learning techniques, focusing on those that are most effective for the practitioner. source In this blog post, I will take a deep dive into the topic of online learning — a very popular research area within the deep learning community. Like many research topics in deep learning, online learning has wide … Read more

Regular Expressions Clearly Explained with Examples

Don’t worry if the regex characters above don’t make much sense to you now — they merely serve as references for the examples that we are about to go through. In this section, we will be focusing on 6 different examples that will hopefully reinforce your understanding of regular expressions. Specifically, we will be looking … Read more

Interview Series Part III: — Crack any Business Case Interviews

Machine Learning/DataScience Interview Series: How you should approach business case interviews to make it to the next stage Photo by Firmbee.com on Unsplash This is my 3rd post in the “Machine Learning/Data Science Interview” series. The first post I wrote focused on technical interviews and can be found here. The second post I wrote focused … Read more

Function Decorators in Python

Next, let’s look at an example of decorating a function. Assume we have a simple function that returns a sum of two integers: def sum_up(n, m):return n + m Then we can run it like so: print(sum_up(3, 7)) Output: 10 Now, assume we want to log what is happening in this part of our codebase … Read more

Use these Principles to Design Brilliant Dashboards

As a Software Product Analyst working in Data Science, and someone that loves data in general, I frequently create dashboards to convey information about key performance indicators and metrics of interest to project stakeholders. Most recently, at work I’ve been digging into product usage data and have been using Elasticsearch and Kibana to create dashboards. … Read more

Forecasting when accuracy is no longer the goal: a new world of possibilities.

In this series of articles[1], we’ve so far demonstrated why and how existing forecast accuracy metrics prevent demand planners from delivering more business value. More precisely, using data from the M5 competition, we’ve empirically established the weak correlation between accuracy metrics and cost-effectiveness. In other words: existing metrics do not take into account the intended … Read more

Bad Data Visualizations and How To Fix Them

Using data visualization principles to fix misleading and uninformative charts Photo by Firmbee.com on Unsplash Building data visualizations: the stage in the data science cycle where you get to present your findings after you have worked on understanding and cleaning a dataset. I am sure you have wondered what the best way to go about … Read more

Mathematics Hidden Behind Linear Regression

Exploring statistics using Calculus Hi Everyone, This is about the mathematics that is used in the linear regression (with gradient descent) algorithm. This was a part of my IB HL Mathematics Exploration. Linear Regression is a statistical tool that produces a line of best fit for a given dataset analytically. To produce the regression line … Read more

Cloud Native Data Pipelines using ArgoWorkflow

Leveraging containers and Kubernetes to scale your data engineering pipelines Whether you are a data engineer, platform engineer, data scientist, or ML engineer, when working with data, we all are faced with the challenge of creating pipelines. Despite the eclectic disparities between our data processing goals, one aspect remains constant “we need the ability to … Read more

Automate Application Migration with GKE Autopilot and Migrate for GKEAutomate Application Migration with GKE Autopilot and Migrate for GKECloud Developer Advocate

Many developers today are choosing to develop and deploy new greenfield applications on Google Kubernetes Engine (GKE). And it’s easy to understand why—GKE offers a great combination of scalability, security, and ease of use.  However, what might surprise a lot of people is that GKE is also often chosen to run existing brownfield workloads. For … Read more

tsbox 0.3.1: extended functionality

[This article was first published on cynkra, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The tsbox package provides a set of tools that are agnostic towards existing … Read more

Categories R Tags ExcerptFavorite

Amazon MSK now supports running multiple authentication modes and updates to TLS encryption settings

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports the simultaneous use of multiple authentication modes and updates to encryption-in-transit settings for Amazon MSK clusters. These features allow you to migrate your clients seamlessly from one authentication mode to another and update encryption settings to match those changes.  With this launch, you can now activate … Read more

Categories AWS ExcerptFavorite

A comprehensive study of Mixed Integer Programming with JuMP on Julia (Part 3)

I am solving a problem with an exponential number of constraints with the Branch-and-Cut framework Photo by Claudio Schwarz on Unsplash Yes, it’s possible. Even though it’s very counter-intuitive, we can handle a linear program with an exponential number of constraints provided that we have a practical (even approached) way of separating these constraints. This … Read more

How To Iterate Over Keys and Values in Python Dictionaries

Iterating over both keys and values Now in case you need to iterate over both keys and values in one go, you can call items(). The method will return a tuple containing the key value pairs in the form (key, value). for key, value in my_dict.items():print(f’Key: {key}, Value: {value}’)# OutputKey: a, Value: 1Key: b, Value: … Read more

Implementing Deep Convolutional Neural Networks in C without External Libraries

We apply FSRCNN on the Y component of the YUV420 video. Since the HVS has less sensitivity on U and V, these components could be upsampled using simpler interpolation algorithms like Bicubic. Hence in this post, we focus on implementing CNN on C, for upsampling U and V components, we simply repeat existing elements to … Read more

Automating Machine Learning Using FLAML

Using FLAML for Automating Machine Learning Process Photo by Pietro Jeng on Unsplash Machine Learning is a process where we try to solve real-life business problems using a different set of algorithms. Creating a Machine Learning model is easy but selecting which model performs the best for our data in terms of generalization and performance … Read more

AWS RoboMaker now supports container images in simulation

AWS RoboMaker, a service that allows customers to simulate robotics applications at cloud scale, now supports container images. This feature enables customers to use the container tools that they are already familiar with to build and package their code for running simulations in RoboMaker.   With container support, you can now take advantage of container … Read more

Categories AWS ExcerptFavorite

What type of data processing organization are you?What type of data processing organization are you?Solutions Manager, Smart Analytics and AIData Analytics Practice Lead

Every organization has its own unique data culture and capabilities. Yet each is expected to use technology trends and solutions in the same way as everyone else. Your organization may be built on years of legacy applications, you may have developed a considerable amount of expertise and knowledge, yet you may be asked to adopt … Read more

Common mistakes we Data Scientists make

DISCLAIMER I am a data scientist and have made all these mistakes, but I have had the privilege of sitting on the managerial, project lead and developer side of the fence, and here are some tips to getting your stakeholders (i.e. anyone involved in the project team or has an interest in the success of … Read more

Categories R Tags ExcerptFavorite

Fast and {furrr}-ious: real time economic monitoring using R

Mango’s ‘Meet-Up’ at Big Data London on 22nd September features guest speaker Adam Hughes, Data Scientist for The Bank of England, whose remit involves working with incredibly rich datasets, feeding into strategic decision-making on monetary policy. You can read about Adam’s incredibly interesting data remit and his team’s journey through Covid-19, in this short Q&A. Can … Read more

Categories R Tags ExcerptFavorite

Functions That Generate a Multi-index in Pandas and How to Remove the Levels

Introduction In this article, we will look at what a multiindex is, where and when to use it, functions that generate a multiindex, and how to collapse it into a single index. But first, let’s get some basic definitions out of the way. An index is a column in a DataFrame that ‘uniquely’ identifies each … Read more

ROK Defeats Niche Zero Part 3

I use two metrics for evaluating the differences between techniques: Final Score and Coffee Extraction. Final score is the average of a scorecard of 7 metrics (Sharp, Rich, Syrup, Sweet, Sour, Bitter, and Aftertaste). These scores were subjective, of course, but they were calibrated to my tastes and helped me improve my shots. There is … Read more

Demystified: Wasserstein GANs (WGAN)

What is the Wasserstein distance? What is the intuition behind using Wasserstein distance to train GANs? How is it implemented? Fig. 1: Optimal discriminator and critic when learning to differentiate two Gaussians[1]. In this article we will read about Wasserstein GANs. Specifically we will focus on the following: i) What is Wasserstein distance?, ii) Why … Read more

Graph Neural Network (GNN) Architectures for Recommendation Systems

Recommendation systems are used to generate a list of recommended items for a given user(s). Recommendations are drawn from the available set of items (e.g., movies, groceries, webpages, research papers, etc.,) and are tailored to individual users, based on: user’s preferences (implicit or explicit), item features, and/or user<->item past interactions. The quantity and the quality … Read more

Applications are open: 2022 summer school on stats methods for ling and psych

[This article was first published on Shravan Vasishth’s Slog (Statistics blog), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet  Applications are now open for the sixth SMLP … Read more

Categories R Tags ExcerptFavorite

Algorithmic Thinking for Data Science

The one prominent question that data science students constantly ask is, “Why Algorithms?” And with all honesty, I do not blame them. You see libraries and languages advancing every day, Python with scikit-learn can implement almost any data structure in one line of code. Why would one want to know the science and mathematics behind … Read more

Is your data strategy missing the “Mark”?

The benefits realized by any and every data initiative will be coupled to and limited by the maturity of an organization’s information literacy. Ask any data leader about their data strategy; they’ll likely start with their modern data architecture, mentioning buzzwords like data lakes, event streaming, or unstructured/semi-structured data. Next, they may dive into the … Read more

Sibling Rivalry and Cointegration in Game of Thrones

Granger Causality between Tyrion and Cersei Lannister. In Python. No Dragons. Testing for time series cointegration, Granger causality, stationarity (ADF and KPSS) and white noise residuals. With a quick helping of vector autoregression. Process Steps: It is a truth universally acknowledged that a data scientist in possession of a somewhat nerdy mind must be in … Read more

Introducing the Synthetic Data Community

Not interested in the research stuff — just tell me how do I get started? Sure, fire your terminal and type in the following: pip install ydata-synthetic That’s it. You have all the synthesizers installed in a single command. Now to walk you through various library usages, we have included multiple examples presented as jupyter … Read more

Visualizing Spotify Data with Python and Tableau

Create a dynamic dashboard using your streaming data & Spotify’s API I didn’t need to do this project to find out that I’m still addicted to ‘Ribs’ by Lorde, but it was fun to work on nonetheless. See below to learn how you can easily replicate it using your own Spotify data! Tableau Public DashboardJupyter … Read more

{emayili} Rendering R Markdown

[This article was first published on R – datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In a previous post I documented a new feature in {emayili}, … Read more

Categories R Tags ExcerptFavorite

Beautiful Maps with MazamaSpatialPlots

Many of us have become addicted to The NY Times COVID maps — maps of US state or county level data colored by cases, vaccinations, per capita infections, etc. While recreating maps like these in R is possible, it is disappointingly difficult. The just released MazamaSpatialPlots R package takes a first stab at remedying this … Read more

Categories R Tags ExcerptFavorite

Computing on Coupled Data Streams with Beam

Coupled data streams need to be analyzed together paying particular attention to simultaneity in event-time and other process specific variables controlling the participant streams… Data streams are ubiquitous now-a-days. IOT devices, ATM transactions, apps, sensors etc… pushing out a steady stream of data. Analyzing these streams of data as they arrive poses challenges — especially … Read more

Basic Molecular Representation for Machine Learning

RDKit supports several fingerprint functions, which outputs could be used for calculating molecular similarity or as the inputs to the downstream machine learning models. Figure 8 shows the codes for retrieving RDKit Fingerprint and Morgan Fingerprint, and Figure 9 shows the results of these fingerprint functions. Fig. 8. Retrieving RDKit Fingerprint and Morgan Fingerprint Fig. … Read more

Will You Switch From PyCharm to DataSpell — the Latest Data Science IDE from JetBrains?

Review of the key features for the DataSpell IDE Photo by Nick Fewings on Unsplash Among the common Python IDEs, PyCharm is my favorite for several reasons, just to name a few: 1). PyCharm gives me a more coherent user experience because I used to use AndroidStudio a lot; 2). Great auto-completion intelligence for high … Read more

Creating Successful R User Groups in Abuja, Nigeria

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Bilikisu Aderinto, Founder/Organizer of the Abuja R User Group and R-Ladies Abuja, … Read more

Categories R Tags ExcerptFavorite

Feedback Alignment Methods

Backpropagation’s simplicity, efficiency, and high accuracy and convergence rates, make it the de facto algorithm to train neural networks. However, there is evidence that such an algorithm could not be biologically implemented by the human brain [1]. One of the main reasons is that backpropagation requires synaptic symmetry in the forward and backward paths. Since … Read more

Programming An Intuitive Image Classifier, Part 1

Learning the “Mental” Generative Models Now it’s time to learn the “mental models” — the generative probability models for each class of handwritten digit the computer will learn from data in order to make educated inferences at classification time. To create the actual model, we will use a technique in the sci-kit learn module called … Read more

Analysis of the emotion data — a dataset for emotion recognition tasks.

We’ll start by importing the necessary libraries and visualizing the data. As we already know, the data has been preprocessed, so that is a bonus. We’ll typically look for imbalance in the dataset and length of the tweets to start with. Beyond that, feel free to dive in further. Creating a column with label names. … Read more