Everything About Python Dictionary Data Structure: Beginner’s Guide

In this section we continue working with the countries dictionary and discuss ways of adding elements to a Python dictionary. Let’s say we want to add another country with its population to our countries dictionary. For example, we want to add Japan’s population of 126,476,461. We can easily do it by adding it as an … Read more

Storytelling with Maps: Elections in Santa Fe Province

Open source free software for geospatial data Image by Arnaud Jaegers from Unsplash Maps are defined as a graphic representation of a territory on a two-dimensional surface. They are also defined as a schematic drawing or layout that represents the characteristics of a given territory, such as its dimensions, coordinates, geographic features, or other relevant … Read more

How to process a DataFrame with millions of rows in seconds?

What are the main steps of data processing with Terality? Terality comes with a Python client that you import into a Jupyter Notebook. Then you write the code in “a pandas way” and Terality securely uploads your data and takes care of distributed processing (and scaling) to calculate your analysis. After processing is completed, you … Read more

Setting up Apple’s new M1 MacBooks for Machine Learning

In a previous post, I connected the process of getting things going with our innate desire to learn. This post serves as a follow-up: It shows how to prepare the M1 MacBooks for Machine Learning. A typical setup of Machine Learning includes a) using virtual environments, b) installing all packages within them, c) using python, … Read more

5 Must-Know Terms in Time Series Analysis

It is better to start our discussion by distinguishing between deterministic and stochastic processes. The time-dependent values in a deterministic process can be calculated. For instance, how much you will have in your savings account in two years can be calculated using the initial deposited amount and the interest rate. We can’t really talk about … Read more

TensorFlow for Computer Vision — How to Implement Convolutions From Scratch in Python

And that’s a convolution in a nutshell! Convolutional layers are useful for finding the optimal filter matrices, but a convolution in itself only applies the filter to the image. There’s a ton of well-known filter matrices for different image operations, such as blurring and sharpening. Let’s see how to work with them next. We’ll use … Read more

Classification with Imbalanced Data

Using various resampling methods to improve machine learning models Photo by Aziz Acharki on Unsplash Building classification models on data that has largely imbalanced classes can be difficult. Using techniques such as oversampling, undersampling, resampling combinations, and custom filtering can improve accuracy. In this article, I’ll walk through a few different approaches to deal with … Read more

Considerations for Deploying Machine Learning Models in Production

By Jules S. Damji, Michael Galarnyk Phases of the Model Development Cycle (image by Jules S. Damji) A common grumble among data science or machine learning researchers or practitioners is that putting a model in production is difficult. As a result, some claim that a large percentage, 87%, of models never see the light of … Read more

Feature Engineering in Python

Beyond the basics Photo by Antoine Dautry on Unsplash In my decade plus as a data scientist, my experience largely agrees with Andrew Ng’s statement, “Applied machine learning is basically feature engineering.” From the very start of my career, building credit card fraud models at SAS, most of my value as a data scientist came … Read more

Rational UI Design with Streamlit

Data Visualization From one point of view Streamlit is a retrograde step in web development because it lets you mix up the logic of your app with the way it is presented. But from another it is very much simplifying web design. The demonstrator app — image by author When Tim Berners-Lee first invented the … Read more

How to Use Streamlit and Python to Build a Data Science App

Web apps are still useful tools for data scientists to present their data science projects to users. Since we may not have web development skills, we can use open-source python libraries like Streamlit to easily develop web apps in a short time. Introduction to Streamlit Installation and Set up Develop the Web App Test the … Read more

Area Under the Curve and Beyond

Machine Learning & Diagnostic Statistics Example, with python code, demonstrating how to generate and dive deeper with AUC, IDI, and NRI metrics Image by Author AUC is a good starting metric when comparing the performance of two models but it does not always tell the whole story NRI looks at the new models ability to … Read more

What to Log? From Python ETL Pipelines!

— a detailed log structure for ETL pipelines! credits- Burst As part of my work, I have been converting some of my ETL jobs developed on the traditional tool-based framework into a python framework and came across a few challenges. These few challenges are orchestration, code management, logging, version control, etc. For a few of … Read more

Changing the Cell’s Default Output — A Quick Jupyter Notebook Productivity Tip

Changing the Behavior “Permanently” When you’re done with the current Notebook and you create another one, you may notice that you’ll have to run the following code again. Otherwise, the new Notebook will just behave as before — only showing the last expression’s evaluation result. from IPython.core.interactiveshell import InteractiveShellInteractiveShell.ast_node_interactivity = “last_expr_or_assign” Is it tedious to … Read more

Building A Tennis Match Simulator in Python

Photo by Moises Alex on Unsplash Using Python to verify the math behind points-based modelling of tennis games Tennis, like other racket sports (and volleyball), has a specific scoring system that involves point scoring being divided up into subsets with these chunks being what matters to the overall match, not each individual point. This leads … Read more

3 Mistakes to Avoid When You Write Your Machine Learning Model

Machine Learning Some tips on how to optimize the development process of a Machine Learning model in order to avoid surprises during the deployment phase. Photo by Nick Owuor (astro.nic.visuals) on Unsplash Eventually I was able to breathe a sigh of relief: my Machine Learning model works perfectly both on training and on the test … Read more

Documenting Your Python Code

Always add relevant comments to your code where possible. And be mindful of overdoing it by adding comments to everything. The best comments are the ones that you don’t have to write, as the code is very clear. Generally speaking, there are two types of comments: single and multi-line comments. In Python, commenting is usually … Read more

Visualising Global Population Datasets with Python

Mapping information concerning distribution of people is vital to a host of public policy questions across our planet’s different country settings. The ability to capture geographic distribution of population and their key characteristics is integral to measuring exposure to disasters and climate change, access differentials to key services such as health, and environmental and land-use … Read more

To Monitor or Not to Monitor a Model — Is there a question?

MODEL MONITORING GUIDE This is the first part of a series of blogs on model monitoring. This article outlines the need to monitor a model and demonstrates two examples with an open-source library called Evidently AI. Devanshi Verma*, Matthew Fligiel*, Rupa Ghosh*, Dr. Arnab Bose *Contributed equally to this work, names are arranged in alphabetical … Read more

What does Weights & Biases do?

Another common MLOps task that falls under model experimentation and development is hyperparameter tuning. My go-to analogy for hyperparameter tuning is cake baking 🍰. Consider for a moment that the framework, training data and code you write for building a model are the ingredients you combine to make the cake mixture. You then run that … Read more

Understanding Consumer Behavior With The Market Basket Analysis

Learn about the data mining technique used to optimize sales in retail and e-commerce industries Photo by Tamanna Rumee on Unsplash Anticipating customers’ interests is a strategy employed in many business models. Companies invest heavily in tactics ranging from taking customer surveys to building sophisticated machine learning models to better understand customer behavior. One of … Read more

Detecting the most popular tourist attractions in Valencia using unsupervised learning techniques

Theory The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) method is a density-based clustering algorithm used to separate high-density from low-density regions. This algorithm is based on two hyperparameters: The radius (eps): The maximum distance between two samples to be considered as neighbors. The minimum number of points (MinPts): The number of samples in … Read more

Do Not Use Python Pickle Unless You Know All These Facts

Pros and cons of Pickle serialisation, and when should we use it Compare with most of the other popular programming languages, Python probably has the most flexible serialisation of objects. In Python, everything is an object, so we can say that almost everything can be serialised. Yes, the module that I was talking about is … Read more

Build Your First Machine Learning Model With Zero Configuration — Exploring Google Colab

It’s just that easy to start your machine learning journey. Photo by ray rui on Unsplash Machine learning (ML) is trending, and every company wants to leverage ML to help them better their products or services. Thus, we’ve been observing a growing demand for ML engineers, and such demand has drawn the attention of many … Read more

Announcing PyCaret’s New Time Series Module

(Image by Author) PyCaret’s New Time Series Module PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive. In comparison with the other open-source machine learning libraries, PyCaret … Read more

How to Make your Computer Talk with Python

Your first step in becoming a billionaire playboy Your robot butler. Image by Steven Miller If you’re a fan of movies like Iron Man, you’ve probably fantasised about getting your very own Jarvis. Well, in this post, I’m going to show you how you can get started making your own computer assistant. We’ll do so … Read more

How to Skyrocket Your Python Speed with Numba

Let me give you a high-level overview of Numba first 👍 Numba describes itself in the following way: Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. — Numba Documentation Let’s unpack the above statement. Numba is an open-source and lightweight Python library that … Read more

Slicing NumPy Arrays like an expert

The ultimate numpy array indexing and slicing guide Photo by Daniel Lincoln on Unsplash Slicing numpy arrays is like peeling fruits. You cut a part away and keep the rest. — The numpy ninja — After reading this post you should be able to slice through arrays like through butter. I will start out by … Read more

Building a Univariate GARCH Model In Excel

Volatility forecasting using GARCH in Excel with Python and PyXLL Volatility modelling in Excel with Python and PyXLL. Image is author’s own. In this article we are going to build a Univariate Garch model in Excel. Garch models are used to estimate the volatility of financial assets. This article first appeared on the PyXLL blog … Read more

Time Series Forecasting with ThymeBoost

TLDR: ThymeBoost combines the traditional decomposition process with gradient boosting to provide a flexible mix-and-match time series framework for trend/seasonality/exogenous decomposition and forecasting, all a pip away. All code lives here: ThymeBoost Github Traditional time-series decomposition typically involves a sequence of steps: Approximate Trend/Level Detrend to Approximate Seasonality Detrend and Deseasonalize to Approximate Other Factors … Read more

Python and the Module Search Path

Photo by Chris Ried on Unsplash How python knows which packages to import, where to find them and how modern tools (conda, pyenv, poetry) make this easy for us Previously we’ve looked at (article here) how various tools (conda, pyenv) manipulate the $PATH variable so that when you type python, the version of python you … Read more

Meet Datascienv — A Fail-Proof Method for Setting up Data Science Environments

Put simply, datascienv is a Python package that offers data science environment setup with a single pip install. Here’s the list of libraries it installs for you, according to the official Pypi page: Image 1 — Python packages installed by datascienv It’s a lot — from your everyday data analysis, preprocessing, and visualization to machine … Read more

Essential guide to Multi-Class and Multi-Output Algorithms in Python

How to train an ML model for multi-learning tasks Image by LUM3N from Pixaba Most of the classification or regression that one works with involves one target class label (dependent variable) and multiple independent features. Some machine learning classification or regression tasks may have two or more two dependent variables. The standard machine learning algorithms … Read more

Building a Network of Related IT-Skills

I analyzed 30k job descriptions to build a network chart of related IT-skills. IT job descriptions mention tons of different frameworks, programming languages and other skills. Languages like HTML and CSS obviously go hand in hand, but what are the less obvious connections? In this analysis I parsed 30k job descriptions in order to map … Read more

7 Tips I Wish I Knew Before Clearing All HackerRank Python Challenges

Photo by Roman Synkevych on Unsplash I called it the RRR approach. Record, reapply and repeat. Let me explain what that means. Record If you are doing language-specific challenges rather than doing algorithm or data structures problems, it’s pretty safe to assume that you are, to some extent, still learning the ropes. If this is … Read more

3 Solutions for the Setting with Copy Warning of Python Pandas

Warnings should never be ignored. Photo by NeONBRAND on Unsplash If you have ever done data analysis or manipulation with Pandas, it is highly likely that you encounter the SettingWithCopy warning at least once. This warning occurs when we try to do an assignment using chained indexing because chained indexing has inherently unpredictable results. Here … Read more

How to Combine Two String Columns in Pandas

Concatenating string with non-string columns Now let’s assume that one of the columns you are trying to concatenate is not in string format: import pandas as pddf = pd.DataFrame([(1, 2017, 10, ‘Q1’),(2, 2017, 20, ‘Q2’),(3, 2016, 35, ‘Q4’),(4, 2019, 25, ‘Q2’),(5, 2020, 44, ‘Q3’),(6, 2021, 51, ‘Q3’),], columns=[‘colA’, ‘colB’, ‘colC’, ‘colD’])print(df.dtypes)colA int64colB int64colC int64colD objectdtype: … Read more

Understanding Surrogate Models and Their Benefits in Data Science

Now that the features of each dynamic variable are extracted and that a correlation is established with the initial conditions, we can create a linear model for each of them: To make these surrogate models more accessible, we can create a “SurrogatePrediction” class with 3 different methods that will generate predicted time series, based on … Read more