3 (and Half) Powerful Tricks To Effectively Read CSV Data In Python

The parameter usecols in pandas.read_csv() is extremely useful to load only the specific columns from the csv data set. Here is the direct comparison of the time taken by read_csv() with and without usecols pandas.read_csv() usecols | Image by Author Importing .csv file to pandas DataFrame using usecols is ⚡️ 2.4X ⚡️ faster than importing … Read more

Learn Plotly for Advanced Python Visualization: A Use Case Approach

In order to add customizations such as cluster colors, bubble sizes, and hover-over tips, we need to first add three new columns to our data frame that assign these ‘customization parameters’ to each data point. The following code will add a new column called ‘color’ to the data frame. We first define a function called … Read more

Getting Started with Geospatial Analysis

Using geographic data and geospatial images to study the impact of climate changes, natural disasters, or human activity. Geographic data includes geospatial data captured using satellite imagery and geographic postponing systems (GPS) and other geographic data generally described explicitly in terms of geographic coordinates. Geospatial analysis includes collecting, reporting, plotting, and analyzing this data using … Read more

Defining the Moving Average Model for Time Series Forecasting in Python

Explore the moving average model and discover how we can use the ACF plot to identify the right MA(q) model for our time series Photo by Pawel Czerwinski on Unsplash One of the foundational models for time series forecasting is the moving average model, denoted as MA(q). This is one of the basic statistical models … Read more

How RGB and Grayscale Images Are Represented in NumPy Arrays?

Let’s start with image basics (Image by author, made with draw.io) Today, you’re going to learn some of the most important and fundamental topics in machine learning and deep learning. I guarantee that today’s content will deliver some of the foundational concepts that are key to start learning deep learning — a subset of machine … Read more

Will Data Scientists Still Be in Demand in 2022?

I have read many different viewpoints online about data engineering replacing data science as the hottest job of the 21st century. After working closely with both data engineering and data science teams, I have come to the conclusion that both fields are equally valuable. Companies need data engineers. They need people who are able to … Read more

How to Decrease the Carbon Footprint of Digital Communication

Photo by Mikaela Wiedenhoff on Unsplash An assessment of influences of email behaviours on greenhouse emissions using System Dynamics Nowadays, it is quite normal to communicate via digital tools. We have social media platforms to chat with friends, videotelephony services to conduct job interviews and, of course, the good old email. One would think that … Read more

Sentiments of Rome

Being a low-resource language, it is not easy to acquire an annotated corpus in Latin, with a substantial size. I used one accessible dataset[4][5] on the internet, which consists of 45 Latin sentences classified into 3 sentiment classes (POSITIVE, NEGATIVE, NEUTRAL, MIXED) and have been extracted from Horace’s Odes (The creators of the dataset have … Read more

Mastering Histograms in Matplotlib

Details of Making Histograms The histogram is one of the most popular plots. It is useful to understand the overall distribution of a continuous variable. So, almost in any data analysis or exploratory data analysis, or machine learning project, you will start with some histograms. In this article, I will explain how to make histograms … Read more

Essential guide to Machine Learning Model Monitoring in Production

Techniques to detect data drift Image by Mediamodifier from Pixabay Model Monitoring is an important component of the end-to-end data science model development pipeline. The robustness of the model not only depends upon the training of the feature engineered data but also depends on how well the model is monitored after deployment. Typically a machine … Read more

Tutorial on Surface Crack Classification with Visual Explanation (Part 2)

Now, we are going to generate visual explanation heat maps. To generate the heat-maps we are going to use Grad-CAM[1] algorithm. The heat-maps identify the image regions that influence the network’s decision. If we look at the heat map, we can easily understand which image pixels contribute to the network’s decision. To work with grad-cam, … Read more

Webhook vs API — Which One Do You Need?

Even if you are completely unfamiliar with technology, you likely utilize APIs on a daily basis. Whether ordering from an online store, determining your train schedule, or checking a weather app— we constantly pose requests for information, which is retrieved from a system or database we don’t necessarily know anything about. The layer between our … Read more

Setting up a Text Summarisation Project (Part 2)

Leveraging zero-shot learning for text summarisation with Hugging Face’s Pipeline API Photo by David Dvořáček on Unsplash This is the second part of a tutorial on setting up a text summarisation project. For more context and an overview of this tutorial, please refer back to the introduction as well as part 1 in which we … Read more

Believe Rationally

How To Update Your Beliefs Based On Evidence Imagine you took a rapid at-home covid-19 test. If you test positive, how worried should you be? Alternatively, if you test negative, how safe should you feel? Photo by Medakit Ltd on Unsplash This article will arm you with the knowledge and the tools to correctly and … Read more

Introduction to Applied Linear Algebra: Norms & Distances

Photo of Yan Krukov from Pexels Goal: This article gives an introduction to vector norms, vector distances and their application in the field of data science Why you should learn it: Vector norms and distances are used to describe attributes of vectors and the relationship of different vectors to each other. It is widely used … Read more

Self-Training Classifier: How to Make Any Algorithm Behave Like a Semi-Supervised One

You may think that Self-Training involves some magic or uses a highly complex approach. In reality, though, the idea behind Self-Training is very straightforward and can be explained by the following steps: First, we gather all labeled and unlabeled data, but we only use labeled observations to train our first supervised model. Then we use … Read more

Localization of indoor Wi-Fi users by Bayesian statistical modelling

Identifying indoor Wi-Fi users’ locations with a tolerance of uncertainty by Pymc3 Wi-Fi sensor network With the help of GPS, outdoor positioning has witnessed significant development. However, we are suffering from important inaccuracies when facing the indoor case. The existence of Wi-Fi network gives an alternative to build a localization system and significant research has … Read more

How to Build a Poisson Hidden Markov Model Using Python and Statsmodels

Manufacturing strikes in the United States plotted against time (Data source: R data sets) (Image by Author) A step-by-step tutorial to get up and running with the Poisson HMM A Poisson Hidden Markov Model is a mixture of two regression models: A Poisson regression model which is visible and a Markov model which is ‘hidden’. … Read more

Scaled Line chart — What are they and why do you absolutely need them

Combining the power of two simple things to get something awesome Image by Author Sometimes when you combine two seemingly simple things, you get something awesome. In this article, you will see the power of combining a line chart and a numeric scaler. Just for the sake of vocabulary: The line chart is very useful … Read more

FuzzyWuzzy — the Before and After

Data Preprocessing — Cleaning the Data Before Analysis Before we choose our FuzzyWuzzy function and start comparing strings, we want to clean the data to ensure that our results will be as accurate as possible. Cleaning the data means removing irrelevant strings, and thus improving the functions’ performance. For example, let’s assume we compare strings … Read more

Paper explained: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

The secrets of why the SwAV model architecture works so well for self-supervised pre-training Teaching a neural network to understand the world around it without human supervision has been one of the north stars of the computer vision research community for years. Recently, multiple publications have shown the potential of novel methods to make significant … Read more

Analysing Interactions with SHAP

The mean prediction is the average predicted bonus across all 1000 employees. If you add up all the values in the contribution matrix and add the mean prediction you will get the models actual prediction for that employee. In our case, the mean predicted bonus was $148.93. All the values in the matrix add up … Read more

Normalization, Standardization and Normal Distribution

I will start this post with a statement: normalization and standardization will not change the distribution of your data. In other words, if your variable is not normally distributed, it won’t be turn into one with the normalize method. normalize() or StandardScaler() from sklearn won’t change the shape of your data. Standardization Standardization can be … Read more

Examples of Multi-Cursor for working with Data

How to save time and nerves when coding for data analysis in VS Code using Multi-Cursor and selection features Doing multiple thing at once — Photo by Matt Bero at unsplash Working with data can be very dynamic with repeated forward and backward motions through your code to adjust and copy snippets, introducing new assumptions, … Read more

Codex — a bridge between clouds?

image — shutterstock Can Codex translate commands between AWS and Google Cloud? Many businesses need to deal with multiple cloud environments. AWS, Azure, and Google Cloud each have their own sets of commands to carry out cloud actions, such as setting up buckets, defining service accounts, and other administration tasks. It would be great to … Read more

Towards Controlled Generation of Text: A Summary

An easy-to-understand explanation of the research paper Photo by Scott Graham on Unsplash In this article, I am going to summarize an influential paper in the field of Natural Language Processing, Towards Controlled Generation of Text[1]. This paper was published in 2017, at the ICML conference, and is cited more than 800 times at the … Read more

How to Build a Knowledge Graph with Neo4J and Transformers

Using custom Named Entity Recognition and Relation Extraction models Image by Author: Knowledge Graph in Neo4j In my previous article “Building a Knowledge Graph for Job Search using BERT Transformer”, we explored how to create a knowledge graph from job descriptions using entities and relations extracted by a custom transformer model. While we were able … Read more

Anomaly Detection in IoT Enabled Smart Battery Management Systems

Understanding the usage of data engineering and machine learning in the electric mobility world Photo by Kumpan Electric on Unsplash We are living in the world of electric mobility. Globally, the adoption of electric cars and two-wheeler is steeply on the rise. Electric mobility devices rely on expensive rechargeable lithium-ion batteries for power. These batteries … Read more

An Efficient Hybrid Algorithm to Solve Nonlinear Least Squares Problems

Hands-on Tutorials When Levenberg-Marquardt meets Quasi-Newton. And yes, we build it from scratch with python! Photo by Jeremy Bishop on Unsplash In previous articles, we’ve seen Gradient Descent and Conjugate Gradient algorithm in action, as two of the simplest optimization method there is. We implemented line search for searching the direction to which the objective … Read more

Statistics is dead, long live statistics!

Meet resampling, a one-size-fits-all modern approach to do stats Estimating confidence intervals and hypothesis testing are two common statistical tasks. They have an air of mystery around them as they rely on math accompanied by complex and case-specific assumptions. While this is still the way many people do stats, I argue it is not the … Read more

Medical AI: Why Clinicians Swipe Left

The most common reason your medical AI will be rejected by clinicians and how to overcome it Image by Evin Felix. So, you’ve found success training algorithms for industrial use, and now you want to train an algorithm to help doctors help patients. Developing algorithms for medical applications is a challenging pursuit but also a … Read more

Developing a Convolutional Neural Network Model Using the Unlabeled Image Files Directly From the…

Using Image-Generator to label the Images Automatically as the Subdirectories Convolutional Neural Network is a great tool for Image classification. It can perform other artificial intelligence tasks as well. But this article will focus primarily on image recognition. I have a detailed article on how a forward pass of a Convolutional Neural Network works and … Read more

Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames

An End-To-End SQL Interface for Data Science and Analytics As a data scientist, you might be familiar with both Pandas and SQL. However, there might be some queries, transformations that you feel comfortable doing in SQL instead of Python. Wouldn’t it be nice if you can query a pandas DataFrame like below: … using SQL? … Read more

A Great Python Library: Great Expectations

The id column should always be unique and duplicate id values might have severe consequences. We can easily check for the uniqueness of the values in this column. df.expect_column_values_to_be_unique(column=”id”)# output{“meta”: {},”result”: {“element_count”: 1000,”missing_count”: 0,”missing_percent”: 0.0,”unexpected_count”: 0,”unexpected_percent”: 0.0,”unexpected_percent_total”: 0.0,”unexpected_percent_nonmissing”: 0.0,”partial_unexpected_list”: []},”success”: true,”exception_info”: {“raised_exception”: false,”exception_traceback”: null,”exception_message”: null}} The functions of the Great Expectations library return a json … Read more

Why You Should Use Callbacks in TensorFlow 2

Customize your training of deep neural networks – a practical guide Photo by John Schnobrich on Unsplash Callbacks are essential when you want to control the training of a model. And you do want to control the training… Callbacks help us prevent overfitting, visualize our training progress, save checkpoints and much more. But why TensorFlow? … Read more

Leading by Example to Improve Civic Life

It’s uncommon to find an in-house AI Lab at the municipal level of government. I would love to learn more about the team. Could you share what the Lab does at the City of London? What does a typical day look like? The Municipal Artificial Intelligence Applications Lab is a part of the Information Technology … Read more

Efficiently Shortlisting a Classification Model

Converging to the right model in a faster manner Photo by Djim Loic on Unsplash It’s 8 PM and you are still cleaning up the data, performing EDA, and creating more features. Your initial discussion with business is first thing tomorrow and the expectation is to discuss what potential “classification” methodologies are in consideration and … Read more