Understand your Algorithm with Grad-CAM

Warning, the Grad-CAM can be difficult to wrap your head around. Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the … Read more

Integrate Neo4j with KarateClub node embedding package

Learn how to integrate KarateClub library with Neo4j to calculate various node and graph embeddings Lately, I have been on a quest to learn as much as possible about node embedding techniques. The goal of node embedding is to encode nodes so that the similarity in the embedding space approximates similarity in the original network. … Read more

NumPy Basics for People in a Hurry

A simple guide for beginners learning NumPy in Python Image by author (made on Canva) NumPy is a Python library on which most data science packages such as SciPy (Scientific Python), Matplotlib, and Scikit-learn depends to some extent. It adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical … Read more

Filling in the Gaps: Imputation 3 Ways

A look at the imputation of missing data ranging from simple replacement approaches to more complex Multiple Imputation by Chained Equations. Photo by Caleb Jones on Unsplash If you’ve done any predictive analytics — or attempted any IFoA actuarial examinations — you’ll have come across the phrase “garbage in, garbage out”. It’s been around since … Read more

Python: Managing Our Love-Hate Relationship With Its Syntactic Beauty And Crazy Slow Speeds

How we can help a much-loved language with a little boost of speed. Photo by Михаил Павленко on Unsplash Python is ubiquitous in the data science and quantitative finance community, for its ease of use, extensive libraries, and syntactic beauty. Yet, Python often never makes it to production as is; in the quantitative finance context, … Read more

Data Manipulation: SQL vs. Pandas

Which tool would you like to use in your next data science project? Photo by Pascal Müller on Unsplash Data cleaning and manipulation are essential steps in any data science project. Both SQL and Pandas are popular tools used by Data Analysts and Data Scientists nowadays. Which tool to used depends on where the data … Read more

Object-Oriented Programming (OOP) in Python

Demystifying classes, objects, inheritance, and more Source: Undraw.co Object-oriented programming is a method of organizing a program by grouping related properties and behaviors into individual objects. The basic building blocks of OOP are objects and classes. A class is a code template for creating objects, we can think of it as a blueprint. It describes … Read more

How to Visualize the Rest of Your Life

TUTORIAL — PYTHON — ALTAIR A Step-By-Step Tutorial to create a Data Viz Chart with your Life in Weeks using Python and Altair I try not to be too philosophical with this article. However, recently I saw someone showing a poster that depicted his life in weeks. While all weeks in the past were visualized … Read more

How to use image preprocessing to improve the accuracy of Tesseract

Applying computer vision techniques to sharpen accuracy Previously, on How to get started with Tesseract, I gave you a practical quick-start tutorial on Tesseract using Python. It is a pretty simple overview, but it should help you get started with Tesseract and clear some hurdles I faced when I was in your shoes. Now, I’m … Read more

7 Best UI Graphics Tools For Python Developers With Starter Codes

Python is such a versatile language that it can accomplish most tasks that different programming languages are meant to achieve. Although Python is used more frequently for applications and projects related to artificial intelligence, data science, data visualizations, data analytics, and other similar operations, we are by no means limited to these boundaries. You can … Read more

Ultra Concise Python Code — Shaving Code and Finding Better Mental Health

I would call this: razor fine code but “fine” makes ambiguous what we’re talking about here. It’s tight python code. Its job is expression in few lines. But whether it’s fine or good is your call. It’s your mental health call. Let us set the stage: When reviewers have had the misfortune to inspect my … Read more

Bayesian Structural Time-Series Interruption Method

A Bayesian Approach to Difference-In-Differences Photo by Chris Liverani on Unsplash When dealing with real-world data, it is extremely rare to find clean, isolated, and controlled lab-like datasets. We often find that the more classical statistical methods that we all learn, simply do not work — their assumptions are too unrealistic or rigid, and the … Read more

Python for Excel Users — Part 1

Getting started with dealing with data using pandas This is part one of a tutorial series for everyone who wants to get started working in python instead of Excel, e.g. for automating tasks or improving speed and scalability. Or for just the curious ones. You’re at the right spot when: you use Excel to combine … Read more

Extracting Solar Power Potential Using Global Solar Atlas

How to use GIS data layers provided by GSA to extract photovoltaic potential for any region Prepared by Solargis for The World Bank. Provided under CC BY 4.0 license. With increasing instances of wildfire, flooding, global warming, and other natural disasters, the thrust on the transition to renewable sources of energy is getting stronger. Many … Read more

How to Speed up Python Data Pipelines up to 91X?

The frustrating thing about being a data scientist is waiting for big-data pipelines to finish. Although python is the romantic language of data scientists, it isn’t the fastest. This scripting language is interpreted at the time of execution, making it slow and parallel executions hard. Sadly, not every data scientist is an expert in C++. … Read more

How To Simulate Traffic On Urban Networks Using SUMO

Understanding, predicting, and ultimately — reducing traffic congestion in urban networks is a complex problem. Even understanding the emergence of traffic congestion in the most simple case — a single lane road, is challenging. The Simulation of Urban Mobility (SUMO) platform is an open source platform that enables simulation of traffic flows in complex environments. … Read more

Forecasting the 2020 US Election Using Multilevel Regression with Post-stratification

Presidential election, Trump, Political Polls, PYMC3, Python The most commonly used method for estimating state- level opinion is called disaggregation. The process is simple and easy to implement: After combining a set of national polls, you calculate the opinion percentages disaggregated by state. The problem with disaggregation is that it requires a large number of … Read more

Getting started with NLP in Python

First, we need a dataset to work with and Kaggle is where we have gone to. Kaggle dataset There are many different competitions available within Kaggle that aim to challenge any budding Data Scientist. We will review the datasets provided within the CommonLit Readability competition. CommonLit provided Kaggle with the opportunity to develop algorithms that … Read more

How to learn Matlab

Although it is just my own opinion, I strongly believe that the first step for learning a language should always be getting familiar with its data types. The least important step is to memorize a variety of functions. None of us can know of every single function, but thanks to the documentation and our “best … Read more

Side-by-side comparison of strings in Python

Implementation of a tool to compare texts side-by-side with Python enabling a better overview of differences Photo by Vanessa Giaconi (source: Unsplash) Currently I am working on a privacy filter for text in Python. During development I ran into a problem I face often; how to quickly compare two strings and evaluate the difference easy. … Read more

Create “Weather-Proof” Validations for your Time Series Forecasting Model

A guide to building the Ultimate Time Forecast using Python Photo by Marco Brito on Unsplash TIME TRAVEL!!— It is time to take our explorations to the next level and work on predicting outcomes that are dependent on time. Time Series is a sequence of data that is present in intervals of time. These intervals … Read more

4 Pandas Functions For Index Manipulation

Reindex The reindex function can be used to assign a new column or row index to a data frame. We can change the order of columns using the reindex function as follows. df.reindex(columns=[“C”,”A”,”B”,”Date”]) (image by author) If an item in the new index is not present in the current data frame, it is filled with … Read more

Kats: a Generalizable Framework to Analyze Time Series Data in Python

Now let’s analyze the StackOverflow question count related to Python. The data is split into a train and a test set to evaluate the forecast. Start with constructing a time series object. We use time_col_name=’month’ to specify the time column. To plot the data, call the plot method: Image by Author Cool! It looks like … Read more

Frequentist vs Bayesian Statistics

A Practical Introduction to Parameter Estimation with Python Parameter estimation is a critical component of statistical inference, data science, and machine learning workflows. Though this topic can be complex, we offer an introduction to the process comparing two approaches with some theory and some code. Credits: https://unsplash.com/@cgbriggs19 Whether it be an outcome that is yet … Read more

Keep your Notebooks Consistent with JupyterLab Templates

Create a directory where you will store your notebooks. For instance, my templates are stored here: ~/.jupyter/templates Create the following file (if it does not yet exist) ~/.jupyter/jupyter_notebook_config.py Add the following line to this file. This tells jupyterLab the full path to your template directory. This must be the full path, do not use the … Read more

Python is Perfect — Why Anti-Python Developers Won’t Give Up.

To start off, here’s a quora post that made me chuckle. Source: Quora Apparently, some people don’t and won’t take Python as seriously as it should be. Interesting! Anyway, I personally think that what is preventing Python from mass adoption is the mindset with which people approach programming. In other words, the problem is not … Read more

Chi-square Test for Independence

Use of pingouin library for Chi-square analysis implementation Image from unsplash Introduction Data scientists sometimes need to examine if one categorical variable is related to another one in the same population. If the data is continuous, one can simply calculate the correlation between the variables and determine if those are highly correlated depending on the … Read more

Exploring Disruptions in Real-Estate During Covid Using PyCaret’s Rapid Modeling Pipeline

The real-estate market has had a wild year, as indicated by this plot I made of Case-Shiller housing price indices from a bunch of different markets. Check out that spike beginning about halfway into 2020! Image by Author PyCaret is an excellent library for using machine learning to explore disruptions in business markets such as … Read more

Scrape Data from PDF Files Using Python

You want to make friends with tabula-py and Pandas Data science professionals are dealing with data in all shapes and forms. Data could be stored in popular SQL databases, such as PostgreSQL, MySQL, or am old-fashioned excel spreadsheet. Sometimes, data might also be saved in an unconventional format, such as PDF. In this article, I … Read more

Regex essential for NLP

In NLP we must preprocess our text according to the task at hand. In some cases, we need to remove all punctuation in a string but during a task like sentiment analysis, it is important to hold onto some punctuations like ‘!’ which express strong sentiment. If we want to remove all punctuation we can … Read more

How to build a Digital Twin

Python implementation of a digital twin for a Li-ion battery Image by Pedro Figueras – Pexels. Illustration by Javier Marin. In this tutorial, we will show how to create a simple but functional Digital Twin in Python. A Li-ion battery will be our physical asset. This Digital Twin will enable us to analyze and predict … Read more

Creating Joy Plots Using JoyPy

Using JoyPy for creating series of Stacked Histograms as Joy Plots Source: https://scrapbox.io/pycoaj/JoyPy Visualization is a core part of finding insights and can be used for storytelling. While creating visualization we need to think about which plot to use, which features to consider, what story will be coming out, or finding root cause analysis. Have … Read more

Forms, Files, Static and Templates in FastAPI

Note how we create a Jinja2Templates instance by passing in the directory name (templates) and how we mount the static directory by providing the path and name inside the app.mount() function. Because we want to serve the form.html file we use an app.get decorator. Make sure to set the response_class parameter equal to HTMLResponse. It … Read more

Solving The Probabilistic Deutsch and Jozsa Quantum Algorithm

Concluding a series of posts on the famous Deutsch and Jozsa quantum algorithm Do you want to get started with Quantum Machine Learning? Have a look at Hands-On Quantum Machine Learning With Python. In the previous post, we developed a probabilistic version of Deutsch and Jozsa’s quantum algorithm. Image by author Deutsch and Jozsa’s original … Read more

Shell scripts for Data Science in Python

One lesson I learned the difficult way is that boolean values can be supplied in two ways, either via an argument with type=bool or via an argument with action=”store_true”. Whichever way you choose differentiates its behavior vastly. Let’s look at the following two cases: $ python3 run.py –name Louis –age 27 –alive –debug>>> Namespace(name=’Louis’, age=27, … Read more

Detection of vertebral column pathologies using decision trees

Elements Decision trees are a supervised machine learning algorithm based on a sequence of hierarchical questions. These questions will divide the space into multiple linear spaces to predict the outcome. The response can be discrete (a class) or continuous (a real number). Decision trees are made up of three elements: the nodes, the branches, and … Read more

Create GitHub’s style contributions plot for your Time Series data

Seaborn is a statistical data visualization library in Python. It is based on matplotlib but has some great default themes and plotting options. Creating a heatmap technically is essentially replacing the numbers with colors. To be more precise, it means to plot the data as a color-encoded matrix. Let’s see how we can achieve this … Read more