Geocoding in Python Using the Google Maps API

To continue following this tutorial we will need three Python libraries: pandas, geopy, and googlemaps. If you don’t have them installed, please open “Command Prompt” (on Windows) and install them using the following code: pip install pandaspip install geopypip install googlemaps Import the required libraries: Once the libraries are downloaded, installed, and imported, we can … Read more

Build Your First AWS CDK Project

Infrastructure as Code made easy with Python Image from Unsplash by Tezos A while ago I wrote an article about AWS CloudFormation, an Infrastructure as Code tool that makes it easy to define and deploy infrastructure resources in a central yaml or json file. This makes it easy on developers to track and monitor any … Read more

Replace Python Lists and Make Your Code Orders of Magnitude Faster!

No fancy libraries and no complicated engineering, just one data structure — hash tables. Image by author Every beginner programmer loves for-loops because of their utility and how easy they are to understand. Similarly, everyone loves arrays. However more often than not, we start using arrays for everything without giving it a second thought. We … Read more

ANSYS in a Python Web App, Part 1: Post Processing with PyDPF

The DPF in PyDPF stands for Data Processing Framework and (according to the PyAnsys Documentation) is dedicated to post-processing: DPF is a workflow-based framework which allows simple and/or complex evaluations by chaining operators. The data in DPF is defined based on physics agnostic mathematical quantities described in a self-sufficient entity called field. This allows DPF … Read more

Fuzzy String Matching using Python

The simple ratio approach from the fuzzywuzzy library computes the standard Levenshtein distance similarity ratio between two strings which is the process for fuzzy string matching using Python. Let’s say we have two words that are very similar to each other (with some misspelling): Airport and Airprot. By just looking at these, we can tell … Read more

How to Explain Decision Trees’ Predictions

We develop an approach to explain why a learned tree model chooses a certain class for a given sample, providing examples in Python A decision tree model learned from the Iris dataset, in which flowers are classified into three different Iris species: Setosa, Versicolor and Virginica. The plot is explained below in “Visualizing the Learned … Read more

Combating the curse of dimensionality

Feature selection with a variance threshold As the name suggests, variance represents the variability in a dataset indicating how much spread out the distribution is. If the variance of a feature is very low, the feature is not as important. Let’s suppose we have a dog health dataset with different features indicating a dog’s weight, … Read more

Finding distance between two latitudes and longitudes in Python

Earth’s equatorial radius is 6378 km and polar radius is 6356 km so earth is not a perfect sphere. However, assuming spherical earth enables us to easily find approximate distances which is satisfactory in some applications. In this section, we will use the haversine formula to find the spherical distance between two locations from their … Read more

The Fourier Transform (4): Putting the FFT to work

So far, we’ve treated the Fourier transform as a mathematical black box operation. Similarly, we’ll now introduce the inverse Fourier transform without dissecting the implementation details. The inverse Fourier transform (IFFT) lets us reverse the FFT! The inverse Fourier transform is the mathematical operation that maps our function in the frequency domain to a function … Read more

Optimizing Patient Scheduling

Transforming Mixed Integer Programming into Python Mathematically, this is an NP-hard problem that uses Mixed Integer Programming. Integer programming attempts to find optimal arrangements for given variables that satisfy constraints (really functions) while also either maximizing or minimizing some objective function. In our case, our objective function is to shorten the total workday. I’ll also … Read more

6 Common Mistakes Machine Learning Beginners Make and How to Avoid Them

Mistakes I’ve made on my journey and how you can avoid being like me when starting out Photo from Unsplash by Lala Azizli Machine learning is a hot topic that has been growing rapidly in popularity. It’s easy to understand why: AI and machine learning are taking over! However, it can be overwhelming for those … Read more

Train a neural net to predict continuous properties from an image in 40 lines of code with PyTorch.

Now that we can load our data, its time to load the neural net: device = torch.device(‘cuda’) if torch.cuda.is_available() else torch.device(‘cpu’)Net = torchvision.models.resnet18(pretrained=True) # Load netNet.fc = torch.nn.Linear(in_features=512, out_features=1, bias=True)Net = Net.to(device)optimizer = torch.optim.Adam(params=Net.parameters(),lr=Learning_Rate) The first part is identifying whether the computer has GPU or CPU. If there is Cuda GPU the training will be … Read more

Deep Feed Forward Neural Networks and the Advantage of ReLU Activation Function

It’s now time to have some fun and develop our own Deep Neural Network capable of recognizing MNIST digits. Setup We’ll need the following data and libraries: Let’s import all the libraries: The above code prints package versions used in this example: Tensorflow/Keras: 2.7.0pandas: 1.3.4numpy: 1.21.4sklearn: 1.0.1matplotlib: 3.5.1 Next, we ingest MNIST handwritten digits data … Read more

Reshaping a DataFrame using Pandas melt()

With the knowledge we have learned so far, let’s take a look at a real-world problem: the COVID-19 time-series data available from Johns Hopkins University CSSE Github. (Image by author) There are two problems: confirmed, deaths and recovered are kept in different CSV files. That is not straightforward to plot them in one graph. Dates … Read more

Merit Order and Marginal Abatement Cost Curve in Python

An electricity authority or a utility in a country could have competing power plants of different types in its portfolio to offer electricity output to retailers. This is known as a wholesale electricity market (also known as spot market). The cost of generating electricity differs according to the type of power plant. For example, there … Read more

Cosine Similarity Explained using Python — Machine Learning — PyShark

Cosine similarity is a measure of similarity between two non-zero vectors. It is calculated as the angle between these vectors (which is also the same as their inner product). Well that sounded like a lot of technical information that may be new or difficult to the learner. We will break it down by part along … Read more

How To Build An AutoML API

Simple guide to building reusable ML classes Image from Unsplash by Scott Graham There’s been a lot of interest in AutoML recently. Ranging from open-source projects to scalable algorithms in the cloud, there’s been a surge in projects that make ML more accessible for non-technical users. Examples of AutoML in the Cloud includes SageMaker Canvas … Read more

An End-to-End Machine Learning Project — Heart Failure Prediction Part 1

Data exploration, model training, validation and storage In this series, I will be walking through an end-to-end machine learning project which will cover everything from data exploration to model deployment via a web application. My goal is to provide general insight into the different components involved in getting a model to production; this series is … Read more

10 Features Your Streamlit ML App Can’t Do Without — Implemented

Add Jupyter lab, session managment, multi-page, files explorer, conda envs, parallel processing, and deployment to your app Much has been written about Streamlit killer data apps, and it is no surprise to see Streamlit is the fastest growing platform in this field: Image by Star history, edited by author However, developing an object segmentation app … Read more

Intro to Comparing and Analyzing Multiple Unevenly Spaced Time-Series Signals

Methods to analyze multiple time-series signals that occur over the same time period but have different timestamps and time spacings Photo by Nathan Dumlao on Unsplash Say we have the following scenario — we have two different sensors that are measuring the current and voltage across a battery pack. Now, we want to do some … Read more

Bayesian Linear Regression with Bambi

Leverage Bayesian inference to get a distribution of your predictions When fitting a regression line to sample data, you might get a regression line like below: Image by Author Instead of getting one single regression line, wouldn’t it be nice if you can get a distribution of predictions instead? Image by Author That is when … Read more

8 Guidelines to Create Professional Data Science Notebooks

Sometimes during the analysis, you add code to cells and execute them, and then, after that, you modify and run another cell that comes before them. This may obviously cause some inconsistencies. For example, using variables defined in cells below the current cell will produce errors. See the straightforward example below, where we create a … Read more

Forecasting Chess Elo On A Time Series

Using the Glicko rating system to make prediction on your future chess rating. Photo by Hassan Pasha on Unsplash Not long ago, I’ve come across this video[1] by 1littlecoder showing how you can use berserk, the Python client for the Lichess API, to extract information on your chess games. As a regular player on Lichess, … Read more

How to Deploy Machine Learning Models

The easiest way to deploy machine learning models on the web Introduction I will introduce the easiest way to deploy machine learning applications on the web. In the previous notebooks, I have built machine learning models using linear and tree-based models. It turned out that hyper tuned XGboost model performed best. That is why today … Read more

Nearest Neighbor Analysis for Large Datasets in Helsinki Region

In the last few months, I have been part of the well-known Automating GIS course at the University of Helsinki as a Research Assistant. My experience has been remarkable while giving my tips for automating GIS processes to students during their tasks. I am glad to see how geographers are taking over the GIS automation … Read more

Reinventing adversarial machine learning: adversarial ML from scratch

Bear with me! I think this might be a half-decent motivation! I want to explain why I think adversarial ML is so interesting. To give it context, let’s start with a ludicrous party question: is a Pop-Tart a ravioli? … The metaphorical question Let’s unpack why the question makes for a fun debate among friends. … Read more

5 Advanced Tips on Python Functions

Notes from Fluent Python by Luciano Ramalho (Chapter 5–6). Did you learn to code in Java, then moved to python? If started with OOP but now work in python, this post is for you. Photo by Michele Purin on Unsplash In chapters 5–6 of Fluent Python, Luciano Ramalho discusses how traditional object-oriented paradigms are not … Read more

5 Examples to Learn Date and Time Manipulation with Python Pandas

5. Difference The difference between two dates or times can be of great importance in some tasks. For instance, we might need to calculate the time between consecutive measurements in a process. The subtraction operation with two datetime objects gives us the difference in days. df[“Diff”] = df[“Date2”] – df[“Date”] (image by author) The data … Read more

Introducing Anomaly/Outlier Detection in Python with PyOD

Anomaly detection goes under many names; outlier detection, outlier analysis, anomaly analysis, and novelty detection. A concise description from Wikipedia describes anomaly detection as follows: Anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Let‘s try to unpack the above statements. … Read more

You Probably Do Not Need that for loop

How a declarative mindset will help you write much better data science code Whenever I see a for loop in a piece of data science Python code, my first response is “that is probably not needed”. The for loop however is just one example of a more deep philosophical difference between a more traditional imperative … Read more

Plotly for Hierarchical Data Visualization: Treemaps and More

The treemap we just created looks good! It seems pretty straightforward and convenient to use Plotly to create a treemap with just a few lines of code! Notice that in the treemap above, some industry sectors are squeezed into really small rectangles and the labels are illegible because of the size of the rectangles. Let’s … Read more

Books for Learning NLP

Photo by Laura Kapfer on Unsplash A former colleague recently reached out and asked for advice on how best to learn natural language processing (NLP) from scratch. “EXCELLENT!” I thought; I had already answered the exact same question for another former colleague so it should have been as simple as dusting off the previous reply, … Read more

A beginner’s guide to OCTIS vol. 2: Optimizing Topic Models

Photo by Joel Filipe on Unsplash In a previous post, I introduced the Python package OCTIS (Optimizing and Comparing Topic Models Is Simple); I demonstrated how to get started and its features. The package allows for simple topic model optimization and comparison (as the name suggests). This post focuses on the first letter of the … Read more

The Fourier Transform (3): Magnitude and phase encoding in complex data

Complex number basics Later in this series, we’ll get a touch more technical in our treatment of complex numbers. For now, though, we only need to recognize complex numbers as composites of two parts: a real component and an imaginary component. We’ll come back to some of the magical properties of complex numbers (and introduce … Read more

How to Write NumPy Arrays to CSV Files

This post explains how to write NumPy arrays to CSV files. We will look at: the syntax for writing different NumPy arrays to CSV the limitations of writing NumPy arrays to CSV alternative ways to save NumPy arrays Let’s get to it. You can use the np.savetxt() method to save your Numpy array to a … Read more

Deploying Docker Containerised ML Models on AWS Elastic Beanstalk

Now we are all set-up, let’s get coding. In general, here are the steps we will be taking to deploy our model on AWS. Train a RandomForest Classifier. Build a simple Flask app with exposed API endpoint. Containerise our application using Docker containers. Deploy the containerised application on AWS Elastic Beanstalk. Serving our model as … Read more