Build A Text Recommendation System with Python

Use NLP semantic similarity to provide the most accurate recommendations Denise Jans Natural Language Processing is one of the most exciting fields of Machine Learning. It enables our computer to understand very dense corpus, analyze them, and provide us the information we are looking for. In this article, we’ll create a recommendation system that acts … Read more

Explore and understand your data with a network of significant associations.

Lets continue with the Titanic dataset as it contains a structure that is often seen in real use cases, i.e., the presence of categorical, boolean, and continuous variables per sample. In the previous step we initialized and loaded the Titanic dataset. In this step we will pre-process the 12 input features; typing and one-hot encoding. … Read more

How to do “Limitless” Math in Python

Sounds like a catchy title? Well, what we really meant by that term is arbitrary-precision computation i.e. breaking away from the restriction of 32-bit or 64-bit arithmetic that we are normally familiar with. Here is a quick example. This is what you will get as the value for the square-root of 2 if you just … Read more

If You Can Write Functions, You Can Use Dask

I’ve been chatting with many data scientists who’ve heard of Dask, the Python framework for distributed computing, but don’t know where to start. They know that Dask can probably speed up many of there workflows by having them run in parallel across a cluster of machines, but the task of learning a whole new methodology … Read more

Joining Pandas DataFrames

Learn how to merge Pandas Dataframes easily Photo by CHUTTERSNAP on Unsplash Very often, your data comes from different sources. In order to help with your analytics, you often need to combine data from different sources so that you can obtain the data you need. In this article, I will talk about how you can … Read more

Raspberry Pi Gardening: Monitoring a Vegetable Garden using a Raspberry Pi — Part 2: 3D Printing

Part 2 of throwing Raspberry Pis at the pepper plants in my garden: On the topics of 3D printing, more bad solder jobs, I2C, SPI, Python, go, SQL, and failures in CAD A journey through the project (by author) In the last iteration of this project, I walked you through my journey of throwing a … Read more

Fast AutoML with FLAML + Ray Tune

Microsoft Researchers have developed FLAML (Fast Lightweight AutoML) which can now utilize Ray Tune for distributed hyperparameter tuning to scale up FLAML’s resource-efficient & easily parallelizable algorithms across a cluster One of FLAML’s algorithms CFO tuning the # of leaves and the # of trees for XGBoost. The two heatmaps show the loss and cost … Read more

Data Visualization In Excel Using Python

Using ExcelWriter for Creating Visualizations in Excel by Python Code Source: By Author Excel is widely used for data analysis and has a lot of functionalities for analyzing, manipulating, visualizing, etc. Using excel should be one of the main skills required for a Data Analyst, Product Analyst, and Business Analyst. It helps in understanding the … Read more

Hypothesis Testing Made Easy through the easy-ht Python Package

Statistics Which Hypothesis Test should you use? Pearson or Spearman? T-Test or Z-Test? Chi Square? No problem with easy-ht. Photo by Edge2Edge Media on Unsplash One of the main difficulties that a new data scientist may encounter regards statistics basics. In particular, it may be difficult for a data scientist to understand which hypothesis test … Read more

Tired of JupyterLab? Try DataSpell — A New Amazing IDE for Data Science

Disclaimer: This is not a sponsored article. I don’t have any affiliation with DataSpell or its creators. The article shows an unbiased overview of the IDE, intending to make data science tools accessible to the broader masses. The data science IDE market isn’t all that saturated. You have Jupyter for maximum interactivity on the one … Read more

The 7 Best Ways to Learn Python Depending On Your Extremely Specific Circumstance

Read this if you’re slightly overwhelmed by the number of Python-learning options out there Photo by Kamil Zubrzycki from Pexels. Everyone wants to know the best way to learn to code Python nowadays. It’s a great language as I’ve written about (extensively) before, with great career prospects and tons of useful features. For as many … Read more

Neural Network for input of variable length using Tensorflow TimeDistributed wrapper

Guide on how to deal with the case in which we have inputs (usually signals) of variable length, using the Tensorflow TimeDistributed wrapper Why variable input length? Tensorflow Timedistributed Wrapper Data Generator References Have you ever wanted to apply a neural network to your dataset, but the data (signals, time series, texts, etc.) had a … Read more

Python Lambda Functions: Three Practical Examples (Sort, Map, and Apply)

Learn lambda functions through real examples Photo by Tolga Ulkan on Unsplash One advanced Python concept is lambda functions, which are anonymous functions that are defined using the lambda keyword. Lambda functions have the following basic form: lambda params: expression The keyword lambda signifies that you are defining a lambda function. params refers to the … Read more

Creating Charts in Google Slides with Python

Impress your audience leveraging Google’s API and the gslides package A common gap data scientists run up against is how to programmatically create simple, elegantly formatted and company-branded visualizations in a slide deck. Leveraging Google APIs and the package gslides you can easily create charts and tables in Google Slides that will impress your audience, … Read more

Huffman Encoding & Python Implementation

An old but efficient compression technique with Python Implementation Huffman Encoding is a Lossless Compression Algorithm used to compress the data. It is an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper “A Method for the Construction of Minimum-Redundancy Codes”. [1] As it … Read more

PyQt & Relational Databases

Easy to use full-featured widget for working with relational database data. Photo by Sigmund on Unsplash Python is an easy-to-learn and powerful programming language. You can get sophisticated outputs, and compared to other languages, you’ll need to write significantly fewer lines of code. However, when it comes to GUI application development, you can experience difficulties. … Read more

Creating a Community-Specific Reputation Score for Users of Web3 Platform

Using Graph Data Science to compute a reputation score (betweenness centrality) on Mirror user interactions across Twitter, Governance, and Ethereum transactions This post was first published on, be sure to subscribe there and follow me on twitter to get my most up-to-date crypto and data science content. Later on, this methodology was used for … Read more

AB testing with Python

1. Designing our experiment Formulating a hypothesis First things first, we want to make sure we formulate a hypothesis at the start of our project. This will make sure our interpretation of the results is correct as well as rigorous. Given we don’t know if the new design will perform better or worse (or the … Read more

A guide to XGBoost hyperparameters

What is the one machine learning algorithm — if you ask — that consistently gives superior performance in regression and classification? XGBoost it is. It is arguably the most powerful algorithm and is increasingly being used in all industries and in all problem domains —from customer analytics and sales prediction to fraud detection and credit … Read more

Writing your First Distributed Python Application with Ray

Ray makes parallel and distributed computing work more like you would hope (image source) Ray is a fast, simple distributed execution framework that makes it easy to scale your applications and to leverage state of the art machine learning libraries. Using Ray, you can take Python code that runs sequentially and transform it into a … Read more

How to Produce an Animated Bar Plot in Plotly using Python

Wrangle your raw dataset to produce an Animated Bar Plot Image Courtesy of Author, Stephen Fordham Plotting Antibiotic prescribing rates in US counties This tutorial details how to transform raw data into an animated barplot using the Plotly library in Python. The dataset used in this tutorial is titled: ‘Potentially Avoidable Antibiotic Prescribing observed and … Read more

5 Must-Know SQL Functions For Manipulating Dates

2. Dateadd The name of this function is even more self-explanatory than the previous one. The dateadd function is used for adding a time or date interval to a date. As always, the syntax is easier to understand with examples. DECLARE @mydate DateSET @mydate = GETDATE()SELECT DATEADD(MONTH, 1, @mydate) AS NextMonthNextMonth2021-09-22 The first parameter indicates … Read more

Isomap Embedding — An Awesome Approach to Non-linear Dimensionality Reduction

As you can see, Isomap is an Unsupervised Machine Learning technique aimed at Dimensionality Reduction. It differs from a few other techniques in the same category by using a non-linear approach to dimensionality reduction instead of linear mappings used by algorithms such as PCA. We will see how linear vs. non-linear approaches differ in the … Read more

Stop One-Hot Encoding your Time-based Features

Essential guide to feature transformation for cyclic features Image by Sarah Lötscher from Pixabay Feature Engineering is an essential component of the data science model development pipeline. A data scientist spends most of the time analyzing and preparing features to train a robust model. A raw dataset consists of various types of features including categorical, … Read more

Let’s Talk About Graph Neural Network Python Libraries!

Visualization of Karate Club data with the two factions (Source: Author) Since I want to keep it simple, I will use the popular Zachary’s Karate Club graph dataset. Here, the nodes represent 34 students who were involved in the club and the links represent 78 different interactions between pairs of members outside the club. There … Read more

Three Tricks on Python Functions that You Should Know

Python Basics A quick overview of some tips which may improve your programming skills: nested functions, variable parameters and lambda functions. Photo by Shahadat Rahman on Unsplash This tutorial covers the following three advanced programming tricks on Python functions: nested functions variable parameters lambda functions A nested function is a function within another function. Due … Read more

How to Use Monte Carlo Simulation to Help Decision Making

Using Monte Carlo Simulation to Make Real Life Decisions Source: Recently, I was faced with a very difficult decision to make. I had to choose between various job offers that were all interesting, for different reasons. After a couple of sleepless nights, I realized one thing: why not use the tools at my disposal … Read more

Building a Sentiment Classifier using spaCy 3.0 Transformers

Train the model Next step, to train the model. In version 3.0, spaCy provides a command-line interface to perform training. For that, we need to download the base configuration file from this site. Before downloading, we need to select textcat under components as this is a classification problem. I selected hardware GPU as I used … Read more

Setting up Conda to Run PyData Stack on your Apple M1 Silicon Machine

The Apple M1 silicon chip is a fantastic innovation: it’s low-power, high-efficiency, has great battery life, and is inexpensive. However, if you’re running the PyData Stack with default builds, you may encounter some strange behaviour. When you install a library written for Intel CPU architecture, the code will be run through the Rosetta emulator. This … Read more

TensorFlow Decision Forests — Train your favorite tree-based models using Keras

Photo by veeterzy on Unsplash Yes, you read that right — the same API for both Neural Networks and tree-based models! In this article, I will briefly describe what decision forests are and how to train tree-based models (such as Random Forest or Gradient Boosted Trees) using the same Keras API as you would normally … Read more

When You Should Not Use Accuracy to Evaluate Your Machine Learning Model

and what the alternatives are. Photo by Michal Matlon on Unsplash Creating a machine learning model is an iterative process. You will need to do several iterations to have a robust and decent model. Furthermore, you may need to update a model after it is deployed into production. A significant part in this process is … Read more

Reading Python Encrypted Data in Node.js

Within NodeJS we use the ‘crypto’ library. This can be installed globally with npm i g crypto. With this, there is a range of encryption algorithms available. In this example, we have chosen the AES-256-CBC (Cipher Blocker Chaining) block cypher encryption — a symmetric encryption algorithm which means that the same key can be used … Read more

An Easy Beginner’s Guide to Git Part 2

Basics of Git for Reviewing Code and Undoing Changes Photo by Roman Synkevych on Unsplash Now that you’ve learned the basics on how to use Git from my article An Easy Beginner’s Guide to Git Part 1, let’s do some more exercises to become comfortable with using and navigating Git. You’ve seen how Git can … Read more

Gradient Boosted Decision Trees explained with a real-life example and some Python code

Gradient Boosting algorithms tackle one of the biggest problems in Machine Learning: bias. Decision Trees is a simple and flexible algorithm. So simple to the point it can underfit the data. An underfit Decision Tree has low depth, meaning it splits the dataset only a few of times in an attempt to separate the data. … Read more

Still using the average?

One of the most straightforward methods of assessing robustness is to compare the so-called standard error (SE) for various estimators. The SE is a measure of how much an estimator varies across random samples from our population. If our estimator varies wildly from sample to sample, it has a high standard error. On the other … Read more

Create a heatmap from the logs of your activity tracker

How to import the data from your apps and devices and create a heatmap from GPX files with Python. Heatmap (Image by author) I have 7 years’ worth of recorded walking activity on my computer. Over all these years these have been collected with several devices and apps, from stand-alone GPS-receiver, through SportsTracker to Garmin. … Read more

Python But It’s Weird

Code snippets that will question your Python skills Photo by Vinicius “amnx” Amano on Unsplash We all love Python! After using this language for a long time, if you go deeper into concepts, you will be amazed by the modularity of this language. In my experience, I find it easy to implement most of my … Read more