3 (and Half) Powerful Tricks To Effectively Read CSV Data In Python

The parameter usecols in pandas.read_csv() is extremely useful to load only the specific columns from the csv data set. Here is the direct comparison of the time taken by read_csv() with and without usecols pandas.read_csv() usecols | Image by Author Importing .csv file to pandas DataFrame using usecols is ⚡️ 2.4X ⚡️ faster than importing … Read more

Learn Plotly for Advanced Python Visualization: A Use Case Approach

In order to add customizations such as cluster colors, bubble sizes, and hover-over tips, we need to first add three new columns to our data frame that assign these ‘customization parameters’ to each data point. The following code will add a new column called ‘color’ to the data frame. We first define a function called … Read more

Defining the Moving Average Model for Time Series Forecasting in Python

Explore the moving average model and discover how we can use the ACF plot to identify the right MA(q) model for our time series Photo by Pawel Czerwinski on Unsplash One of the foundational models for time series forecasting is the moving average model, denoted as MA(q). This is one of the basic statistical models … Read more

Introduction to Applied Linear Algebra: Norms & Distances

Photo of Yan Krukov from Pexels Goal: This article gives an introduction to vector norms, vector distances and their application in the field of data science Why you should learn it: Vector norms and distances are used to describe attributes of vectors and the relationship of different vectors to each other. It is widely used … Read more

Self-Training Classifier: How to Make Any Algorithm Behave Like a Semi-Supervised One

You may think that Self-Training involves some magic or uses a highly complex approach. In reality, though, the idea behind Self-Training is very straightforward and can be explained by the following steps: First, we gather all labeled and unlabeled data, but we only use labeled observations to train our first supervised model. Then we use … Read more

How to Build a Poisson Hidden Markov Model Using Python and Statsmodels

Manufacturing strikes in the United States plotted against time (Data source: R data sets) (Image by Author) A step-by-step tutorial to get up and running with the Poisson HMM A Poisson Hidden Markov Model is a mixture of two regression models: A Poisson regression model which is visible and a Markov model which is ‘hidden’. … Read more

Normalization, Standardization and Normal Distribution

I will start this post with a statement: normalization and standardization will not change the distribution of your data. In other words, if your variable is not normally distributed, it won’t be turn into one with the normalize method. normalize() or StandardScaler() from sklearn won’t change the shape of your data. Standardization Standardization can be … Read more

Examples of Multi-Cursor for working with Data

How to save time and nerves when coding for data analysis in VS Code using Multi-Cursor and selection features Doing multiple thing at once — Photo by Matt Bero at unsplash Working with data can be very dynamic with repeated forward and backward motions through your code to adjust and copy snippets, introducing new assumptions, … Read more

An Efficient Hybrid Algorithm to Solve Nonlinear Least Squares Problems

Hands-on Tutorials When Levenberg-Marquardt meets Quasi-Newton. And yes, we build it from scratch with python! Photo by Jeremy Bishop on Unsplash In previous articles, we’ve seen Gradient Descent and Conjugate Gradient algorithm in action, as two of the simplest optimization method there is. We implemented line search for searching the direction to which the objective … Read more

Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames

An End-To-End SQL Interface for Data Science and Analytics As a data scientist, you might be familiar with both Pandas and SQL. However, there might be some queries, transformations that you feel comfortable doing in SQL instead of Python. Wouldn’t it be nice if you can query a pandas DataFrame like below: … using SQL? … Read more

A Great Python Library: Great Expectations

The id column should always be unique and duplicate id values might have severe consequences. We can easily check for the uniqueness of the values in this column. df.expect_column_values_to_be_unique(column=”id”)# output{“meta”: {},”result”: {“element_count”: 1000,”missing_count”: 0,”missing_percent”: 0.0,”unexpected_count”: 0,”unexpected_percent”: 0.0,”unexpected_percent_total”: 0.0,”unexpected_percent_nonmissing”: 0.0,”partial_unexpected_list”: []},”success”: true,”exception_info”: {“raised_exception”: false,”exception_traceback”: null,”exception_message”: null}} The functions of the Great Expectations library return a json … Read more

Why You Should Use Callbacks in TensorFlow 2

Customize your training of deep neural networks – a practical guide Photo by John Schnobrich on Unsplash Callbacks are essential when you want to control the training of a model. And you do want to control the training… Callbacks help us prevent overfitting, visualize our training progress, save checkpoints and much more. But why TensorFlow? … Read more

Augmented Assignments in Python

How augmented assignments work with mutable objects When augmented assignment expressions are used, the most optimal operation will be automatically picked up. This means that for specific object types supporting in-place modifications, then an in-place operation will be applied since it is faster than first creating a copy and then the assignment. If you want … Read more

Graph Machine Learning with Python Part 2: Random Graphs and Diffusion Models of CryptoPunks…

In part 1, we discussed Network Basics, Network Connectivity, Network Distance, Network Clustering, and Network Degree Distributions. If you’re just starting out, I’d recommend starting there first: https://towardsdatascience.com/graph-machine-learning-with-python-pt-1-basics-metrics-and-algorithms-cc40972de113 In this second part, I’ll take a deeper dive into how we can reason about networks and start to model them via random graphs, diffusion models, simulations, … Read more

How to Easily Cluster Textual Data in Python

With this method, you’ll never have to manually cluster survey answers ever again Photo by Pawel Czerwinski on Unsplash Text data is notoriously annoying, I really don’t enjoy working with it. Especially survey data — whose bright idea was it to let people type whatever they want? In most research companies, some poor person will … Read more

Bias-Variance Tradeoff in Time Series

1️⃣ Setup We will use the classical “airline” dataset [6] to demonstrate this in PyCaret. A Jupyter notebook for this article can be found here and also at the end of the article under the “Resources” section. from pycaret.datasets import get_datafrom pycaret.internal.pycaret_experiment import TimeSeriesExperiment#### Get data —-y = get_data(“airline”, verbose=False)#### Setup Experiment —-exp = TimeSeriesExperiment()#### … Read more

Features 101: An Introduction To Analyzing Feature-Sets

For this project, I am going to be using the “cars” data-set from VegaDatasets.jl. using VegaDatasetsdata = dataset(“cars”) We now can wrap this data-set into a data-frame type using the DataFrames.jl module: using DataFramesdf = DataFrame(data)show(df) (image by author) Before we go into looking at these features, let us first consider the different types of … Read more

How To Fix pandas.parser.CParserError: Error tokenizing data

Reproducing the error First, let’s try to reproduce the error using a small dataset I have prepared as part of this tutorial. Example file containing data in inconsistent format — Source: Author Now if we attempt to read in the file using read_csv : import pandas as pddf = pd.read_csv(‘test.txt’) we are going to get … Read more

Introduction to Clustering in Python with PyCaret

A step-by-step, beginner-friendly tutorial for unsupervised clustering tasks in Python using PyCaret Photo by Paola Galimberti on Unsplash PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive. … Read more

Data Science in Marketing: A Beginner’s Guide

Marketing Analytics with Python — Datacamp If you have already taken a beginner level data science course and have a basic understanding of machine learning models, you can consider taking this Datacamp track. It consists of 7 courses, and takes you through concepts like analyzing marketing campaigns with Python, sentiment analysis, customer churn prediction, market … Read more

Hello! I am PAMI

A new Pattern Mining Python library for Data Science Figure 1: Broad classification of learning algorithms across Artificial Intelligence, Data Mining, Machine Learning, and Deep Learning Big Data Analytics represents the set of techniques to discover knowledge hidden in large databases. These techniques can be broadly classified into four types: Pattern mining — aims to … Read more

Understand Q-Q plot using simple python

An effective way to visualize data Image by Author Introduction Engineers and scientists work with data. Without data, they are not able to draw any conclusion. Now is the era of creation of data everyday from every aspects of our lives. Some data are random and some are biased. Some may suffer from bias because … Read more

Why I Chose the MacBook Air over the MacBook Pro as a Data Scientist

Note: There are other differences between the Air and Pro that I didn’t include in the table above. Some of them might influence your decision. I’ll discuss those differences later in the article. MacBook Air summary: The cheapest laptop in Apple’s lineup (although starting at $999, it’s still pretty expensive). It’s also the smallest and … Read more

Five Unexpected Behaviours of Python Could Be Surprised

Some cold knowledge about Python you need to know Every programming language may have some interesting facts or mysterious behaviours, so does Python. In fact, as a dynamic programming language, there are even more interesting behaviours in Python. I would bet most of the developers may never experience one of these scenarios because most of … Read more

Exploring stacks and queues

Supercharge your programs with two highly useful tools Photo by Nathan Dumlao on Unsplash In our last post, we covered data structures, or the ways that programming languages store data in memory. We touched upon abstract data types, theoretical entities that are implemented via data structures. The concept of a “vehicle” can be viewed as … Read more

How to Benefit from the Semi-Supervised Learning with Label Spreading Algorithm

If you are already familiar with the Label Propagation algorithm, you may want to know about the two ways that Label Spreading differs from it. If you are not familiar with Label Propagation, then feel free to skip to the next section. Symmetric normalized Laplacian vs. random walk normalized Laplacian The Label Spreading algorithm uses … Read more

4 Python Pandas Functions That Serve Better With Dictionaries

Pandas is arguably the most popular data analysis and manipulation library in the data science ecosystem. First and foremost, it is easy to learn and offers an intuitive syntax. With that being said, we will focus on a different great feature of Pandas: Flexibility. The capabilities of Pandas functions can be extended by using the … Read more

Adiabatic Quantum Computation 1: Foundations and the Adiabatic Theorem

The lesser known type of quantum computers that are easier to build, easier to understand, and (maybe) equally as powerful. I’ve just completed my thesis for Honours at the Australian National University, which proposed how diamonds could be used as adiabatic quantum computers. During this project however, I realised that there are few people with … Read more

Bayesian Inference and Markov Chain Monte Carlo Sampling in Python

An introduction to using Bayesian Inference and MCMC sampling methods to predict the distribution of unknown parameters through an in-depth coin-flip example implemented in Python. Image from Adobe Stock This article extrapolates a basic coin-flip example into a larger context in which we can examine the use and power of Bayesian Inference and Markov Chain … Read more

TensorFlow for Computer Vision — How to Implement Pooling From Scratch in Python

Much better — we now had only four pools to work with, and we got rid of half the pixels in height and width. Next, let’s see how to implement the pooling logic from scratch in Python. Now the fun part begins. Let’s start by importing Numpy and declaring the matrix from the previous section: … Read more

Introduction to Binary Classification with PyCaret

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive. In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be … Read more

Speed up Linear Regression with Matrix Math

Use Numpy and Linear Algebra to fit multiple regression models Photo by Jan Huber on Unsplash Linear Regression is an extremely popular and useful model. It’s used by Excel Gurus and Data Scientists alike — but how can we fit lots of regression models quickly? This article walks through various ways to fit a linear … Read more

The Subsets (Powerset) of a Set in Python 3

Looking at recursive, iterative, and other implementations to compare their performance The first time I thought of this problem was when I worked on testing a component on a work-related project. Back then, I’ve realized that for properly testing the component, I should generate what seemed to be 2ⁿ distinct cases (n being the number … Read more

A Python Pandas Introduction to Excel Users

Pandas core concepts you need to know before moving from Excel to Python Pandas Photo by Bruce Hong on Unsplash Pandas is probably the best tool to do real-world data analysis in Python. It allows us to clean data, wrangle data, make visualizations, and more. You can think of Pandas as a supercharged Microsoft Excel. … Read more

Geocoding CSV in Python

An Example of Christmas Markets in Baden Württemberg, Germany 2021 Photo by Daniels Joffe on Unsplash Geocode is the process where you locate the coordinates by inputting the address data across the globe. It is a very basic step in most GeoData analytical processes. This short tutorial gives you insight into how to geocoding in … Read more

Announcement: PyCaret 2.3.5 is here! Learn what’s new?

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive. To learn more about PyCaret, you can check the official website or GitHub. This article demonstrates the use … Read more

How to Perform Real-Time Speech Recognition with Python

Real-time Speech-to-Text using AssemblyAI API AssemblyAI offers a Speech-To-Text API that is built using advanced Artificial Intelligence methods and facilitates transcription of both video and audio files. In today’s guide we are going use this API in order to perform speech recognition at real-time! Now the first thing we need to do is open a … Read more