Practical Machine Learning Basics

My first exploration of Machine learning using the Titanic competition on Kaggle Louis & Lola, survivors of the Titanic disaster (Photo from Library of Congress Prints and Photographs, No known restrictions on publication) This article describes my attempt at the Titanic Machine Learning competition on Kaggle. I have been trying to study Machine Learning but … Read more Practical Machine Learning Basics

Ultimate Pandas Guide — Mastering the Groupby

We can also index with a single column (as opposed to list): sales_data.groupby(‘month’).agg(sum)[‘purchase_amount’] In this case, we get a Series object instead of a DataFrame. I tend to prefer working with DataFrames, so I typically go with the first approach. Now that we have the basics down, let’s go through a few of the more … Read more Ultimate Pandas Guide — Mastering the Groupby

Business Intelligence Visualizations with Python

Installation process is pretty straight forward. Just open your terminal and insert the following command: pip install matplotlib A. Line Plot After having installed the library, we can jump on to plot creation. The first type we’re going to create is a simple Line Plot: # Begin by importing the necessary libraries:import matplotlib.pyplot as plt … Read more Business Intelligence Visualizations with Python

How to Query PostgreSQL using Python (with SSH) in 3 Steps

STEP 3: Query! Now were ready to start querying! The defined class only provides a handful of basic functions. Let’s walk through how to use the class and what we can do with it. First, we’ll need to specify our PostgreSQL connection arguments, and SSH arguments (if SSH tunneling is required to access the remote … Read more How to Query PostgreSQL using Python (with SSH) in 3 Steps

Solving a Social Distancing Problem using Genetic Algorithms

“Social distancing” has become very popular these days but it is not always obvious how the rules can fit our daily life. In this story, we are going to study a social distancing problem and find solution to it using Genetic Algorithms. After setting the problem and its constraints, I’ll summarize the principles of Genetic … Read more Solving a Social Distancing Problem using Genetic Algorithms

Ultimate Pandas Guide — Joining data with Python

Photo by Laura Woodbury from Pexels Master the difference between “Merge” and “Join” Everyone who works in data knows this: before you build machine learning models or produce stunning visualizations, you have to get your hands dirty with data wrangling. And one of the core skills in data wrangling is learning how to join together … Read more Ultimate Pandas Guide — Joining data with Python

A Summer as a Data Scientist

A retrospective on my summer as a data scientist and how GSI Technology’s summer program breaks the internship status quo. GSI Technology. Reposted with Author’s Permission Data science is a field that can be hard to break into, especially if you are an undergraduate student. My name is Braden Riggs and some of you reading … Read more A Summer as a Data Scientist

Logistic Regression for Binary Classification

Supervised Learning Methods in Machine Learning Image from ¹wikicommons In previous articles, I talked about deep learning and the functions used to predict results. In this article, we will use logistic regression to perform binary classification. Binary classification is named this way because it classifies the data into two results. Simply put, the result will … Read more Logistic Regression for Binary Classification

Data Wrangling in Pandas : A Downloadable Cheatsheet

DATA ANALYSIS Turn raw data into functional form Following acquisition of raw data, data wrangling is the most essential step to transform raw data into more functional form for data analysis, model building and data visualization. If involves preprocessing, restructuring and cleaning operations and the end product is a dataset in a readily accessible format, … Read more Data Wrangling in Pandas : A Downloadable Cheatsheet

How to Analyze Emotions and Words of the Lyrics From your Favorite Music Artist

An interesting way of performing Text and Sentiment Analysis to song lyrics using Python. Photo by Gabriel Bassino on Unsplash Music is a powerful language to express our feelings and in many cases is used as a therapy to deal with tough moments in our lives. The different sounds, rhythms, and effects used in music … Read more How to Analyze Emotions and Words of the Lyrics From your Favorite Music Artist

Rewiring Your Brain from Python to Java

Seven conceptual hurdles you might face when learning a new programming language Confession: my personal experience is almost the complete opposite of the title of this article. I actually started with C++ in college, moved to Java to teach AP Computer Science A, and then entered Python territory to work with all of the snazzy … Read more Rewiring Your Brain from Python to Java

Automatic Speech Recognition for the Indian Accent

After you request the dataset, IITM will give you access to their Google Drive links for seven days. Because I needed the data for an extended period, I transferred all the ZIP files to a Google Cloud Bucket. Each ZIP file will have a folder containing the .wav files and the corresponding metadata file named … Read more Automatic Speech Recognition for the Indian Accent

Statistical test for MCAR in python…

Have you wondered if missing “age” is related to the “Salary” of the respondant in a survey? Have you ever thought of analysing associations between various missing values in a dataset? How can you be sure that the absent data are because of no definite pattern? — Answer to these questions is fairly straight forward … Read more Statistical test for MCAR in python…

My Favorite Python Servers To Deploy Into Production

Gunicorn3 is the “ classic” and industry-standard Python production server. Compared to its competitors, the biggest thing that shapes Gunicorn as a separate entity is its ability to manage workers. It is also incredibly light on resources, and that in tandem with worker management means that you can set the priority of certain endpoints and … Read more My Favorite Python Servers To Deploy Into Production

Writing advanced SQL queries in pandas

We will create a small dataset to use. Let’s assume we have two imaginary people’s trip data from the past 2 years: df = pd.DataFrame({‘name’: [‘Ann’, ‘Ann’, ‘Ann’, ‘Bob’, ‘Bob’], ‘destination’: [‘Japan’, ‘Korea’, ‘Switzerland’, ‘USA’, ‘Switzerland’], ‘dep_date’: [‘2019-02-02’, ‘2019-01-01’, ‘2020-01-11’, ‘2019-05-05’, ‘2020-01-11’], ‘duration’: [7, 21, 14, 10, 14]})df Let’s define dep_date as the departure date … Read more Writing advanced SQL queries in pandas

A Concise Guide of 10+ Awesome Python Editors and How To Choose Which Editor Suits You The Best…

1. Thonny Editor Screenshot By Author The Thonny Integrated Development Environment (IDE), which comes pre-installed on the Linux and Linux based Platforms. However, you have to install it manually on the Windows platform. My experience with the Thonny editor is mainly on the Raspberry Pi. It is a great development environment and easy for beginners … Read more A Concise Guide of 10+ Awesome Python Editors and How To Choose Which Editor Suits You The Best…

Serverless Alternative: Executing Python Functions using AWS, Terraform, and Github Actions

Automate the deployment and execution of a Python function without worrying about package size, execution time, or portability Photo by Alex Knight on Unsplash What’s better than Serverless? Serverless is all the buzz these days and for good reason. Serverless is a simple, yet powerful cloud resource to execute function calls without worrying about the … Read more Serverless Alternative: Executing Python Functions using AWS, Terraform, and Github Actions

4 Hidden Gems for Idiomatic Pandas Code

Photo Source: Pexels Sharing more Pandas tips to level up your data manipulation My last article, 6 Lesser-Known Yet Awesome Tricks in Pandas, hopefully, has given you some flavor of efficient coding in Pandas. Continuing with this topic, let’s explore more cool Pandas tips and tricks if you haven’t known them already. This blog will … Read more 4 Hidden Gems for Idiomatic Pandas Code

A three level sentiment classification task using SVM with an imbalanced Twitter dataset

The following is the results obtained from the Support Vector Machine trained with Term Frequency/Inverse Document Frequency vectors with various oversampling techniques to the minority classes. The figure below shows the results for the Support Vector Machine model, trained on unbalanced training data. The overall accuracy of the model here is 60% but looking at … Read more A three level sentiment classification task using SVM with an imbalanced Twitter dataset

Computer Vision on the Edge

Making it All Work We’ve discussed the complexity of developing computer vision applications as well as deploying them; how can we mitigate these challenges and get CV applications to production as quickly as possible? If you look at a platform like alwaysAI, there are a couple key methodologies utilized to reduce the complexity of this … Read more Computer Vision on the Edge

5 tips for data aggregation in pandas

📍 Tip #1: Use crosstab() for multi-variable counts/percentages You are probably already familiar with this series function: value_counts(). Running df[‘day’].value_counts() will give us the counts of unique values in day variable. If we specify normalize=True inside the method, it will give us percentages instead. This is useful for a single variable, but sometimes we need … Read more 5 tips for data aggregation in pandas

Measuring and Calculating Streamflow at the Plitvice Lakes National Park | by Karlo Leskovar

Calculations of the streamflow For the sake of calculating the streamflow at each cross section a simple python script was written. I will briefly explain how it works. First, we import dependencies, in this case, we need numpy and pandas. We define the file path to the excel file with data. As we had several … Read more Measuring and Calculating Streamflow at the Plitvice Lakes National Park | by Karlo Leskovar

Simulating a Turing Machine with Python and executing programs on it

Use python code to make your computer behave like a Turing Machine In this article, we shall implement a basic version of a Turing Machine in python and write a few simple programs to execute them on the Turing machine. This article is inspired by the edX / MITx course Paradox and Infinity and few … Read more Simulating a Turing Machine with Python and executing programs on it

Use Python to Automate Your Excel Work!

Automate those pesky Excel reports! Photo by Carlos Muza on Unsplash In this post, you’ll learn how to automate some of your common, tedious Excel tasks. Chances are, you can’t escape Excel (and it’s probably good you don’t entirely!). Learning how to make working with the application easier and how to automate repetitive tasks, is … Read more Use Python to Automate Your Excel Work!

10 Crazy Cool Project Ideas for Python Developers

Tennis Match — Photo by Moises Alex on Unsplash Betting is an activity where people predict an outcome and if they are right then they receive a reward in return. Now, there are many technological advances that happened in Artificial Intelligence or Machine Learning in the past few years. For example, you might have heard … Read more 10 Crazy Cool Project Ideas for Python Developers

This Function Can Make Your Pandas Code Significantly Faster

We’ll need two libraries for the demonstration, and those are Numpy and Pandas. The dataset is completely made up and shows sales for 3 products on a particular date. Dates are split into 3 columns, just to make everything a bit harder and heavier for the computer. Anyway, here’s the dataset: Nothing special here, but … Read more This Function Can Make Your Pandas Code Significantly Faster

Elegant Geographic plots in Python and R using GeoPandas and Leaflet

We can clearly see that countries in Asia have more population density values especially India, Bangladesh, Korea and Japan. Import libraries We’ll load in the leaflet library for generating plots and the sf library for reading in shapefiles. Import dataset Just like in Python, we’d import in the dataset skipping first 4 rows, remove extra … Read more Elegant Geographic plots in Python and R using GeoPandas and Leaflet

10 Normality Tests-Python (Step-By-Step Guide 2020)

Normality test is used to check if a variable or sample has a normal distribution. Image of Author Before going to talk about Normality test lets first discuss normal distribution and why is it so important? The normal distribution also known as the Gaussian distribution is a probability function that describes how the values of … Read more 10 Normality Tests-Python (Step-By-Step Guide 2020)

Scraping a table in a PDF, reliably and then test data quality

How to scrape a table within a PDF in Python, unit test the data for quality and then upload it to S3. Photo by Tim Mossholder on Unsplash Suppose you need to ingest some data into your data warehouse and after further discussions with your stakeholders the source of this data is a PDF document. … Read more Scraping a table in a PDF, reliably and then test data quality

Enrich your train fold with a custom sampler inside an imblearn pipeline

Where to use augmented data in your process Once you have a set of augmented data to enrich your original data set, you will ask yourself how and at which point to merge them. Typically you are using sklearn and its modules to evaluate your estimator or search for optimal hyper-parameters. Popular modules including RandomizedSearchCV … Read more Enrich your train fold with a custom sampler inside an imblearn pipeline

How to embed your Julia code into Python to speed up performance

Before I demonstrate how to embed Julia code into Python to boost the performance, I want to convince you that there is a value to use Julia, and it can work much faster than Python when using for loops.So, let’s run a simple for loop in Julia and Python and compare their running times: In … Read more How to embed your Julia code into Python to speed up performance

Fundamentals of NumPy

Photo by Emile Perron on Unsplash Over the course of this article, we shall learn the various features and functions of the Python library, NumPy NumPy is one of the Python libraries, that supports multi-dimensional, substantial arrays as well as matrices. It also supports a large collection of mathematical functions to operate on these arrays. … Read more Fundamentals of NumPy

Build and Deploy simple ML Tools by stacking coolest librairies

The first thing to do before going headlong is to prepare your work environment. Virtual Environment Here (as in any project) we will be working with several packages. In order to have total control over the tools you use, it is always recommended to work in a virtual environment. We will use Anaconda, a Python … Read more Build and Deploy simple ML Tools by stacking coolest librairies

How to get Market Data from the NYSE in less than 3 lines (Python).

II. Convert your date in a DateTime format This step is not mandatory if you already have converted your dates previously. However, if you did not, I recommend you to use the lines of codes below: In the line of code above, Python is converting your date from a string format to a DateTime format. … Read more How to get Market Data from the NYSE in less than 3 lines (Python).

Python Bar Chart Race Animation: COVID-19 Cases

Bar Chart Race GIF COVID-19 has crushed many countries for over eight months. Million cases has been confirmed, and the number keeps getting higher everyday. Thanks to Google who has provided the Corona dataset for us, publicly and FREE. It makes it possible for us to do our own analysis related to this pandemic, for … Read more Python Bar Chart Race Animation: COVID-19 Cases

Mapping Census Data

How to access and map population data in Python Photo by Ryan Wilson on Unsplash Today I’m taking a look at the racial composition of Seattle, according to the 2010 Census. Towards this end, I’ll use Integrated Public Use Microdata Series (IPUMS) National Historical Geographic Information System (NHGIS). You can also use data.census.gov, which I … Read more Mapping Census Data

Machine Learning on AWS SageMaker

Before we jump into this, let’s explain what we need to have in place — I’ll be quick, promise! Setup preparation Amazon S3 Amazon S3 is a storage service allowing us to store and protect our data in directories (Buckets). We will need this service to go forward Buckets: is a container for objects stored … Read more Machine Learning on AWS SageMaker

How to draw a bar graph for your scientific paper with python

A bar graph 📊(also known as a bar chart or bar diagram) is a visual tool with that readers can compare data showed by bars among categories. In this story, I try to introduce how can we draw a clear bar plot with python. As a student or researcher, you have to publish your efforts … Read more How to draw a bar graph for your scientific paper with python

Learn linear regression using scikit-learn and NBA data: Data science with sports

Data science enables many pretty amazing tasks for its practitioners, and changed our lives in many ways from small to big. When a business predicts demand for a product, when a company identifies fraudulent transactions online or when a streaming service recommends what to watch, data science is often the oil that enables these innovations. … Read more Learn linear regression using scikit-learn and NBA data: Data science with sports

Text Classification with CNNs in PyTorch

A step-by-step guide to build a text classifier with CNNs implemented in PyTorch. Photo by Shelby Miller on Unsplash “Deep Learning is more that adding layers” The objective of this blog is to develop a step by step text classifier by implementing convolutional neural networks. So, this blog is divided into the following sections: Introduction … Read more Text Classification with CNNs in PyTorch

How to run Recommender Systems in Python

A practical example of Movies Recommendation with Recommender Systems Photo by Pankaj Patel on Unsplash Nowadays, almost every company applies Recommender Systems (RecSys) which is a subclass of information filtering system that seeks to predict the “ rating” or “ preference “ a user would give to an item. They are primarily used in commercial … Read more How to run Recommender Systems in Python

Algorithmic Trading with RSI using Python

Using talib and yfinance Photo by NASA on Unsplash Machine Learning is computationally intensive, as the algorithm is not deterministic and therefore must be constantly tweaked over time. However, technical indicators are much quicker, as the equations do not change. This therefore improves their ability to be used for real-time trading. To create a program … Read more Algorithmic Trading with RSI using Python

My Odyssey, Finding The Most Popular Python Function

We all love Python, but how often do we use which mighty functionality? An article about my quest to figure it out The most mentioned Python functions mentioned inside Pythonrepositories calculated via GitHub commits. Image by Author The other day while I was running some zip() with some lists through a map(). I couldn’t stop … Read more My Odyssey, Finding The Most Popular Python Function

Getting Started with Python Classes

In computer programming, classes are a convenient way to organize data and functions such that they are easy to reuse and extend later. In this post, we will walk through how to build a basic class in python. Specifically, we will discuss the example of implementing a class that represents instagram users. Let’s get started! … Read more Getting Started with Python Classes

Your ML Algorithm Is Not Performing Well

Photo by Rob Schreckhise on Unsplash How to Detect the Problem We spend so much time developing a machine learning algorithm. But after deploying if that algorithm performs poorly, that becomes frustrating. The question is what is the next step if the algorithm does not work as expected. What went wrong? Was the number of … Read more Your ML Algorithm Is Not Performing Well

Find Highly Correlated Stocks with Python!

Whether you are crafting a portfolio and want to incorporate diversification or trying to find stocks for a Pairs Trading strategy, the ability to calculate the correlation between the movement of two stocks is a must. Having a portfolio of stocks that are not closely correlated allows you to tap into different performing assets that … Read more Find Highly Correlated Stocks with Python!