Python List, NumPy, and Pandas

How to choose the right data structure from Python list, Numpy array, and Pandas DataFrame There are multiple data structures to work with a sequence of data in Python. The available data structures include lists, NumPy arrays, and Pandas dataframes. Oftentimes it is not easy for the beginners to choose from these data structures. In … Read more Python List, NumPy, and Pandas

The Complete Hands-On Machine Learning Crash Course

From linear regression to unsupervised learning, this guide covers everything you need to know to get started in machine learning. Theory and practical exercises are covered for each topic! Linear regression — theory Linear regression — practice Logistic regression — theory Linear discriminant analysis (LDA) — theory Quadratic discriminant analysis (QDA)— theory Logistic regression, LDA … Read more The Complete Hands-On Machine Learning Crash Course

Python For Data Science — A Guide to Data Visualization with Plotly

Again, the main objective of this dataset is to study what are the factors that affect the survivability of a person onboard the titanic. First thing that comes to my mind is to display how many passengers survived the titanic crash. Hence, visualizing the Survived column itself will be a good start. In Plotly, we … Read more Python For Data Science — A Guide to Data Visualization with Plotly

PySpark for Data Science Workflows

Image Source Chapter 6 of Data Science in Production Demonstrated experience in PySpark is one of the most desirable competencies that employers are looking for when building data science teams, because it enables these teams to own live data products. While I’ve previously blogged about PySpark, Parallelization, and UDFs, I wanted to provide a proper … Read more PySpark for Data Science Workflows

SequenceMatcher in Python

A human-friendly longest contiguous & junk-free sequence comparator SequenceMatcher is a class available in python module named “difflib”. It can be used for comparing pairs of input sequences. The objective of this article is to explain the SequenceMatcher algorithm through an illustrative example. Due to the limited docs available, I thought to share the concept … Read more SequenceMatcher in Python

Discover the strength of monotonic relation

Finding Spearman’s rank correlation coefficient using Python for IB Diploma Mathematics Photo by Mikael Kristenson on Unsplash In this article, I will show the necessary steps using Python to find the Spearman’s rank correlation coefficient. Monotonic function shows the relation between ordered sets. The Spearman’s rank correlation is useful to find this relation of ordinal … Read more Discover the strength of monotonic relation

Artificial Eyeliner on LIVE Feed using Python, OpenCV and Dlib

Briefly explaining — the program first extracts 68 landmark points from each face. Of those 68 points, points 37–42 belong to the left eye and points 43–48 belong to the right eye — see picture below. Visualisation of 68 landmark points. (Image from pyimagesearch) Because our goal is to apply eyeliner, we are only interested … Read more Artificial Eyeliner on LIVE Feed using Python, OpenCV and Dlib

Top 4 Numpy Functions You Don’t Know About (Probably)

The where() function will return elements from an array that satisfy a certain condition. Let’s explore it with an example. I’ll declare an array of grades of some sort (really arbitrary): You can now use where() to find, let’s say, all grades that are greater than 3: Note how it returns the index position. The … Read more Top 4 Numpy Functions You Don’t Know About (Probably)

Power up your Python Projects with Visual Studio Code

You should have a directory for every project, and a virtual environment for every directory. This structure does two important things: It keeps your stuff organized appropriately, which makes it easier to keep projects separate, manage dependencies, and keep out things that shouldn’t be there. (Who likes having to undo git commits?) It lets you … Read more Power up your Python Projects with Visual Studio Code

Add this single word to make your Pandas Apply faster

Yes. It does. We get a 2x improvement in run time vs. just using the function as it is. So what exactly is happening here? Source: How increasing data size effects performances for Dask, Pandas and Swifter? Swifter chooses the best way to implement the apply possible for your function by either vectorizing your function … Read more Add this single word to make your Pandas Apply faster

Predicting Bitcoin Price with Business News (Python)

The above dataset contains a daily summary of prices where the CHANGE column is the percentage change of the last price of the day (PRICE) with respect to the first (OPEN). Goal: To make things simple, we’ll focus on predicting if the price will rise (change > 0) or fall (change ≤ 0) the following … Read more Predicting Bitcoin Price with Business News (Python)

An Introduction to Discretization in Data Science

Feature Engineering: 4 Discretization Techniques to Learn. Discretization is the process through which we can transform continuous variables, models or functions into a discrete form. We do this by creating a set of contiguous intervals (or bins) that go across the range of our desired variable/model/function. Continuous data is Measured, while Discrete data is Counted. … Read more An Introduction to Discretization in Data Science

Build a useful ML Model in hours on GCP to Predict The Beatles’ listeners

AutoML “Automated Machine Learning” is under fire lately by the Data Science Community. I had some criticism published in on Google Cloud ML Google Cloud’s AutoML first look (the Google AutoML Team did answer some of my requests, see **new** things at the end of this article) This post will show it is completely reasonable … Read more Build a useful ML Model in hours on GCP to Predict The Beatles’ listeners

Data Wrangling using Pandas library

In this article, we’ll see some of the most useful techniques used to clean and process the data with Pandas library. Photo by Kevin Ku on Unsplash Data Science involves the processing of data so that the data can work well with the data algorithms. Data Wrangling is the process of processing data, like merging, … Read more Data Wrangling using Pandas library

Metrics and Python II

You can download complete file in https://kreilabs.com/wp-content/uploads/2019/12/Metrics_table.pdf Here we can see pyhton implementation for table metrics in matrix_metrix routine: import sklearn.metricsimport mathdef matrix_metrix(real_values,pred_values,beta):CM = confusion_matrix(real_values,pred_values)TN = CM[0][0]FN = CM[1][0] TP = CM[1][1]FP = CM[0][1]Population = TN+FN+TP+FPPrevalence = round(TP / Population,2)Accuracy = round( (TP+TN) / Population,4)Precision = round( TP / (TP+FP),4 )NPV = round( TN / … Read more Metrics and Python II

How to Supercharge your Pandas Workflows

Photo by Bill Jelen on Unsplash When I work with structured data, Pandas is my number one go-to tool. I think that comes at no surprise as Pandas is the most popular Python library for data manipulation, exploration, and analysis. It offers a lot of functionality out of the box. Additionally, various other modules exist … Read more How to Supercharge your Pandas Workflows

Combo Charts with Seaborn and Python

Overlaying two plots to make one chart http://www.terrystickels.com/math-art/images/synchronized_curves_std.jpg Intro The libraries, code, and visuals will be down below but first I wanted to offer a brief introduction as to why I decided to share this with everyone in this community. If you just want to skip down to the tutorial just skip the intro. While … Read more Combo Charts with Seaborn and Python

Solving TSP Using Dynamic Programming

While I was conducting research for another post in my transportation series (I, II, stay tuned for III), I was looking for a dynamic programming solution for the Traveling Salesperson Problem (TSP). I did find many resources, but none were to my liking. Either they were too abstract, too theoretical, presented in a long video … Read more Solving TSP Using Dynamic Programming

Catch Me if You Can: Outlier Detection (Taxi Trajectory Streams)

Predicting Uber demand through historical analysis (Taxi Trajectory Streams) Taxis | Beijing Photo published on thedrive.com Outlier detection is an interesting data mining task that is used quite extensively to detect anomalies in data. Outliers are points that exhibit significantly different properties than the majority of the points. To this end, outlier detection has very … Read more Catch Me if You Can: Outlier Detection (Taxi Trajectory Streams)

The Beginning of Natural Language Processing

Let’s all get back to the early stages of human life where early humans used to communicate with different hand gestures to convey their messages to each other to an extent wherein the present day we have more than 7000 varieties of languages spoken all around the world. Its quite an achievement for the early … Read more The Beginning of Natural Language Processing

Why Data Scientists Must Speak the Language of Python

Since the year 1950, the world has seen the emergence of more than a few programming languages. Be it JAVA, C, C++, Python or C#, every language eas designed to serve a purpose. Over time, people started to communicate with machines in these multiple languages. As a result, plenty of wonderful software applications were born … Read more Why Data Scientists Must Speak the Language of Python

Checking Analyzed Laboratory Data for Errors

Tutorial: Automatically Analyzing Laboratory Data to Create a Performance Map How to Ensure That There are no Errors in Laboratory Data Sets It’s very common that scientists find themselves with large data sets. Sometimes it comes in the form of gigabytes worth of data in a single file. Other times it’s hundreds of files, each … Read more Checking Analyzed Laboratory Data for Errors

Down with technical debt! Clean Python for data scientists.

Documenting your project — PEP257 and Sphynx While PEP8 outlines coding conventions for Python, PEP257 standardises the high level structure, semantics, and conventions of docstrings: what they should contain, and how to say it in a clear way. As with PEP8, these are not hard rules, but they are guidelines which you’d be wise to … Read more Down with technical debt! Clean Python for data scientists.

Perform very basic Pandas operations on data

In this guide, you’ll learn most basic operations performed on data using pandas. image credit: istockphoto Data scientists spend a large amount of their time cleaning datasets and getting them down to a form with which they can work. In fact, a lot of data scientists argue that the initial steps of obtaining and cleaning … Read more Perform very basic Pandas operations on data

Super Easy Way to Get Sentence Embedding using fastText in Python

Now, let me show you how easy it is to use. It can be done in just 4 lines. See? Very easy. You do not need to do any downloading or building, when you use MeanEmbedding for the first time, it just downloads pre-trained facebookresearch/fastText vector automatically. Currently, this library only supports English and Japanese. … Read more Super Easy Way to Get Sentence Embedding using fastText in Python

Interactive Neural Network Fun in Excel

Constructing CPPNs in Excel with PyTorch and PyXLL After reading Making deep neural networks paint to understand how they work by Paras Chopra I was inspired to delve into some experiments of my own. The images produced were intriguing, and I wanted to play around and get a feel for how they changed in response … Read more Interactive Neural Network Fun in Excel

What does Twitter think of the New Tesla Cybertruck? Sentiment Analysis in Python

Recently Elon Musk introduced the Tesla Cybertruck, an all-electric battery-powered commercial vehicle developed by Tesla Inc. The Cybertruck is a sustainable energy alternative to the thousands of fossil fuel powered trucks sold everyday. Recently during a demonstration of the Cybertruck, Elon told one of the key designers of the truck to throw a small steel … Read more What does Twitter think of the New Tesla Cybertruck? Sentiment Analysis in Python

How Regression Analysis Works

Different forms of regression analysis and their applications Regression analysis is a machine learning algorithm that can be used to measure how closely related independent variable(s) relate with a dependent variable. An extensive use of regression analysis is building models on datasets that accurately predict the values of the dependent variable. At the beginning of … Read more How Regression Analysis Works

Analyzing One Million Voter Records in Manhattan

A Unique Look into the Composition of Voters in Manhattan What if you could instantly visualize the political affiliation of an entire city, down to every single apartment and human registered to vote? Somewhat surprisingly, the City of New York made this a reality in early 2019, when the NYC Board of Elections decided to … Read more Analyzing One Million Voter Records in Manhattan

Excel with xlwings

Figuring out another Python integration https://qph.fs.quoracdn.net/main-qimg-29688f0bfb78623144c20d4e6ea516a1 Recently I wrote about PyXLL and I felt like I went ‘Down a Rabbit Hole’ of discovery and possibilities. Naturally I got to thinking if there might be other alternatives out there. ‘xlwings’ came up in the search and of course, I started thinking about ‘Red Bull Gives You … Read more Excel with xlwings

Understanding Dimensionality Reduction for Machine Learning

Principal Component Analysis (PCA) is one of the most popular dimensionality reduction algorithms out there and can also be used for noise filtering, stock market predictions, data visualization, and much more. PCA reduces the number of features in the dataset by detecting the correlation between them. When the correlation between the features is strong enough, … Read more Understanding Dimensionality Reduction for Machine Learning

Build pipelines with Pandas using “pdpipe”

The second method looks for the string drop in the Price_tag column and drops those rows that match. And finally, the third method removes the Price_tag column, cleaning up the DataFrame. After all, this Price_tag column was only needed temporarily, to tag specific rows, and should be removed after it served its purpose. All of … Read more Build pipelines with Pandas using “pdpipe”

Will NumPy become Python?

Well, without Numpy, how can we perform mathematical operations between arrays? How does Python stack up against the other statistical languages of our period? Python’s array iteration is awesome, actually. The zip() function makes it possible to iterate through two lists at the same time. array = []for f, b in zip(array1, array2): res = … Read more Will NumPy become Python?

12 Steps to Production-Quality Data Science Code

There’s a Dilbert comic in which Dilbert tells his boss that he can’t take over a co-worker’s software project until he spends a week bad mouthing the co-worker’s existing code. If you’ve ever taken over maintaining someone else’s code, you’ll immediately see the truth in this. No one likes taking over maintaining or working on … Read more 12 Steps to Production-Quality Data Science Code

How to Build a Restaurant Recommendation System Using Latent Factor Collaborative Filtering

Image Designed by Freepik I usually watch youtube when I am taking a break from my work. I commit to myself to watch Youtube only for 5 to 10 minutes to rest my mind. Here is what usually happens, after I finished watching one video, the next video pops out from Youtube recommendations and I … Read more How to Build a Restaurant Recommendation System Using Latent Factor Collaborative Filtering

Julia Box: Google Colab for Julia

Julia is a great language that is up and coming in the statistical computing place. Julia is actually very commonly used by biologists, medical scientists, and chemists; however, Julia for data-science, while not quite used on a large scale yet, is an idea that comes more and more feasible everyday. Julia certainly has advantages to … Read more Julia Box: Google Colab for Julia

5 Actionable advice for Data Science beginners

Here are 5 tips for everyone getting into data science Photo by Frame Harirak on Unsplash “Learn from the experts; you will not live long enough to figure it all out by yourself. “ — Brian Tracy There are myriads of ways to learn data science. You can read articles, watch videos, enroll in online … Read more 5 Actionable advice for Data Science beginners

The Russell Westbrook Effect

For 3 straight seasons, he averaged a triple double in Oklahoma City. While the regular season numbers were stupendous, it didn’t translate to much postseason success. Reunited with his old Thunder buddy James Harden, what will Westbrook’s Effect be in Houston? Oscar Robertson was the first player to average a triple double for a whole … Read more The Russell Westbrook Effect

Advantages and Disadvantages of Artificial Intelligence

Artificial Intelligence is one of the emerging technologies which tries to simulate human reasoning in AI systems. John McCarthy invented the term Artificial Intelligence in the year 1950. He said, ‘Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate … Read more Advantages and Disadvantages of Artificial Intelligence

Build a Python Crawler to Get Activity Stream with GitHub API

I want to get these activities like below ShusenTang starred lyprince/sdtw_pytorchchizhu starred markus-eberts/spertHexagram-King starred BrambleXu/knowledge-graph-learningYevgnen starred BrambleXu/knowledge-graph-learning…… 2.1 GitHub API First, we take a look at GitHub API documentation. If you don’t enable the two-factor authentication, you could run the below command to test the API. After inputting the password, you should see the response. … Read more Build a Python Crawler to Get Activity Stream with GitHub API

11 Evaluation Metrics Data Scientists should be familiar with— Lessons from A High-rank Kagglers’…

Evaluation metric, a theme of this post, is a somewhat confusing concept for ML beginners with another related but separate concept, loss function. They are similar in a sense they could be the same when we are lucky enough, but it will not happen every time. Evaluation metric is a metric “we want” to minimize … Read more 11 Evaluation Metrics Data Scientists should be familiar with— Lessons from A High-rank Kagglers’…

Increasing Kaggle Revenue: Analyzing user data to recommend the best new product

In this project, we will create recommendations for increasing revenue at Kaggle, an online community for data science professionals. We will analyze a Kaggle customer survey, attempting to learn if there are any indicators of potential revenue growth for the company. To make our recommendations, we will try to learn: Is there market potential for … Read more Increasing Kaggle Revenue: Analyzing user data to recommend the best new product

Demand Prediction with LSTMs using TensorFlow 2 and Keras in Python

TL;DR Learn how to predict demand using Multivariate Time Series Data. Build a Bidirectional LSTM Neural Network in Keras and TensorFlow 2 and use it to make predictions. One of the most common applications of Time Series models is to predict future values. How the stock market is going to change? How much will 1 … Read more Demand Prediction with LSTMs using TensorFlow 2 and Keras in Python

Time Series Forecasting with LSTMs using TensorFlow 2 and Keras in Python

TL;DR Learn about Time Series and making predictions using Recurrent Neural Networks. Prepare sequence data and use LSTMs to make simple predictions. Often you might have to deal with data that does have a time component. No matter how much you squint your eyes, it will be difficult to make your favorite data independence assumption. … Read more Time Series Forecasting with LSTMs using TensorFlow 2 and Keras in Python