How I customarily bin data with Pandas

Pandas qcut() To understand how qcut() works, let’s start with histograms: sns.histplot(planets[‘mass’]) Histograms automatically divide an array or list of numbers into several bins, each containing different number of observations. In seaborn, it is possible to control the number of bins: sns.histplot(planets[‘mass’], bins=10) Histograms are the first examples of binning data you might have seen. … Read more How I customarily bin data with Pandas

MLflow Part 3: Logging Models to a Tracking Server!

MLflow 101 Getting your parameters, metrics, artifacts, and more logged to an MLflow tracking server Hey there, friends, and welcome back to another post in our series on MLflow. If this is the first post you’ve seen and would like to catch up, be sure to check out the previous posts here: As always, if … Read more MLflow Part 3: Logging Models to a Tracking Server!

Building a comprehensive set of Technical Indicators in Python for quantitative trading

Simple Moving Average (SMA) Simple Moving Average is one of the most common technical indicators. SMA calculates the average of prices over a given interval of time and is used to determine the trend of the stock. As defined above, I will create a slow SMA (SMA_15) and a fast SMA (SMA_5). To provide Machine … Read more Building a comprehensive set of Technical Indicators in Python for quantitative trading

A quick tutorial to AWS Transcribe with Python

How to add speaker labels? Now, we will read the JSON file in the “TranscriptFileUri.” As we are using Google Colab, I will also demonstrate how to access files inside specific folders.Assuming we already have it in a Folder: Colab Notebooks/AWS Transcribe reader: here’s how to access it. from google.colab import driveimport sysimport os drive.mount(‘/content/drive/’)sys.path.append(“/content/drive/My … Read more A quick tutorial to AWS Transcribe with Python

Handling Outliers in Clusters using Silhouette Analysis

Identify and remove outliers in each cluster from K-Means clustering Image by Gerd Altmann from Pixabay The real-world data often has a lot of outlier values. The cause of outliers can be data corruption or failure to record data. The handling of outliers is very important during the data preprocessing pipeline as the presence of … Read more Handling Outliers in Clusters using Silhouette Analysis

NLP Text Preprocessing: Steps, tools, and examples

Part 3: Vectorization and embeddings Text vectorization is converting text into vectors of values to represent their meanings. Earlier days, we have one hot encoding method with a vector with a size of our vocabulary, and value 1 wherever the text appear and 0s elsewhere. Nowadays, we have more advanced methods like spacy, GloVe, or … Read more NLP Text Preprocessing: Steps, tools, and examples

Succeed as a Data Scientist in a Hackathon Without Data Being Provided?

Last month I participated in my first Hackathon because I received a random ad E-Mail from my University which promoted a very cool sounding one. I clicked it and saw that they were searching for teams with different skills, also including Data Scientists. Awesome I thought, I’m in! I signed up, applied and got accepted. … Read more Succeed as a Data Scientist in a Hackathon Without Data Being Provided?

Analyzing the chaotic Presidential Debate 2020 with text mining techniques

Thanks to the internet, now the world knew about the Presidential Debate 2020 that went out of control. All of the major news stations were reporting about how the participants were interrupting and sniping at one another. I decided to put together an article that focuses on analyzing the words used in the event and … Read more Analyzing the chaotic Presidential Debate 2020 with text mining techniques

New Study on TikTok’s Algorithms and Trump’s Tulsa Rally

The event illustrates how TikTok’s algorithms can make mass political communication more accessible, but it is still no democratic utopia. Over the summer, I crunched the numbers on about 80,000 TikTok videos pertaining to the prank on Trump’s re-election rally in Tulsa. My main interest was understanding how TikTok’s algorithms may have played a role … Read more New Study on TikTok’s Algorithms and Trump’s Tulsa Rally

Characteristic Based Similarity for New Products Forecasting

Photo by Andre Hunter on Unsplash A way to solve one of the biggest questions of retailers when they are including new products to their mix. “How much I’m gonna sell of it?” is the question that every retailer has in mind when they are thinking about adding a new material into their stores and … Read more Characteristic Based Similarity for New Products Forecasting

How to build an encoder decoder translation model using LSTM with Python and Keras.

Follow this step by step guide to build an encoder decoder model and create your own translation model Photo by Michael Dziedzic on Unsplash Prerequisites: to understand this article previous knowledge about recurrent neural network (RNN) and encoder decoder is valuable. This article is a practical guide on how to develop an encoder decoder model, … Read more How to build an encoder decoder translation model using LSTM with Python and Keras.

3 Steps to Define an Effective Data Science Process

When I ask people who lead data science teams about their data science process, many will describe a data science life cycle (i.e., their data science process workflow — such as first obtaining data, then cleaning the data, and then creating a machine learning model). Others give a vague answer about “working as a team … Read more 3 Steps to Define an Effective Data Science Process

How to estimate the standard error of the median: The Bootstrap Strategy

What is the standard error? According to Wikipedia, the standard error of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. There are several concepts in this sentence that needs to be clarified before move forward: First, a statistic is a sampling estimation of a parameter. For … Read more How to estimate the standard error of the median: The Bootstrap Strategy

How to do a Custom Sort on Pandas DataFrame

Suppose we have a dataset about a clothing store: df = pd.DataFrame({‘cloth_id’: [1001, 1002, 1003, 1004, 1005, 1006],’size’: [‘S’, ‘XL’, ‘M’, ‘XS’, ‘L’, ‘S’],}) Data made by author We can see that each cloth has a size value and the data should be sorted by the following order: XS for extra small S for small … Read more How to do a Custom Sort on Pandas DataFrame

FACTOR ANALYSIS-MY ML OREO DETECTOR

“Beauty gets the attention but personality gets the heart”. These lines portray the importance of things which lies beyond our vision. What about a Machine Learning algorithm that finds information about the inner beauty like my heart which finds the creamy layer of the Oreo despite the unappetizing outer crunchy biscuits. Factor analysis is one … Read more FACTOR ANALYSIS-MY ML OREO DETECTOR

How to Build a Football Dataset with Web Scraping

To begin with the code, we’ll make our imports and initialize two empty lists, one for dealing with errors, which will be explained later in the article, and the other to store the data of every match we scrape. Within the loop, the URL will be created using the match ID, the driver object will … Read more How to Build a Football Dataset with Web Scraping

Tutorial: Stop Running Jupyter Notebooks from your Command Line!

Run your Jupyter Notebook as a stand alone web app Photo taken by Justin Jairam from @jusspreme (with permission) Jupyter Notebook provides a great platform to produce human-readable documents containing code, equations, analysis, and their descriptions. Some even consider it a powerful development when combining it with NBDev. For such an integral tool, the out … Read more Tutorial: Stop Running Jupyter Notebooks from your Command Line!

This Changes The Way You “See” Quantum Computing

Exploring The Quantum Observer Effect This post is part of the book: Hands-On Quantum Machine Learning With Python Image by author, Frank Zickert A qubit is a two-level quantum system that is in a superposition of the quantum states |0⟩ and |1⟩ unless you observe it. (Here’s more on the qubit state). Once you observe … Read more This Changes The Way You “See” Quantum Computing

5 Reasons why I’m learning Web Development, as a Data Scientist

And, why you should too. Photo by Luke Peters on Unsplash If there’s one thing that frustrates me about the data science process, it would probably be the fact that I could spend hours (and maybe days) building and refining a model only to realize putting it in production is another ton worth of work … Read more 5 Reasons why I’m learning Web Development, as a Data Scientist

How To Approach Problem Definition In Your Next Deep Learning Project

In this section, we’ll take a deeper dive into the guiding questions presented earlier in this article. A better understanding of the applicability and importance of the guiding questions is realised when exploration of scenarios reveal considerations to take when tackling a problem that requires a deep learning-based solution. Photo by Markus Winkler on Unsplash … Read more How To Approach Problem Definition In Your Next Deep Learning Project

Interactive: Visualizing Covid-19 Test Accuracy

Before we get into the tests themselves, let’s understand what “accuracy” means for detecting an infection. There are 2 parts to a test’s accuracy: Sensitivity and Specificity. Sensitivity measures how often tests report “positive” for people who actually have the disease. Specificity measures how often tests report “negative” for people who don’t have the disease. … Read more Interactive: Visualizing Covid-19 Test Accuracy

Building a Succesful Data Initiative

Successful data initiatives use a simple architecture that will scale. Most big data projects fail. Why do data initiatives fail? Photo by Serghei Trofimov on Unsplash The team starts building without clear business goals The team tries to solve problems it doesn’t understand fully using applications it understands even less The team chases Big Data … Read more Building a Succesful Data Initiative

You know Excel. Time to learn SQL.

Photo by Olenka Sergienko from Pexels 9 core excel functionalities translated into SQL As the world becomes increasingly data-driven, a growing number of professionals are working more closely with data. In many cases, the first introduction to this domain comes in the timeless form of Microsoft Excel. Vlookups, pivot tables, sumifs; maybe a little VBA … Read more You know Excel. Time to learn SQL.

Top 3 Python Visualization Libraries

Bokeh is the plotting library I tend to use the most for its interactivity features. The functionality I particularly enjoy is its ability to extend functionality with the introduction of custom JavaScript, which “can support advanced or specialized cases” that you may have when plotting. Bokeh allows for the creation of plots, dashboards, and applications … Read more Top 3 Python Visualization Libraries

Active Learning — Say Yeah!

Machine learning this, machine learning that! You know the drill. Let’s talk about a topic that people are only whispering about at the moment, Active Learning. Active learning is a sub-field of Artificial Intelligence which is based on the fact that curious algorithms are better learners both in terms of efficiency and expressivity. The core … Read more Active Learning — Say Yeah!

Understanding Logistic Regression

What is logistic regression? Logistic regression is just adapting linear regression to a special case where you can have only 2 outputs: 0 or 1. And this thing is most commonly applied to classification problems where 0 and 1 represent two different classes and we want to distinguish between them. Linear regression outputs a real … Read more Understanding Logistic Regression

Want to break into data science? Start building

Landing a job as a data scientist, machine learning engineer, or really any kind of role writing software takes more than just math and programming knowledge. In reality, these roles require you to make hundreds of decisions every day. These might be big decisions, like: How and where should I store my data? Which algorithm(s) … Read more Want to break into data science? Start building

Analysing My Lockdown Sleep Data | Data Science | Medium

The past few months have been a turbulent time for most of us, in one form or another, but has our sleep been changed for better or worse? Photo by Matthew Henry on Unsplash The past few months have been a turbulent time for most of us, in one form or another. Whether that turbulence … Read more Analysing My Lockdown Sleep Data | Data Science | Medium

Interpretability in PyTorch, Integrated Gradient

Interpretability in Neural networks using Captum, Integrated Gradients, and PyTorch Lightning. Gif summarizing the interpretation of the learning process of our Neural Network. Image by Author Neural networks have been taking the world by storm. Not a week passes without great news about how GPT-3 supposedly automates yet another language task. Or how AI is … Read more Interpretability in PyTorch, Integrated Gradient

Credit Card Customer Clustering with Autoencoder and K-means

A Further Dig into Business Intelligence for Customer Marketing with Improved models Img from pixabay via link In a previous article, we created a stacked auto-encoder model for movie rating prediction. But as we know, with its encoder part, an auto-encoder model can also help with feature extraction. So, in this article, we will continue … Read more Credit Card Customer Clustering with Autoencoder and K-means

Auto-Updating Your Github Profile With Python

Showcase Your Skills Through Automating Your Profile Readme Photo by Christina Morillo from Pexels Lately, I have been seeing an increasing number of developers on Github with a profile level README.md, and I wanted to create the same thing. I saw this as another opportunity to communicate what I am about. I also feel like … Read more Auto-Updating Your Github Profile With Python

Emulating a PID Controller with Long Short-term Memory: Part 2

Training a Long Short-term Memory neural network in Keras to emulate a PID controller using the Temperature Control Lab Photo by PilMo Kang on Unsplash Welcome to Part 2 of this exciting project! The results have looked great so far, and now we can get into the meat of what we’re trying to accomplish: emulating … Read more Emulating a PID Controller with Long Short-term Memory: Part 2

Bollinger Bands for stock trading. Theory and practice in Python

Bollinger Bands are a tool introduced by the quantitative trader John Bollinger in the 1980s. They are made by two lines that wrap the price time series in a way that is related to volatility. The higher the volatility, the wider the bands. They are usually drawn in this way: Higher band: a 20-period Simple … Read more Bollinger Bands for stock trading. Theory and practice in Python

The Six Types of Data Analysis

Photo by Luke Chesser on Unsplash Data Analysis can be separated and organized into 6 types, arranged with an increasing order of difficulty. Descriptive Analysis Exploratory Analysis Inferential Analysis Predictive Analysis Causal Analysis Mechanistic Analysis Goal — Describe or Summarize a set of Data Description: The very first analysis performed Generates simple summaries about samples … Read more The Six Types of Data Analysis

Using Docopt in python, the most user-friendly command-line parsing library

Now that we have created our docopt.py file, we can create a setup.py file as long as a requirements.txt file to make our project even more user-friendly. A setup.py file is a python file where you describe your module distribution to the Distutils, so that the various commands that operate on your modules do the … Read more Using Docopt in python, the most user-friendly command-line parsing library

3 Basic Steps of Stock Market Analysis in Python

Analyze Tesla stock in Python, calculate Trading Indicators and plot the OHLC chart. Includes a Jupyter Notebook with code examples. Photo by Chris Liverani on Unsplash Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You … Read more 3 Basic Steps of Stock Market Analysis in Python

Generating Random Data into a Database Using Python

Populating a MySQL Database with Dummy Data using Pandas Photo by aitoff from pixabay In this article we’re going to demonstrate how to generate dummy data and input for a MySQL database. Majority of the times we need to create a database to test some software that we’ve built. To do this, we can use … Read more Generating Random Data into a Database Using Python

How to Build a Serverless Application using AWS SAM

Let’s take a look at the App.java file. The sam-init command created a simple Lambda function that returns the JSON body {“message”: “hello world”} and the machine’s IP address when called. We can now change this template and add more code to read news from Google. Now let’s take a look at the template.yml file. … Read more How to Build a Serverless Application using AWS SAM

Deep Learning based Recommender Systems

A gentle introduction to modern movie recommenders Traditionally, recommender systems are based on methods such as clustering, nearest neighbor and matrix factorization. However, in recent years, deep learning has yielded tremendous success across multiple domains, from image recognition to natural language processing. Recommender systems have also benefited from deep learning’s success. In fact, today’s state-of-the-art … Read more Deep Learning based Recommender Systems

Three Ways to Create Dockernized LaTeX Environment

Getting Started with LeTeX + Docker + VSCode Remote Container Photo by Arisa Chattasa on Unsplash IntroductionSetupMethod 1: tianon/latexMethod 2: Remote-ContainersMethod 3: Creating your containerHow to switch Remote containersOpening a PDFConclusionReferences We can run a Docker application in any environment, Linux, Windows, or Mac. Docker provides a set of official base images for most used … Read more Three Ways to Create Dockernized LaTeX Environment

Anchors Away! Regex in R | by Drew Seewald

Tutorial | R | Regular Expressions (Regex) Secrets to working with text using advanced regular expressions tools in R Photo by Peter Hansen on Unsplash So you already know the basics of regular expressions, or regex, in R. Things like how to use character sets, meta characters, quantifiers, and capture groups. These are the basic … Read more Anchors Away! Regex in R | by Drew Seewald

Using Data Science Skills Now: Text Scraping

Have a tedious document searching task? Automate it with python in 5 simple steps. Image by Henryk Niestrój from Pixabay We all are given tasks at times that are tedious in nature. They’re manual and annoying. If it’s a once and done project, we grunt through and get it done. Sometimes you know it’s going … Read more Using Data Science Skills Now: Text Scraping

How AI Techniques Made Me a Better Parent for Our Toddler

I used the knowledge and wisdom I gained from my work in Artificial Intelligence to understand and teach my two-year-old son more effectively and regain my sanity. Image via Unsplash and Freepik Overview The knowledge I have gained from building Artificial Intelligence (AI) tech is directly applicable to raising my toddler. Not only does it … Read more How AI Techniques Made Me a Better Parent for Our Toddler

Build a Shiny Dashboard with Elasticsearch

5. Connect to Elasticsearch elasticsearch <- import(“elasticsearch”)host <- “localhost:9200″es <- elasticsearch$Elasticsearch(hosts = host) There are various way for the connection since AWS4Auth is not used. You may use a R only approach as well. 6. Install the necessary packages for the Shiny dashboard including Shiny, shinyWidgets and shinydashboard. Manipulate Data and Build Shiny Dashboard using … Read more Build a Shiny Dashboard with Elasticsearch

How to Clean Text Files at the Command Line

A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook Photo by JESHOOTS.COM on Unsplash Cleaning data is like cleaning the walls in your house, you clear any scribble, remove the dust, and filter out what is unnecessary that makes your walls ugly and get rid of … Read more How to Clean Text Files at the Command Line

Kaggle’s Titanic Competition in 10 Minutes | Part-I

Machine Learning Tutorials — Part 1 | Part 2 → soon Complete Your First Kaggle Competition in Less Than 20 Lines of Code with Decision Tree Classifier | Machine Learning Tutorials Since you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via … Read more Kaggle’s Titanic Competition in 10 Minutes | Part-I