Ranking algorithms — know your multi-criteria decision solving techniques!

Suppose you have a decision to make — like buying a house, or a car, or even a guitar. You don’t want to choose randomly or get biased by someone’s suggestion, but want to make an educated decision. For this, you gathered some information about the entity you want to buy (let’s say it’s a … Read more Ranking algorithms — know your multi-criteria decision solving techniques!

Simple Movie Recommendation System using Python

Photo by Noom Peerapong on Unsplash Recommendation systems are quite popular now. They can be used in either movie recommendation or shopping or e-commerce websites or on social media websites. It’s used intensively. There are three types of recommendation systems — Content-based, Popularity based and Collaborative based. Popularity based is simple and recommends on the … Read more Simple Movie Recommendation System using Python

Streamlit + Heroku = Magic?

Note: I’m assuming you already have streamlit installed and already know a little bit on how to work with it. There are four essential components required to launch your streamlit application on Heroku. setup.sh requirements.txt Procfile your_application.py setup.sh — no credentials needed Note: You do not need to name this file exactly setup.sh, it can … Read more Streamlit + Heroku = Magic?

Unifying remote and local AzureML environments

Microsoft and Python Machine Learning: a modern love story, Part 2 of 2 Microsoft Azure is conquering our hearts as AI practitioners and wooing us with support for open-source frameworks such as PyTorch, Tensorflow and Scikit-learn on AzureML. Here we build a workflow around the tools that MS gives us and it is up to … Read more Unifying remote and local AzureML environments

A Bayesian Approach to Linear Mixed Models (LMM) in R/Python

Implementing these can be simpler than you think There seems to be a general misconception that Bayesian methods are harder to implement than Frequentist ones. Sometimes this is true, but more often existing R and Python libraries can help simplify the process. Simpler to implement ≠ throw in some data and see what sticks. (We … Read more A Bayesian Approach to Linear Mixed Models (LMM) in R/Python

The Best Way to Invest in the Market

Empirical analysis comparing lump sum investment vs. dollar cost averaging using Python Working in finance, I inevitably get asked “What stocks should I buy?”. Since I’m well aware that I have no chance of consistently beating the market, I respond with the same three words every time, “Buy the SPY.” For those unfamiliar, the SPY … Read more The Best Way to Invest in the Market

Exploring Line Charts for Data Visualization

Exploratory Data Visualization A Practical Guide to Understand, Visualize, and Interpret Data. Data Visualization is a discipline that focuses on the visual representation of data. We as humans possess powerful visual processing capabilities. We tend to find patterns quickly. Unfortunately, when represented in a textual or tabular form we are unable to take advantage of … Read more Exploring Line Charts for Data Visualization

Twitter Analytics: “WeRateDogs”

Most of the focus of this project was on data wrangling. So what exactly is data wrangling? Data wrangling refers to the process of cleaning, restructuring and enriching the raw data available into a more usable format. I have used various python libraries in this project, below are the ones I got started with. import … Read more Twitter Analytics: “WeRateDogs”

Efficient Web Scraping with Scrapy

https://unsplash.com/@markusspiske New features of Scrapy to make your scraping efficient Scrapy as a framework for web scraping is powerful and scalable. It has an active user base and new features coming out with each update. Here in this article we will run through some of those features to get the most out of your scraping … Read more Efficient Web Scraping with Scrapy

Grabbing Geodata From Your Photos Library Using Python

Katy Mould www.katymould.com Use python to automate sorting through your photo collection So one of the many talents my girlfriend has is photography. However she has over 100,000 photos which require sorting out primarily into year and city. She primarily does location based photography and finds writing essays about certain areas of the world easier … Read more Grabbing Geodata From Your Photos Library Using Python

Find Text Similarities with your own Machine Learning Algorithm.

4. TF-IDF algorithm As we used the above function to clean and filter data as well as setting up our counts we will proceed with the implementation of the core functionality of our experiment, the tf-idf algorithm: TF IDF mainly using dictionaries Taking advantage of our tf-idf algorithm we could now decide to only consider … Read more Find Text Similarities with your own Machine Learning Algorithm.

Removing non-linear trends from timeseries data

Sometimes trends need to be removed from timeseries data, in preparation for the next steps, or part of the data cleaning process. If you can identify a trend, then simply subtract it from the data, and the result is detrended data. If the trend is linear, you can find it via linear regression. But what … Read more Removing non-linear trends from timeseries data

Creating Photo Mosaics Using Python

The Code This section contains the details of the creation engine so for those bored by pseudo-code scroll to the bottom to see the final product of my demo. The full code is available on GitHub. The Image Directory: The background images are loaded into memory, reshaped, and processed. The processing involves acquiring the average … Read more Creating Photo Mosaics Using Python

What is time series decomposition and how does it work?

A step-by-step procedure for decomposing a time series into trend, seasonal and noise components using Python There are many decomposition methods available ranging from simple moving average based methods to powerful ones such as STL. In Python, the statsmodels library has a seasonal_decompose() method that lets you decompose a time series into trend, seasonality and … Read more What is time series decomposition and how does it work?

Jupyter Notebook Essential Productivity Hacks

Use Ctrl+Click to create multicursors. Wherever Or, for example, consider changing the name of a column. Because Jupyter notebooks is cell-based, it can be easy to forget variables and what values they hold. However, you can track all the variables — their names, their data type, and the data they store — by calling %whos. … Read more Jupyter Notebook Essential Productivity Hacks

Generating cooking recipes using TensorFlow and LSTM Recurrent Neural Network

Photo by home_full_of_recipes (Instagram channel) I’ve trained a character-level LSTM (Long short-term memory) RNN (Recurrent Neural Network) on ~100k recipes dataset using TensorFlow, and it suggested me to cook “Cream Soda with Onions”, “Puff Pastry Strawberry Soup”, “Zucchini flavor Tea” and “Salmon Mousse of Beef and Stilton Salad with Jalapenos”. Here you may find more … Read more Generating cooking recipes using TensorFlow and LSTM Recurrent Neural Network

Properties in Python: Fundamentals for Data Scientists

Understand the basics with a concrete example! Photo by Goran Ivos on Unsplash In many object-oriented programming languages, data within an object or in a class structure can be explicitly hidden from external access by using specific keywords such as private or protected. This prevents misuse by not allowing access to the data from outside … Read more Properties in Python: Fundamentals for Data Scientists

A simple yet useful data visualization library for your EDA

Exploratory data analysis/univariate analysis/data science Visualize the relationship between a dependent variable and any feature in a meaningful way Exploratory data analysis, or EDA, is particularly relevant for projects that involve tabular data. During the EDA, the cautious data scientist generally tries to assess the quality of its datasets by looking for outliers or by … Read more A simple yet useful data visualization library for your EDA

Styling Pandas DataFrames: More Than Just Numbers

To begin, we’ll generate a sample DataFrame. It will have five columns, each with ten rows, all drawn randomly from a normal distribution. To color in a DataFrame according to conditional rules, one needs to create a color map. This exists in the form of a function that takes in a value, returning a string … Read more Styling Pandas DataFrames: More Than Just Numbers

What’s New In Python 3.9 Dictionaries?

Dictionaries are a built-in Python data type used to hold mutable, insertion ordered collection of key-value pairs. Up until Python 3.8, we’d typically use the update method or the ** unpacking operator but the most trivial way to update a dictionary is by inserting a key-value pair in it as shown below: d1 = {‘a’: … Read more What’s New In Python 3.9 Dictionaries?

Python: Writing Custom Exceptions is easier than you might think

Python provides a number of built-in exceptions, however, sometimes it makes a lot more sense to make our own Handling runtime errors in Python is pretty easy. You just have to put the code in a try-except block and handle the exception from many of builtin exception types provided by Python. Given below is an … Read more Python: Writing Custom Exceptions is easier than you might think

Tafra: A Minimalist Dataframe

A small, pure-Python package with first-class support for types and minimal dependencies focused on usability and performance. David S. Fulford and Derrick W. Turk. June 17, 2020 Image Source: Pixabay Introduction Data science requires, quite obviously, data. When we work with data, we must first load the data from a source, and into memory. Our … Read more Tafra: A Minimalist Dataframe

Build and deploy machine learning web app using PyCaret and Streamlit

A beginner’s guide to deploying a machine learning app on Heroku PaaS In our last post on deploying a machine learning pipeline in the cloud, we demonstrated how to develop a machine learning pipeline in PyCaret, containerize Flask app with Docker and deploy serverless using AWS Fargate. If you haven’t heard about PyCaret before, you … Read more Build and deploy machine learning web app using PyCaret and Streamlit

Line Chart Animation with Plotly on Jupyter

📉 Introduction📉 Data preparation📉 All countries line chart📉 Adding LOG and LINEAR buttons📉 Changing hovermode📉 Line chart animation📉 Scatter and bar chart animations📉 Conclusion In this article, I will try to reproduce one of Our World in Data charts using Plotly animation on Jupyter. ourworldindata.org is an excellent website and I really like their visualizations. … Read more Line Chart Animation with Plotly on Jupyter

Learn how to automate the basic steps of Data Analysis

To begin developing your own customised Python package, the following are the steps that we need to perform: STEP 1 — Creation of the Python script file This file will contain the Python code necessary to run the basic data analysis. To demonstrate, let us automate the steps such as calculation of – Dimension of … Read more Learn how to automate the basic steps of Data Analysis

Text classification of Amazon Fine Food reviews

Photo by Christian Wiediger on Unsplash The goal here is to classify Food reviews based on customers’ text. So the first step would be to download the dataset. It would be fascinating for suppliers to use reviews from their customers to provide better service to them. Reviews include several features like ‘ProductId’, ‘UserId’, ‘Score’, and … Read more Text classification of Amazon Fine Food reviews

Your Ultimate Python Visualization Cheat-Sheet

Creating a figure is necessary to specify the graph size. plt.figure(figsize=(horizontal_length,vertical_length)) Seaborn styles can add grids and styles to the graph space. There are four styles in seaborn, which can be loaded using .set_style. sns.set_style(name_of_style) Seaborn contexts are built-in pre-created packages of how you may want your plot to look, which affects things like the … Read more Your Ultimate Python Visualization Cheat-Sheet

Make a Simple NBA Shot Chart with Python

Recently, I stumbled on this wonderful Python package called nba_api, which can be found here, that serves as a very simple API client to retrieve stats from www.nba.com. I have personally always wanted to create player shot charts like ones found online, but gathering the shot location data seemed like a daunting task. This is … Read more Make a Simple NBA Shot Chart with Python

What is a Higher Order Function?

Let’s build a simple version of the popular functions using higher order functions (which they actually are). The filtering function will have 2 parameters, an array and a test function and it will return a new array with all the elements that pass the test. Python: def filtering(arr, test):passed = []for element in arr:if (test(element)):passed.append(element)return … Read more What is a Higher Order Function?

Polymorphism in Python: Fundamentals For Data Scientists

Understand the basics with a concrete example! Photo by ThisisEngineering RAEng on Unsplash Polymorphism is another important concept in object-oriented programming. The classes are polymorphic when they contain methods that have different implementations while they have identical names. In such a case, we can use objects of these polymorphic classes without considering differences across the … Read more Polymorphism in Python: Fundamentals For Data Scientists

How to Profile Your Code in Python

More Control With Profile and pstats.Stats Although using cProfile.run() can be sufficient in most cases, if you need more control over profiling, you should use the Profile class of cProfile. The snippet below is taken from the Profile class document. import cProfile, pstats, iofrom pstats import SortKeypr = cProfile.Profile()pr.enable()# … do something …pr.disable()s = io.StringIO()sortby … Read more How to Profile Your Code in Python

Autodeploy FastAPI App to Heroku via Git in these 5 Easy Steps

The Procfile specifies the commands that are executed by the app upon startup. Create a new file called Procfile (no extensions, make sure you are using the same casing). This is the Procfile for the Iris classifier: web: uvicorn iris.app:app –host=0.0.0.0 –port=${PORT:-5000} If you are using your own FastAPI project, change the path iris.app:app accordingly. … Read more Autodeploy FastAPI App to Heroku via Git in these 5 Easy Steps

A Simple Way to Analyze Student Performance Data with Python

Photo by Stephen Dawson on Unsplash Explore how to analyze data and build informative graphs in a productive way using Python and Dremio Data analysis and data visualization are essential components of data science. Actually, before the machine learning era, all data science was about the interpretation and visualization of data with different tools and … Read more A Simple Way to Analyze Student Performance Data with Python

Practical Image Process with OpenCV

Image processing is basically the process that provides us to achieve the features from images. Image processing is applied for both images and videos. These are the procedures used frequently in order to make training more successful in deep learning structures. The image processing begins with the recognition of data by computers. Firstly, a matrix … Read more Practical Image Process with OpenCV

Inheritance in Python: Fundamentals for Data Scientists

Understand the basics with a concrete example! Photo by Christopher Gower on Unsplash Class inheritance is an important concept in object-oriented programming. It enables us to extend the abilities of an existing class. To create a new class, we use an available class as a base and extend its functionality according to our needs. Since … Read more Inheritance in Python: Fundamentals for Data Scientists

Defining Goodness in Machine Learning Algorithms: Computational Learning Theory

PAC Bound Given a hypothesis space with a specific size, we can determine how many training examples are required to learn a probably approximately correct hypothesis. The below diagram illustrates the intuition for PAC Bound. Each point represents a possible m ∈ M examples. Each colored oval represents a set of “misleading” m examples for … Read more Defining Goodness in Machine Learning Algorithms: Computational Learning Theory

How to Rename Screenshots on Mac with 21 Lines of Python

PYTHON Using the os and datetime modules, a normally repetitive task now takes one click. Original image created for this project by little sister Taking screenshots on a Mac is as easy as hitting Command-Shift-3 for a fullscreen capture or Command-Shift-4 for a portion of the screen. You can also set a custom label for … Read more How to Rename Screenshots on Mac with 21 Lines of Python

5 Steps in Pandas to Process Petrophysical Well Logs

In the previous work, we implemented 10 simple steps using pandas to process petrophysical well logs in LAS format. In this project, we will go deeper to use more advanced approaches to process well log data. These 5 steps are:1) Function Definition2) Apply Function3) Lambda Function4) Cut Function5) Visualization To avoid extra work that we … Read more 5 Steps in Pandas to Process Petrophysical Well Logs

A Basic Introduction to TensorFlow Lite

An introduction to TensorFlow Lite Converter, Quantized Optimization, and Interpreter to run Tensorflow Lite models at the Edge In this article, we will understand the features required to deploy a deep learning model at the Edge, what is TensorFlow Lite, and how the different components of TensorFlow Lite can be used to make an inference … Read more A Basic Introduction to TensorFlow Lite

Predicting the Scores of the Postponed 2020 Rugby Six Nations Matches

Accessing and analysing Six Nations Rugby data, as well as predicting the scores of the remaining postponed matches The Six Nations Rugby Championship is the annual rugby union competition contested by the national teams of England, France, Ireland, Italy, Scotland, and Wales. The tournament was scheduled to conclude on 14 March 2020; however, due to … Read more Predicting the Scores of the Postponed 2020 Rugby Six Nations Matches

Many ways of Pivoting with Pandas

Data exploration hacks for Pythoneers Photo by Tekton on Unsplash Pandas offers an endless number of data exploration tools that are intuitive to use and are worth having at the tip of your fingers. This article explores one such tool, pivoting. On multiple occasions, data is obtained in its raw state as a linear tabular/unpivoted/stacked … Read more Many ways of Pivoting with Pandas

NLP Sentiment Analysis for beginners.

Set up and import libraries %matplotlib inlineimport stringimport numpy as npimport matplotlibimport matplotlib.pyplot as pltmatplotlib.rc(‘xtick’, labelsize=14) matplotlib.rc(‘ytick’, labelsize=14) Now, we load in the data and look at the first 10 comments with open(“sentiment_labelled_sentences/full_set.txt”) as f:content = f.readlines()content[0:10] ## Remove leading and trailing white spacecontent = [x.strip() for x in content]## Separate the sentences from the … Read more NLP Sentiment Analysis for beginners.

Popcorn Data — Analysing Cinema Seating Patterns

Photo by Markus Spiske on Unsplash It was now time to get our hands dirty by cleaning the data and pulling out relevant information. Using pandas, we parsed the JSON, cleaned it, and made a DataFrame with the data to improve readability and filter it easily. Since the seat data took a lot of memory, … Read more Popcorn Data — Analysing Cinema Seating Patterns

4 Ways To Speed Up Sorting Card Decks

In Layman terms, the computational complexity is the amount of time it takes to run an algorithm. Often, this is represented in what’s called “Big O Notation”, which is a method of writing the limiting (or worst case) behaviour of an algorithm. So for example, if you wanted to count the number of items in … Read more 4 Ways To Speed Up Sorting Card Decks

A modern guide to Spark RDDs

Everyday opportunities to reach the full potential of PySpark The web is full of Apache Spark tutorials, cheatsheets, tips and tricks. Lately, most of them have been focusing on Spark SQL and Dataframes, because they offer a gentle learning curve, with a familiar SQL syntax, as opposed to the steeper curve required for the older … Read more A modern guide to Spark RDDs

Classification with Random Forests in Python

Now let’s also look at the first five rows of data using the ‘.head()’ method: print(df.head()) The attribute information is as follows Source We will be predicting the class for mushrooms where the possible class values are ‘e’ for edible and ‘p’ for poisonous. The next thing we will do is convert each column into … Read more Classification with Random Forests in Python