Learn to Write Functions Others Can Use in Python

Do One Thing At a Time A common mistake many beginners make is writing too long and complicated functions. It is always recommended to design functions to only perform one specific task. Small and precise functions are easy to test and debug with modern IDEs and will be flexible. Now, you might be thinking: ‘I … Read more Learn to Write Functions Others Can Use in Python

10 Neat Python Tricks and Tips Beginners Should Know

Tricks to Become a Unique Beginner Photo by Cristian Lopez on Unsplash Python is a powerful general-purpose programming language. It is used in web development, data science, creating software prototypes, and so on. Fortunately for beginners, Python has simple easy-to-use syntax. This makes Python an excellent language to learn to program for beginners. Python is … Read more 10 Neat Python Tricks and Tips Beginners Should Know

How to Navigate Analytics Job Search During COVID-19

This analysis is a part of our project in the Summer Data Competition 2020 hosted by Fuqua School of Business. I want to send my special thank to my teammates: Yaqiong (Juno) Cao and Xinying (Silvia) Sun, for their great contribution. Photo by Annie Spratt on Unsplash Problem Definition Are you an analytics master student … Read more How to Navigate Analytics Job Search During COVID-19

How To Benchmark Any Models’ Inference Statistics For Production

Find pyinfer on Github : https://github.com/cdpierse/pyinfer Docs can be found here: https://pyinfer.readthedocs.io/en/latest/ When developing machine learning models initial efforts are often put on measuring metrics that reflect how well a model performs for a given task. This step is of course crucially important but when moving a model to production other factors come into play … Read more How To Benchmark Any Models’ Inference Statistics For Production

Practical Machine Learning Tutorial: Part.4 (Model Evaluation-2)

Multi-class Classification Problem: Geoscience example (Facies) In this part, we will elaborate on more model evaluation metrics specifically for multi-class classification problems. Learning curves will be discussed as a tool to come up with an idea of how to trade-off between bias and variance in the model parameter selection. ROC curves for all classes in … Read more Practical Machine Learning Tutorial: Part.4 (Model Evaluation-2)

Drop Duplicates in Pandas | Dean McGrath

Learn how to drop duplicates from a Pandas DataFrame to improve your data quality Photo by Samantha Lam on Unsplash Dropping duplicates from your data sets is a task you will regularly have to do as a Data Analyst. Whilst in some cases, duplicates may be valid frequently, they have been created through lax data … Read more Drop Duplicates in Pandas | Dean McGrath

Automate your job search with Python and Github Actions

A real-life example using Scrapy and Github Actions Photo by Marten Newhall on Unsplash Job hunting is a time-consuming task. A lot of different sites for job searches exist, but there is not a “one size fits all”. Job openings are available in job aggregators, LinkedIn, career pages of individual companies, even as tweets or … Read more Automate your job search with Python and Github Actions

Here’s Why You Should Learn Docker as a Data Scientist

You’ll be surprised by how easy it is. To use Docker you’ll need to install it. Download Docker Desktop from this link, install it and open up the application. Now create the following project structure anywhere on your computer: Image 1 — Directory structure for your Python app (image by author) Let’s start with what … Read more Here’s Why You Should Learn Docker as a Data Scientist

Testing Streamlit Apps Using SeleniumBase

In the time I’ve worked at Streamlit, I’ve seen hundreds of impressive data apps ranging from computer vision applications to public health tracking of COVID-19 and even simple children’s games. I believe the growing popularity of Streamlit comes from the fast, iterative workflows through the Streamlit “magic” functionality and auto-reloading the front-end upon saving your … Read more Testing Streamlit Apps Using SeleniumBase

How to Collect Live Feed and Frequently Updated Data Using Cron

Cron allows you to schedule repeat tasks, making it a great tool to run data collection scripts Photo by Nick Chong on Unsplash A major concern when collecting time series data is ensuring that all data is collected at equal time intervals. Without equal time intervals, you will be unable to use most methods for … Read more How to Collect Live Feed and Frequently Updated Data Using Cron

How to Code Ridge Regression from Scratch

Ridge Regression, like its sibling, Lasso Regression, is a way to “regularize” a linear model. In this context, regularization can be taken as a synonym for preferring a simpler model by penalizing larger coefficients. We can achieve this concretely by adding a measure of the size of our coefficients to our cost function, so that … Read more How to Code Ridge Regression from Scratch

Create “Interactive Globe + Earthquake Plot in Python

Here we create an interactive Globe like Google Earth using the topography data with an amazing visualization tool, Plotly. We also plot a global earthquake distribution on this interactive Globe. Image by Author Enjoy in here! Through creating this interactive plot, you can get the following ideas.– Deeper insights and more realistic application example to … Read more Create “Interactive Globe + Earthquake Plot in Python

Linear regression made easy. How does it work and how to use it in Python?

All you need to know about building a Machine Learning model using the linear regression algorithm Multiple linear regression model. Graph by author. Machine Learning is making huge leaps forward, with an increasing number of algorithms available so we can solve complex real-world problems. This story is part of a deep dive series explaining the … Read more Linear regression made easy. How does it work and how to use it in Python?

An Introduction to K-Nearest Neighbours Algorithm

Step 1: Possible k values possible_k=[1,3,5,7,9,11] Step 2: Finding Accuracy Score and MSE for each k value Calculating Accuracy score for each k value and appending to list “ac_scores” ac_scores=[]for k in possible_k:knn=KNeighborsClassifier(n_neighbors= k,weights = ‘distance’,metric=”euclidean” )knn.fit(x_train,y_train)y_pred=knn.predict(x_test)scores=accuracy_score(y_test,y_pred)ac_scores.append(scores)print (“Accuracy Scores :”,ac_scores) Output: Accuracy Scores : [0.8333333333333334, 1.0, 1.0, 0.8333333333333334, 0.8333333333333334, 0.8333333333333334] Step 3: Calculate Error Error … Read more An Introduction to K-Nearest Neighbours Algorithm

Deploying your Dash App to Heroku — THE MAGICAL GUIDE

(Image by author, inspired by Charlie the Unicorn and Gunicorn) So you have your Dash app running on your local machine and you’re finally ready to share it with the world on a public site. The problem is: words like like Git, Flask, Gunicorn and Heroku sound like strange mythical creatures, even after a few … Read more Deploying your Dash App to Heroku — THE MAGICAL GUIDE

Don’t Use Recursion In Python Any More

Python Closure — A Pythonic technique you must know I was such a programmer who likes recursive functions very much before, simply because it is very cool and can be used to show off my programming skills and intelligence. However, in most of the circumstances, recursive functions have very high complexity that we should avoid … Read more Don’t Use Recursion In Python Any More

Building a Custom Semantic Segmentation Model

Using your own data to create a robust computer vision model Following on from my previous post here, I wanted to see how feasible it would be to reliably detect and segment a Futoshiki puzzle grid from an image without using a clunky capture grid. It works surprisingly well even when trained on a tiny … Read more Building a Custom Semantic Segmentation Model

10 Magical facts about Python

Knowing facts is important Python is a general-purpose programming language. It is very easy to learn, easy syntax and readability are some of the reasons why developers are switching to python from other programming languages. We can use python as object oriented and procedure oriented language as well. It is open source and has tons … Read more 10 Magical facts about Python

Spark

I have noticed that whenever I talk about Spark the first thing that comes to listeners’ minds how similar or different it is from Big Data and Hadoop. So, let’s first understand how Spark is different from Hadoop. Spark is not Hadoop A common misconception is that Apache Spark is just a component of Hadoop. … Read more Spark

3D Point Cloud processing tutorial by F. Poux

3D Python The ultimate guide to subsample 3D point clouds from scratch, with Python. Two efficient methods are shown to import, process, structure as a voxel grid, and visualise LiDAR data. Point cloud sampling results by following the strategies explained in this guide. © F. Poux In this article, I will give you my two … Read more 3D Point Cloud processing tutorial by F. Poux

The Best Machine Learning Algorithm for Handwritten Digits Recognition

Implementing Machine Learning Classification Algorithms to Recognize Handwritten Digits (Image from Pixabay) Handwritten Digit Recognition is an interesting machine learning problem in which we have to identify the handwritten digits through various classification algorithms. There are a number of ways and algorithms to recognize handwritten digits, including Deep Learning/CNN, SVM, Gaussian Naive Bayes, KNN, Decision … Read more The Best Machine Learning Algorithm for Handwritten Digits Recognition

Introduction to Plotnine as the Alternative of Data Visualization Package in Python

The grammar of graphics with plotnine Did you know plotnine as the grammar of graphics for Python? Plotnine is the implementation of the R package ggplot2 in Python. It replicates the syntax of R package ggplot2 and visualizes the data with the concept of the grammar of graphics. It creates a visualization based on the … Read more Introduction to Plotnine as the Alternative of Data Visualization Package in Python

How to Scrape Dynamic Web pages with Selenium and Beautiful Soup

Beautiful Soup is a great tool for extracting data from web pages but it works with the source code of the page. Dynamic sites need to be rendered as the web page that would be displayed in the browser — that’s where Selenium comes in. Image by Author Beautiful Soup is an excellent library for … Read more How to Scrape Dynamic Web pages with Selenium and Beautiful Soup

Context Managers: a Data Scientist’s View

https://unsplash.com/photos/KVihRByJR5g?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink How python context managers can clean up your code This post is part of a series where I will be sharing things I’m learning on the topic of clean python code. I am a data scientist seeking to level up my python skills by writing more pythonic code and finding better ways to structure … Read more Context Managers: a Data Scientist’s View

New UC Davis Tool Tracks California’s COVID-19 Cases By Region

Regional tracking of COVID-19 cases aids day-to-day decision making in the UC Davis School of Veterinary Medicine Animation by Author Written by Pranav Pandit — Postdoctoral Research Fellow at One Health Institute at UC Davis The COVID-19 pandemic has highlighted the importance of constant and real-time disease surveillance to better control unprecedented local outbreaks. Early … Read more New UC Davis Tool Tracks California’s COVID-19 Cases By Region

4 Lessor-Known Yet Awesome Tips for Pytest

Master Unit Testing in Python with These 4 Tips Photo by Gayatri Malhotra on Unsplash As a data scientist, it is important to make sure your functions work as expected. A good practice is to write a small function then test your function with unit testing. Rather than trying to debug a big chunk of … Read more 4 Lessor-Known Yet Awesome Tips for Pytest

@Decorators in Python (Advanced)

factorial(10) checks type of 10,9,8,7,…recursively which is not necessary. we can solve this elegantly by using inner functions. def factorial(n):””” Calculates the factorial of n, n => integer and n >= 0.”””def inner_factorial(n):if n == 0:return 1else:return n * inner_factorial(n-1)if type(n) == int and n >=0:return inner_factorial(n)else:raise TypeError(“n should be an integer and n >= … Read more @Decorators in Python (Advanced)

4 Top AI/ML Github Repositories in November 2020

The researchers at Facebook have come out with an update to their Pixel-aligned Implicit Function (PIFu) model that aligns pixels of a 2D image with corresponding pixels of a 3D image. Using PIFu, Facebook have made a Deep Learning model (end-to-end) for digitising people, with the ability to infer 3D surface and texture from either … Read more 4 Top AI/ML Github Repositories in November 2020

Data science 101. Begginer steps. Data preprocessing, exploratory analysis

Photo by Franki Chamaki on Unsplash My story of exploring and analyzing data and trying to build a robust pipeline with classification task on my hands, having almost 0 experience in data science I have a statistical background, though never in my life, I’ve worked with statistics. I got rusty; I mean, short fragments exist … Read more Data science 101. Begginer steps. Data preprocessing, exploratory analysis

Python Dictionaries: Everything You Need to Know

This section is the most useful one for practical work. You’ll learn which methods you can call on dictionaries and how they behave. get() This method is used to grab a value for a particular key. Here’s how to grab the value for City of the weather dictionary: weather.get(‘City’) As you would expect, “Washington, D.C.” … Read more Python Dictionaries: Everything You Need to Know

5-Minute Machine Learning: Principal Component Analysis

Principal Component Analysis Image from Unsplash.com by @iamrbn Principal component analysis (PCA) is one of the most basic and widely used dimension reduction techniques. However, I’d also say its one of the least understood from a technical standpoint and maybe seen as “black box”. I’m going to try to clearly, and in 5 minutes, explain … Read more 5-Minute Machine Learning: Principal Component Analysis

Introduction to Geopy: Using Your Latitude & Longitude Data in Python

Getting Address, Postal Code, Distance, and More Photo by delfi de la Rua on Unsplash Longitude and latitude are great measures to obtain relatively precise locations of the observations, but oftentimes these coordinates alone do not tell enough stories. How can we easily convert these measures to something more meaningful? We can do many things … Read more Introduction to Geopy: Using Your Latitude & Longitude Data in Python

Top 11 Github Repositories to Learn Python

If you’ve ever worked with software, you must be aware of the platform GitHub. For the uninitiated, GitHub is a lot more than just a place to host all your code. It’s a place that lets you collaborate with other developers and manage your code repositories online with a range of specialized tools that are … Read more Top 11 Github Repositories to Learn Python

Augmenting Training Data For Image Recognition

Expand your machine learning training dataset using the Augmentor library Image recognition technology has emerged to be one of the most popular applications for machine learning in recent years. Image recognition (IR) is often used to detect certain people/places/things inside another, larger image. IR has proven useful for tasks like brand detection, crime prevention, and … Read more Augmenting Training Data For Image Recognition

Speed Cubing for Machine Learning

Episode 2: Using GPUs, RAPIDS, CuPy and VisPy libraries Photo by Michael Dziedzic on Unsplash Previously on “Speed Cubing for Machine Learning”… In Episode 1 [1], we described how to generate 3D data as fast as possible to feed some Generative Adversarial Networks, using CPUs, multithreading and Cloud resources. We reached a rate of 2 … Read more Speed Cubing for Machine Learning

Basic Data Visualizations Using Plotly Express

Using Plotly Express to produce basic data visualizations of Amazon’s Alexa reviews dataset from Kaggle. Photo by Edward Howell on Unsplash Plotly Express is a high-level Python visualization library — it is a wrapper for Plotly.py that displays a simple syntax for complex charts. It was first introduced in version 4.0.0 and is a high-level … Read more Basic Data Visualizations Using Plotly Express

Using Machine Learning to Find Exoplanets with NASA’s Dataset

MACHINE LEARNING Learn how to build an algorithm to find planets out of the solar system Image by Lucas Pezeta. Source: Pexels A few weeks ago, I wrote an article about using data science in meaningful ways that could help our world become a better place. Now, let’s talk a little bit about other worlds. … Read more Using Machine Learning to Find Exoplanets with NASA’s Dataset

4 Rarely-Used Yet Very Useful Pandas Tricks

The groupby function is commonly used in exploratory data analysis. Combined with the agg function, we are able to apply different aggregation functions to different columns. The NamedAgg method allows us to rename the aggregated columns inside the agg function. Let’s first create a dataframe. import numpy as npimport pandas as pdcats = pd.Series(list(‘abc’)*3).sample(n=9).reset_index(drop=True)df = … Read more 4 Rarely-Used Yet Very Useful Pandas Tricks

Deriving Patterns of Fraud from the Enron Dataset

The Enron email and financial datasets are big, messy treasure troves of information, which become much more useful once you know your way around them a bit. Enron’s complete data may be downloaded from this link here, and the refined pickle files may be downloaded from the following Github repository along with the complete code … Read more Deriving Patterns of Fraud from the Enron Dataset

Disjoint Set and Tarjan’s Off-line Lowest Common Ancestor Algorithm

Let’s apply the algorithm to an example binary tree: Binary Tree (Image by Author) Iterating through the binary tree post order, we first start at node 7: (Image by Author) Nothing much happens here, we make a set for node 7, with parent = 7, ancestor = 7, and size = 1, and we mark … Read more Disjoint Set and Tarjan’s Off-line Lowest Common Ancestor Algorithm

How to Set Up Automated Tasks in Linux Using Cron

GETTING STARTED A Step-by-Step Guide to Setting Up a Cron Job Photo by Possessed Photography on Unsplash Have you ever found yourself doing repetitive tasks on a regular basis? For example, deleting temporary files every week to conserve your disk space, scraping data from a site every week to gather new information or sending recurring … Read more How to Set Up Automated Tasks in Linux Using Cron

Fourier Convolutions in PyTorch

Math and code for efficiently computing large convolutions with FFTs. Photo by Faye Cornish on Unsplash Note: Complete methods for 1D, 2D, and 3D Fourier convolutions are provided in this Github repo. I also provide PyTorch modules, for easily adding Fourier convolutions to a trainable model. Convolutions Convolutions are ubiquitous in data analysis. For decades, … Read more Fourier Convolutions in PyTorch

Fine-Tuning Pre-trained Model VGG-16

After importing the necessary libraries, our train/test set, and preprocessing the data (described here), we dive into modeling: First, import VGG16 and pass the necessary arguments: from keras.applications import VGG16vgg_model = VGG16(weights=’imagenet’, include_top=False, input_shape=(224, 224, 3)) 2. Next, we set some layers frozen, I decided to unfreeze the last block so that their weights get … Read more Fine-Tuning Pre-trained Model VGG-16

How to Code Linear Regression from Scratch

Think back to your first algebra class: do you remember the equation for a line? If you said “y = mx + b”, you’re absolutely right. I think it’s also helpful to start in two dimensions, because without using any matrices or vectors, we can already see that given inputs x, and outputs y, we … Read more How to Code Linear Regression from Scratch

All the Pandas shift() you should know for data analysis

Pandas shift() shift index by the desired number of periods. The simplest call should have an argument periods (It defaults to 1) and it represents the number of shifts for the desired axis. And by default, it is shifting values vertically along the axis 0 . NaN will be filled for missing values introduced as … Read more All the Pandas shift() you should know for data analysis

Autonomous Driving Dataset Visualization with Python and VizViewer

Disclosure: The author is involved in VizViewer’s development. As part of a recently published paper and Kaggle competition, Lyft has made public a dataset for building autonomous driving path prediction algorithms. The dataset includes a semantic map, ego vehicle data, and dynamic observational data for moving objects in the vehicle’s vicinity. The challenge presented by … Read more Autonomous Driving Dataset Visualization with Python and VizViewer