A Basic Guide to Initial and Exploratory Data Analysis

With a few examples in Python A data analyst is defined differently in different work setups. A data analyst might be contributing to all kinds of work — including MIS, reporting, data engineering, database management, etc. in a real scenario. It’s not necessarily bad. But here, what we’ll be talking about it the actual job … Read more A Basic Guide to Initial and Exploratory Data Analysis

Fuzzy String Matching in Python

Photo by Romain Vignes on Unsplash Finding strings that approximately match a pattern in your data using Python. As a data scientist, you are forced to retrieve information from various sources by either leveraging publicly available API’s, asking for data, or by simply scraping your own data from a web page. All this information is … Read more Fuzzy String Matching in Python

Getting Started with NLP for Indic Languages

Word Embedding for Indic Languages Will now go into the topic by getting word embedding for Indic languages. Numerically representing words plays a role in any NLP task. We are going to use the Natural Language Toolkit for Indic Languages (iNLTK) library. iNLTK is an open-source Deep Learning library built on top of PyTorch aiming … Read more Getting Started with NLP for Indic Languages

Find Common Words in an Article with Python Module Newspaper and NLTK

A step-by-step guide to extracting information and finding insights from newspapers using newspaper3k and NLTK You want to extract essential information from an interesting article, but find the article is too long to read with your limited amount of time. Before you dive into the whole article and end up disappointed with the irrelevant content, … Read more Find Common Words in an Article with Python Module Newspaper and NLTK

What is Stationarity in Time Series and why should you care

Remember the import you did from the statsmodelslibrary? We’re gonna use it now to test for stationarity. The statoolscontains adfullermethod to which you can pass your time-series data: Well, the situation is not great. As expected, the time series isn’t stationary, which the p-value confirms (0.99). Let’s explore a method that will differentiate the series … Read more What is Stationarity in Time Series and why should you care

Choose stocks to invest in with Python

Input parameters: Number of stocks (I) Return of each stock for each year (a[t][i]) Total investment limit each year (b[t]) Decision variables: Whether or not to select a stock each year x[t][i] Constraint: Total investment each year cannot exceed b[t] Objective: Maximize total return Save the given information in arrays and matrices c = [90, … Read more Choose stocks to invest in with Python

Translate a Pandas data frame using googletrans library

Google translator logo Googletrans is a free python library that uses Google Translate API. In this article, we explain how to employ the library to translate strings as well as data frames in Python. Googletrans is a third-party library that we can install by using pip. After installing the library, we import the module googletrans. … Read more Translate a Pandas data frame using googletrans library

Most Effective Way To Implement Radial Basis Function Neural Network for Classification Problem

How to use K-Means Clustering along with Linear regression to classify images Radial Basis Function Neural Network or RBFNN is one of the unusual but extremely fast, effective and intuitive Machine Learning algorithms. The 3-layered network can be used to solve both classification and regression problems. In this article, the implementation of MNIST Handwritten Digits … Read more Most Effective Way To Implement Radial Basis Function Neural Network for Classification Problem

Web Scraping Wikipedia with BeautifulSoup

My roommate and I had a discussion about her observation of the high depression rate in Sweden. We drew the connection between the depression rate and the lack of sunshine. I decided to support my hypothesis by gathering my own data and analyzing it. I use Beautiful Soup, an easy-to-use Python tool for web scraping. … Read more Web Scraping Wikipedia with BeautifulSoup

Achieving Stationarity With Time Series Data

An illustration of the principles of stationarity, Source: BeingDatum Most time series models work under the assumption that the underlying data is stationary, that is the mean, variance, and covariance are not time-dependent. More likely than not your time series will not be stationary which means that you will have to identify the trends present … Read more Achieving Stationarity With Time Series Data

Breaking Down Goodreads Dataset using Python

We can now delete the last column. del data[‘extra’] We now have to convert the data types of the columns back to their original types. pd.to_numeric automatically configures float or int data types. data[“average_rating”] = pd.to_numeric(data.average_rating) data[“ratings_count”] = pd.to_numeric(data.ratings_count) data[“# num_pages”] = pd.to_numeric(data.[“# num_pages”])data[“text_reviews_count”] = pd.to_numeric(data.[“text_reviews_count”]) Filtering data: I want to only keep those books … Read more Breaking Down Goodreads Dataset using Python

Four Useful Functions For Exploring Data in Python

During the process of exploring data I often find myself repeatedly defining similar python logic in order to carry out simple analytical tasks. For example, I often calculate the mean and standard deviation of a numerical column for specific categories within data. I also often analyze the frequency of categorical values within the data. In … Read more Four Useful Functions For Exploring Data in Python

How does Python work?

image credits — https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQAQ5hOZAjAWsKwFbOXNONYWW-Mg4dxL7cWc-1gufkYvviTnvH8SA&s As a Machine Learning Engineer, I have been using Python for more than a year. Recently, I have also started learning C++, for fun. It made me realize how easy and intuitive Python is. I got more curious about how Python is different from other languages and its working. In this … Read more How does Python work?

How To Manipulate Date And Time In Python Like A Boss

Today’s date and time in different formats Let’s warm up from the most basic. Below is the code that will print the current year, month, date, hour, minute, seconds and milliseconds. In: from datetime import datetime d = datetime.now() #today’s datetimedOut:datetime.datetime(2019, 12, 22, 13, 14, 18, 193898) This is very useful information but often we … Read more How To Manipulate Date And Time In Python Like A Boss

Run Native Julia Code WITH PYTHON!

Speed up Python by switching languages I frequently discuss the benefits of using Julia, as there are many of them. Julia is a scalable, high-performance, and high-level language that is easy to learn and can get nearly any job done. This is especially true for Data Scientists, as Julia’s center of mass is statistical computing … Read more Run Native Julia Code WITH PYTHON!

Linear Regression with Only Python and Numpy

Writing a machine learning model just with Numpy and Python In this post, we’ll see how to implement linear regression in Python without using any machine learning libraries. In another post, we saw how the linear regression algorithm works in theory. With the rise in popularity of machine learning libraries, anyone can implement ML algorithms … Read more Linear Regression with Only Python and Numpy

LASSO Regression Tutorial

LASSO regression is an example of regularized regression. Regularization is one approach to tackle the problem of overfitting by adding additional information, and thereby shrinking the parameter values of the model to induce a penalty against complexity. The 3 most popular approaches to regularized linear regression are the so-called Ridge Regression, Least Absolute Shrinkage and … Read more LASSO Regression Tutorial

How to Define Custom Exception Classes in Python

Creating a Custom Dictionary that can only store integers and floats as its values Let’s progress now, and demonstrate how custom error classes can easily and usefully be incorporated into our own programs. To begin, I will create a slightly contrived example. In this fictitious example, I will create a custom dictionary, that can only … Read more How to Define Custom Exception Classes in Python

Making A Model Is Like Baking A Cake

Photo by Henry Be on Unsplash The Types of Cakes Available As we progress further in our modern era, the advances of Data Science and Technology continue to make marvel strides in various fields of study and practice. As a result of the vast applicability of Data Science and Technology, various different types of models … Read more Making A Model Is Like Baking A Cake

Time Series Forecasting with a SARIMA Model

Predicting daily electricity loads for a building on the UC Berkeley campus Photo credit: Elena Zhukova Hey there! In this article, I’ll run through an example of electricity load forecasting using a SARIMA model. Three years of daily electricity load data was gathered for a building on the UC Berkeley campus to create a model … Read more Time Series Forecasting with a SARIMA Model

AWS and Python: The Boto3 Package

Domo Arigato, AWS Boto It’s 2020 and the world of cloud storage and computing will most likely be the direction of most businesses in the coming decades. The prospect of having scalable storage and computing power without having to purchase physical equipment is very appealing. The three big dogs of the cloud are Amazon Web … Read more AWS and Python: The Boto3 Package

Building a Dynamic data pipeline with Databricks and Azure Data Factory

There is the choice of high concurrency cluster in Databricks or for ephemeral jobs just using job cluster allocation. After creating the connection next step is the component in the workflow. Below we look at utilizing a high-concurrency cluster. fig 1 — Databricks ADF pipeline component settings Adjusting base parameter settings here as in fig1 … Read more Building a Dynamic data pipeline with Databricks and Azure Data Factory

Build a simple real-life chat app with Python

Development of the chat apps Before we get to the tutorial, let us understand the history and the reason for the development of the chat apps. blog.eduonix.com Chat applications have been here for quite a long time, be it the ancient yahoo chat rooms or the advanced and modern chat applications such as WhatsApp, Facebook … Read more Build a simple real-life chat app with Python

Let’s Make a KNN Classifier from Scratch

The time has come to evaluate our algorithm. For simplicity, I’ve decided to use the famous Iris dataset which can be loaded from Scikit-Learn. Here’s the code to get you going: And here’s how everything should look like: If you’re wondering what Bamboolib is, check out this article: Introducing Bamboolib — a GUI for Pandas … Read more Let’s Make a KNN Classifier from Scratch

How to train your Neural Net — Tensors and Autograd

In this blog post, we will implement some of the most commonly used tensor operations and talk a little about the Autograd functionality in PyTorch. import numpy as npimport torchfrom torch.autograd import grad Create an unitialized tensor Long tensor. x = torch.LongTensor(3, 4)print(x)################## OUTPUT ##################tensor([[140124855070912, 140124855070984, 140124855071056, 140124855071128],[140124855071200, 140124855071272, 140124855068720, 140125080614480],[140125080521392, 140124855066736, 140124855066800, 140124855066864]]) Float … Read more How to train your Neural Net — Tensors and Autograd

Time Series Forecasting Using a Seasonal ARIMA Model

A Tutorial in Python One of the most widely studied models in time series forecasting is the ARIMA (autoregressive integrated moving average) model. Many variations of the ARIMA model exist, which employ similar concepts but with tweaks. One particular example is the seasonal ARIMA (SARIMA) model. The SARIMA model accounts for seasonality when generating time … Read more Time Series Forecasting Using a Seasonal ARIMA Model

A simulation framework to analyze airplane boarding methods

Developing a Python program to calculate boarding times for various configurations and to visualize the boarding procedure Photo by Suhyeon Choi on Unsplash There are various reasons why people are afraid of the letter ‘C’. Lower academic grades are denoted by C. Complicated, lower-level programming languages are named C. However, the C that scared me … Read more A simulation framework to analyze airplane boarding methods

How to Create Your Own Image Dataset for Deep Learning

Bridging the Gap Between Introductory Learning and Real-World Application Photo by Beata Ratuszniak on Unsplash There are a plethora of MOOCs out there that claim to make you a deep learning/computer vision expert by walking you through the classic MNIST problem. That’s essentially saying that I’d be an expert programmer for knowing how to type: … Read more How to Create Your Own Image Dataset for Deep Learning

30 Python Best Practices, Tips, And Tricks

Improve your Python knowledge and skills Photo by author With the holidays behind us, most of us have returned to our day jobs. For all those hard workers, here are 30 Python best practices, tips, and tricks. I’m sure they’ll help you procrastinate your actual work, and still learn something useful in the process. In … Read more 30 Python Best Practices, Tips, And Tricks

Python One-liner Distributed Acceleration with Wordbatch Apply

Using the Wordbatch Apply-decorator to Process Any Function by Distributed Map-Reduce Parallel and Distributed Python Computing hardware is undergoing a period of rapid evolution due to advances in semiconductor manufacturing processes, with 64-core consumer CPUs and 80-core server CPUs arriving at the market this year. This is bringing about a quantitative leap in CPU performance, … Read more Python One-liner Distributed Acceleration with Wordbatch Apply

Conditional probability with a python example

Use python to calculate the conditional probability of a student getting an A in math given they missed 10 or more classes. This article has 2 parts:1. Theory behind conditional probability2. Example with python For once, wikipedia has an approachable definition, In probability theory, conditional probability is a measure of the probability of an event … Read more Conditional probability with a python example

The Best New Geospatial Data Science Libraries In 2019

As a Geospatial data scientist, 2019 brought some new tools that made my life easier. In this post, I am sharing the best of these new additions in the Python ecosystem and some resources to get you started. You will find tools that accelerate your Geospatial data science pipelines using GPU, advanced Geospatial Visualization tools … Read more The Best New Geospatial Data Science Libraries In 2019

How I built an Audio-Based Music Genre Predictor using Python and the Spotify Web API

A decision tree, a popular tool in the field of machine learning, uses its tree-like structure to make decisions. Photo by Stephen Milborrow on Wikipedia — “A tree showing survival of passengers on the Titanic (“sibsp” is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and … Read more How I built an Audio-Based Music Genre Predictor using Python and the Spotify Web API

Discovering Spotify Wrapped with Python — An Extended Data Exploration

The 2010s have been wrapped and unless you’ve been living under a rock, in recent days you most likely came across that personalized visual representation of one’s listening history throughout the year; formally known as Spotify Wrapped. Although some data folks might think that it’s just essentially an output of COUNT, SUM, WHERE and GROUP … Read more Discovering Spotify Wrapped with Python — An Extended Data Exploration

Fine tune Albert with Pre-training on Custom Corpus

Photo by Esperanza Zhang on Unsplash Pre-train Albert with custom corpus for domain-specific texts and further fine-tune the pre-trained model for application tasks This post illustrates the simple steps to pre-train the state of art Albert[1] NLP model on a custom corpus and further fine-tune the pre-trained Albert model on specific downstream tasks. The custom … Read more Fine tune Albert with Pre-training on Custom Corpus

Be brave and go scrape your own data.

A data scientist who uses just prebuilt datasets is like a chef who just microwaves frozen food. Photo by Daan Stevens on Unsplash In the process of developing your data science skills, quite often you dream too big and find yourself disconnected from learning what you will really need in order to do your everyday … Read more Be brave and go scrape your own data.

Common Machine Learning Programming Errors in Python

Common Python Errors in Machine Learning Source In this post I will go over some of the most common errors I come across in python during the model building and development process. For demonstration purposes we will use height/weight data which can be found here on Kaggle. The data contains gender, height in inches and … Read more Common Machine Learning Programming Errors in Python

Send your new year greetings by email and text message using Python

There are multiple ways actually you can use Python to send your new year greetings by text messages. Method 1. A tweak of the email sending option. As many of you may be aware, many cellphone carriers allow you to send text messages using email. Here are just a short list for major carriers in … Read more Send your new year greetings by email and text message using Python

SVD in Machine Learning: Underdetermined Least Squares

Least squares can be described as follows: given the feature matrix X of shape n × p and the target vector y of shape n × 1, we want to find a coefficient vector w’ of shape n × 1 that satisfies w’ = argmin{∥y — Xw∥²}. Intuitively, least squares attempts to approximate the solution … Read more SVD in Machine Learning: Underdetermined Least Squares

6 New Features in Python 3.8 for Python Newbies

Parameters in a Python function can accept two types of arguments. Positional arguments that are passed positionally Keyword arguments that are supplied by keywords In the following example, the values of both parameters a and b can be supplied by positional or keyword arguments. Flexible. def my_func(a, b=1):return a+bmy_func(5,2) # both positional argumentsmy_func(a=5,b=2) # both … Read more 6 New Features in Python 3.8 for Python Newbies