Automatic Portfolio Optimization

Extract optimal asset weights for your portfolio using Python Introduction Portfolio optimization is a widely studied topic, especially in academia. The main idea is to maximize a portfolio’s value by finding the most productive combination of assets to yield the highest return. In this article, I will show you how to create your own Python … Read more Automatic Portfolio Optimization

The Many Ways To Call Axes In Matplotlib

In matplotlib terminology, a basic plot starts from one figure and at least one axes (if you are confused about these terms, you may find this post is useful). A close analogy with painting, figure is the canvas and axes is the artistic composition. A canvas (figure) can have only one type or many different … Read more The Many Ways To Call Axes In Matplotlib

Data Cleaning with Pandas — Avoid this Mistake!

https://unsplash.com/photos/FOsina4f7qM Pandas is an extremely useful data manipulation package in Python. For the most part, functions are intuitive, speedy, and easy to use. But once, I spent hours debugging a pipeline to discover that mixing types in a Pandas column will cause all sorts of problems later in a pipeline. Read more to discover what … Read more Data Cleaning with Pandas — Avoid this Mistake!

A Layman’s Guide to Fuzzy Document Deduplication

Why Gensim? Gensim is a Python library popularly used for topic modeling. However it also has very valuable utilities for deduplication. While there are several efficient ways to calculate cosine similarity in Python, including use of the popular SKLearn library, Gensim’s major advantage comes when your dataset grows very large. When your corpus grows beyond … Read more A Layman’s Guide to Fuzzy Document Deduplication

Integrate JupyterLab with Google Drive

And as a data scientist your work, insights, and conclusions are of vital importance, whether they are work-related or just something you’ve been working on the side. Sure, you can always bring a flash drive with you, but that’s also an inconvenient option, and needless to say, that flash drives are so easy to lose. … Read more Integrate JupyterLab with Google Drive

A (Really) Gentle Introduction to NLP in Python

NLP Made Simple Hold my hand and let’s get started together. source I know, it’s not easy. NLP is a thing that everybody talks about, and it seems like everyone is doing it, besides yourself, who are lost and sad in the middle of the crowd. No worries, Natural Language Processing (NLP) is a hard … Read more A (Really) Gentle Introduction to NLP in Python

Maximizing Efficiency in Python — Six Best Practices for Implementing Python3.7 in Production.

3. Always Account For Memory and Efficiency Simple python programs will generally never run into issues relating to memory, however this topic will become crucial as scripts grow larger and more complex. Unlike other languages, the Python interpreter performs memory management in the background leaving users with no control whatsoever. For more information regarding memory … Read more Maximizing Efficiency in Python — Six Best Practices for Implementing Python3.7 in Production.

Ultimate Setup for Your Next Python Project

Starting any project from scratch can be a daunting task… But not if you have this ultimate Python project blueprint! Original image by @sxoxm on Unsplash Whether you are working on some machine learning/AI project, building web apps in Flask or just writing some quick Python script, it’s always useful to have some template for … Read more Ultimate Setup for Your Next Python Project

Should I buy a lottery ticket?

Lottery Ticket Analysis I analyzed past lottery data to decide to buy a lottery ticket using statistics and probability. Photo by dylan nolte on Unsplash I often find myself in deciding between buying a lottery ticket or not especially for the powerball new year’s eve draw. The reason why I feel hesitant in those moments … Read more Should I buy a lottery ticket?

The truth about the martingale betting system

I swear by the name of Science that the evidence I shall give shall be the truth, the whole truth, and nothing but the truth. About the simulation from random import *def roll():result = randint(1,36)results.append(result)results = []for i in range(1000000):roll() The script simulates 1000000 roulette outcomes within a second. At each simulation, a random whole … Read more The truth about the martingale betting system

Download Email Attachment from Microsoft Exchange Web Services Automatically

Automating The Dull Routine With Python Learn to Handle Email Attachment Using Python Library Exchangelib Photo by Webaroo.com.au on Unsplash Did you need to download email attachments regularly? Do you want to automate this boring process? I know that feel bro. When I first come to my job, I was assigned a daily task: download … Read more Download Email Attachment from Microsoft Exchange Web Services Automatically

Hands-on Web Scraping: Building your Twitter dataset with python and scrapy

This assumes that you have some basic knowledge of python and scrapy. If you are interested only in generating your dataset, skip this section and go to the sample crawl section on the GitHub repo. Gathering tweets URL by searching through hashtags For searching for tweets we will be using the legacy Twitter website. Let’s … Read more Hands-on Web Scraping: Building your Twitter dataset with python and scrapy

A Tale of Two Cities — A mystery solved with Pandas

Could Perth really be wetter than Melbourne? Photo by Ricardo Resende on Unsplash Having recently moved from Melbourne to Perth I found it natural to make comparisons between the two cities. Which one has better coffee? OK, that one is easy — Melbourne hands down! Which one has more rain — well to answer that … Read more A Tale of Two Cities — A mystery solved with Pandas

Why Python is better than R for Data Science careers

Most companies require their data scientists to do more than predictive modeling (ie machine learning). At the least, you’ll probably be required to maintain the data pipelines that feed your models, and those data pipelines will likely be built in Python. The industry standard for pipelines today is the Python-based Airflow, and at Facebook we … Read more Why Python is better than R for Data Science careers

Pandas tips that will save you hours of head scratching

Making your Data Analysis experiments reproducible saves time for you and others in the long term When revisiting a problem you’ve worked on in the past and finding out that the code doesn’t work is frustrating. Making your Data Analysis experiments reproducible saves time for you and others in the long term. These tips will … Read more Pandas tips that will save you hours of head scratching

Plotly Python: Scatter Plots

fig = go.Figure(data=go.Scatter(x=steamdf[‘price’],y=steamdf[‘average_playtime’],mode=’markers’,marker_size=steamdf[‘ratio’],hovertext=steamdf[‘name’],hoverlabel=dict(namelength=0),hovertemplate=’%{hovertext}<br>Price: %{x:$}<br>Avg. Playtime: %{y:,} min’,marker=dict(color=’rgb(255, 178, 102)’,size=10,line=dict(color=’DarkSlateGrey’,width=1))))fig.update_layout(title=’Price vs. Average Playtime’,xaxis_title=’Price (GBP)’,yaxis_title=’Average Playtime (Minutes)’,plot_bgcolor = ‘white’,paper_bgcolor = ‘whitesmoke’,font=dict(family=’Verdana’,size=16,color=’black’))fig.update_xaxes(showline=True,linewidth=2,linecolor=’black’,mirror=True,showgrid=False,zerolinecolor=’black’,zerolinewidth=1,range=[-1, 65])fig.update_yaxes(showline=True,linewidth=2,linecolor=’black’,mirror=True,showgrid=True,gridwidth=1,gridcolor=’grey’,zerolinecolor=’black’,zerolinewidth=1,range=[-2000, 40000])fig.show() I hope this covers enough to get you feeling confident with creating and customizing scatter plots with Plotly! Favorite

Bulk Mapping Attributes to Dataframes using Python Pandas

When it comes to iterating through large volumes of rows in Pandas dataframes, many of us have impatiently waited for our program to finish looping, sometimes row by row for painstakingly long periods of time. This was one of my main struggles when loading high-volume transactional data as a Pandas dataframe, and then enriching the … Read more Bulk Mapping Attributes to Dataframes using Python Pandas

A Basic Guide to Initial and Exploratory Data Analysis

With a few examples in Python A data analyst is defined differently in different work setups. A data analyst might be contributing to all kinds of work — including MIS, reporting, data engineering, database management, etc. in a real scenario. It’s not necessarily bad. But here, what we’ll be talking about it the actual job … Read more A Basic Guide to Initial and Exploratory Data Analysis

Fuzzy String Matching in Python

Photo by Romain Vignes on Unsplash Finding strings that approximately match a pattern in your data using Python. As a data scientist, you are forced to retrieve information from various sources by either leveraging publicly available API’s, asking for data, or by simply scraping your own data from a web page. All this information is … Read more Fuzzy String Matching in Python

Getting Started with NLP for Indic Languages

Word Embedding for Indic Languages Will now go into the topic by getting word embedding for Indic languages. Numerically representing words plays a role in any NLP task. We are going to use the Natural Language Toolkit for Indic Languages (iNLTK) library. iNLTK is an open-source Deep Learning library built on top of PyTorch aiming … Read more Getting Started with NLP for Indic Languages

Find Common Words in an Article with Python Module Newspaper and NLTK

A step-by-step guide to extracting information and finding insights from newspapers using newspaper3k and NLTK You want to extract essential information from an interesting article, but find the article is too long to read with your limited amount of time. Before you dive into the whole article and end up disappointed with the irrelevant content, … Read more Find Common Words in an Article with Python Module Newspaper and NLTK

What is Stationarity in Time Series and why should you care

Remember the import you did from the statsmodelslibrary? We’re gonna use it now to test for stationarity. The statoolscontains adfullermethod to which you can pass your time-series data: Well, the situation is not great. As expected, the time series isn’t stationary, which the p-value confirms (0.99). Let’s explore a method that will differentiate the series … Read more What is Stationarity in Time Series and why should you care

Choose stocks to invest in with Python

Input parameters: Number of stocks (I) Return of each stock for each year (a[t][i]) Total investment limit each year (b[t]) Decision variables: Whether or not to select a stock each year x[t][i] Constraint: Total investment each year cannot exceed b[t] Objective: Maximize total return Save the given information in arrays and matrices c = [90, … Read more Choose stocks to invest in with Python

Translate a Pandas data frame using googletrans library

Google translator logo Googletrans is a free python library that uses Google Translate API. In this article, we explain how to employ the library to translate strings as well as data frames in Python. Googletrans is a third-party library that we can install by using pip. After installing the library, we import the module googletrans. … Read more Translate a Pandas data frame using googletrans library

Most Effective Way To Implement Radial Basis Function Neural Network for Classification Problem

How to use K-Means Clustering along with Linear regression to classify images Radial Basis Function Neural Network or RBFNN is one of the unusual but extremely fast, effective and intuitive Machine Learning algorithms. The 3-layered network can be used to solve both classification and regression problems. In this article, the implementation of MNIST Handwritten Digits … Read more Most Effective Way To Implement Radial Basis Function Neural Network for Classification Problem

Web Scraping Wikipedia with BeautifulSoup

My roommate and I had a discussion about her observation of the high depression rate in Sweden. We drew the connection between the depression rate and the lack of sunshine. I decided to support my hypothesis by gathering my own data and analyzing it. I use Beautiful Soup, an easy-to-use Python tool for web scraping. … Read more Web Scraping Wikipedia with BeautifulSoup

Achieving Stationarity With Time Series Data

An illustration of the principles of stationarity, Source: BeingDatum Most time series models work under the assumption that the underlying data is stationary, that is the mean, variance, and covariance are not time-dependent. More likely than not your time series will not be stationary which means that you will have to identify the trends present … Read more Achieving Stationarity With Time Series Data

Breaking Down Goodreads Dataset using Python

We can now delete the last column. del data[‘extra’] We now have to convert the data types of the columns back to their original types. pd.to_numeric automatically configures float or int data types. data[“average_rating”] = pd.to_numeric(data.average_rating) data[“ratings_count”] = pd.to_numeric(data.ratings_count) data[“# num_pages”] = pd.to_numeric(data.[“# num_pages”])data[“text_reviews_count”] = pd.to_numeric(data.[“text_reviews_count”]) Filtering data: I want to only keep those books … Read more Breaking Down Goodreads Dataset using Python

Four Useful Functions For Exploring Data in Python

During the process of exploring data I often find myself repeatedly defining similar python logic in order to carry out simple analytical tasks. For example, I often calculate the mean and standard deviation of a numerical column for specific categories within data. I also often analyze the frequency of categorical values within the data. In … Read more Four Useful Functions For Exploring Data in Python

How does Python work?

image credits — https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQAQ5hOZAjAWsKwFbOXNONYWW-Mg4dxL7cWc-1gufkYvviTnvH8SA&s As a Machine Learning Engineer, I have been using Python for more than a year. Recently, I have also started learning C++, for fun. It made me realize how easy and intuitive Python is. I got more curious about how Python is different from other languages and its working. In this … Read more How does Python work?

How To Manipulate Date And Time In Python Like A Boss

Today’s date and time in different formats Let’s warm up from the most basic. Below is the code that will print the current year, month, date, hour, minute, seconds and milliseconds. In: from datetime import datetime d = datetime.now() #today’s datetimedOut:datetime.datetime(2019, 12, 22, 13, 14, 18, 193898) This is very useful information but often we … Read more How To Manipulate Date And Time In Python Like A Boss

Run Native Julia Code WITH PYTHON!

Speed up Python by switching languages I frequently discuss the benefits of using Julia, as there are many of them. Julia is a scalable, high-performance, and high-level language that is easy to learn and can get nearly any job done. This is especially true for Data Scientists, as Julia’s center of mass is statistical computing … Read more Run Native Julia Code WITH PYTHON!

Linear Regression with Only Python and Numpy

Writing a machine learning model just with Numpy and Python In this post, we’ll see how to implement linear regression in Python without using any machine learning libraries. In another post, we saw how the linear regression algorithm works in theory. With the rise in popularity of machine learning libraries, anyone can implement ML algorithms … Read more Linear Regression with Only Python and Numpy

LASSO Regression Tutorial

LASSO regression is an example of regularized regression. Regularization is one approach to tackle the problem of overfitting by adding additional information, and thereby shrinking the parameter values of the model to induce a penalty against complexity. The 3 most popular approaches to regularized linear regression are the so-called Ridge Regression, Least Absolute Shrinkage and … Read more LASSO Regression Tutorial

How to Define Custom Exception Classes in Python

Creating a Custom Dictionary that can only store integers and floats as its values Let’s progress now, and demonstrate how custom error classes can easily and usefully be incorporated into our own programs. To begin, I will create a slightly contrived example. In this fictitious example, I will create a custom dictionary, that can only … Read more How to Define Custom Exception Classes in Python

Making A Model Is Like Baking A Cake

Photo by Henry Be on Unsplash The Types of Cakes Available As we progress further in our modern era, the advances of Data Science and Technology continue to make marvel strides in various fields of study and practice. As a result of the vast applicability of Data Science and Technology, various different types of models … Read more Making A Model Is Like Baking A Cake

Time Series Forecasting with a SARIMA Model

Predicting daily electricity loads for a building on the UC Berkeley campus Photo credit: Elena Zhukova Hey there! In this article, I’ll run through an example of electricity load forecasting using a SARIMA model. Three years of daily electricity load data was gathered for a building on the UC Berkeley campus to create a model … Read more Time Series Forecasting with a SARIMA Model

AWS and Python: The Boto3 Package

Domo Arigato, AWS Boto It’s 2020 and the world of cloud storage and computing will most likely be the direction of most businesses in the coming decades. The prospect of having scalable storage and computing power without having to purchase physical equipment is very appealing. The three big dogs of the cloud are Amazon Web … Read more AWS and Python: The Boto3 Package

Building a Dynamic data pipeline with Databricks and Azure Data Factory

There is the choice of high concurrency cluster in Databricks or for ephemeral jobs just using job cluster allocation. After creating the connection next step is the component in the workflow. Below we look at utilizing a high-concurrency cluster. fig 1 — Databricks ADF pipeline component settings Adjusting base parameter settings here as in fig1 … Read more Building a Dynamic data pipeline with Databricks and Azure Data Factory

Build a simple real-life chat app with Python

Development of the chat apps Before we get to the tutorial, let us understand the history and the reason for the development of the chat apps. blog.eduonix.com Chat applications have been here for quite a long time, be it the ancient yahoo chat rooms or the advanced and modern chat applications such as WhatsApp, Facebook … Read more Build a simple real-life chat app with Python

Let’s Make a KNN Classifier from Scratch

The time has come to evaluate our algorithm. For simplicity, I’ve decided to use the famous Iris dataset which can be loaded from Scikit-Learn. Here’s the code to get you going: And here’s how everything should look like: If you’re wondering what Bamboolib is, check out this article: Introducing Bamboolib — a GUI for Pandas … Read more Let’s Make a KNN Classifier from Scratch

How to train your Neural Net — Tensors and Autograd

In this blog post, we will implement some of the most commonly used tensor operations and talk a little about the Autograd functionality in PyTorch. import numpy as npimport torchfrom torch.autograd import grad Create an unitialized tensor Long tensor. x = torch.LongTensor(3, 4)print(x)################## OUTPUT ##################tensor([[140124855070912, 140124855070984, 140124855071056, 140124855071128],[140124855071200, 140124855071272, 140124855068720, 140125080614480],[140125080521392, 140124855066736, 140124855066800, 140124855066864]]) Float … Read more How to train your Neural Net — Tensors and Autograd

Time Series Forecasting Using a Seasonal ARIMA Model

A Tutorial in Python One of the most widely studied models in time series forecasting is the ARIMA (autoregressive integrated moving average) model. Many variations of the ARIMA model exist, which employ similar concepts but with tweaks. One particular example is the seasonal ARIMA (SARIMA) model. The SARIMA model accounts for seasonality when generating time … Read more Time Series Forecasting Using a Seasonal ARIMA Model