Scraping the AAPL Stock Prices using Python.

Setting up a virtual environment: As discussed earlier, we’ll be using some libraries for this project. Libraries are like packages that contain additional functionality for our project. In our case, we’ll use two libraries: Beautiful Soup, and Requests. The Request library allows us to make requests to urls, and access the data on those HTML … Read more

The Little Known OGrid Function in Numpy

Adding Geometric Shapes One of the coolest and most instructive image transformations with ogrid involves adding geometric shapes to your image. Since most basic geometric shapes have simple mathematical formulas, we can perform functions on the x and y returned by ogrid to recreate these formulas. For example, let’s see some code to create a … Read more

What does Twitter think of Star Wars: The Rise of Skywalker? Sentiment Analysis in Python

Star Wars: The Rise of Skywalker was recently released in theaters. Disney reported $40 million in previews for the new release which makes it the fifth highest preview of all time. It is tracking between $170 and $200 million in ticket sales opening weekend. By those numbers, it is pretty clear that this new release … Read more

On Making A Multilingual Search Engine

In my previous posts, I have been talking theory around semantic search and so I thought, why not also do a starter code for making a multi-lingual search engine — something that understands the semantics of a language and doesn’t need any machine translation engines. Write your query in any language and get the results … Read more

Hands-on End-to-End Automated Machine Learning

A hands on experience for AutoML coding ina Python environment with AutoML libraries. We start with a basic pipeline approach in Python, which actually has no AutoML in it, and quickly pass to famous AutoML libraries. We also compare the trending AutoML SaaS solutions from OptiWisdom, such as OptiScorer with the classical approaches. This is … Read more

Venn Diagrams and Word Clouds in Python

Top Cat and Dog Names I started with a horizontal bar chart to look at the 50 most frequent cat and dog names in Seattle. fig, axes = plt.subplots(ncols=2, figsize=(12,12))plt.subplots_adjust(wspace=.3)axes[0].barh(cat_t50[‘name’], cat_t50[‘count’], .5, color=’#294AB9′, alpha=0.8)axes[0].invert_yaxis()axes[0].set_title(“Frequency of Seattle’s Top 50 Cat Names”, fontsize=12)axes[0].set_xlabel(“Number of Cats”)axes[1].barh(dog_t50[‘name’], dog_t50[‘count’], .5, color=’#1896ac’, alpha=0.8)axes[1].invert_yaxis()axes[1].set_title(“Frequency of Seattle’s Top 50 Dog Names”, fontsize=12)axes[1].set_xlabel(“Number of … Read more

Working With Time Series Data

NYC’s daily temperature chart (November 1, 2019 to December 11, 2019) produced with Matplotlib Data scientists study time series data to determine if a time based trend exists. We can analyze hourly subway passengers, daily temperatures, monthly sales, and more to see if there are various types of trends. These trends can then be used … Read more

What are your customers saying? Natural Language Processing (NLP) with Yelp Review Data)

Photo by Dlanor S on Unsplash Enough blabbering. Let’s get coding! Before you launch your Jupyter Notebook, make sure to install the following packages to Anaconda: 2.1 SETTING UP YOUR ANACONDA ENVIRONMENT These are the packages I will utilize in this walk-through. Each of these packages can be an article/book in itself. If you want … Read more

Mistakes Data Scientists Make

I’m fascinated by systems that use error to improve. These are Nassim Nicholas Taleb’s antifragile systems, which use error/pain/volatility/mistakes to improve. This is an observable paradox — a sign of a fundamental truth. Examples of antifragile systems include business, evolutionary learning, biological evolution, training neural networks and also learning data science. I’ve made many mistakes … Read more

How to deploy a Docker Container (Python) on Amazon ECS using Amazon ECR

2. Dockerize the Python Application Go to the following link to clone/download this python app that I’m going to dockerize. (Please ignore the DP 🙂 ) DilanKalpa/PythonDockerAWS You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or… As per my repository, … Read more

Three Model Explanability Methods Every Data Scientist Should Know

We saw three different kinds of model explanability outputs: variable importance, PDP, and SHAP. They all provide different looking outputs. How can we choose one of them when we want explanability? a. SHAP values can be used for anything else. First point is that: SHAP has advantage in a sense that they provide the most … Read more

American Sign Language Hand Gesture Recognition

Based on these images, it is easy to understand why our neural network has trouble distinguishing between these two signs. In future work, we will use images with higher resolution that allow for more intricate details to be extracted from the images. Hopefully, this will further improve the accuracy of our model. An additional limitation … Read more

Pandas Tips & Tricks: Need For Speed

DATA ANALYTICS LIKE A PYTHON PRO A Personal Favorite 1-Liner Kungfu Panda My last post demonstrated a simple process for evaluating a set of face pairs to determine whether or not the two are blood relatives. Several snippets were breezed over like black-boxes. Let us look at one of my those snippets, a simple 1-liner: … Read more

Using Python and Robinhood to Create a Simple Buy Low — Sell High Trading Bot

Photo by Ishant Mishra on Unsplash So I have been messing with Robinhood lately and been trying to understand stocks. I am not a financial advisor or anything, but I wanted to create a simple trading bot so I could understand robin_stocks a little bit more before I create more complex code. For those who … Read more

The Last Matplotlib Tweaking Guide You’ll Ever Need

As with any of mine coding-related article, we’ll start with the imports. You’ll need the Pandas library for loading in and dealing with tabular data, and you will also need Matplotlib (with some stuff that will enable us to do the tweaking): Great. Now onto the dataset. I’m using the International Airline Passengers dataset, mainly … Read more

How To Visualize A Decision Tree In 5 Steps

Step 1: Download and install Anaconda for Windows Depending on your Python and computer versions, choose the right Anaconda package to download. Anaconda is a common Python distribution that is usually allowed to download and install in large corporations. Anaconda Python/R Distribution – Free Download The open-source Anaconda Distribution is the easiest way to perform … Read more

3 Numpy Image Transformations on Baby Yoda

2. Horizontal / Vertical Flip We might also want to flip our baby Yoda horizontally or vertically. Taking a step back, let’s look at how we would reverse just a regular list of numbers. If I have list_of_nums which are [1, 2, 3, 4, 5], running the following: list_of_nums[::-1] would result in the reversed list: … Read more

How Fast Numpy Really is and Why?

Let’s create a Python List and Numpy array of 10K elements and add a scaler to each element of the array. Adding Scaler to Python List Adding Scaler to Numpy Array When adding a scaler to an array of 10K elements, Numpy is around 5 times faster than Python list. Following plots illustrate the speed … Read more

Deploying Machine Learning Models as Data, not Code: omega|ml

The data science community is on a mission to find the optimal approach to deploying machine learning solutions. My open source framework for MLOps, omega|ml implements a novel approach by deploying models as data, increasing speed and flexibility while reducing the complexity of the tool chain. DevOps for machine learning: Not a full match When … Read more

Text Preprocessing for Data Scientist

Code #Importing necessary librariesimport numpy as npimport pandas as pdimport reimport nltkimport spacyimport string# Reading the datasetdf = pd.read_csv(“sample.csv”)df.head() Output It is the most common and simplest text preprocessing technique. Applicable to most text mining and NLP problems. The main goal is to convert the text into the lower casing so that ‘apple’, ‘Apple’ and … Read more

Gaming on Reddit, Revisited

I recently decided to take some time and run through the project again, not just to collect more data, but also to try implementing some of the improvements I hadn’t gotten around to originally. I have definitely grown as a programmer, so there were plenty of changes I knew I wanted to make even from … Read more

Python and R — Unequivocal Champions of Data Science

Data science is such a broad field that includes several subdivisions such as data engineering, data preparation and transformation, data visualization, machine learning,and deep learning. While there are several skills required for doing data science (Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science), the two most basic requirements … Read more

Hyperparameter Tuning Explained

#1 Manual tuning With manual tuning, based on the current choice of parameters and their score, we change a part of them, train the model again, and check the difference in the score, without the use of automation in the selection of parameters to change and value of new parameters. Advantage of manual tuning is: … Read more

Guide to Building a College Basketball Machine Learning Model in Python

Photo by Martin Sanchez on Unsplash Let’s compare a neural net performance to Vegas line accuracy. Introduction Over the Thanksgiving holiday, I had some free time and stumbled upon a great Python public API created by Robert Clark. The API allows users to pull about any statistic for major American sports very easily from … Read more

Figuring out a Fair Price of a Used Car in a Data Science Way

Introduction What is an ordinary way of figuring out the price for a used car? You search for similar vehicles, estimate the rough baseline price and then fine-tune it depending on the current mileage, color, number of options, etc. You use both domain knowledge and current market state analysis. If you go deeper, you may … Read more

The Best Free Data Science eBooks

An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. Introduction to Statistical Learning with Applications in R Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani Home Download the book PDF… Description: This book provides an introduction to statistical learning methods. It is aimed for upper-level undergraduate students, … Read more

Understand the Basics of Deep Learning in a Hurry

by Gordan Johnson on Keras is a deep learning framework that sits on top of backend frameworks like TensorFlow. Keras is excellent because it allows you to experiment with different neural-nets with great speed! It sits atop other excellent frameworks like TensorFlow, and lends well to the experienced as well as to novice data … Read more

Machine learning 101 & data science: Tips from an industry expert

As a practicing data scientist, you’ll need to become familiar with machine learning concepts. Both data science and machine learning overlap, but in the most basic terms you could say: Data science is used to gain insights and understanding of data. Machine learning is used to produce predictions based on data. That said, the boundary … Read more

Quant’s Guide: Finding Key Metrics & Ratios Using Python

Quick pocket guide to some basic financial metrics and ratios you can find in very few lines with existing Python libraries… Photo Creds: Unsplash We’ll use Yahoo Finance for this example. You can use your data source of choice. First, let’s import the libraries and grab the data used in this article… The Efficient Frontier … Read more

Writing a Python Package

Enough talk, let’s write some code. Let’s write a simple function and package it. # def heythere(): print(“hey there”) # #!/usr/bin/env python# -*- coding: utf-8 -*-from setuptools import setup, find_packagessetup( author=”Chinmay Shah”, author_email=’’, classifiers=[ ‘License :: OSI Approved :: MIT License’, ‘Programming Language :: Python :: 3.7’, ], description=”Says hello”, license=”MIT license”, include_package_data=True, … Read more

How a Data Scientist Buys Extended Warranties

Now back to our problem. Question 1: Given the probabilities of a patch (0.3) and a blowout (0.05), what is the probability that the car dealership fails to make a profit? Before doing any distribution modeling, we need to understand the relationship between patches and blowouts as a function of profit.Earlier, we calculated the expected … Read more

Is that a warbler? Bird classification with Keras CNN in Python

Ever wondered ‘What is that bird?’ I constantly wondered ‘What is that bird?’ when I walked my dog along a park in Boston that was filled with birds at all times of the year: baby ducks during the summer, migratory songbirds in the fall/spring, and waterfowl in the winter. My grandpa (a long-time bird watcher) … Read more

End-To-End Writer Identification : Off-line Vs On-line Approach

Each person has their own distinct handwriting with its own special characteristics and small details that make it different. If we pick two handwriting samples at random written by two people we might notice that the way the pen strokes are constructed, the spacing between letters and other details are different. In this post, we … Read more

3 simple Python efficiency tips

In this post, I’ll be sharing 3 Python tips that may give a performance boost to your code. Let’s get started! Consider a scenario in which we have a list of records and we need to match a subset of IDs to these registers. For example: records = [{“id”: 1, “name”: “John”, “surname”: “Doe”, “country”: … Read more

Python List, NumPy, and Pandas

How to choose the right data structure from Python list, Numpy array, and Pandas DataFrame There are multiple data structures to work with a sequence of data in Python. The available data structures include lists, NumPy arrays, and Pandas dataframes. Oftentimes it is not easy for the beginners to choose from these data structures. In … Read more

The Complete Hands-On Machine Learning Crash Course

From linear regression to unsupervised learning, this guide covers everything you need to know to get started in machine learning. Theory and practical exercises are covered for each topic! Linear regression — theory Linear regression — practice Logistic regression — theory Linear discriminant analysis (LDA) — theory Quadratic discriminant analysis (QDA)— theory Logistic regression, LDA … Read more

Python For Data Science — A Guide to Data Visualization with Plotly

Again, the main objective of this dataset is to study what are the factors that affect the survivability of a person onboard the titanic. First thing that comes to my mind is to display how many passengers survived the titanic crash. Hence, visualizing the Survived column itself will be a good start. In Plotly, we … Read more

PySpark for Data Science Workflows

Image Source Chapter 6 of Data Science in Production Demonstrated experience in PySpark is one of the most desirable competencies that employers are looking for when building data science teams, because it enables these teams to own live data products. While I’ve previously blogged about PySpark, Parallelization, and UDFs, I wanted to provide a proper … Read more

SequenceMatcher in Python

A human-friendly longest contiguous & junk-free sequence comparator SequenceMatcher is a class available in python module named “difflib”. It can be used for comparing pairs of input sequences. The objective of this article is to explain the SequenceMatcher algorithm through an illustrative example. Due to the limited docs available, I thought to share the concept … Read more

Discover the strength of monotonic relation

Finding Spearman’s rank correlation coefficient using Python for IB Diploma Mathematics Photo by Mikael Kristenson on Unsplash In this article, I will show the necessary steps using Python to find the Spearman’s rank correlation coefficient. Monotonic function shows the relation between ordered sets. The Spearman’s rank correlation is useful to find this relation of ordinal … Read more

Artificial Eyeliner on LIVE Feed using Python, OpenCV and Dlib

Briefly explaining — the program first extracts 68 landmark points from each face. Of those 68 points, points 37–42 belong to the left eye and points 43–48 belong to the right eye — see picture below. Visualisation of 68 landmark points. (Image from pyimagesearch) Because our goal is to apply eyeliner, we are only interested … Read more

Power up your Python Projects with Visual Studio Code

You should have a directory for every project, and a virtual environment for every directory. This structure does two important things: It keeps your stuff organized appropriately, which makes it easier to keep projects separate, manage dependencies, and keep out things that shouldn’t be there. (Who likes having to undo git commits?) It lets you … Read more