## Automatically Storing Data from Analyzed Data Sets

How to Store Data Analysis Results to Facilitate Later Regression Analysis Figure 1: Example Folder Hierarchy This is the fifth article in a series teaching you to how to write programs that automatically analyze scientific data. The first presented the concept and motivation, then laid out the high level steps. The second taught you how … Read more

## Gaussian Mixture Modelling (GMM)

GMM estimation Figure 3 below illustrates what GMM is doing. It clearly shows three clusters modelled by three different Gaussian distributions. I have used a toy data set here just to illustrate this clearly as it is less clear with the Enron data set. As you can see, compared to Figure 2 modelled using spherical … Read more

## Robot following a walkway with OpenCV and Tensorflow

How to make a self driving robot with Raspberry Pi, computer vision and convolutional neural network. After my robot learned how to follow a line, there is a new challenge appeared. I decided to go outdoor and make the robot move along a walkway. It would be nice if a robot follows the host through a … Read more

## Mapping Locations of Reported Pot Holes in Toronto using Python

Picking a Date Range and Working with API Limit When deciding a date range, it is hard to pick a meaningful time that shows us valuable results. Since our analysis is on pot holes some prior knowledge about the cause of pot holes can be helpful. During the time of deep freezing and thawing results in … Read more

## SQL and Pandas

Where and how should these tools be used? As I mentioned in my previous post, my technical experience has almost exclusively been in SQL. While SQL is awesome and can do some really cool things, it has its limitations — these limitations are in large part why I decided to acquire Data Science superpowers at Lambda School. In … Read more

## From ‘R vs Python’ to ‘R and Python’

Leveraging the best of Both Worlds Could we utilize the statistical prowess of R along with the programming capabilities of Python? Well, when we can easily embed SQL code within either R or Python script, why not blend R and Python together? There are basically two approaches by which we can use both Python and R … Read more

## Common Field Calculations using Python in ArcGIS

Field Calculator in ArcGIS Pro The ArcGIS field calculator can save a person time and allow for fairly quick data cleanup IF you can remember how to use it. I don’t know about you, but for me the layout of the field calculator, the separation of the formula field and the code block area and the … Read more

## Artificial Neural Networks Optimization using Genetic Algorithm with Python

Main Project File Implementation The third file is the main file because it connects all functions. It reads the features and the class labels files, filters features based on the standard deviation, creates the ANN architecture, generates the initial solutions, loops through a number of generations by calculating the fitness values for all solutions, selecting … Read more

## Checking Automated Data Analysis for Errors

How to Check for Errors, both Manually and Automatically, when Automating Data Analysis This is the fourth article in a series teaching you to how to write programs that automatically analyze scientific data. The first presented the concept and motivation, then laid out the high level steps. The second taught you how to structure data sets … Read more

## 10 Steps to Set Up Your Python Project for Success

In this guide we’ll walk through adding tests and integrations to speed development and improve code quality and consistency. If don’t have a basic working Python package, check out my guide to building one and then meet right back here. Cool. Here’s our ten-step plan for this article: Install Black Create .pycache Install pytest Create Tests … Read more

## How to Perform Explainable Machine Learning Classification — Without Any Trees

Credit: Pixabay Strict and clear rules… appear to us as something in the background — hidden in the medium of the understanding. – Ludwig Wittgenstein Decision trees are a popular technique for classification. They’re intuitive, easy to interpret, and often perform well out-of-the-box. Tree models are paths of rules that humans can understand. In certain contexts, being able … Read more

## Master Python through building real-world applications (Part 9)

Endnotes As we all know, we learn from visualizations far better than we learn from raw data. Building visualizations from data are really rewarding and with help of external libraries like Bokeh, Python’s visualization game is stronger than ever. In this post, you learned about stock market data, how to download it, what are candlestick … Read more

## A “full-stack” data science project

2. Data exploration The notebook exploring the data is available on GitHub here. Regardless of the data analysis you’re performing, or how well you think you know your data, it is always a good idea to take a look at it and be aware of the various characteristics before starting to work on a specific … Read more

## Machine Learning for Beginners: An Introduction to Neural Networks

A simple explanation of how they work and how to implement one from scratch in Python. Here’s something that might surprise you: neural networks aren’t that complicated! The term “neural network” gets used as a buzzword a lot, but in reality they’re often much simpler than people imagine. This post is intended for complete beginners and … Read more

## Replacing Excel with Python

Importing Excel Files into a Pandas DataFrame Initial step is to import excel files into DataFrame so we can perform all our tasks on it. I will be demonstrating the read_excel method of Pandas which supports xls and xlsx file extensions. read_csv is same as using read_excel, we wont go in depth but I will share … Read more

## Python Tutorial: Fuzzy Name Matching Algorithms

How to cope with the variability and complexity of person name variables used as identifiers. This is the fifth article of our journey into the Python data exploration world. A list of the published articles you can find here (and the source code here). So let’s start then. Methods of Name Matching In statistical data sets … Read more

## Data Science With No Data

Building an AI/ML model with no access to a dataset In this article, we will demonstrate how to generate a dataset to build a machine learning model. According to this, Medicare fraud and abuse cost taxpayers \$60 billion per year. AI/ML could significantly help identify and prevent fraud and abuse, but since privacy is of utmost … Read more

## Use Google and Tweepy to Build a Dataset of Twitter Users

With ever-increasing value being placed on the effectiveness of social media in marketing, mining data from social platforms is a critical piece of the ad-tech puzzle. Free developer API access to social data is becoming more and more restrictive, and so easily accessing the right data can be a challenge. Twitter is an exception to … Read more

## Building a Flask API to Automatically Extract Named Entities Using SpaCy

How to use the Named Entity Recognition module in spaCy to identify people, organizations, or locations in text, then deploy a Python API with Flask The overwhelming amount of unstructured text data available today provides a rich source of information if the data can be structured. Named-entity Recognition (NER)(also known as Named-entity Extraction) is one of … Read more

## How to Build a Deep Neural Network Without a Framework

So, for the weighted sum, the function is simply: Simple enough! Now, we build a function to feed the result to an activation function (either ReLU or sigmoid): Now, we want to use the sigmoid function on the last layer, and ReLU on all previous layers. This is specific to this application, because we will … Read more

## Extracting faces using OpenCV Face Detection Neural Network

Recently, I came across the website https://www.pyimagesearch.com/ which has some of the greatest tutorials on OpenCV. While reading through its numerous articles, I found that OpenCV has its own Face Detection Neural Network with really high accuracy. So I decided to work on a project using this Neural Network from OpenCV and extract faces from … Read more

## Real-time face liveness detection with Python, Keras and OpenCV

Most facial recognition algorithms you find on the internet and research papers suffer from photo attacks. These methods work really well at detecting and recognizing faces on images, videos and video streams from webcam. However they can’t distinguish between real life faces and faces on a photo. This inability to recognize faces is due to … Read more

## CASM = Fractals

Using a simple equation, we can see exactly how the iteration occurs. We first substitute a value for x. Solve the equation for y. Then take the value of y and make it our new x. The best way to illustrate this is to actually use real values. Iteration Our first value was 1 for … Read more

## Another Stage Of Visualization: Be Reactive with Dash

A gentle invitation to Dash by Plotly Dash is an open source python library which enables us to create web applications with Plotly. It makes it easy to build an interactive visualization with simple reactive decorators like a dropdown, a slider bar, and markdown text data. We can even update the plots according to the input … Read more

## Random thoughts on my first ML deployment

5 things I didn’t know six months ago and that’s better not to forget in the months to come A little bit of context: I’m currently working for a fast growing yet still medium-sized company that after having built a robust and widely used product has decided to start leveraging the data generated during the years … Read more

## Building Blocks: Text Pre-Processing

Morphological Normalization Morphology, in general, is the study of the way words are built up from smaller meaning-bearing units, morphomes. For example, dogs consists of two morphemes: dog and s Two commonly used techniques for text normalization are: Stemming: The procedure aims to identify the stem of a word and use it in lieu of … Read more

## Finding Lane Lines — Simple Pipeline For Lane Detection.

Identifying lanes of the road is very common task that human driver performs. This is important to keep the vehicle in the constraints of the lane. This is also very critical task for an autonomous vehicle to perform. And very simple Lane Detection pipeline is possible with simple Computer Vision techniques. This article will describe … Read more

## Set Your Jupyter Notebook up Right with this Extension

Solution: The Setup Jupyter Notebook Extension Rather than just complaining about the problem (it’s easy to be a critic but a lot harder to do something positive) I decided to see what could be done with Jupyter Notebook extensions. The result is an extension that on opening a new notebook automatically: Creates a template to … Read more

Investigating Paleoclimate Data with Pandas and Seaborn Some time ago Dr. Ed Hawkins, who happens to be the creator of the Climate Spirals, released to the world the Warming Stripes graph for Annual Global Temperature ranging from 1850–2017. The concept is simple but also very informative: each stripe represents the temperature for a single year and … Read more

## The Python Dreamteam

As a Data Scientist, I code almost entirely in Python. I also get easily scared by configuring stuff. I don’t really know what a PATH is. I have no clue what lies within the /bin directory on my laptop. These are all things that you seemingly have to get familiar with to not have Python … Read more

## Boosting: Is It Always The Best Option?

Gradient boosting has become quite a popular technique in the area of machine learning. Given its reputation for achieving potentially higher accuracy than other models, it has become particularly popular as a “go-to” model for Kaggle competitions. However, use of gradient boosting raises two questions: Does this technique really outperform others consistently irrespective of the … Read more

## How to make your model awesome with Optuna

Example walk-through Jason and the Argonauts source Data I used the 20 newsgroups dataset from Scikit-Learn to prepare the experiment. You can find the data import below: Model It’s a Natural Language Processing problem, and the model’s pipeline contains a feature extraction step and a classifier. The code for the pipeline looks as follows: Optimization … Read more

## Machine Learning for Particle Data When You are Not a Physicist

How a H2O deep learning model can be used to do supervised classification with Python This article introduces Deep Learning with H2O, the open source machine learning package by H2O.ai, and shows how a H2O Deep Learning model can be used to solve supervised classification problem, that is, use the ATLAS experiment to identify the Higgs … Read more

## How to use Google Speech to Text API to transcribe long audio files?

Credit: Pixabay Speech recognition is a fun task. A lot of API resources are available in market today which makes it easier for user to opt for one or another. However, when it comes to audio files especially call center data, the task becomes little challenging. Let’s make an assumption that a call center conversation … Read more

## Rating Sports Teams — Elo vs. Win-Loss

Photo by Ariel Besagar on Unsplash Which is better? Introduction There are many ways to determine who is the best team or player in any sport. You can look at the last 5 games. The last 10 games. You can use score differential. You can rate them on which teams “feel” the best. You can look at … Read more

## Build Your First Open Source Python Project

A step-by-step guide to a working package Every software developer and data scientist should go through the exercise of making a package. You’ll learn so much along the way. Making an open source Python package may sound daunting, but you don’t need to be a grizzled veteran. You also don’t need an elaborate product idea. You … Read more

## A beginner’s guide to Linear Regression in Python with Scikit-Learn

Simple Linear Regression Linear Regression While exploring the Aerial Bombing Operations of World War Two dataset and recalling that the D-Day landings were nearly postponed due to poor weather, I downloaded these weather reports from the period to compare with missions in the bombing operations dataset. You can download the dataset from here. The dataset … Read more

## Deploy ML/DL Models to Production via Panini

What is Panini? Panini is a platform that serves ML/DL models at low latency and makes the ML model deployment to production from a few days to a few minutes. Once deployed in Panini’s server, it will provide you with an API key to infer the model. Panini query engine is developed in C++, which provides … Read more

## How to Practice Python with Google Colab?

Automatic setting-up, getting help effectively, collaborative programming, and version control. A one-stop solution to the pain points in Python beginners’ practice. Pain Points This semester, I started to teach the course “INFO 5731: Computational Methods for Information Systems” at University of North Texas (UNT), which includes the foundation of Python, Natural Language Processing and Machine … Read more

## Web Scraping Using BeautifulSoup

“Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching and modifying the parse tree. It commonly saves programmers hours or days of work.” You can use the pip package manager to install BeautifulSoup. \$ pip install … Read more

## Jupyter Lab: Evolution of the Jupyter Notebook

8. Extensions JupyterLab has been designed as an essentially extensible environment. The extensions are really powerful tools that can really enhance a person’s productivity. JupyterLab extensions are npm packages (the standard package format in Javascript development). There are many community-developed extensions being built on GitHub. You can search for the GitHub topic jupyterlab-extension to find … Read more

## A Data Science Public Service Announcement

If neither of those options appeals to you, or you want to help out even more, you can provide financial support to open-source projects. Now, you may say my donation can never make a difference, but with the criminally small amounts most of these projects get, (\$3000 for Pandas in 2017, \$1300 for Numpy) even … Read more

## Sentiment Analysis of Anthem Game Launch in Python

Video game launches are plagued by drama. From misleading pre-order bundles, to games that are far from complete at launch, big publishers have quite a bit of risk to manage when it comes to deciding how and when a game launches. I thought it might be a fun project to see just how the sentiment … Read more

## Lot’s of JSON

Getting Started We can use %%bash magic to print a sample of our file: %%bash head ../input/roam_prescription_based_prediction.jsonl {“cms_prescription_counts”: {“CEPHALEXIN”: 23, “AMOXICILLIN”: 52, “HYDROCODONE-ACETAMINOPHEN”: 28},”provider_variables”: {“settlement_type”: “non-urban”, “generic_rx_count”: 103, “specialty”: “General Practice”, “years_practicing”: 7, “gender”: “M”, “region”: “South”, “brand_name_rx_count”: 0}, “npi”: “1992715205”} From this we can see the JSON data looks like a Python dictionary. That’s … Read more

## Web Scraping Using Selenium

Scrape Image Page Links The following code launches Chrome browser with the provided url using Selenium, scrolls to the bottom of the page (apparently magically), extracts the links for the image display pages and saves them in a csv file. Lines 5–10: import the necessary packages required for this code to work. The selenium webdriver will … Read more

## Understanding Logistic Regression step by step

Logistic Regression is a popular statistical model used for binary classification, that is for predictions of the type this or that, yes or no, A or B, etc. Logistic regression can, however, be used for multiclass classification, but here we will focus on its simplest application. As an example, consider the task of predicting someone’s … Read more

## Intro to Statistics — Looking at Data (1)

There are many free learning courses and material about Statistics. Statistics can be effectively used to analyse, estimate, and sometimes predict real-world events. When used correctly, statistics will lead us to take better and safer decisions based on data observations. It is the basic pillar in Data Science and an extremely useful tool in many … Read more

## Modeling Price with Regularized Linear Model & Xgboost

Developing statistical models for predicting individual house prices We would like to model the price of a house, we know that the price depends on the location of the house, square footage of a house, year built, year renovated, number of bedrooms, number of garages, etc. So those factors contribute to the pattern — premium location would typically … Read more