Basic data analysis techniques every data analyst should know, using Python.

4. Joins Joins are combining two dataframes on a side by side manner based on a common column. Most of the time these columns are referred to askey columns . The term joinis originated from the database language SQL, and was needed because the data modelling of SQL databases is mostly done by using relational modelling. … Read more

From Scratch: The Game of Life

Source: Domino Art Work Hello everyone and welcome to the second article in the “From Scratch” series. (Previous one: From Scratch: Bayesian Inference, Markov Chain Monte Carlo and Metropolis Hastings, in python) In this article we explain and provide an implementation for “The Game of Life”. I say ‘we’ because this time I am joined by … Read more

Using Regular Expression in Genetics with Python

Regular expressions in Python Regular expressions (regex) in Python can be used to help us find patterns in Genetics. We can exploit regex when we analyse Biological sequence data, as very often we are looking for patterns in DNA, RNA or proteins. These sequence data types are just strings and therefore remarkable amendable for pattern analysis … Read more

Deep learning to identify Malaria cells using CNN on Kaggle

Deep learning has vast ranging applications and its application in the healthcare industry always fascinates me. As a keen learner and a Kaggle noob, I decided to work on the Malaria Cells dataset to get some hands-on experience and learn how to work with Convolutional Neural Networks, Keras and images on the Kaggle platform. One … Read more

An End-to-End Tutorial Running Convolution Neural Network on MCU with uTensor

Get the Demo Code First things first, clone the repo: % git clone cd simple_cnn_tutorial Next, set up the environment by running the following commands: # setup a python virtual environment and activate it% python2.7 -m virtualenv .venv% source .venv/bin/activate # install mbed-cli and setup libraries% pip install mbed-cli% mbed deploy # install uTensor cli … Read more

Tuning-In To NYC’s Music Neighborhoods

Data Pre-Processing Data Cleaning The preliminary dataset was cleaned according to the answers listed in the Exploratory Data Analysis section above. First, venues located in states other than “New York” or “NY” were removed. Entries with “Venue State” equal to “New York” were changed to “NY.” Removing venues not in New York state Entries returned by … Read more

Home Value Prediction

Predicting real estate value using machine learning algorithms How do companies like Zillow offer price estimates for homes that are not for sale? They collect data on the characteristics of each property and use machine learning algorithms to make predictions. In this article, I’ll demonstrate a similar analysis using a data set included in Kaggle’s … Read more

Optimizing a Sort & Match Method in Pandas — Why Data Scientists should Attend Meetups

SECTION II. That’s Easy to Solve (Hold my Beer) Here’s how each of the authors initially approached the solution. METHOD 3: Pd.Merge Solution (Sebastian) Sebastian: My initial intuition was to avoid explicitly iterating over the dataframe. I knew that pandas and numpy built-in functions were implemented in a lower-level language and highly optimized, which is why … Read more

Are you asking the right questions

In one of the episodes of “Brain Games with Jason Silva”, witnesses of a car crash are asked to estimate the speed of the car involved. Some estimated that the speed was about 10–20 mph, while some guessed it was about 40–50 mph. Why do you think the estimates between the two groups were so … Read more

Introduction to Convolutional Neural Networks (CNN) with TensorFlow

Learn the foundations of convolutional neural networks for computer vision and build a CNN with TensorFlow Photo by Stephanie Cook on Unsplash Recent advances in deep learning have made computer vision applications leap forward: from unlocking our mobile phone with our face, to safer self-driving cars. Convolutional neural networks (CNN) are the architecture behind computer vision … Read more

Pandaral·lel — A simple and efficient tool to parallelize your Pandas operations on all your CPUs.

What issue does bother us? With pandas, when you run the following line: You get this CPU usage: Standard Pandas apply — Only 1 CPU is used. Even if your computer has several CPUs, only one is fully dedicated to your calculation. Instead of this CPU usage, we would like a simple way to get something like this: Parallel … Read more

Automatically Storing Results from Analyzed Data Sets

How to Store Data Analysis Results to Facilitate Later Regression Analysis This is the fifth article in a series teaching you to how to write programs that automatically analyze scientific data. The first presented the concept and motivation, then laid out the high level steps. The second taught you how to structure data sets to … Read more

Understanding Python Virtual Environments

No, you don’t need VR glasses to read this article. Just a bag full of attention and excitement is enough. If you’re new to the world of data science and python, a virtual environment might seem like an incredibly complex idea — but the opposite is true. It is straightforward to understand and even simpler to use! … Read more

Will your income be more than $50K/yr? Machine Learning can tell

Machine learning is breaking grounds in numerous fields including Finance. What if we could use Machine Learning models to identify incomes of individuals? I found just the right dataset for this, called Census Income Dataset. I used the information in the dataset to predict if someone would earn an income greater than $50K/yr. I collected … Read more

Building a Content Based Recommender System for Hotels in Seattle

Photo Credit: Pixabay How to use description of a hotel to recommend similar hotels. The cold start problem is a well known and well researched problem for recommender systems, where system is not able to recommend items to users. due to three different situation i.e. for new users, for new products and for new websites. Content-based filtering … Read more

Step-by-Step Guide to Creating R and Python Libraries

R and Python are the bread and butter of today’s machine learning languages. R provides powerful statistics and quick visualizations, while Python offers an intuitive syntax, abundant support, and is the choice interface to today’s major AI frameworks. In this article we’ll look at the steps involved in creating libraries in R and Python. This … Read more

What library can load image in Python and what are their difference?

from skimage import io img = io.imread(img_dir) Colour channel After loading the image, usually plt.imshow(img) will be used to plot the images. Let’s plot some doge! You may spot that the OpenCV image above looks odd. It is because matplotlib, PIL and skimage represent image in RGB (Red, Green, Blue) order, while OpenCV is in … Read more

Implementing MACD in Python

MACD is a popularly used technical indicator in trading stocks, currencies, cryptocurrencies, etc. MACD is popularly used in analyzing charts for stocks, currencies, crypto, and other assets…Credit: Unsplash Basics of MACD MACD is used and discussed in many different trading circles. Moving Average Convergence Divergence (MACD) is a trend following indicator. MACD can be calculated very … Read more

Repetition in Songs: A Python Tutorial

One of Ed Sheeran songs as a case study Credit: Unsplash Everyone has heard a song or knows what a song sounds like. I can carelessly say everyone can define a song …in their own words. Just for the benefit of the doubt, a song (according to Wikipedia) is a single work of music that is typically … Read more

Predicting geographic origin of fish samples using Random Forest models

How machine learning concepts can support fishery management The Problem I was trying to show the utility of a type of analysis that groups the origin of fish samples from a particular species given the shape of the fish’s ear bone. The basic concept is that fish in distinct groups for a specific species, say … Read more

Jazz & Bossa Nova: Siblings (?)

1. Importing Libraries First off, let’s import the required libraries: Matplotlib and Seaborn will be imported for data visualization; Pandas, for data analysis; Bs4 and Requests, for web scraping. import matplotlib.pyplot as pltimport seaborn as sns import pandas as pd import requestsfrom bs4 import BeautifulSoup 2. Web Scraping Now that the libraries have been imported, which … Read more

A step-by-step guide for creating advanced Python data visualizations with Seaborn / Matplotlib

Although there’re tons of great visualization tools in Python, Matplotlib + Seaborn still stands out for its capability to create and customize all sorts of plots. Photo by Jack Anstey on Unsplash In this article, I will go through a few sections first to prepare background knowledge for some readers who are new to Matplotlib: Understand the … Read more

How to dominate MLS Fantasy

Hello old friend Note to all of my non-US readers, please forgive me for referring to this sport as “soccer” and not “football”. Enjoy! Growing up in Kansas City, I cherish many fond memories of attending KC Wizards (now Sporting Kansas City) games with my family. Being a soccer player myself (not a very good one, … Read more

Finding the right model parameters

If you’ve been reading about Data Science and/or Machine Learning, you must have come across articles and projects that work with MNIST dataset. The dataset includes a set of 70,000 images where each image is a handwritten digit from 0 to 9. I also decided to use the same dataset to understand how fine tuning … Read more

Don’t let them GO!

Using machine learning to detect customer churn. We have an example of a virtual company called ‘Sparkify’ who offers paid and free listening service, the customers can switch between either service, and they can cancel their subscription at any time. The given customers dataset is huge (12GB), thus the standard tools for analysis and machine learning … Read more

Predicting the ‘Future’ with Facebook’s Prophet

Making the Predictions Making the dataset ‘Prophet’ compliant. Let’s convert the data in the format desired by Prophet. We shall rename ‘Date’: ‘ds’ and ‘Views’: ‘y’ df.columns = [‘ds’,’y’]df.head() Prophet follows the sklearn model API wherein an instance of the Prophet class is created and then the fit and predict methods are called. The model … Read more

Evaluating Keras neural network performance using Yellowbrick visualizations

If you have ever used Keras to build a machine learning model, you’ve probably made a plot like this one before: {training, validation} {loss, accuracy} plots from a Keras model training run This is a matrix of training loss, validation loss, training accuracy, and validation accuracy plots, and it’s an essential first step for evaluating the … Read more

Building a Music Recommendation Engine with Probabilistic Matrix Factorization in PyTorch

Recommendation systems are one of the most widespread forms of machine learning in modern society. Whether you are looking for your next show to watch on Netflix or listening to an automated music playlist on Spotify, recommender systems impact almost all aspects of the modern user experience. One of the most common ways to build … Read more

Relationships validated between population health chronic indicators

In the last story, we started looking into a 15 year chronic disease dataset from the U.S. Center for Disease Control and Prevention, or CDC. The beginnings of the exploratory data analysis started with understanding the columns and rows of data and what was relevant for further analysis. In this post, we are going to … Read more

10-Step guide to schedule your script using cloud services

Scheduling Python/R scripts using Kaggle and PythonAnywhere cloud services Kaggle account, we will use the kernels to host and run our Python script. PythonAnywhere account, we will use the task scheduling to trigger our script hosted on Kaggle. What do we need to do? Setup Kaggle account and go to ‘Kernels’ tab, then ‘New Kernel’. You can … Read more

Rating London Properties by their “Pub Score”: An Alternative Lens on the London Housing Market

I’m looking to buy a house soon. That was the spark for this project. I started looking on various property websites, and whilst the information on them is great, I felt that if I could get hold of the data used to populate such sites, I could approach my house hunt in a more data-driven way. … Read more

Virtual, Headless, and Distributed (Oh My!)

Fearless Web Scraping with Python in DataLab Notebooks This post empowers the Pythonista, with a complete framework to explore the world of data on the internet — all behind randomized proxy servers in a fast parallelized sequence, while protecting your company’s immutable IP from curious eyes, and other potential trolls. With this new outlet, the reader is … Read more

A Complete Exploratory Data Analysis and Visualization for Text Data

How to combine visualization and NLP in order to generate insights in an intuitive way Visually representing the content of a text document is one of the most important tasks in the field of text mining. As a data scientist or NLP specialist, not only we explore the content of documents from different aspects and … Read more

Optimizing Jupyter Notebook: Tips, Tricks, and nbextensions

nbextensions The benefits of this extension are that it changes the defaults. To install nbextensions, execute below commands in Anaconda Prompt: conda install -c conda-forge jupyter_contrib_nbextensionsconda install -c conda-forge jupyter_nbextensions_configurator Alternatively, you can also install nbextensions using pip: pip show jupyter_contrib_nbextensions Run pip show jupyter_contrib_nbextensions to find where notebook extensions are installed Run jupyter contrib … Read more

Bayesian Modeling of Pro Overwatch Matches with PyMC3

Photo by AC De Leon on Unsplash Professional eSports are becoming increasingly popular, and the industry is growing rapidly. Many of these professional game leagues are based on games that have two teams that battle it out. Call of Duty, League of Legends, and Overwatch are all examples. Although these are comparable to traditional team sports, … Read more

Six Recommendations for Aspiring Data Scientists

Source: Building experience before landing a job Data science is a field with a huge demand, in part because it seems to require experience as a data scientist to be hired as a data scientist. But many of the best data scientists I’ve worked with have diverse backgrounds ranging from humanities to neuroscience, and it … Read more

Taking Google Sheets to (a) Class.

I am currently building a Flask app for teachers. Since Google Drive has been adopted by teachers, Google sheets are used by them also. One of my app’s features is to easily allow teachers to copy and paste the sheet link into the app and submit it through a form. It will then convert it … Read more

How to setup the PySpark environment for development, with good software engineering practices

In this article we will discuss about how to set up our development environment in order to create good quality python code and how to automate some of the tedious tasks to speed up deployments. We will go over the following steps: setup our dependencies in a isolated virtual environment with pipenv how to setup … Read more

Let’s build an Article Recommender using LDA

Due to keen interest in learning new topics, I decided to work on a project where a Latent Dirichlet Allocation (LDA) model can recommend Wikipedia articles based on a search phrase. This article explains my approach towards building the project in Python. Check out the project on GitHub below. Structure Photo by Ricardo Cruz on Unsplash … Read more

Data Science with no Math

Using AI to Build Mathematical Datasets This is an addendum to my last article, in which I had to add a caveat at the end that I was not a mathematician, and I was new at Python. I added this because I struggled to come up with a mathematical formula to generate patient data that … Read more

Deep Learning — it`s not only about kitties in mobiles, or how we proceeded in locomotive bogies…

Few days ago Aurorai company sent system of defects and bogie status control recognition of Ermak locomotive for operational tests. This problem is uncommon and very interesting, first stage included evaluation of brake pad and bandage width condition. We managed to solve this task with accuracy up to 1 mm at locomotive speed not exceeding … Read more

Computer Vision for Beginners: Part 1

Computer Vision is one of the hottest topics in artificial intelligence. It is making tremendous advances in self-driving cars, robotics as well as in various photo correction apps. Steady progress in object detection is being made every day. GANs is also a thing researchers are putting their eyes on these days. Vision is showing us … Read more

Using Wrappers to Log in Python

Logging in Python can be tedious, especially when you use it to debug. I am not a fan of Conda or Pycharm myself (or any other fancy IDE), but for those of you that are you will always have the problem of debugging/controlling when you have to put your code in production. A lot of … Read more

How to Build a Reporting Dashboard using Dash and Plotly

8. Building the First Data Table Figure 5: First Data Table with a Condensed View The first data table in the dashboard presents metrics such as spend (the cost associated with a given advertising product), website sessions, bookings (transactions), and revenue. These metrics are aggregated depending upon the dates selected in the date selected and typically … Read more