## Breaking Down Richard Sutton’s Policy Gradient With PyTorch And Lunar Lander

Theory Behind The Policy Gradient Algorithm Before we can implement the policy gradient algorithm, we should go over specific math involved with the algorithm. The math is very straight-forward and very easy to follow and for the most part, is reinterpreted from the OpenAI resource mentioned above. First, we define tau to be a trajectory … Read more

## How to Find a Descent Learning Rate using Tensorflow 2

Taken from http://www.merzpraxis.de/index.php/2016/06/13/der-suchende/ When it comes to building and training Neural Networks, you need to set a massive amount of hyper-parameters. Setting those parameters right has a tremendous influence on the success of your net and also on the time you spend heating the air, aka training you model. One of those parameters that you … Read more

## Kepler.GL & Jupyter Notebooks: Geospatial Data Visualization with Uber’s opensource Kepler.GL

Plot Geospatial data inside Jupyter notebook & Easily interact with Kepler’s User interface to tweak the visualisation. kepler.gl for Jupyter is an excellent tool for big Geospatial data visualisation. Combine world-class visualisation tool, easy to use User interface (UI), and flexibility of python and Jupyter notebooks (3D Visualization GIF below, more in the article). 3D … Read more

## How to manage files in Google Drive with Python

As a Data Analyst, most of the time I need to share my extracted data to my product manager/stakeholder and Google Drive is always my first choice. One major issue over here is I have to do it on weekly or even daily basis, which is very boring. All of us hate repetitive tasks, including … Read more

## How To Assess Statistical Significance In Your Data with Permutation Tests

Permuting a color grid means shuffling it! The proportion in the data is the same, but the structure is lost with every new iteration. A permutation test is basically doing what this image is doing, but to our data. We shuffle and mix everything together to get a big pool of data and compare this … Read more

## Linear Regression with Gradient Descent from Scratch in Numpy

I strongly advise you to read the article linked above. It will set the foundations on the topic, plus some math is already discussed there. To start out, I’ll define my dataset — only three points that are in a linear relationship. I’ve chosen so few points only because the math will be shorter — … Read more

## Fear Tells Us What We Have To Do

My deep learning self-study for 09/30/19–10/07/19 I’m a math lecturer and aspiring data scientist hoping to participate in artificial general intelligence research, and this week I decided to start keeping a weekly blog of what I’ve been doing, both for my own reference and potentially to help others on a similar path, following the advice … Read more

## Selenium Tutorial: Scraping Glassdoor.com in 10 Minutes

I had to scrape jobs data from Glassdoor.com for a project. Let me tell you how I did it… What is Scraping? It’s a method for collecting information from web pages. Why Scraping? Other than the fact that it is fun, Glassdoor’s library provides a limited number of data points. It doesn’t allow you to … Read more

## Models as Serverless Functions

Source: Wikimedia Chapter 3 of “Data Science in Production” I recently published Chapter 3 of my book-in-progress on leanpub. The goal with this chapter is to empower data scientists to leverage managed services to deploy models to production and own more of DevOps. Data Science in Production Building Scalable Model Pipelines with Python towardsdatascience.com Serverless … Read more

ᴛᴏᴘ^ In order to publish an app with GoDaddy hosting, you will need to turn it into something GoDaddy can use. For this, we will have to use a package called wsgiref. There is no need to download anything, as this package is included in Python since 2.7. This is what your end folder structure … Read more

## Data Science with SQL in Python

Python Application in SQL Ever hear about the database programming language, Sequel (SQL)? How can we use Python code to harness the power of SQL databases & be able to retrieve, manipulate & delete that information stored in the database, with Python? In this article, I plan on giving a thorough beginner’s tutorial on Sequel … Read more

## How to write Web apps using simple Python for Data Scientists?

In the start we said that each time we change any widget, the whole app runs from start to end. This is not feasible when we create apps that will serve deep learning models or complicated machine learning models. Streamlit covers us in this aspect by introducing Caching. 1. Caching In our simple app. We … Read more

Photo by Safar Safarov on Unsplash My activities on Twitter were mind-numbingly repetitive. From what Kirk was doing, it also didn’t exactly seem like he was reading everything that he was posting about. And whenever something is done over and over again, it’s typically a prime candidate for automation. I found tweepy, a Python library … Read more

## 7 things to quickly improve your Data Analysis in Python

The ‘Magics’ of IPython are basically a series of enhancements that IPython has layered on-top of the standard Python syntax. Magic commands come in two flavors: line magics, which are denoted by a single % prefix and operate on a single line of input, and cell magics, which are denoted by a double %% prefix … Read more

## An Easier Way to Encode Categorical Features

Photo by Ash Edmonds on Unsplash Using the python category encoder library to handle high cardinality variables in machine learning I have recently been working on a machine learning project which had several categorical features. Many of these features were high cardinality, or in other words, had a high number of unique values. The simplest … Read more

## Playing with object detection

I’ll follow my jupyter notebook to make things easier to show. Feel free to either simply run it or implement the code on your own. Keep in mind that some code snippets use functions implemented in previous snippets, therefore the order of occurrence matters. All mentioned files in this post are available in my GitHub. … Read more

## A Guide to Integrating Text Analytics into Tableau

Credit: Freddie Marriage Data is often dirty and messy. Sometimes, it doesn’t even come in the right form for quick analysis and visualization. While Tableau (and Prep) had several tools to deal with numeric, categorical, and even spatial data, one consistent missing piece was handling unstructured text data. Not anymore. In the latest edition of … Read more

## Using Python To Get SalesForce Data

Photo by Denys Nevozhai on Unsplash I work at a startup that heavily uses SalesForce. When I first started we would have to log in through the Salesforce site. Go to the reports tab, create a report with the necessary fields. Download a Comma Separated Value spreadsheet. Do some data cleaning here and there. Mostly … Read more

## PyTorch v1.3 — What’s new?

Support for Android and iOS, Named Tensor, TPU Support, Quantization and more. Facebook just released PyTorch v1.3 and it is packed with some of the most awaited features. The three most attractive ones are: Named Tensor — Something that would make the life of machine learning practitioners much easier. Quantization — For performance critical systems … Read more

## Detecting SET cards using transfer learning

Now that we can classify cards, it’s time for the final step and find all possible SET combinations. Remember that in order to have a SET, the three cards need to have either the same or different values for each attribute. A straightforward solution is to consider all possible triplets, and check if the SET … Read more

## Line Detection: Make an Autonomous Car see Road Lines

Step by step you can turn a video stream into a line detector via Computer Vision Fully self-driving passenger cars are not “just around the corner”. Elon Musk claims that Teslas will have a “full self-driving” capability by the end of 2020. Especially, he says that Tesla’s hardware is already ready for Autonomous drive, and … Read more

## Proper Balancing for Cross Validation

Importing & splitting of the data: import pandas as pdimport numpy as npfrom sklearn import datasetsfrom sklearn.model_selection import cross_validatefrom sklearn.metrics import accuracy_score, precision_scorefrom sklearn.linear_model import LogisticRegressionimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_split,StratifiedKFoldfrom imblearn.over_sampling import SMOTEfrom imblearn.under_sampling import RandomUnderSamplerdf = pd.read_csv(‘creditcard.csv’).sample(50000, random_state=0)train, test = train_test_split(df, test_size=0.3, random_state=0, shuffle=True)y_train = np.array(train[“Class”])y_test = np.array(test[“Class”])del train[“Class”]del test[“Class”]train = train.reset_index(drop=True)test … Read more

## TDD shouldn’t be TDDious

I still come across the age old “how to test” debate, but can we make it fun to test things? I’ve been working as an engineer for over a decade now, and still come across the age old “how to test” debate. I’m a Lead Engineer and that means working with my team on how … Read more

## The skin in the game heuristic for protection against disasters

Why absence of personal risk puts entire systems in danger “If a builder builds a house for a man and does not make its construction sound, and a wall cracks, that builder shall strengthen that wall at his own expense.” “If a builder builds a house for a man and does not make its construction … Read more

## Vectorisation: How to speed up your Machine Learning algorithm by x78 times faster

Given an equation, we will see how step by step can achieve more efficient code not only x78 times in terms of speed, but using only 3 lines of code! Let’s dive into it… As an interpreted language, Python for loops is inherently slower than their C counterpart. This is a big bottleneck for the … Read more

## Cheat sheet for Python dataframe ↔ R dataframe syntax conversions

A mini-guide for those who’re familiar with data analysis using either Python or R and want to quickly learn the basics for the other language Photo by Mad Fish Digital on Unsplash In this guide, for Python, all the following commands are based on the ‘pandas’ package. For R, the ‘dplyr’ and ‘tidyr’ package are … Read more

## Training Yolo for Object Detection in PyTorch with Your Custom Dataset — The Simple Way

In a previous story, I showed how to do object detection and tracking using the pre-trained Yolo network. Now I want to show you how to re-train Yolo with a custom dataset made of your own images. For this story, I’ll use my own example of training an object detector for the DARPA SubT Challenge. … Read more

## Media Bias Detection using Deep Learning Libraries in Python

All data was acquired from All the News dataset created by Andrew Thomson. It is freely available and you can download it anytime. It is separated into three large CSV files, all containing a table that looks like this: Because we are interested in the content and in the outlet name only, we will focus … Read more

## Deploying a React App on Heroku: the Python perspective

How to deploy a React frontend paired with a Flask backend Coming from a Python background, Heroku is a fantastic place to deploy. I’ve got a variety of static and Flask-based websites which have been trivial to configure and easy to integrate with Github for smooth continuous deployment. This post is not about deploying with … Read more

## Predicting Food Serving Sizes with a Feed-Forward Neural Network

In my immersive program at General Assembly, I had to complete a capstone project, something completely of my own design, without any instructions on what to do or how to go about it, besides meeting a few key criteria. So I decided to concentrate on something I know well and am passionate about — food! … Read more

## eCFR Parsing with BeautifulSoup and ElementTree

Photo by Fabian Irsara on Unsplash At work I was given the task of parsing an eCFR document. I had never parsed an XML file before and started out using the ElementTree library before switching over to BeautifulSoup which I had used once in my data science boot camp. I had to prioritize speed and … Read more

## How To Breakout Data From Databricks-Spark-Hive

Fingers Trying to break out of jail, Pixabay. The easy way. This post is written for scientists who are using Databricks (DB) notebooks and are looking to export their Hive-based datasets by using Pyspark, to an external machine, in order to obtain a more efficient workflow using Pandas. There are many ways to do the … Read more

## Analysing survey data with Python and Jupyter Notebooks

The Notebook environment is perfect for the ad-hoc nature of working with survey data Surveys are the amoeba of the data world. Not because they eat your brain (not literally, anyway) but because they are ever changing in their shape and structure. Even the surveys that are meant to stay the same – the studies … Read more

## Decomposing Signal Using Empirical Mode Decomposition — Algorithm Explanation for Dummy

What kind of ‘beast’ is Empirical Mode Decomposition (EMD) is? It’s an algorithm to decompose signals. And when I say signal, what I mean is a time-series data. We inputting a signal to the EMD and we will get some decomposed signal a.k.a ‘basic ingredient’ of our signal input. It’s similar to the Fast Fourier … Read more

## JupyterLab for complex Python and Scala Spark projects

JupyterLab is an awesome piece of technology for prototyping and self-documenting research. But can you use it for projects that have a big codebase? The notebook workflow was a big improvement for all data scientists around the globe. The ability to directly see the result of each step and not running over and over the … Read more

## Keras data generators and how to use them

You probably encountered a situation where you try to load a dataset but there is not enough memory in your machine. As the field of machine learning progresses, this problem becomes more and more common. Today this is already one of the challenges in the field of vision where large datasets of images and video … Read more

## How to use Selenium as life-saver when dealing with boring tasks?

Automate never-ending repetitive tasks the Selenium way photo by elmnet If you are a developer then probably you do not need an intro to selenium. Selenium is a powerful tool built to interact with the web server for processing requests in a programmatic way. It is used in automating a wide variety of tasks involving … Read more

## 8 Useful Pandas Features for Data-Set Handling

This article presents 8 simple, but useful Pandas operations which showcase how the Python’s Pandas library can be usefully used for data-set exploration. The Data-set I will use for this tutorial piece is entitled ‘International football results from 1872 to 2019’ and can be sourced here, in case any of the code snippet examples presented … Read more

## FastText sentiment analysis for tweets: A straightforward guide.

FastText is an open-source NLP library developed by facebook AI and initially released in 2016. Its goal is to provide word embedding and text classification efficiently. According to their authors, it is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. [1] This … Read more

## Recommender System in Python — Part 2 (Content-Based System)

The process of getting recommendations is now as simple as a function call. The only parameter you need to pass in is the movie title, and it has to be the same as the one present in the dataset, every little spelling mistake will break everything. Feel free to play around with the function to … Read more

## Beginner’s Guide to Python Quirks and Jargon

But while these courses and tutorials can quickly get you up to speed with the basics of the language and the relevant data science libraries — pandas, numpy, matplotlib, and, sklearn, to name a few — most barely scratch the intricacies of Python. Despite its simplicity, Python is a vast and rich language, and it … Read more

## How to get started with Data Science : A brief tutorial on using Anaconda, Python, Jupyter…

In this article, I wanted to write about my experience of overcoming the initial hurdle and getting started with learning Data Science. Learning data science is a journey and you will keep learning once you get started. In this article we will go through following 5 starting steps for getting into the field of learning … Read more

## Python Tips and Tricks, You Haven’t Already Seen, Part 2

Note: This was originally posted at martinheinz.dev Few weeks ago I posted an article (here) about some not so commonly known Python features and quite a few people seemed to like it, so here comes another round of Python features that you hopefully haven’t seen yet. Using lots of hardcoded index values can quickly become … Read more

## Pedestrian detection using Non Maximum Suppression

A complete pipeline for detecting pedestrians on the road Pedestrian detection is still an unsolved problem in computer science. While many object detection algorithms like YOLO, SSD, RCNN, Fast R-CNN and Faster R-CNN have been researched a lot to great success but still pedestrian detection in crowded scenes remains an open challenge. In recent years, … Read more

## A closer look into the Spanish railway passenger transportation pricing

As someone who lives and works in a Spanish city 400km away from home, I have found that the most convenient way to travel back and forth is to resort to the train. As a frequent user I have grown baffled of the pricing pattern upon buying the tickets, moving sometimes along the same levels, … Read more

## A demonstration of carrying data analysis (Crimes in Denver EDA)

This is my second demonstration of carrying data analysis using Python. My previous article is about New York City Airbnb Open Data. Please have a look and give me your comments and thoughts so I can keep improving. A demonstration of carrying data analysis (New York City Airbnb Open Data) In this article, I will … Read more

## Avengers, resemble!

Finding the ideal costume through facial recognition One of the more interesting and specialized uses of computer vision is for facial detection and recognition. Humans are incredibly adept at recognizing faces, but it is a fairly recent trend that we have been able to train computers to do a close enough job to warrant using … Read more

## 6 of the Best Niche Platforms to Learn SQL and Python

w3schools is a simple, no-frills tool for learning web development skills, including SQL and Python Depending on your preferences, you will probably either love or hate w3school’s approach to learning. w3schools claims to be the world’s largest web developer site, so their methods clearly work for many people. Essentially, the method of teaching here is … Read more

## Parquet conversion in AWS using Airflow (Part 2)

In this post, we will deep dive into the custom Airflow operators and see how to easily handle the parquet conversion in Airflow. If you are on AWS there are primarily three ways by which you can convert the data in Redshift/S3 into parquet file format: Using Pyarrow which might take a bit of time … Read more