Writing your first Neural Net in less than 30 lines of code with Keras.

https://unsplash.com/@tvick Reminiscing back to when I first started my journey into AI, I remember all too well how daunting some of the concepts seemed. Reading a simple explanation on what a Neural Network is can quickly lead to a scientific paper where every second sentence is a formula with symbols you’ve never even seen before. … Read moreWriting your first Neural Net in less than 30 lines of code with Keras.

Object Detection with Less Than 10 Lines of Code Using Python

Find out what objects are in the image What to know what objects are in the image? Or perhaps you want to count the number of apples in an image? In this post, I will show you how to create your own object detection program using Python in less than 10 lines of code. You … Read moreObject Detection with Less Than 10 Lines of Code Using Python

Geocode with Python

How to Convert physical addresses to Geographic locations → Latitude and Longitude Datasets are rarely complete and often require pre-processing. Imagine some datasets have only an address column without latitude and longitude columns to represent your data geographically. In that case, you need to convert your data into a geographic format. The process of converting … Read moreGeocode with Python

Lessons from doing Data Science for Eric Garcetti, Mayor of Los Angeles

My internship required me to put on different hats at work. When I focused on building a working solution, I had the inclination to put my engineer hat on to code, plug the data and chug. Sometimes the tasks at hands are straight-forward, for instance, producing visualizations and summary statistics of some dataset. But often … Read moreLessons from doing Data Science for Eric Garcetti, Mayor of Los Angeles

Pandas.Series : A Part of the backbone for Machine Learning in Python

ONE OF THE KEYS TO UNDERSTANDING PANDAS IS TO UNDERSTAND THE DATA model. At the core of pandas are three data structures: Series — 1D (can be understood as columns of a spreadsheet) DataFrame — 2D (can be understood as a single spreadsheet) Panel — 3D (can be understood as a group of spreadsheets) In … Read morePandas.Series : A Part of the backbone for Machine Learning in Python

Scraping the Web: A fast and simple way to scrape Amazon

Photo by Nicolas Picard on Unsplash As a data scientist or data enthusiast, one is always hungry for lots and lots of DATA. I can imagine the heart-eyes when you see lots of data in a website and your desire to grab all the data, perform all sorts of techniques you have learnt, apply statistics, … Read moreScraping the Web: A fast and simple way to scrape Amazon

Customer Segmentation Using RFM in Apache Spark

Source RFM segmentation is a great method to divide customers into equal groups depending on three criteria (which compose the RFM acronym): Recency. How much time has elapsed since a customer’s last activity or transaction with the company? Frequency. How often has a customer transacted with the company during a particular period of time? Monetary. … Read moreCustomer Segmentation Using RFM in Apache Spark

A Minimalist End-to-End Scrapy Tutorial (Part IV)

Systematic Web Scraping for Beginners Photo by Paweł Czerwiński on Unsplash Part I, Part II, Part III, Part IV In the previous three parts, you have developed a spider that extracts quote information from http://quotes.toscrape.com and stores the data into a local SQLite database. In this part, I will show you how to deploy the … Read moreA Minimalist End-to-End Scrapy Tutorial (Part IV)

Throwing dice with maximum entropy principle

“nobody knows what entropy really is, so in any discussion you will always have an advantage” von Neunmann Sounds as a good reason to dive into the meaning of entropy. This post is all about dice and maximum entropy. Photo by Jonathan Petersson on Unsplash The post has four parts. In the first part, I … Read moreThrowing dice with maximum entropy principle

Introduction to Web Scraping with Selenium And Python

Practical tutorial on how to get started with Selenium Web scraping is a fast, affordable and reliable way to get data when you need it. What is even better, the data is usually up-to-date. Now, bear in mind that when scraping a website, you might be violating its usage policy and can get kicked out … Read moreIntroduction to Web Scraping with Selenium And Python

How to master Python’s main data analysis library in 20 Minutes

Image by xresch from Pixabay Now that we are comfortable with filtering and sorting the data front to back and vice versa, let’s move to some more advanced analytical functionalities. Standard Functions: Like the read functions, there are also a lot of analytical functions implemented in Pandas. I will highlight and explain the ones I … Read moreHow to master Python’s main data analysis library in 20 Minutes

Mastering the art of web scraping with Selenium and Python [Part 2/2]

Selenium is a powerful tool for advanced interactions with websites: login, clicks… Let’s use it for web scraping Alright let’s do something ‘simple’ here: collect all the artists available on Spotify. That’s a robot scrolling through Spotify’s catalog of artists ⚠️Obviously, I need to put a disclaimer here ⚠️Don’t use this method to resell data … Read moreMastering the art of web scraping with Selenium and Python [Part 2/2]

Overloading Operators in Python

…and a bit on overloading methods (but I’ll try not to overload you) Most of us learning to program in Python run into concepts behind operator overloading relatively early during the course of our learning path. But, like most aspects of Python (and other languages; and, for that matter, pretty much anything), learning about overloaded … Read moreOverloading Operators in Python

Take your Python Skills to the Next Level With Fluent Python

Photo by Bonnie Kittle on Unsplash The intermediate programmer’s ticket to advanced Python You’ve been programming in Python for a while, and although you know your way around dicts, lists, tuples, sets, functions, and classes, you have a feeling your Python knowledge is not where it should be. You have heard about “pythonic” code and … Read moreTake your Python Skills to the Next Level With Fluent Python

5 Steps to Amazing Visualizations with Matplotlib

Matplotlib sucks. By default. But you can tweak the hell out of it. We’ve all been there. Matplotlib is imported, your dataset is prepared, and you are ready to make some astonishing visualization. Pretty soon, the harsh reality of potato-looking default Matplotlib charts hits you in the face. Damn. A couple of weeks ago I’ve … Read more5 Steps to Amazing Visualizations with Matplotlib

Productionizing NLP Models

After we were done making the project we had many common utilities which can be used for any projects. Innersourcing Numbers are in % of total project time. This can vary for projects. Innersourcing allows an ecosystem of contributors to develop and use reusable components for everyone. We observed that good software engineering takes way … Read moreProductionizing NLP Models

Attribute Relevance Analysis in Python — IV and WoE

Recently I’ve written about Recursive Feature Elimination — one of many feature selection techniques I use most often. Today I will speak about the other one — Attribute Relevance Analysis. Unlike RFE, it dives more deeply into individual attributes and tries to tell you which segment of that variable has the strongest connection with the … Read moreAttribute Relevance Analysis in Python — IV and WoE

How To Create a Plotly Visualization And Embed It On Websites

Examples from Plot.ly Plotly is an open-source, simple-to-use charting library for python. Plotly.express was built as a wrapper for Plotly.py to make creating interactive visualizations as easy as writing one line of python ✨ plotly.express is to plotly what seaborn is to matplotlib There are so many cool interactive visualizations that can be created with … Read moreHow To Create a Plotly Visualization And Embed It On Websites

Testing Serverless Services

python unit-testing of AWS lambda functions using moto. Having tests is crucial for the success of any software project. If you are developing an application, you want to write tests that check your applications functionality. With tests, you can add new features or fix bugs and deploy the changed code with some piece of mind. … Read moreTesting Serverless Services

Custom object detection with Android and TensorFlow

The Problem: Recently while I was traveling to North-Eastern parts of India, I had to wait for a substantial time for my bag to show up on the airport’s baggage carousel. The area surrounding the carousel was packed with fellow commuters. It was hard to tell my bag apart from the other bags as roughly … Read moreCustom object detection with Android and TensorFlow

How to Get a Job as a Data Scientist — 7 Actionable Tips

Job hunting can be a challenging task for many people, yet we all need to go through that process in order to build a career. A large proportion of the most desirable jobs on the job market right now are jobs related to analytics, like data scientists, data engineers, or even a data analyst. As … Read moreHow to Get a Job as a Data Scientist — 7 Actionable Tips

Decision Trees and Random Forests:

What does min_impurity_decrease do and how should it be used? Photo by Filip Zrnzević on Unsplash During my time learning about decision trees and random forests, I have noticed that a lot of the hyper-parameters are widely discussed and used. Max_depth, min_samples_leaf etc., including the hyper-parameters that are only for random forests as well. One … Read moreDecision Trees and Random Forests:

3 Python Tools Data Scientists Can Use for Production-Quality Code

Just because you’re a data scientist, doesn’t mean you shouldn’t write good code My first experience with coding was using S-Plus (a forerunner to R) as an undergraduate statistics student. Our lecturer, a professor with decades of experience, taught us how to fit regression models by literally typing our code one line at a time … Read more3 Python Tools Data Scientists Can Use for Production-Quality Code

A Minimalist End-to-End Scrapy Tutorial (Part I)

Systematic Web Scraping for Beginners Photo by Paweł Czerwiński on Unsplash Part I, Part II, Part III, Part IV Web scraping is an important skill for data scientists to have. I have developed a number of ad hoc web scraping projects using Python, BeautifulSoup, and Scrapy in the past few years and read a few … Read moreA Minimalist End-to-End Scrapy Tutorial (Part I)

TensorFlow.JS — Using JavaScript Web Worker to Run ML Predict Function

This post is about Machine Learning on client-side. I will explain how to run ML model in JavaScript Web Worker. The model was trained in TensorFlow/Keras (using Python) to detect sentiment for a hotel review. I’m JavaScript developer and I feel great when the Machine Learning model runs on client-side (in the browser). I will … Read moreTensorFlow.JS — Using JavaScript Web Worker to Run ML Predict Function

Bar Chart Race in Python with Matplotlib

~In roughly less than 50 lines of code Bar Chart Race animation showing the 10 biggest cities in the world Bar chart races have been around for a while. This year, they took social media by storm. It began with Matt Navarra’s tweet, which was viewed 10 million times. Then, John Burn-Murdoch created reproducible notebook … Read moreBar Chart Race in Python with Matplotlib

Python List Comprehension in 3 Minutes and 3 Reasons why you should use it

Let´s create our own animal park to learn how to use List Comprehension List comprehension is a powerful method to create new lists from existing lists. If you start using Python, List Comprehension might look complicated but you should get familiar with it as fast as you can. You can select specific elements from a … Read morePython List Comprehension in 3 Minutes and 3 Reasons why you should use it

Python Data Science & Analytics / Consulting Project Overview

Python is the most dynamic language for data science today. From backend development, to in-depth ML learning and statistical analysis. It is intuitive, flexible, and perhaps most importantly- wildly supported from an open-source perspective. It is fairly easy to learn, and incredibly powerful. It is rivaled perhaps only by R in terms of analytics and … Read morePython Data Science & Analytics / Consulting Project Overview

Basic data structures of xarray

Okay, let’s see some code! # customary importsimport numpy as npimport pandas as pdimport xarray as xr First, we’ll create some toy temperature data to play with: We generated an array of random temperature values, along with arrays for the coordinates latitude and longitude (2 dimensions). First, let’s see how we can represent this data … Read moreBasic data structures of xarray

How to Create an Interactive Geographic Map Using Python and Bokeh

Interactive Data Visualization with Choropleth Maps If you are looking for a powerful way to visualize geographic data then you should learn to use interactive Choropleth maps. A Choropleth map represents statistical data through various shading patterns or symbols on predetermined geographic areas such as countries, states or counties. Static Choropleth maps are useful for … Read moreHow to Create an Interactive Geographic Map Using Python and Bokeh

Avoiding the vanishing gradients problem using gradient noise addition

Neural networks are computational models used to approximate a function that models the relationship between the dataset features x and labels y, i.e. f(x) ≈ y. A neural net achieves this by learning the best parameters θ such that the difference between the prediction f(x; θ) and the label y is minimal. They typically learn … Read moreAvoiding the vanishing gradients problem using gradient noise addition

Turbocharging SVD with JAX

In the previous post, I wrote about the fundamentals of two commonly used dimensionality reduction approaches, singular value decomposition (SVD) and principal component analysis (PCA). I also explained their relationships using numpy. To quickly recap, the singular values (Σ) of a 0-centered matrix X (n samples × m features), equals the square root of its … Read moreTurbocharging SVD with JAX

So you want to be a Data Scientist?

Imports We’ll start with the imports. Type the following into your notebook. Imports in the notebook The imports are telling the notebook what other modules (collections of features) we will need. Pandas is for data manipulation, numpy for scientific computation, datetime for, you guessed it, datetime related functionalities, matplotlib and seaborn are for plotting. The … Read moreSo you want to be a Data Scientist?

Linear Programming for Data Scientists

As Data Scientists we become acquainted with the concept of optimization very early in our careers. Optimization lies at the heart of every machine learning model. But our relationship with optimization goes way back; we’ve been [unknowingly] solving optimization problems since before we can remember: The fastest way to get to work Organizing our budget … Read moreLinear Programming for Data Scientists

Building a Zero Curve with Forward Rate Agreements Using Pandas

Photo by Markus Spiske on Unsplash In finance world, if you wanted to price an instrument and figure out the future value at t(n) from t0 (now), you would need to use the spot yield curve. Among the professional traders, the spot yield curve is called zero curve. If you have a $1000 now to … Read moreBuilding a Zero Curve with Forward Rate Agreements Using Pandas

Delivering Data Science Without Delivering Software

Data Science Tools (Photo: Author) Do you always need to deliver complete software? From time to time debates such as ‘R vs Python’ or ‘Software skills vs ‘Statistics Skills’ rear their heads in the Data Science world. These debates sometimes appear to have the hidden assumption that the only possible deliverable for a data scientist … Read moreDelivering Data Science Without Delivering Software

Hypothesis tests with Python

In my previous article, I’ve been talking about statistical Hypothesis tests. Those are pivotal in Statistics and Data Science since we are always asked to ‘summarize’ the huge amount of data we want to analyze in samples. Once provided with samples, which can be arranged with different techniques, like Bootstrap sampling, the general purpose is … Read moreHypothesis tests with Python

Analysis of car accidents in Barcelona using Pandas, Matplotlib, and Folium

Open Data Barcelona is Barcelona´s data service which contains around 400 datasets, covering a wide rage of topics such as population, business, or housing. This project was born in 2010 with the main objective of maximize available public resources, allowing companies, citizens, researcher, and other public institutions to make use of the data generated. In … Read moreAnalysis of car accidents in Barcelona using Pandas, Matplotlib, and Folium

Feature Selection in Python — Recursive Feature Elimination

Now the fun part can finally begin. You will need to declare two variables — X and target where first represents all the features, and the second represents the target variable. Then you’ll make an instance of the Machine learning algorithm (I’m using RandomForests). In it, you can optionally pass a random state seed for … Read moreFeature Selection in Python — Recursive Feature Elimination

Models as Web Endpoints

Source: https://www.maxpixel.net/Internet-Hexagon-Icon-Networks-Honeycomb-Hexagons-3143432 An excerpt from Data Science in Production In the second chapter of Data Science in Production, I discuss how to set up predictive models as web endpoints. This is a useful skill, because it enables data scientists to shift from batch model application, such as outputting CSV files, to hosting models that other … Read moreModels as Web Endpoints

NLP Text Preprocessing: A Practical Guide and Template

Text preprocessing is traditionally an important step for natural language processing (NLP) tasks. It transforms text into a more digestible form so that machine learning algorithms can perform better. To illustrate the importance of text preprocessing, let’s consider a task on sentiment analysis for customer reviews. Suppose a customer feedbacked that “their customer support service … Read moreNLP Text Preprocessing: A Practical Guide and Template

5 Minute Guide to Plotting with Pandas

Find out how to quickly visualise data with this popular python tool Pandas is one of the most popular python libraries for data science. It features an array of tools for data handling and analysis in python. Pandas also has a visualisation functionality which leverages the matplotlib library in conjunction with its core data structure, … Read more5 Minute Guide to Plotting with Pandas

Scrape multiple pages with Scrapy

In this post I will develop a WebCrawler that will collect the information from each manga available on myanimelistfor this purpose we will iterate several pages and subpages to create a complete dataset. Scrapy is “An open source and collaborative framework for extracting the data you need from websites”. There are several types of framework … Read moreScrape multiple pages with Scrapy

Analysis on the Light Rail Network using Python (Pandas, Plotly, SodaPy)

Canberra Metro Operations (CMO) operates the first ever light rail in Canberra. The light rail system has been in operation for a few months now and travels between Civic and Gungahlin. You’ve probably heard of people riding the tram back and forth during the first month it opened its doors to the public. The ACT … Read moreAnalysis on the Light Rail Network using Python (Pandas, Plotly, SodaPy)

Object-Oriented Programming and the magic of Test-Driven Development

Python is one of the most popularly used programming languages in Data Science. For some, it is about the language’s flexibility and readability, for others it’s about its relatively low complexity, and for most, it is about its multifaceted nature. We call Python a multifaceted language because it allows you to code in four different … Read moreObject-Oriented Programming and the magic of Test-Driven Development