Learn Enough Python to be Useful Part 2

How to Use if __name__ == “__main__ “ This article is one in a series to help you become comfortable in Python scripting land. It’s for data scientists and anyone new to Python programming. if __name__ == “__main__”: is one of those things you see in Python scripts that often isn’t explained. You might have … Read moreLearn Enough Python to be Useful Part 2

Community Forums Meets Data Science

Analysis of forum members’ activity, posts, and behavior SummaryAs a community builder and strategist with a passion for data science, I have found that the use of data science techniques has deepened my understanding of the communities I manage, allowing me to make better strategic and operational decisions. In this article, I aim to exemplify how … Read moreCommunity Forums Meets Data Science

Time Series Analysis Tutorial Using Financial Data

VIX predictions from my ARMA (8,2) time window refitting model For my 2nd project at Metis I created a model that predicted the price of the CBOE volatility index (VIX) using a time series analysis. The VIX is a composite of option prices of popular stocks that indicate how much volatility is in the overall … Read moreTime Series Analysis Tutorial Using Financial Data

Building a Better Profanity Detection Library with scikit-learn

Why existing libraries are uninspiring and how I built a better one. A few months ago, I needed a way to detect profanity in user-submitted text strings: This shouldn’t be that hard, right? I ended up building and releasing my own library for this purpose called profanity-check: Of course, before I did that, I looked in the … Read moreBuilding a Better Profanity Detection Library with scikit-learn

Blender 2.8 Grease Pencil Scripting and Generative Art

5agadoBlockedUnblockFollowFollowing Feb 4 Quick, Draw! — Flock — Conway’s Game of Life What: learning the basics of scripting for Blender Grease-Pencil tool, with focus on generative art as a concrete playground. Less talking, more code (commented) and many examples. Why: mostly because we can. Also because Blender is a very rich ecosystem, and Grease-Pencil in version 2.8 is a powerful … Read moreBlender 2.8 Grease Pencil Scripting and Generative Art

Python Basics: Mutable vs Immutable Objects

Source: https://www.quora.com/Can-you-suggest-some-good-books-websites-for-learning-Python-for-a-layman After reading this blog post you’ll know: What are an object’s identity, type, and value What are mutable and immutable objects Introduction (Objects, Values, and Types) All the data in a Python code is represented by objects or by relations between objects. Every object has an identity, a type, and a value. Identity An … Read morePython Basics: Mutable vs Immutable Objects

Matplotlib Tutorial: Learn basics of Python’s powerful Plotting library

What is Matplotlib To make necessary statistical inferences, it becomes necessary to visualize your data and Matplotlib is one such solution for the Python users. It is a very powerful plotting library useful for those working with Python and NumPy. The most used module of Matplotib is Pyplot which provides an interface like MATLAB but … Read moreMatplotlib Tutorial: Learn basics of Python’s powerful Plotting library

Introduction to TWO approaches of Content-based Recommendation System

A complete guide to resolve the confusion Content-based filtering is one of the common methods in building recommendation systems. While I tried to do some research in understanding the detail, it is interesting to see that there are 2 approaches that claim to be “Content-based”. Below I will share my findings and hope it can … Read moreIntroduction to TWO approaches of Content-based Recommendation System

Making Programming Easier with Keyboard Macros — Video

A recent video from Linus Tech Tips introduced how one of their editors uses macros for video editing. This got me thinking; can macros be easily created to improve my programming? This video demonstrates how creating code macros can be achieved and how useful it can be: Background Source: Linus Tech Tips — Can your Keyboard do … Read moreMaking Programming Easier with Keyboard Macros — Video

Predicting Kickstarter Campaign Success with Gradient Boosted Decision Trees: A Machine Learning…

Fitting the models, evaluating performance, choosing a final model, and predicting on a new (totally real) campaign Another common thing in the data science workflow is trying out multiple models. There are ways to minimize the effort in this stage based on what you want to accomplish or what the dataset is/what the problem is (you … Read morePredicting Kickstarter Campaign Success with Gradient Boosted Decision Trees: A Machine Learning…

Comparing Python Virtual Environment tools

Thanks to Keith Smith, Alexander Mohr, Victor Kirillov and Alain SPAITE for recommending pew, venv and pipenv. I just love the community that we have on Medium. I recently published an article on using Virtual Environments for Python projects. The article was well received and the feedback from readers opened a new view for me. … Read moreComparing Python Virtual Environment tools

PyViz: Simplifying the Data Visualisation process in Python.

Exploring Data with PyViz In this section, we will see how different libraries are effective in bringing out different insights from data and their conjunction can really help to analyse data in a better way. Dataset The dataset being used pertains to the number of cases of measles and pertussis recorded per, 100,000 people over time … Read morePyViz: Simplifying the Data Visualisation process in Python.

4 Machine Learning Techniques with Python

4 Machine Learning Techniques with Python Machine Learning Techniques vs Algorithms While this tutorial is dedicated to Machine Learning techniques with Python, we will move over to algorithms pretty soon. But before we can begin focussing on techniques and algorithms, let’s find out if they’re the same thing. A technique is a way of solving … Read more4 Machine Learning Techniques with Python

Using Image Data to Determine Text Structure

Painting by Patrick Henry Bruce Dotting the i’s and following the lines In my previous article, I discussed how to implement fairly simple image processing techniques in order to detect blobs of text in an image. Realistically, that algorithm did little more than find high contrasting pixel regions in an image. Yet, the simple procedure still laid … Read moreUsing Image Data to Determine Text Structure

Doing meaningful work with Machine Learning — Classify Disaster Messages

Build models to help disaster organizations save people’s lives. I’m writing this post at 1am in Bucharest, Romania. Hello there again! Welcome to my fourth piece of content about Machine Learning. I’ve recently done a project that I believe to be socially meaningful. I’ll give a brief overview what this is all about and I’ll dive … Read moreDoing meaningful work with Machine Learning — Classify Disaster Messages

Interactive Data Visualization with Python Using Bokeh

Simple and basic go-through example Recently I came over this library, learned a little about it, tried it, of course, and decided to share my thoughts. From official website: “Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to … Read moreInteractive Data Visualization with Python Using Bokeh

Travelling in the BlockChain Ecosystem with Python

First things first, you’ll want to download Anaconda on your local machine, and set up a conda with Python 3.5+ in an environment, then launch a Jupyter Notebook to run the code below chunks. Better yet, if you haven’t already tried, run the following code in Google Collab for free. Next, we’ll find the number … Read moreTravelling in the BlockChain Ecosystem with Python

What are the Skills Needed to Become a Data Scientist in 2019?

It’s hardly a surprise to anyone in the tech and related industries that “data scientist” is the best job to have in the States. After all, this has been what sources like the Harvard Business Review and Glassdoor report for what is now four years in a row. And even if we take the base … Read moreWhat are the Skills Needed to Become a Data Scientist in 2019?

Predicting Premier league standings — putting that math to some use

I am a casual fan when it comes to football, but the idea of building a mathematical model that can be applied to a real-world problem seemed exciting enough to have a try at it. (Let’s kick off then, shall we? ⚽️) Breaking down the problem The rankings in the league table are primarily determined by … Read morePredicting Premier league standings — putting that math to some use

A Dog Detector and Breed Classifier

In a field like physics, things keep getting harder, to the point that it’s very difficult to understand what’s going on at the cutting edge unless it’s in highly simplified terms. In computer science though, and artificial intelligence in particular, knowledge built up slowly over 70+ years by people all over the world is still … Read moreA Dog Detector and Breed Classifier

Build a Pipeline for Harvesting Medium Top Author Data

Nuts and Bolts One key requirement was to make deployment of my Luigi workflow very simple. I wanted to assume only one thing about the deployment environment; that the Docker daemon would be available. With Docker, I wouldn’t need to be concerned with Python version mismatches or other environmental discrepancies. It took me a little while … Read moreBuild a Pipeline for Harvesting Medium Top Author Data

Master Python through building real-world applications (Part 7)

Data Collector Web App with PostgreSQL and Flask Working with database and queries can be pretty daunting to some, or maybe most of us. Perhaps, I have lost 100 readers by now just because there’s PostgreSQL written in the subtitle. But as you are here, I want you to know that it is an important thing … Read moreMaster Python through building real-world applications (Part 7)

Mario vs. Wario — round 2: CNNs in PyTorch and Google Colab

Since quite some time I was getting round to playing with Google Colab (yes, free access to GPU…). I think this is a really awesome initiative, which enables people with no GPU on their personal computers to play around with Deep Learning and train model they would not be able to train otherwise. Basically we … Read moreMario vs. Wario — round 2: CNNs in PyTorch and Google Colab

Analytics Building Blocks: Regression

A modularized notebook to tune and compare 11 regression algorithms with minimal coding in a control panel fasion This article summarizes and explains key modules of my regression block (One of the simple modularized notebooks I am developing to execute common analysis tasks). The notebook is intended to facilitate quicker experimentation for the users with … Read moreAnalytics Building Blocks: Regression

Artificial Neural Network Implementation using NumPy and Classification of the Fruits360 Image…

This tutorial builds artificial neural network in Python using NumPy from scratch in order to do an image classification application for the Fruits360 dataset. Everything (i.e. images and source codes) used in this tutorial, rather than the color Fruits360 images, are exclusive rights for my book cited as “Ahmed Fawzy Gad ‘Practical Computer Vision Applications … Read moreArtificial Neural Network Implementation using NumPy and Classification of the Fruits360 Image…

Quick guide to run your Python scripts on Google Colaboratory

If you are looking for an interactive way to run your Python script, say you want to start a machine learning project with a couple of friends, look no further — Google Colab is the best solution for you. You can work online and save your code on your local Google Drive, and it allows you to … Read moreQuick guide to run your Python scripts on Google Colaboratory

Monte Carlo Simulations with Python (Part 1)

Monte Carlo’s can be used to simulate games at a casino (Pic courtesy of Pawel Biernacki) This is the first of a three part series on learning to do Monte Carlo simulations with Python. This first tutorial will teach you how to do a basic “crude” Monte Carlo, and it will teach you how to … Read moreMonte Carlo Simulations with Python (Part 1)

How to beat Google’s AutoML – Hyperparameter Optimisation with Flair

This is a follow-up to our previous post about State of the Art Text Classification. We explain how to do hyperparameter optimisation using Flair to achieve optimal results in text classification outperforming Google’s AutoML Natural Language. What is hyperparameter optimisation and why can’t we simply do it by hand? Hyperparameter optimisation (or tuning) is the process … Read moreHow to beat Google’s AutoML – Hyperparameter Optimisation with Flair

Python’s Collections Module — High-performance container data types.

Let us now hop over to the actual objective of this article which is to get to know about the Python’s Collection module. This is just an overview and for detailed explanations and examples please refer to the official Python documentation. Collections Module Collections is a built-in Python module that implements specialized container datatypes providing … Read morePython’s Collections Module — High-performance container data types.

Time Series of Price Anomaly Detection

Photo credit: Pixabay Anomaly detection detects data points in data that does not fit well with the rest of the data. Also known as outlier detection, anomaly detection is a data mining process used to determine types of anomalies found in a data set and to determine details about their occurrences. Automatic anomaly detection is critical in … Read moreTime Series of Price Anomaly Detection

Tel Aviv artists: build yourself a mapping app

tl;dr — I went from experimenting with mapping libraries to building a reusable mapping app. This is how I did it and how you can re-use it. Intro As a data scientist, most of my work stays behind the scenes. When training models, the farthest I reach in exposure is deploying a simple flask web-app as REST … Read moreTel Aviv artists: build yourself a mapping app

Get Started with Support Vector Machines (SVM)

A hands-on tutorial with 4 examples on how to implement support vector machines for classification Photo by Randy Fath on Unsplash In a previous post, I introduced the theory of support vector machine (SVM). Now, I will further explain how SVMs work with fours different exercises! The first part will show how to perform classification with … Read moreGet Started with Support Vector Machines (SVM)

Scrape Reddit data using Python and Google BigQuery

Let’s get started with data collection from Reddit Reddit API: While web scraping is one among the famous(or infamous!) ways of collecting data from websites, a lot of websites offer APIs to access the public data that they host on their website. This is to avoid unnecessary traffic that scraping bots create, often crashing their websites … Read moreScrape Reddit data using Python and Google BigQuery

Creating AI for GameBoy Part 1: Coding a Controller

Released in 2003, Fire Emblem, The Blazing Sword is a strategy game so successful that its characters are featured in Super Smash Bros and the 15th installment of the series will be released in early 2019. The game is played by selecting characters (aka units), making decisions on where to move them, and then deciding … Read moreCreating AI for GameBoy Part 1: Coding a Controller

Getting Started with Recommender Systems and TensorRec

System Overview TensorRec is a Python package for building recommender systems. A TensorRec recommender system consumes three pieces of input data: user features, item features, and interactions. Based on the user/item features, the system will predict which items to recommend. The interactions are used when fitting the model: predictions are compared to the interactions and … Read moreGetting Started with Recommender Systems and TensorRec

3 Methods for Parallelization in Spark

Source: geralt on pixabay Scaling data science tasks for speed Spark is great for scaling up data science tasks and workloads! As long as you’re using Spark data frames and libraries that operate on these data structures, you can scale to massive data sets that distribute across a cluster. However, there are some scenarios where libraries may … Read more3 Methods for Parallelization in Spark

Visualizing Principal Component Analysis with Matrix Transforms

A guide to understanding eigenvalues, eigenvectors, and principal components Principal Component Analysis (PCA) is a method of decomposing data into correlated components by identifying eigenvalues and eigenvectors. The following is meant to help visualize what these different values represent and how they’re calculated. First I’ll show how matrices can be used to transform data, then … Read moreVisualizing Principal Component Analysis with Matrix Transforms

Flask: An Easy Access Door to API development

Photo by Chris Ried on Unsplash The world has gone through a huge transition; from separating the piece of code as functions in procedural languages to the development of libraries; from RPC calls to Web Service specifications in Service Oriented Architecture(SOA) like SOAP and REST. This has paved a way to Web APIs and microservices, … Read moreFlask: An Easy Access Door to API development

How to deploy your website to a custom domain

This blog documents the steps needed to deploy a website written in Python with Flask framework to a custom domain using Heroku and NameCheap. Flask is a micro-framework that allows us to use Python in the back-end to interact with our front-end code in HTML/CSS or Javascript to build web sites. People also use other … Read moreHow to deploy your website to a custom domain

Rat City: Visualizing New York City’s Rat Problem

Is Your Neighborhood a Rat Hotspot too? Check out the interactive rat sighting map here: https://nbviewer.jupyter.org/github/lksfr/rats_nyc/blob/master/rats_for_nbviewer_only.ipynb Introduction If you have ever spent a significant amount of time in New York City, you have very likely come across rats. Regardless if you are waiting for the subway or strolling through Washington Square Park, your chances of running … Read moreRat City: Visualizing New York City’s Rat Problem

Startup Funding, Investments, and Acquisitions

Exploratory Data Analysis (EDA) Funding I am just going to just jump straight in and figure out whether we can answer our first question. Well, we can break it down a bit since there are a number of parts to this question. Let’s first look at the average amount funded, total funding and the number of … Read moreStartup Funding, Investments, and Acquisitions

Predicting Breast Cancer with Decision Trees

How to implement decision trees with bagging, boosting and random forest to predict breast cancer from routine blood tests Photo by Hello I’m Nik on Unsplash In a previous post, I introduced the theory of decision trees and its performance can be improved using bagging, boosting or random forests. Now, we implement these techniques to predict … Read morePredicting Breast Cancer with Decision Trees

Solving Travelling Salesperson Problems with Python

How to use randomized optimization algorithms to solve travelling salesperson problems with Python’s mlrose package mlrose provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains. In this tutorial, we will discuss what is meant by the travelling salesperson problem and step … Read moreSolving Travelling Salesperson Problems with Python

A journey into Convolutional Neural Network visualization

Francesco Saverio Zuppichini There is one famous urban legend about computer vision. Around the 80s, the US military wanted to use neural networks to automatically detect camouflaged enemy tanks. They took a number of pictures of trees without tanks and then pictures with the same trees with tanks behind them. The results were impressive. So … Read moreA journey into Convolutional Neural Network visualization

Recursive Programming

How to solve a problem by pretending you already have Despite often being introduced early-on in most ventures into programming, the concept of recursion can seem strange and potentially off-putting upon first encountering it. It seems almost paradoxical: how can we find a solution to a problem using the solution to the same problem? Recursion can … Read moreRecursive Programming

Evaluating A Real-Life Recommender System, Error-Based and Ranking-Based

A recommender system aims to find and suggest items of likely interest based on the users’ preferences Recommender system is one of the most valuable applications in machine learning today. Amazon attributes its 35% of revenue to its recommender system. Evaluation is an integral part of researching and developing any recommender system. Depends on your … Read moreEvaluating A Real-Life Recommender System, Error-Based and Ranking-Based

Master Python through building real-world applications (Part 6)

Scraping data from FIFA.com using BeautifulSoup Most people think data science is about cool machine learning algorithms and self-driving cars. Let me tell you something, it’s not. Almost 80% of the time you are searching and cleaning the data, and if successful, remaining 20% in those cool stuff you see upfront. “Find data and play … Read moreMaster Python through building real-world applications (Part 6)

How I used NLP (Spacy) to screen Data Science Resumes

Do the keywords in your Resume aptly represent what type of Data Scientist you are? Source: Pexels Resume making is very tricky. A candidate has many dilemmas, · whether to state a project at length or just mention the bare minimum · whether to mention many skills or just mention his/her core competency skill · whether … Read moreHow I used NLP (Spacy) to screen Data Science Resumes

Interpreting the coefficients of linear regression

Source: Unsplash Nowadays there is a plethora of machine learning algorithms we can try out to find the best fit for our particular problem. Some of the algorithms have clear interpretation, other work as a blackbox and we can use approaches such as LIME or SHAP to derive some interpretations. In this article I would … Read moreInterpreting the coefficients of linear regression