Analysis of US Elections 2020 with Pandas

Number of counties won by each candidate Let’s check how many counties won by each candidate. I will show you two different ways to accomplish this task. The first way is to use the groupby function and count the number of “True” values for each candidate. winner = elections[[‘candidate’,’won’,’state’]]\.groupby([‘candidate’,’won’], as_index=False).count()winner = winner[winner.won == True]winner.rename(columns={‘state’:’won_county’}, inplace=True)winner … Read more Analysis of US Elections 2020 with Pandas

Here’s Why You Should Learn Docker as a Data Scientist

You’ll be surprised by how easy it is. To use Docker you’ll need to install it. Download Docker Desktop from this link, install it and open up the application. Now create the following project structure anywhere on your computer: Image 1 — Directory structure for your Python app (image by author) Let’s start with what … Read more Here’s Why You Should Learn Docker as a Data Scientist

Paving the Way to Google!

Want to get hired? Work on your technical skills! Want a promotion? Learn institutional knowledge! Domain Knowledge & Institutional Knowledge: When my past company’s CEO encouraged me to work on Declined Transactions as my first project after joining the team, I was disappointed! Right out of academia, I was hoping for a complex project like … Read more Paving the Way to Google!

Bandits, WebAssembly, and IoT

A part of the Bootstrap Thompson Sampling algorithm (image by author). An uncommon combination allows efficient sequential learning on the edge The multi-armed bandit (MAB) problem is a relatively simple (to describe that is, not to solve) formalization of a sequential learning problem that has many applications. In its canonical form, a gambler faces a … Read more Bandits, WebAssembly, and IoT

Testing Streamlit Apps Using SeleniumBase

In the time I’ve worked at Streamlit, I’ve seen hundreds of impressive data apps ranging from computer vision applications to public health tracking of COVID-19 and even simple children’s games. I believe the growing popularity of Streamlit comes from the fast, iterative workflows through the Streamlit “magic” functionality and auto-reloading the front-end upon saving your … Read more Testing Streamlit Apps Using SeleniumBase

Installing Jupyter Notebook for Different Environments in Windows 10

A virtual environment is an isolated region where a particular version of Python and its packages are installed enabling the installation of different versions of Python. Each environment has its own files, directories, and paths. Thus a single system can cater to different projects that demand different Python versions. For Python installation and its basics, … Read more Installing Jupyter Notebook for Different Environments in Windows 10

Truth-Seeking in the Post-Truth Era: Tutorial at EMNLP 2020

AI for Assistance in Fact-Checking. Now that we have a basic understanding of the problem of fake news, how can we tackle it? Of course, there is always the path of manual fact-checking, such as when news groups fact-check presidential candidates during a debate. And completely opposite of this is the path of automatic fact-checking, … Read more Truth-Seeking in the Post-Truth Era: Tutorial at EMNLP 2020

Do You Need a Master’s Degree in Data Science?

There appears to be a wide variety of answers to whether you need an advanced degree to get a data science job. Here, I review and explore the educational level, as discussed by Robert Half, Burtch Works, and Indeed. As you begin your data science career, Robert Half and Indeed agree that you should have … Read more Do You Need a Master’s Degree in Data Science?

Evaluating linear relationships

How to use scatterplots, correlation coefficients, and linear regression effectively Photo by Magda Ehlers from Pexels One of the most common analyses conducted by data scientists is the evaluation of linear relationships between numeric variables. These relationships can be visualized using scatterplots, and this step should be taken regardless of any further analyses that are … Read more Evaluating linear relationships

How to Collect Live Feed and Frequently Updated Data Using Cron

Cron allows you to schedule repeat tasks, making it a great tool to run data collection scripts Photo by Nick Chong on Unsplash A major concern when collecting time series data is ensuring that all data is collected at equal time intervals. Without equal time intervals, you will be unable to use most methods for … Read more How to Collect Live Feed and Frequently Updated Data Using Cron

How to Code Ridge Regression from Scratch

Ridge Regression, like its sibling, Lasso Regression, is a way to “regularize” a linear model. In this context, regularization can be taken as a synonym for preferring a simpler model by penalizing larger coefficients. We can achieve this concretely by adding a measure of the size of our coefficients to our cost function, so that … Read more How to Code Ridge Regression from Scratch

PokéGSQL: Create your First GSQL Algorithms with a Pokémon Dataset!

How to Write Basic GSQL Commands (Note: This is Part 3 of a series. Check out the past blog to load the data into your database!) Now that you have your data loaded, the next big step will be catching them all and extracting relevant information from your graph using queries. This blog will be … Read more PokéGSQL: Create your First GSQL Algorithms with a Pokémon Dataset!

Pokémon Lab Part II: Adding More Data

Adding to the Schema and Adding More Data (Note: This is a Part II of a series. Please check out Part I here: https://towardsdatascience.com/using-api-data-with-tigergraph-ef98cc9293d3) In the last blog, we learned how to upsert API data into a graph for a basic schema. Now, we’re going to go deeper, making a more complex schema and adding … Read more Pokémon Lab Part II: Adding More Data

Creating Interactive Data Tables in Plotly Dash

How to add click actions and live updates to the Dash DataTable Photo by Markus Spiske on Unsplash Plotly Dash is an incredibly powerful framework that allows you to create fully-functional data visualization dashboards. Using Dash, you can create a full front-end experience using only Python. The library does a great job of abstracting away … Read more Creating Interactive Data Tables in Plotly Dash

Create “Interactive Globe + Earthquake Plot in Python

Here we create an interactive Globe like Google Earth using the topography data with an amazing visualization tool, Plotly. We also plot a global earthquake distribution on this interactive Globe. Image by Author Enjoy in here! Through creating this interactive plot, you can get the following ideas.– Deeper insights and more realistic application example to … Read more Create “Interactive Globe + Earthquake Plot in Python

Linear regression made easy. How does it work and how to use it in Python?

All you need to know about building a Machine Learning model using the linear regression algorithm Multiple linear regression model. Graph by author. Machine Learning is making huge leaps forward, with an increasing number of algorithms available so we can solve complex real-world problems. This story is part of a deep dive series explaining the … Read more Linear regression made easy. How does it work and how to use it in Python?

The Top 5 Data Science Questions I Get Asked

Introduction Why did you choose Data Science? What is your everyday work like? Does Data Science get easier? Where do you see Data Science in 5 years? Will you ever leave Data Science? Summary References Data Science has become an incredibly popular field over the past few years. When I started applying to Masters’s programs … Read more The Top 5 Data Science Questions I Get Asked

How to approach AutoML as a data scientist

It doesn’t replace your job, it only makes it a little easier. Photo by Possessed Photography on Unsplash In the past five years, one trend that has made AI more accessible and acted as the driving force behind several companies is automated machine learning (AutoML). Many companies such as H2O.ai, DataRobot, Google, and SparkCognition have … Read more How to approach AutoML as a data scientist

How To Create Differentially Private Synthetic Data

A practical guide to creating differentially private, synthetic data with Python and TensorFlow In this post, we’ll train a synthetic data model on the popular Netflix Prize dataset, using a mathematical technique called differential privacy to protect the identities of anonymized users in the dataset from being discovered via known privacy attacks such as re-identification … Read more How To Create Differentially Private Synthetic Data

Data Mesh 2.0

Photo by Amy-Leigh Barnard on Unsplash The era of the monolithic data warehouse/data lake is coming to an end — long live the decentralized data mesh! Oh, do not despair! All those person-years spent cleaning, transferring, and loading data into your centralized systems hasn’t been in vain. With data mesh, you don’t have to start … Read more Data Mesh 2.0

Model Compression via Pruning

Pruning Neural Networks To obtain fast and accurate inference on edge devices, a model has to be optimized for real-time inference. Fine-tuned state-of-the-art models like VGG16/19, ResNet50 have 138+ million and 23+ million parameters respectively and inference is often expensive on resource-constrained devices. Previously I’ve talked about one model compression technique called “Knowledge Distillation” using … Read more Model Compression via Pruning

Production-level App With Streamlit and Climacell API in a day

Putting It Together By recreating this project, you can improve on these skills: Working with APIs Deploying data apps into production with Streamlit Writing production-level code Scripting A little bit of Plotly geospatial visualization First things first, if you want to run the app on your own machine, you can do so by cloning the … Read more Production-level App With Streamlit and Climacell API in a day

Google Objectron — A giant leap for the 3D object detection

bjecrPhoto by Tamara Gak on Unsplash Google has just announced the launch of MediaPipe Objectron, its mobile technology for real-time detection of 3D objects, enabling the smartphone to recognize the size and orientation of objects. If there one thing that I love in Google’s research is image analysis. The automatic suggestions of Google Photos or … Read more Google Objectron — A giant leap for the 3D object detection

4 use-cases for Sankey Charts

From understanding flow to a quick trick to replace machine learning Photo by Solen Feyissa on Unsplash + Image by Author Sankey charts have become one of the important visualisation techniques in recent time for advanced analytics. It has both characteristics of any awesome visualisation — 1. It can look visually stunning 2. It gives … Read more 4 use-cases for Sankey Charts

An Introduction to K-Nearest Neighbours Algorithm

Step 1: Possible k values possible_k=[1,3,5,7,9,11] Step 2: Finding Accuracy Score and MSE for each k value Calculating Accuracy score for each k value and appending to list “ac_scores” ac_scores=[]for k in possible_k:knn=KNeighborsClassifier(n_neighbors= k,weights = ‘distance’,metric=”euclidean” )knn.fit(x_train,y_train)y_pred=knn.predict(x_test)scores=accuracy_score(y_test,y_pred)ac_scores.append(scores)print (“Accuracy Scores :”,ac_scores) Output: Accuracy Scores : [0.8333333333333334, 1.0, 1.0, 0.8333333333333334, 0.8333333333333334, 0.8333333333333334] Step 3: Calculate Error Error … Read more An Introduction to K-Nearest Neighbours Algorithm

Google Cloud Healthcare API

Learn how this can accelerate AI solutions to benefit modern medicine. Baby armadillo (image licensed to author) Nowhere is data security, accuracy, and time-to-insight more critical than in the world of healthcare. Rapid deployment of AI is enabling healthcare specialists worldwide to deliver a better service. From AI analysis of MRI scans to aid accurate … Read more Google Cloud Healthcare API

Problems in AI Safety Explained with a Pizza Robot

Problems in AI Safety — Deliciously Explained Who doesn’t love pizza? Nothing beats the intense smoky flavour and the seasoned crust of a perfect pepperoni pizza ordered in a lazy Friday night. If you too share a passion for pizza and a love for machine learning, the idea of training a robot to make pizza … Read more Problems in AI Safety Explained with a Pizza Robot

What programming language should business people learn?

Four programming languages, from installation, debugging, tabular data calculation, and more to see which one business people should learn. Image by unsplash.com The most common data in business work is tabular data, such as order records, personnel information, sales contracts, etc., which is called structured data in professional terms. Excel is the most commonly used … Read more What programming language should business people learn?

Image Data Generators in Keras

How to effectively and efficiently use data generators in Keras for Computer Vision applications of Deep Learning I have worked as an academic researcher and am currently working as a research engineer in the Industry. What my experience in both of these roles has taught me so far is that one cannot overemphasize the importance … Read more Image Data Generators in Keras

Named Tuples: A Little Known Machine Learning Helper

Keeping track of multiple variables can become a bit of a nightmare during machine learning development. You will often find that data might travel through several functions and they may need to know settings to do their jobs correctly. Photo by Glenn Carstens-Peters on Unsplash Often you find that you need to store some information … Read more Named Tuples: A Little Known Machine Learning Helper

Trading Wind Energy: Wind Energy Forecasting Model based on Deep Learning

Build a profitable wind energy demand forecasting model for energy traders based on deep learning. Photo by Rabih Shasha on Unsplash Creating a steady supply of energy is always vital as our modern society genuinely depends on this. That is why predictable sources of energy like fossil fuels, or nuclear power, is still favorable. However, … Read more Trading Wind Energy: Wind Energy Forecasting Model based on Deep Learning

Deploying your Dash App to Heroku — THE MAGICAL GUIDE

(Image by author, inspired by Charlie the Unicorn and Gunicorn) So you have your Dash app running on your local machine and you’re finally ready to share it with the world on a public site. The problem is: words like like Git, Flask, Gunicorn and Heroku sound like strange mythical creatures, even after a few … Read more Deploying your Dash App to Heroku — THE MAGICAL GUIDE

Transformers & Numeracy (Pt 1): Calculators for natural language

What is numeracy in NLP and why it matters? Original image from [Chen, et al. 2020], DeepArt image by DeepArt We humans never really liked math. It’s why we built computers in the first place. One of my first maths teachers once said: “Science & invention, in most cases, is driven by laziness. We don’t … Read more Transformers & Numeracy (Pt 1): Calculators for natural language

Type I and Type II Errors in COVID-19 Serology Testing

Using R to build confusion matrices, functions, and determining the efficacy of serology test results. Currently, the US (and other parts of the world) are experiencing another surge in COVID-19 cases. Since the beginning of the pandemic, testing has been a huge topic of conversation. I personally have not been tested for the virus. I … Read more Type I and Type II Errors in COVID-19 Serology Testing

How To Deal with Duplicate Entries Using SQL

Duplicates are a recurring problem for any database user. There are several reasons why some duplicates may appear in a data set, and a sanity check is often necessary before any analysis can be conducted properly. This is why I want to share with you some lessons learned from my experience in dealing with duplicates … Read more How To Deal with Duplicate Entries Using SQL

Using GANs to generate realistic images using Keras and the CIFAR10 Dataset

Small, but nonetheless realistic Photo by Cristofer Jeschke on Unsplash GANs are one of the most promising new algorithms in the field of machine learning. With uses ranging from detecting glaucomatous images to reconstructing an image of a person’s face after listening to their voice. I wanted to try GANs out for myself so I … Read more Using GANs to generate realistic images using Keras and the CIFAR10 Dataset

4 ways Data Scientists fool us

Data Science is great. The idea of analyzing data for decision making has been around for many years, but the popularity of data science has exploded along with the FAANG companies’ growth in recent years. No matter your job title, experience level, or industry, I am confident that you will encounter solutions or products that … Read more 4 ways Data Scientists fool us

Don’t Use Recursion In Python Any More

Python Closure — A Pythonic technique you must know I was such a programmer who likes recursive functions very much before, simply because it is very cool and can be used to show off my programming skills and intelligence. However, in most of the circumstances, recursive functions have very high complexity that we should avoid … Read more Don’t Use Recursion In Python Any More

Building a Custom Semantic Segmentation Model

Using your own data to create a robust computer vision model Following on from my previous post here, I wanted to see how feasible it would be to reliably detect and segment a Futoshiki puzzle grid from an image without using a clunky capture grid. It works surprisingly well even when trained on a tiny … Read more Building a Custom Semantic Segmentation Model

Introduction to Graph Mining and Analytics

Hi there. In this post, I am going to share some interesting stories/applications about graph mining and analytics. Not everyone, even data scientist work with graph-related problems/tools every day. So the first question one might ask: what is a graph? Second: why is graph analytics and algorithms important to know? Third: what are some applications … Read more Introduction to Graph Mining and Analytics

10 Magical facts about Python

Knowing facts is important Python is a general-purpose programming language. It is very easy to learn, easy syntax and readability are some of the reasons why developers are switching to python from other programming languages. We can use python as object oriented and procedure oriented language as well. It is open source and has tons … Read more 10 Magical facts about Python

Learning to Play CartPole and LunarLander with Proximal Policy Optimization

Implementing PPO from scratch with Pytorch In this post, we will train an RL agent to play two control based games: Our agent will be trained using an algorithm called Proximal Policy Optimization. We will implement this approach from scratch using PyTorch and OpenAi gym. This post is based on the following paper: Gym: import … Read more Learning to Play CartPole and LunarLander with Proximal Policy Optimization

Spark

I have noticed that whenever I talk about Spark the first thing that comes to listeners’ minds how similar or different it is from Big Data and Hadoop. So, let’s first understand how Spark is different from Hadoop. Spark is not Hadoop A common misconception is that Apache Spark is just a component of Hadoop. … Read more Spark

15 Topics to Consider as You Review Code in Data Science

As I host these meetings, I share my screen with the developer’s code, and we walk through the work line by line. For our team, it is common that a project may require multiple pull requests in one or more repositories while it is being worked on. Therefore, these discussions allow us to discuss if … Read more 15 Topics to Consider as You Review Code in Data Science