Semi-supervised Anomaly Detection using Auto Encoders

A convolutional auto encoder based approach for semi-supervised anomaly detection in images. In this article, I’ll be discussing a paper [1] that proposes an AutoEncoder based approach for the task of semi-supervised anomaly detection. If you want to look at the GitHub repository link, results and conclusion directly, please scroll to the bottom of the … Read more Semi-supervised Anomaly Detection using Auto Encoders

Data Preprocessing with Python Pandas — Part 3 Normalisation

Image by mohamed Hassan from Pixabay This tutorial explains how to preprocess data using the Pandas library. Preprocessing is the process of doing a pre-analysis of data, in order to transform them into a standard and normalised format. Preprocessing involves the following aspects: missing values data formatting data normalisation data standardisation data binning In this … Read more Data Preprocessing with Python Pandas — Part 3 Normalisation

Non-circular PCA and Neural Decoder

References: [1] N. Even-Chen, D.G. Muratore, S.D. Stavisky, and et al. Power-saving design opportunities for wireless intracortical brain computer interfaces. (2020) Nat Biomed Eng. [2] X. Li, T. Adali, and M. Anderson. Noncircular principal component analysis and its application to model selection (2011). IEEE Transactions on Signal Processing, 59(10):4516–4528. [3] David A. Markowitz, Yan T. … Read more Non-circular PCA and Neural Decoder

Aesthetics Within the Computation World- Part 2: Turing Patterns

This article is part 2 of a series of articles that aim to highlight the aesthetics that can be found in the computational world and data universes. It is as Walframe puts it an “abstract voyages in the computational world”. Patterns in nature and the computational world In this article, we will discuss patterns that … Read more Aesthetics Within the Computation World- Part 2: Turing Patterns

The Dark Side of the Sexiest Job of the 21st Century

What is it like to be a data scientist? In October 2012, a Harvard Business Review article described data scientist as the sexiest job of the 21st century. This article is not the reason why data science is so popular now but I’m pretty sure it motivated some people to become a data scientist. Before … Read more The Dark Side of the Sexiest Job of the 21st Century

My Top 10 Most Fascinating Data Visualizations From 2020

If there’s one thing I know for sure, EVERYONE loves data visualizations, even if they don’t necessarily like data. Think about how popular Spotify’s 2019 Wrapped was to give an example. Wasting no time, I wanted to share with you the ten most fascinating data visualizations that I saw this year and why they’re so … Read more My Top 10 Most Fascinating Data Visualizations From 2020

Why You Should Store Your Survey Data in a Graph Database

Image by Author After having spent many months and possibly millions of dollars collecting data through a survey or census, researchers are grappling with the problem of understanding and cleaning the data. Representing them as a graph in a graph database is a novel approach to achieve this. Here I’ll show the benefits of having … Read more Why You Should Store Your Survey Data in a Graph Database

Wyatt Earp Effect and its Consequences for Data Analytics and Science

What do gunslingers like Wyatt Earp have in common with Data Analytics and Science you might ask yourself — it’s about small probabilities (to stay alive as a gunman) and how it can result in selection bias. Photo by Steve on Pexels What on Earth is the Wyatt Earp Effect? The Wyatt-Earp effect or the … Read more Wyatt Earp Effect and its Consequences for Data Analytics and Science

Harnessing the power of transfer learning for medical image classification

image source — https://unsplash.com/@bacila_vlad This is my first Medium article! (Yay!!!) The project is based on my capstone project for my master’s in data science. I had a lot of fun and I learned so much. I would love to here your feedback on this project if you have some free time. You can find … Read more Harnessing the power of transfer learning for medical image classification

A Beginners Guide to Feature Engineering with QGIS

Install QGIS QGIS can be painlessly installed from their website here. At the time of writing I’m using version 3.16.1 so expect things to look slightly different in later versions. Ground Yourself! When you first open QGIS you’ll be faced with a blindingly white background with tons of tiny buttons on the top and sides. … Read more A Beginners Guide to Feature Engineering with QGIS

Simple Yet Effective Data Preprocessing Toolbox

Binned Average Monthly Hours (Left: Fixed width cut, Right: quartile cut) Occasionally there are many null values which might invoke error during one hot encoding or machine learning operations. Therefore, we might want to remove or impute the null values in the specified features. There are many strategies to impute values such as median, mean, … Read more Simple Yet Effective Data Preprocessing Toolbox

MARS: Multivariate Adaptive Regression Splines — How to Improve on Linear Regression?

Machine Learning A visual explanation of the MARS algorithm with Python examples and comparison to linear regression Model prediction comparison between MARS and Linear Regression. Image by author. Machine Learning is making huge leaps forward, with an increasing number of algorithms enabling us to solve complex real-world problems. This story is part of a deep … Read more MARS: Multivariate Adaptive Regression Splines — How to Improve on Linear Regression?

Machine Learning: Target Feature Label Imbalance Problem and Solutions

Usually, this is something we don’t do. You can sometimes upsample test data anyway just to see if your model works well on minority classes as well. What’s most important to keep in mind is that you don’t want to upsample data and only then do a data split into train and test set. This … Read more Machine Learning: Target Feature Label Imbalance Problem and Solutions

Five Views of AI Risk

Source: Photo by Loic Leray on Unsplash Thirty years from now, will we look back at 2020 as the year when AI discriminated against minority groups, disinformation propagated by special interest groups and aided by AI-based personalization caused political instability, deep fakes and other AI-supported security infringements basically rendered AI untrustworthy and propelled us into … Read more Five Views of AI Risk

Clustering U.S counties by their COVID-19 curves

How can we used unsupervised learning to describe COVID-19 trends in the United States Analytics has become a part of everyone’s daily routine as a result of the pandemic. Every day we look at curves of new cases, positivity rates, and a range of other metrics that give us insight into our current situation. One … Read more Clustering U.S counties by their COVID-19 curves

Sorting a Dictionary in Python

Using the items() method If we want to get a sorted copy of the entire dictionary, we need to use the dictionary items() method: print(dictionary_of_names.items())# dict_items([(‘beth’, 37), (‘jane’, 32), (‘john’, 41), (‘mike’, 59)]) Notice how the items() method returns a dict_items object, which looks similar to a list of tuples. This dict_items object is an … Read more Sorting a Dictionary in Python

How to Train and Deploy Custom AI-Generated Quotes using GPT2, FastAPI, and ReactJS

Generate quotes at will Good quotes help make us stronger. What is truly inspiring about quotes is not their tone or contentedness but how those who share them reflect life experiences that really serve others. I didn’t write the above quote about quotes (Quote-ception; bad pun?), but an AI model I trained did. And it … Read more How to Train and Deploy Custom AI-Generated Quotes using GPT2, FastAPI, and ReactJS

The importance of probabilistic thinking when faced with uncertainty

Why probabilistic thinking can be an extraordinarily powerful and practical tool to approach the unknown and uncontrollable Photo by Loic Leray on Unsplash In a video published in November 2020, physician and philosopher of science Etienne Klein called for engineers to express their opinions and participate in a public debate. As a result, I decided … Read more The importance of probabilistic thinking when faced with uncertainty

The Great British Baking Show: Random Forests Edition

Using a bread vs. cake classifier to deconstruct random forests Photo by Rod Long on Unsplash If only machine learning could be as delicious as it is oftentimes perplexing. While learning data science concepts, I’ve found that it’s a good idea to approach it the same way as learning baking: start simple, grab the basic … Read more The Great British Baking Show: Random Forests Edition

Introducing the Notion API Ruby Gem

The NotionAPI Gem is available for installation here, and the GitHub repository is here. To get started, all you need is to retrieve your token_v2 credentials (open a Notion session in the browser, navigate to cookies, and look for the token_v2 key). After that, you can begin a session with the following code: require “notion_api”@client … Read more Introducing the Notion API Ruby Gem

Building Simulations in Python — A Step by Step Walkthrough

The most basic human population simulator that we could possibly create would be something like this, where the initial population is 50 and we want to see how the population grows to 1,000,000: totalPopulation = 50 growthFactor = 1.00005dayCount = 0 #Every 2 months the population is reportedwhile totalPopulation < 1000000:totalPopulation *= growthFactor #Every 56th … Read more Building Simulations in Python — A Step by Step Walkthrough

Introducing Discrete Random Variable and Expected Value using NASA Data

Now it’s time to address all the questions based on the concept of the random variable presented above. Question1: “What is the probability he can join the team with a crew size of 4, P(X=4)?” To do so, we can calculate the proportion of times for X = 4 to occur. Such proportion of time … Read more Introducing Discrete Random Variable and Expected Value using NASA Data

Los Angeles Neighborhood Analysis

Location Data Using BeautifulSoup, a Python library used for pulling data out of HTML we parse the Wikipedia page to get the list of neighborhoods and districts in Los Angeles. Using Google’s Geocoding API, we collect the location data such as Latitudes and Longitudes of each neighborhood and store them into a pandas dataframe. DataFrame … Read more Los Angeles Neighborhood Analysis

5 Free Books to Learn Statistics for Data Science

Learn all the statistics you need for data science for free Photo by Daniel Schludi on Unsplash Statistics is a fundamental skill that data scientists use every day. It is the branch of mathematics that allows us to collect, describe, interpret, visualise, and make inferences about data. Data scientists will use it for data analysis, … Read more 5 Free Books to Learn Statistics for Data Science

Open Source Licensing primer for Enterprise AI/ML

Manage licensing risks of Open Source Data Science projects The best AI/ML software today from model development (scikit-learn, TensorFlow, PyTorch) to deployment (Kubeflow, Spark) is Open Source. The below snapshot should give you an idea of the pervasiveness of Open Source Software (OSS) in the Enterprise. OSS enterprise adoption trends While OSS started as a … Read more Open Source Licensing primer for Enterprise AI/ML

NeurIPS 2020 Papers: A Deep Learning Engineer’s Takeaway

MPNet is a hybrid of Masked Language Modeling(MLM) and auto-regressive Permuted Language Modeling(PLM) adopting the strengths and avoiding their limitations from each of its constituents. Masked language modeling, as in BERT-style models, mask out ~15% of the data and try to predict those masked tokens. As the dependency between the masked tokens is not modeled … Read more NeurIPS 2020 Papers: A Deep Learning Engineer’s Takeaway

Programming Path For Those Failing to Learn Programming

Programming The path from the very beginning to the further high-quality programming practices. Photo by Jefferson Santos on Unsplash Do you want to learn to program like an engineer? From the ground up by focusing on performance and high standards? If you want to learn programming in a month probably this article is not for … Read more Programming Path For Those Failing to Learn Programming

How to Check if our Time Series is Stationary or Not and Why?

To implement the ADF test in python, we will be using the statsmodel implementation. Statsmodels is a Python module that provides functions and classes for the estimation of many statistical models. The function to perform ADF is called adfuller. First, import the required dependencies. Import the statsmodel module and the adfuller class from the tsa.statstools … Read more How to Check if our Time Series is Stationary or Not and Why?

5 Powerful PyTorch Functions That Every Beginner Should Know

Torch your Data Science world alight with these Tensor related functions “How many Marks did you get in your data science class?” “10 out of tensor” | Photo by Ales Nesetril on Unsplash PyTorch is one of the most sought after skill when it comes to recruitment for data scientists. For those who don’t know, … Read more 5 Powerful PyTorch Functions That Every Beginner Should Know

Going Serverless: 3-Tier Architectures Made Easy With AWS Lambda

Image by Stephan Seeber on Unsplash A multi-tier architecture is a design pattern that is embraced by millions of developers around the world mainly because of its way of separating concerns clearly across many layers. The most widely adapted multi-tier application type is the 3-Tier architecture. A 3 tiered architecture consists of mainly 3 layers. … Read more Going Serverless: 3-Tier Architectures Made Easy With AWS Lambda

Reinforcement Learning Explained Visually (Part 4): Q Learning, step-by-step

The difference, which is the key hallmark of the Q Learning algorithm, is how it updates its estimates. The equation used to make the update in the fourth step is based on the Bellman equation, but if you examine it carefully it uses a slight variation of the formula we had studied earlier. Let’s zoom … Read more Reinforcement Learning Explained Visually (Part 4): Q Learning, step-by-step

Which NBA teams are best at drafting?

Analyzing the last ten NBA drafts to determine who drafts well and who doesn’t Photo by Chensiyuan on Wikimedia Commons Drafting is one of the most crucial elements of a successful long-term franchise. Fueled by a series of great draft picks, the San Antonio Spurs went on to make the playoffs for a record-tying 22 … Read more Which NBA teams are best at drafting?

Keeping Up with Deep Learning — 26 Nov 2020

Deep learning papers, blog posts, Github repos, etc. that I liked this week Photo by Sebastian Pena Lambarri on Unsplash This is the second edition of my weekly update on deep learning. Every Thursday, I’ll release a new batch of research papers, blog posts, Github repos, etc. that I liked over the past week. Links … Read more Keeping Up with Deep Learning — 26 Nov 2020

5 Technical Behaviors I’ve Learned from 2 Years of Data Science and Engineering

Lastly, if these past two years have taught me nothing else, they taught me to take a step back from coding and walk away. When working on an analysis or developing some software or tool, it can be common to keep chugging away on the project. You can often find yourself hours deep into something … Read more 5 Technical Behaviors I’ve Learned from 2 Years of Data Science and Engineering

What to Expect in a Data Analyst Interview

HR interview The first step after your resume passes the initial requirements screen is a call with HR. This call is typically with the recruiter for the position to get a sense of your job experience, provide you details about the position, gauge your interest, and ask about your salary expectations. There are many guides … Read more What to Expect in a Data Analyst Interview

Understanding and Forecasting Customer Lifetime Value (CLTV)

While it is important to measure and track the actual CLTV from the existing customer base, a company also need to be able to estimate the CLTV for both existing and prospect customers over the extended period. There are a few things you will need to consider while predicting the CLTV: Forecasting Demand When it … Read more Understanding and Forecasting Customer Lifetime Value (CLTV)

A gentle introduction to the mathematics behind A/B testing

Picture by Carlos Muza on Unsplash. A/B testing is a tool that allows checking whether certain causal relationship holds. For example, a data scientist working for an e-commerce platform might want to increase the revenue by improving the design of the website. If we assume that the revenue is a function of the proportion of … Read more A gentle introduction to the mathematics behind A/B testing

Interesting AI/ML Articles On Medium This Week (Nov 28)

Read Time: 33 mins Target Audience: Machine Learning Engineers (Mobile focused) Review This article is a transcription of a podcast, for readers who prefer to listen as opposed to read, click here to view the video version. This article details the conversation held between Jeremie Harris and Matthew Stewart on subject matters concerning machine learning … Read more Interesting AI/ML Articles On Medium This Week (Nov 28)

How to combine Alteryx and BigQuery

A short overview of both tools and how to integrate them in each other — explained with three typical use cases. Overview Alteryx is known as a platform that combines analysis, data science and process automation. Integration with BigQuery — Google’s powerful Data Warehouse technology — can be realized very easily and used for many … Read more How to combine Alteryx and BigQuery

Amazon CodeGuru Reviewer announces CodeQuality Detector to help manage technical debt and codebase maintainability

This new detector improves code quality by generating recommendations associated with five types of metrics. Metrics include: (1) method source lines of code, which measures the number of lines of code in a method; (2) method cyclomatic complexity, which measures decisions that are made in a method; (3) method fan out, which measures how many … Read more Amazon CodeGuru Reviewer announces CodeQuality Detector to help manage technical debt and codebase maintainability

AWS Systems Manager Change Calendar integrates with Amazon EventBridge to enable automated actions based on calendar state changes

Change Calendar, a capability of Systems Manager, now publishes an event to Amazon EventBridge when it changes state from open to closed and vice versa. You can use the published state change event to automatically start actions such as disabling promotions through your continuous integration and delivery (CI/CD) pipeline, managing access to your fleet, or … Read more AWS Systems Manager Change Calendar integrates with Amazon EventBridge to enable automated actions based on calendar state changes

Basic Linear Programming in Python with PuLP

PuLP is a python library which can be used to solve linear programming problems. Linear Programming is used to solve optimization problems and has uses in various industries such as Manufacturing, Transportation, Food Diets etc Photo by Emile Perron on Unsplash A basic Linear Programming problem is where we are given multiple equations. The value … Read more Basic Linear Programming in Python with PuLP

How to Exclude the Outliers in Pandas DataFrame

And 3 other Pandas Tricks to Process your Data Efficiently Photo by Lisa Kohnen on Unsplash Pandas is a common library for data scientists. There are different ways to process a Pandas DataFrame, but some ways are more efficient than others. This article will provide you 4 efficient ways to: Assign new columns to a … Read more How to Exclude the Outliers in Pandas DataFrame

From Logistic Regression to Basis Expansions and Splines

Sometimes Linearity is not sufficient Nick Fewings via Unsplash Logistic Regression is a common method used for fitting a binary or categorical response variable. But did you know that if you are not careful, logistic regression can miss out on important features? In this in depth article, we will use the South African Heart Disease … Read more From Logistic Regression to Basis Expansions and Splines

Porting Assistant for .NET adds support for .NET 5

Porting Assistant for .NET can now support customers to migrate their legacy .NET framework applications to newly released .NET 5. .NET 5 is a major release with a broad set of features and improvements. With this updated release of Porting Assistant for .NET customers can analyze and port their .NET framework applications to either new … Read more Porting Assistant for .NET adds support for .NET 5

Comparing Coffee using Pattern Recognition

Exploring coffee similarities using cupping grades and flavors After trying a variety of coffees from around the world, I have often wondered if coffee grades or flavors could tell me which coffees were similar to each other and which were different. I am particularly interested if such comparisons could be used to help determine the … Read more Comparing Coffee using Pattern Recognition