What is an OCR??

A basic theoretical overview of the working of an Optical Character Recognition system. Source: investintech.com The necessity of digitisation is rapidly increasing in the modern era. Due to the growth of information and communication technologies (ICT) and the wide availability of handheld devices, people often prefer digitized content over the printed materials including books and … Read more

Data wrangling and supervised learning in Python:  Predicting Ebola outbreaks in Sierra Leone

Supervised learning is one of the most widely used forms of machine learning in the world. This article guides you through some of the most basic steps needed to build a model: importing your data, looking at it, putting it in a consistent format, using a sample from the data to train and test an … Read more

Online shopping gets more personal with Recommendations AIOnline shopping gets more personal with Recommendations AIProduct Manager

In addition to making it easier to get started, we’ve also been collaborating with the Google Brain and Research teams to push the boundary of what’s possible for recommendation systems. As a result, our models can scale to support massive catalogs of tens of millions of items and ensure that your customers have the opportunity … Read more

Deep Reinforcement Learning for Video Games Made Easy

Deep Q-Networks have revolutionized the field of Deep Reinforcement Learning, but the technical prerequisites for easy experimentation have barred newcomers until now… Atari Pong using DQN agent In this post, we will investigate how easily we can train a Deep Q-Network (DQN) agent (Mnih et al., 2015) for Atari 2600 games using the Google reinforcement … Read more

Intuition behind Residual Neural Networks

Learn how Residual Networks work and how they are naturally derived Deeper is better? (Photo by Riccardo Pelati on Unsplash) Deep Neural Networks — “deep” because of large number of layers, have come a long way in lot of Machine Learning tasks. But how deep? Let’s see the popular case of Image Classification: AlexNet popularized … Read more

The Surprising Pattern of Food and Service Deserts in Victoria

Access to groceries, banks and other basic services in Victoria varies with income, but not the way you may think. Image by Jon Tyson via Unsplash.com The range and accessibility of basic services such as supermarkets and banks varies along with average income between neighboring postcodes in Victoria, Australia. However, unlike the US where “banking … Read more

GPT-3 Explained in Under 2 Minutes

So, you’ve seen some amazing GPT-3 demos on Twitter (machine-made Op-Eds, poems, articles, even working code). But what’s going on under the hood of this incredible model? Here’s a (brief!) look inside. GPT-3 is a neural-network-powered language model. A language model is a model that predicts the likelihood of a sentence existing in the world. … Read more

Archive Existing RDS Files

Introduction When working on data science projects in R, exporting internal … Read more

Estimating the risks of partying during a pandemic

There is no doubt that, every now and then, one ought to … Read more

Innovate in Azure with confidence

As the world navigates through the pandemic, it’s inspiring to see companies across every industry innovate to rethink their operations, engage with customers in new ways, and keep their employees safe. When it comes to innovating in the cloud, customers tell us that they need a platform that enables them to stand up solutions quickly, … Read more

Enabling customers for success on Azure

The pandemic continues to test business principles, models, and strategies organizations once thought to be bedrock truths of business. The COVID-19 crisis has challenged everything, from leadership principles, financial models, operations, and sales process, to technology decisions and platform strategies. Organizations have been forced to quickly adapt to maintain efficient operations in these difficult times. … Read more

Migrate to the cloud with confidence

Organizations today are changing how they run their businesses to ensure safety and efficiency. As we work closely with our customers, their top priorities include optimizing business costs, scaling for a remote workforce, and ensuring business continuity. As a result, cloud migration remains a priority and partners play a critical role. To support your cloud … Read more

Create a scatter plot with ggplot

Make your first steps with the ggplot2 package to create a scatter … Read more

Quine with R

Quine is a self-reproducing function or a computer program that will … Read more

Detecting data leakage in ML pipelines using NANs and complex numbers

An amazingly simple way to detect data leakage Data leakage in machine learning pipelines can cause havoc for your model. In this post, I’m going to share an amazingly simple way to detect data leakages using NANs and complex numbers while treating your ML pipeline as a black box. I’ll talk very briefly about what … Read more

Modern CI/CD Pipeline: Git Actions with AWS Lambda Serverless Python Functions and API Gateway

Modernizing web application development and deployment Photo by Morning Brew on Unsplash Cloud is here to stay and more and more developers are seeking ways to effectively incorporate the cloud. Whether you are a startup recognizing limitations of your on-premise hardware and local machines or a large enterprise curious about how to slowly offload on-prem … Read more

Train a Neural Network to classify images and OpenVINO CPU inferencing in 10mins!

Minimal Setup / Transfer Learning / Quick Optimization Teachable Machine 2.0 Image Project to Intel OpenVINO Toolkit There are tons of resources out there on simplified Training and optimized pre-trained Inferencing models. However, training something custom to optimize performance on readily available hardware with minimal effort still seemed far fetched! In this article we will … Read more

7 Underrated Channels to Follow on YouTube

Photo by Sara Kurfeß on Unsplash Krish Naik is a Lead Data Scientist, pioneering in Machine Learning, Deep Learning, and Computer Vision. He is the complete package. He explains each and every topic with many real-world scenarios right from theoretical knowledge to the practical aspect of it. He has a pool of projects in his … Read more

Cross-validation using KNN

This is the 3rd article in the KNN series. In case, you haven’t read the first 2 parts I suggest you go through them first. Part-1, Part-2 In this article, we will understand what is cross-validation, why it’s needed, and what is k-fold cross-validation? In order to better understand the need for cross-validation, let me … Read more

Latest cool features of Matplotlib

A much less verbose way to generate subplots, which allows you to visually layout your axes in a semantic manner, is introduced via subplot_mosaic(). Earlier, you had to use comparatively more verbose methods of subplots()orGridSpec. You can, moreover, name your axes as you like. For example, to generate the grid shown below, you can now … Read more

COVID Fake News Detection with a Very Simple Logistic Regression

Natural Language Processing, NLP, Scikit Learn This time, we are going to create a simple logistic regression model to classify COVID news to either true or fake, using the data I collected a while ago. The process is surprisingly simple and easy. We will clean and pre-process the text data, perform feature extraction using NLTK … Read more

Predicting Dengue by Barangay in Quezon City, Manila

Quezon City, Philippines – Image from https://www.zocalopublicsquare.org/2020/05/07/letter-from-quezon-city-philippines-coronavirus-covid-19/ideas/dispatches/ Dengue fever is a longstanding problem in the Philippines. With tropical weather and a long rainy season, the Philippines makes good breeding ground for mosquitoes that carry the virus. Unfortunately, there is no cure for dengue, and for the unlucky few bitten with a bad strain, it can … Read more

5 Lesser-Known Seaborn Plots Most People Don’t Know

Boxenplots Boxplots are notoriously bad visualizations because they hide distributions with a few potentially misleading statistical representations of the data. While in some cases, boxplots can be appropriate renderings of data, but especially with large distributions, the five data points used to draw boxplots are simply not enough. Source: Autodesk Research. Image free to share. … Read more

Optimisation of a Poisson survival model using Optimx in R

In this blog post, we will fit a Poisson regression … Read more

‘I’ve been waiting for a guide to come and take me by the hand’: Ridgeline plots with {ggridges}

I really like ridgeline plots but only recently I have learned how to … Read more

3 Probabilistic Frameworks You should know | The Bayesian Toolkit

Build better Data Science workflows with probabilistic programming languages and counter the shortcomings of classical ML. The tools to build, train and tune your probabilistic models. Photo by Patryk Grądys on Unsplash. We should always aim to create better Data Science workflows.But in order to achieve that we should find out what is lacking. Classical … Read more

Supercharging Hyperparameter Tuning with Dask

Hyperparameter tuning is a crucial, and often painful, part of building machine learning models. Squeezing out each bit of performance from your model may mean the difference of millions of dollars in ad revenue, or life-and-death for patients in healthcare models. Even if your model takes one minute to train, you can end up waiting … Read more

Statistical Measures of Central Tendency

In statistics, measures of central tendency are a set of “middle” values representative of the data points. Central tendency describes the distribution of data focusing on the central location around which all other data are clustered. It is the opposite of dispersion that measures how far the observations are scattered with respect to the central … Read more

Monte Carlo Methods

Exploration-Explanation Dilemma In this new post of the “Deep Reinforcement Learning Explained” series, we will introduce the Monte Carlo Methods, another of the classical methods of Reinforcement Learning along with Dynamic Programming introduced in the first part of this series and Temporal Difference Learning that we will introduce in the following post. We will also … Read more

le compte est bon

The Riddler asks how to derive 24 from (1,2,3,8), with … Read more

How Medical Ultrasound May Become the Preferred Imaging Method with AI

Why AI and Ultrasound are a Great Match Photo by Mick Haupt on Unsplash Diagnostic ultrasound is a popular imaging method used for a variety of screening and diagnostic procedures such as pregnancy monitoring, thyroid screening, blood flow evaluation or breast cancer detection. Most of these examinations are performed by highly-trained clinicians specializing in these … Read more

AWS X-Ray .NET Auto-Instrumentation Agent is now available in beta

With the launch of the auto-instrumentation agent, you can trace .NET and .NET core applications targeting Internet Information Services (IIS) with minimum configuration changes and no code changes to your existing application. Using the .NET auto-instrumentation agent, you can enable tracing support for Entity Framework, Entity Framework Core, and SqlClient for .NET Core applications. Using … Read more

Coronavirus: A Confidence Interval for True Positives

How to use basic statistics to create new mathematics to study important problems Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. … Read more

Change Detection using Siamese Networks

How do you measure change with the help of CNNs All things change over time, and being able to understand and quantify this change can be very useful. For example, observing the infrastructural change in a city or a town over years can help measure it’s economic prosperity, face changes can reveal if you’re growing … Read more

Traffic Director and gRPC—proxyless services for your service meshTraffic Director and gRPC—proxyless services for your service meshCloud Functions Product ManagerEngineering Lead, Proxyless gRPC

We fully expect that customers will run service meshes that include both deployment models. We’ve even made it possible for a single gRPC client to call some services via the proxyless route and others via a sidecar proxy. When to deploy Traffic Director with proxyless gRPC servicesWe see three main use cases for the proxyless … Read more

Not spirited away: keeping notes in a distracting working environment

A few life hacks for data scientists to stay productive in a turbulent working environment. Does not include quitting the job. Mainly Jupyter-based, but works with other data science tools, too. I wanted to summarize my experiments with making notes, writing to-do lists, working diaries to fight distractions. My first serious job happened in a … Read more

Rethinking application modernization for CIOsRethinking application modernization for CIOsApplication Modernization Solutions ManagerSolutions Architect

The current global crisis has only reinforced what was already true for many IT organizations—that they must increase agility and accelerate innovation to better serve customers and prevent future disruptions. But for many, maintenance of legacy IT systems has inhibited change and consumed disproportionate amounts of budget. In fact, a recent McKinsey study of enterprises … Read more

Tutorial: Better Blog Post Analysis with googleAnalyticsR

In my previous role as a marketing data analyst for a blogging company, one of my most important tasks was to track how blog posts performed. On the surface, it's a fairly straightforward goal. With Google Analytics, you can quickly get just about any metric you need for your blog posts, for any date range.  … Read more

Identifying hidden trends in news stories using hierarchical clustering

Learn how to use a common hierarchical clustering algorithm called agglomerative clustering to find new topic clusters in recent news articles As data scientists, text analytics on news stories has always been pretty important both from learning as well as practical perspective since it gave us bulk data corpus to train text classification, sentiments analysis, … Read more

Patchwork — The Next Generation of ggplots

Extending the versatility of ggplot2 even further.. Photo by Nicolas Prieto on Unsplash For Data Visualization in R, ggplot2 has been the go-to package to generate awesome, publishing quality plots. Its layered approach enables us to start with a simple visual foundation and keep adding embellishments with each layer. Even the most basic plots with … Read more

Can AI Make You a Better Athlete?

Photo by Clint Bustrillos on Unsplash By Dan Quach — 10 min read A couple of months ago, a friend invited me to join him in an online Dungeons and Dragons campaign. Despite my respectable nerd cred, I’d never actually played DnD. Not that I was opposed to it, in fact, it sounded fun and … Read more

Introducing the Microsoft Azure Well-Architected Framework

As the technology requirements of your business or practice grow and change over time, deploying business-critical applications can increase complexity and overhead substantially. To help manage this ever-growing complexity, we are pleased to announce the introduction of the Microsoft Azure Well-Architected Framework. Following industry standards and terms, the Azure Well-Architected Framework provides a set of … Read more

17 Useful Ruby String Methods to Clean and Format Your Data

1. Iterate over each character of a String Often we need to loop through strings to process the characters. For example, you may want to print all the vowels. str = “abcdeU”temp = “”str.each_char do |char|puts char if [‘a’,’e’,’i’,’o’,’u’].include? char.downcaseend# a# e# U We can add with_index to get the position of the characters. str … Read more