Animations in the time of Coronavirus

The first four months of 2020 have been dominated by the Coronavirus pandemic (aka COVID-19), which has transformed global life in an unprecedented way. Societies and economies struggle to adapt to the new conditions and necessary contraints. A reassuringly large fraction of governments around the world continue to take evidence-based approaches to this crisis that … Read more Animations in the time of Coronavirus

Talking about Data Science Topics with Business-Minded Executives

“Our company is using cutting edge machine learning technology.” Machine Learning vs. Data Analysis or Statistics From Dan Shewan on wordstream.com One big difference between data analysis and machine learning is the questions they seek to answer. In data analysis, you want to know something like what happened to sales at this point in time … Read more Talking about Data Science Topics with Business-Minded Executives

Serendipity: Accuracy’s unpopular best friend in recommenders

It keeps your customers interested. In the earlier example, the music recommender was very accurate — it only recommended music from artistes you liked and previously listened to. A model solely focused on accuracy would do very well in offline evaluation (of the latest week of data). But in the real world, it would suck. … Read more Serendipity: Accuracy’s unpopular best friend in recommenders

How to deploy ML models using Flask + Gunicorn + Nginx + Docker

A template for configuring Flask + Gunicorn + Nginx + Docker with a detailed explanation, that should bring you a bit closer to working with microservices, building MVPs, and so on. Bo-Yi Wu via flickr It might be tricky to develop a good Machine Learning model, but even if one manages to do that, it’s … Read more How to deploy ML models using Flask + Gunicorn + Nginx + Docker

Why we chose AWS over GCP for machine learning

And why we might’ve gotten it wrong Source: Pexels About 10 months ago, a few of us began working on our model serving infrastructure. We wanted to build a tool that would take a trained model and turn it into a production web service, without us having to write glue code or wrangle AWS/Kubernetes for … Read more Why we chose AWS over GCP for machine learning

Targeting Users In Specific Area Using Geofence API

MyReminderRepository.kt You can think of this class as a local server which means it serves us everything we needed to perform any action on the map such as adding a reminder or removing reminder and maintaining all reminders. The following code snippet creates an object of GeofencingClient which helps you to manipulate the geofences. private … Read more Targeting Users In Specific Area Using Geofence API

Data preprocessing for Machine Learning in Python

Data preprocessing is a crucial step in machine learning and it is very important for the accuracy of the model. Data contains noise, missing values, it is incomplete and sometimes it is in an unusable format which cannot be directly used for machine learning models. But what if we use questionable and dirty data? What … Read more Data preprocessing for Machine Learning in Python

Automated Programmatic Website Screenshots in R with {webshot} [Video Tutorial]

In this video tutorial, We explore the R package {webshot} by Winston Chang. This package internally uses phantom js to capture screenshot of web pages / websites, Shiny Applications, RMarkdown documents. {webshot} also lets you take screenshot of a particular viewport or a section of website selected by css selector. Youtube: https://youtu.be/oQKwd1cgiq4 [embedded content] Please … Read more Automated Programmatic Website Screenshots in R with {webshot} [Video Tutorial]

AI vs COVID-19. Does it really work?

Figuring out what we can do with the data available and what we can’t Photo by Alissa Eckert, MS, and Dan Higgins, MAMS, on PHIL Contents: Introduction Making our Chest X-ray COVID-19 classifier2.1. Data preparation2.2. Training2.3. Results Does it really work?3.1. Further analysis3.2. Discussion and takeaways Today everyone knows about the pandemic. Professionals do their … Read more AI vs COVID-19. Does it really work?

Discover, understand and manage your data with Data Catalog, now GADiscover, understand and manage your data with Data Catalog, now GAProduct Manager, Data Catalog

Technical metadata vs. business metadataTechnical metadata refers to metadata that is available in the source system. Technical metadata for a BigQuery table includes table name, table description, column names, column types, column descriptions, creation date, last modification date, and more. For Pub/Sub, technical metadata refers to Pub/Sub topic names and date created. For Cloud Storage … Read more Discover, understand and manage your data with Data Catalog, now GADiscover, understand and manage your data with Data Catalog, now GAProduct Manager, Data Catalog

10 tips to make your data science code cleaner and more efficient

Concerning the formatting of the code, the structure of Python requires us to be very meticulous in order to format our code properly. Indeed, the number of lines to be skipped varies according to what is declared in the code (do we declare a class, a method, a variable?). Lists should not be too long, … Read more 10 tips to make your data science code cleaner and more efficient

Churn Prediction: A Case study of Sparkify using Apache Spark

Null Elements: Since some users will not have values for certain features, they were captured as NULL. In actual sense, these null elements are zero values, considering aggregates were used. Hence, these null elements were replaced with zeros for all columns. Scaling: The range of values for all the features in the dataset, show quite … Read more Churn Prediction: A Case study of Sparkify using Apache Spark

Why R? Webinar – Development pipeline for R production – rZYPAD

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. April 30th (8:00pm GMT+2) is another date for a webinar at Why R? … Read more Why R? Webinar – Development pipeline for R production – rZYPAD

How to learn machine learning and improve your health at the same time

Notice how what we’ve gone through can be applied to learning almost anything. The most important takeaways being instead of focusing on what’s right at any given moment (impossible to predict), you’re concentrating on the trend. You’re building the habit of learning (using courses as a foundation) along with the habit of creating (building your … Read more How to learn machine learning and improve your health at the same time

Using analytics to drive informed intuition

I truly believe that a Data and Analytics function has the mandate to enable better decision making within an organization. Very few practitioners would disagree with this argument, however for us to truly drive our vision, it is important to understand how people make decisions. The human decision-making process is ambiguous, to the extent of … Read more Using analytics to drive informed intuition

Azure Cost Management + Billing updates – April 2020

Whether you’re a new student, thriving startup, or the largest enterprise, you have financial constraints and you need to know what you’re spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in. We’re always looking … Read more Azure Cost Management + Billing updates – April 2020

Azure Container Registry: Mitigating data exfiltration with dedicated data endpoints

Azure Container Registry announces dedicated data endpoints, enabling tightly scoped client firewall rules to specific registries, minimizing data exfiltration concerns. Pulling content from a registry involves two endpoints: Registry endpoint, often referred to as the login URL, used for authentication and content discovery.A command like docker pull contoso.azurecr.io/hello-world makes a REST request which authenticates and … Read more Azure Container Registry: Mitigating data exfiltration with dedicated data endpoints

Cross Region Restore (CRR) for Azure Virtual Machines using Azure Backup

Today we’re introducing the preview of Cross Region Restore (CRR) for Microsoft Azure Virtual Machines (VMs) support using Microsoft Azure Backup. Azure Backup uses Recovery Services vault to hold customers’ backup data which offers both local and geographic redundancy. To ensure high availability of backed up data, Azure Backup defaults storage settings to geo-redundancy. By … Read more Cross Region Restore (CRR) for Azure Virtual Machines using Azure Backup

How to Consume News More Intelligently Using Bayes’ theorem

Base rates, marginal probabilities, sensitivity, and specificity Photo by Markus Spiske on Unsplash When it comes to updating beliefs and making decisions under uncertainty, Bayes’ theorem is just about the best tool available. And yet it is so often relegated to academic textbooks and machine learning applications when it should be bringing us value in … Read more How to Consume News More Intelligently Using Bayes’ theorem

Looking Beyond Feature Importance

Read in and split the Data For this analysis, I’ll be doing a random forest regression using the Boston Housing Dataset in the scikit-learn package. There are 13 features in the Boston Housing Dataset, you can read about them here. After we do some preliminary feature selection I’ll break down what the more important features … Read more Looking Beyond Feature Importance

MAD Over MAPE?

Or which forecast accuracy metrics to use? Source: https://www.arymalabs.com/ Many CPG brands across the world would be focusing on keeping a tab on their sales and demand numbers during the Covid-19 pandemic. In my previous article, I had covered points on doing Marketing Mix modeling during these testing times. The brands might have already forecasted … Read more MAD Over MAPE?

Tutorial: Poisson regression with CatBoost

How to use Poisson regression and CatBoost to achieve better accuracy on count-based data… and predict the number of likes that a tweet gets. The concept of count-based data What is Poisson regression and why it is suitable for count-based data How to build a Poisson regression model with CatBoost package How to predict the … Read more Tutorial: Poisson regression with CatBoost

Deploying Deep Learning Models using TensorFlow Serving with Docker and Flask

Generally, the life-cycle of any data science project is comprised of defining the problem statement, collecting and pre-processing data, followed by data analysis and predictive modelling, but the trickiest part of any data science project is the model deployment where we want our model to be consumed by the end users. There are a lot … Read more Deploying Deep Learning Models using TensorFlow Serving with Docker and Flask

Deploying Panel (Holoviz) dashboards using Heroku Container Registry

Deployment Now comes the crux of this post. This was the part of my journey where I couldn’t find much direct help on the internet. The Panel library (part of Holoviz) provides an excellent toolkit for managing data interactions, setting up pipelines, using widgets and deploying dynamic dashboards. Deployment is as easy as marking your … Read more Deploying Panel (Holoviz) dashboards using Heroku Container Registry

Stop Worrying and Create your Deep Learning Server in 30 minutes

I am assuming that you have an AWS account, and you have access to the AWS Console. If not, you might need to sign up for an Amazon AWS account. First of all, we need to go to the Services tab to access the EC2 dashboard. 2. On the EC2 Dashboard, you can start by … Read more Stop Worrying and Create your Deep Learning Server in 30 minutes

Fighting COVID-19 with Open Access and AI

The CORD-19 resource attempts to accelerate scientific discovery and save lives The network of diseases and chemicals associated with Chloroquine, an example of the kinds of insights that can be extracted from CORD-19 — this visualization was produced with the CoViz tool from AI2. The urgent phone call from the Michael Kratsios (whose august title … Read more Fighting COVID-19 with Open Access and AI

A Practical Guide for Exploratory Data Analysis

Listen to the data, curiously and carefully! Photo by Emma Frances Logan on Unsplash The fuel of each and every machine learning or deep learning model is data. Without data, the models are useless. Before building a model and train it, we should try to explore and understand the data at hand. By understanding, I … Read more A Practical Guide for Exploratory Data Analysis

Vignette: Simulating a minimal SPSS dataset from R

What this is about 📖 I will simulate a minimal labelled survey dataset that can be exported as a SPSS (.SAV) file (with full variable and value labels) in R. I will also attempt to fabricate ‘meaningful patterns’ to the dataset such that it can be more effectively used for creating demo examples. image from … Read more Vignette: Simulating a minimal SPSS dataset from R

Highlights of Hugo Code Highlighting

Thanks to a quite overdue update of Hugo on our build system, our website can now harness the full power of Hugo code highlighting for Markdown-based content.What’s code highlighting apart from the reason behind a tongue-twister in this post title?In this post we shall explain how Hugo’s code highlighter, Chroma, helps you prettify your code … Read more Highlights of Hugo Code Highlighting

Address class imbalance easily with Pytorch

Data augmentation in computer vision. Credits for the picture to fastai. What can you do when your model is overfitting your data? This problem often occurs when we are dealing with an imbalanced dataset. If your dataset represents several classes, one of which is much less represented than the others, then it is difficult to … Read more Address class imbalance easily with Pytorch

Amazon EBS increases concurrent snapshot copy limits to 20 snapshots per destination Region

If you require more concurrent copies per destination region than the new limit, you can submit a limit increase request using the AWS Support Center. If your account has an approved limit that is higher than the new limit, you will continue to have the higher limit. To learn more about Amazon EBS limits, please … Read more Amazon EBS increases concurrent snapshot copy limits to 20 snapshots per destination Region

90 second setup challenge: Jupyter + TensorFlow in Google Cloud

Is it possible for data science beginners to get up and running in under 2 minutes? Data science enthusiasts, how fast can you go from zero to Google Cloud Jupyter notebook? Let’s find out! Image: SOURCE. If you’re in the mood to ultra-customize your setup, Google Cloud gives you dizzying granularity. That’s a fabulous thing … Read more 90 second setup challenge: Jupyter + TensorFlow in Google Cloud

Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Nina Zumel and John Mount will be speaking on advanced … Read more Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

Which Face is Real?

A Generative model aims to learn and understand a dataset’s true distribution and create new data from it using unsupervised learning. These models (such as StyleGAN) have had mixed success as it is quite difficult to understand the complexities of certain probability distributions. In order to sidestep these roadblocks, The Adversarial Nets Framework was created … Read more Which Face is Real?

Intuitively, How Many Guys Should You Date Before Finding Your Perfect Partner

What is the best strategy in order to find your perfect match? Is the first guy the best? Do we have to date as many guys as possible to find the best? But you can easily understand that these two strategies are risky and not optimal: in the first case, you can regret it later … Read more Intuitively, How Many Guys Should You Date Before Finding Your Perfect Partner

What Has Changed?

SOLUTIONS FOR MICROSOFT POWER PLATFORM A step-by-step guide on the continuous delivery of AI Models, Power Apps, and Flows with Microsoft’s Power Platform and Azure DevOps. Photo by jesse ramirez on Unsplash In one of my recent stories, I’ve explained how to create a no-code AI prediction model using the Microsoft Power Platform to forecast … Read more What Has Changed?

Email Analytics: More than you ever need to know

Hats off to Vicki for sparking the motivation to write this, I’m always open to ideas and suggestions for topics to write about *hinthint* If you came to me in 2010 and told me that email tracking analytics would be the hot stuff again in 2020, I would’ve thought you were crazy. But here I … Read more Email Analytics: More than you ever need to know

You Need ModelOps To Scale

As companies, particularly large organizations, scale up their models as a part of building an enterprise-wide pipeline, there’s an increasing need to operationalize the model development process. Similar to DevOps, models need to be developed, integrated, deployed and monitored. Often, with Enterprise AI initiatives, there are a host of governance considerations such as data integrity, … Read more You Need ModelOps To Scale

Optimize Dataproc costs using VM machine typeOptimize Dataproc costs using VM machine typeProduct Manager Google Cloud

Dataproc is a fast, easy-to-use, fully managed cloud service for running managed open source, such as Apache Spark, Apache Presto, and Apache Hadoop clusters, in a simpler, more cost-efficient way. We hear that enterprises are migrating their big data workloads to the cloud to gain cost advantages with per-second pricing, idle cluster deletion, autoscaling, and … Read more Optimize Dataproc costs using VM machine typeOptimize Dataproc costs using VM machine typeProduct Manager Google Cloud

Natural Language Processing — Beginner to Advanced (Part-2)

The NLP Project Basic Lexical Processing — preprocessing steps that are a must for textual data before doing any type of text analytics. Photo by Annie Spratt on Unsplash In this part of the series ‘The NLP Project’, we will understand the various preprocessing steps that must be applied before doing any type of text … Read more Natural Language Processing — Beginner to Advanced (Part-2)

Interactive COVID-19 visualizations using Plotly with 4 lines of code

Doing cool things with data! In this age of technology, data is the new oil. Organizations all over the world are transforming their environments, processes and infrastructures to become more data-driven. A major reason is that data analytics and machine learning gives organizations visibility into how to run their business better. The push to remote … Read more Interactive COVID-19 visualizations using Plotly with 4 lines of code

Y is for scale_y

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Yesterday, I talked about scale_x. Today, I’ll continue on that topic, focusing … Read more Y is for scale_y