AI, Machine Learning and Data Science Roundup: January 2020

A roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted recently. Open Source AI, ML & Data Science News Pandas 1.0.0 is released, a milestone for the ubiquitous Python data frame package. … Read more

Too big to deploy: How GPT-2 is breaking production

A look at the bottleneck around deploying massive models to production The most optimistic of us envision a future in which machine learning is capable of human-level tasks—driving our cars, answering our calls, booking our appointments, responding to our emails. Reality, of course, is different. Modern production machine learning has only effectively tackled very tightly … Read more

The right Electric Vehicle for me: a use case for Conjoint Analysis

The lease details are contained in the fields monthly_cost, upfront_cost and term. The remaining fields concern specifications of the electric vehicle (e.g. range, Sedan or SUV). The relative popularity of the vehicles can be found here: Given a dizzying amount of choice, which EV should you purchase? Conjoint Analysis provides a principled way of … Read more

The role of Process Mining in Digital Transformations

Can you transform what you don’t comprehend? Image by Monster Ztudio licensed via Adobe Stocks Genchi gembutsu is a Japanese term that translates to “go and see.” These are two words that transformation leaders must never forget. In the context of a digital transformation, what genchi gembutsu means is that without analyzing the place where … Read more

Will Streamlit cause the extinction of Flask?

Maybe for Machine Learning (ML) and Deep Learning (DL). For other full-stack applications, probably not! We have yet to encounter one of our Flask-based ML or DL micro-services that can not be refactored into a Streamlit service. The challenge is to keep Streamlit micro-services small by only replacing only 2 to 3 Flask-based micro-services. Extinction … Read more

AWS Backup is now available for Amazon Elastic File System (Amazon EFS) in 4 additional regions

AWS Backup offers a centralized, managed service to back up data across AWS services in the cloud and on premises using Storage Gateway. AWS Backup serves as a single dashboard for backup, restore, and policy-based retention of different AWS resources, including Amazon EBS volumes, Amazon EC2 instances, Amazon RDS databases, Amazon DynamoDB tables, Amazon EFS … Read more

Categories AWS ExcerptFavorite

A Quick Introduction to CMIP6

Climate Data Science How to easily access the next generation of climate models with Python. The Coupled Model Intercomparison Project (CMIP) is a huge international collaborative effort to improve the knowledge about climate change and its impacts on the Earth System and on our society. It’s been going around since the 90s and today we … Read more

Create Your Free Blog Site

Spoiler Alert: Github You have to create the Content. Many other people will do the likes. In case you missed Jeremy Howard’s tweet: I did a deep-dive into @GitHub Pages, and found it’s possible to create a *really* easy way to host your own blog: no code, no terminal, no template syntax. I made “fast_template” … Read more

Identifying and tracking toil using SRE principlesIdentifying and tracking toil using SRE principlesSRE Systems Engineer

One of the key measures that Google site reliability engineers (SREs) use to verify our effectiveness is how we spend our time day-to-day. We want ample time available for long-term engineering project work, but we’re also responsible for the continued operation of Google’s services, which sometimes requires doing some manual work. We aim for less … Read more

Understanding AdaBoost for Decision Tree

An implementation with R Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions. In my former article, I’ve been introducing some ensemble methods for decision trees, whose aim is that of … Read more

Are you effectively evangelizing data

You may have heard, data has become a thing. The space has quickly grown, and its popularity, and interest has never been bigger. However as in demand as data may be, you may be surprised how much excitement you will need to build for your Data and Analytics (DNA) practice, in order to effectively communicate … Read more

50+ Free DataSets for DataScience Projects

Hello All, This is just a short note to specify that the list of FREE datasets is updated for 2020. There are 50+ sites and links to the newly released Google Dataset search engine. So, have fun exploring these data repositories to master programming, create stunning visualizations and build your own unique project portfolios. Some … Read more

Categories R Tags ExcerptFavorite

rco: Make Your R Code Run Faster Today!

[This article was first published on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The rco package can optimize R code in a variety of different ways. … Read more

Categories R Tags ExcerptFavorite

15+ Resources to Get Started with R

[This article was first published on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. R is the second most sought after language in data science behind Python, … Read more

Categories R Tags ExcerptFavorite

Beginners guide to Bubble Map with Shiny

[This article was first published on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Map BubbleMap bubble is type of map chart where bubble or circle position  … Read more

Categories R Tags ExcerptFavorite

Measuring Quality of Conversations of AI Agents

Artificial Intelligence (AI) agents are everywhere. They are embedded within your smartphone (Apple Siri, Google Assistant), they are in your smart home devices (Amazon Alexa, Google Home), you have probably interacted with some while speaking to a company’s customer service department, and they are embedded in the chat widget for many websites you visit; you … Read more

Time-Series Forecasting in Real Life: Budget forecasting with ARIMA

There are several ways you can model a time series, the most popular are: Simple moving average With this approach, you’re saying the forecast is based on the average of the n previous data points. Exponential Smoothing It exponentially decreases the weight of previous observations, such that increasingly older data points have less impact in … Read more

What is a GAN?

How a weird idea became the foundation of cutting-edge AI Take any course on machine learning and you’ll invariably encounter Generative Adversarial Networks, or GANs. Understanding them means mastering the surprising power of playing a computer out against itself. It’s around five o’clock and you’ve just finished your homework. ‘I’m done!’ ‘Great! Would you like … Read more

Analyzing Yelp Dataset with Scattertext spaCy

Exploratory data analysis and visualization for text data using NLP Scattertext spaCy One of the most crucial work in the text mining field is to present the content of the text data visually. Using natural language processing (NLP), a data scientist can summarize documents, create topics, explore storylines of the content in different angles and … Read more

How Not to Run an A/B Test

Sometimes we might see the true difference early and other times we have to wait for that real change to materialize. After some point, the p-value step function flips and stays flipped. In other words, after some time we become rather certain that there really is a difference, and if we test for significance, our … Read more

Bayesian Neural Networks with TensorFlow Probability

If you have not installed TensorFlow Probability yet, you can do it with pip, but it might be a good idea to create a virtual environment before. pip install –upgrade tensorflow-probability Open your favorite editor or JupyterLab. Import all necessarty libraries. # Load libriaries and functions.import pandas as pdimport numpy as npimport tensorflow as tftfk … Read more

Comparing Ensembl GTF and cDNA

It seems that most people think Ensembl’s GTF file and cDNA fasta file mean the same transcripts: Watch out! @ensembl‘s Fasta and GTF annotation files available via do not match (there are transcripts in the GTF not found in the Fasta file. Anyone else expected them to match? — K. Vitting-Seerup (@KVittingSeerup) August 13, … Read more

Categories R Tags ExcerptFavorite

An efficient way to install and load R packages

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Unlike other programs, only fundamental functionalities come by default … Read more

Categories R Tags ExcerptFavorite

A Shiny App for Tracking Moral Networks

This is a post outlining a ShinyApp that I made for visualising inter-participant agreement on quesions relating to Haidt’s Moral Foundations (e.g., Haidt and Joseph 2008). This is part of a line of research on moral judgements, inspired by DAFINET project, where I aim to investigate the role of agreement with others in the robustness … Read more

Categories R Tags ExcerptFavorite

another easy Riddler

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A quick riddle from the Riddler In a two-person game, … Read more

Categories R Tags ExcerptFavorite

Python and AWS SSM Parameter Store

Now, it’s time to write the script that will retrieve the secret parameter we just stored in SSM parameter store and decrypt it so we can use it in our application. Let’s create a new Python file in the project directory and name it Notice the get_parameter() function’s argument named WithDecryption. In this … Read more

Is Explainable AI (xAI) the Next Step, or Just Hype?

Recent years have seen the expansion of artificial intelligence into an array of industries with varying levels of disruption. Once a horizon-technology (perhaps similar to how we now view quantum computing) AI has officially breached everyday life, and informed opinions are no longer reserved for tech enthusiasts and elite data scientists. Now, stakeholders include executives, … Read more

How Data Scientists Can Balance Practicality and Rigor

When building quantitative systems that drive commercial value, pragmatism and innovation are not in conflict with one another. For growing and lean start-ups with challenging research problems and data-focused customers, data science research must yield clear business wins quickly and iteratively. An effective approach to scaling technology in these environments must embody a mix of … Read more

AWS Batch now available in AWS GovCloud (US) Regions

AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU, GPU, or memory-optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. With AWS Batch, … Read more

Categories AWS ExcerptFavorite

Windows Server applications, welcome to Google Kubernetes EngineWindows Server applications, welcome to Google Kubernetes EngineProduct ManagerProduct Manager

In the beta release of Windows Server container support in GKE (version 1.16.4), Windows and Linux containers can run side-by-side in the same cluster. This release also includes several other features aimed at helping you meet the security, scalability, integration and management needs of your Windows Server containers. Some highlights include: Private clusters: a security … Read more

Building the R Community in Southern Africa

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. By Heather Turner, Chair of Forwards, the R Foundation taskforce for underrepresented … Read more

Categories R Tags ExcerptFavorite

12-Hour ML Challenge

GIPHY Well, Christmas. It used to be the time of the year when I hung out with my wife and puppy on the couch and binge-watched movies and shows. Then, this Christmas. Something changed. For some reason, most of the stuff I find on Netflix or YouTube seemed to be quite boring. Maybe I’ve reached … Read more

An Introduction to Unity ML-Agents

The past few years have witnessed breakthroughs in reinforcement learning (RL). From the first successful use of RL by a deep learning model for learning a policy from pixel input in 2013 to the OpenAI Dexterity program in 2019, we live in an exciting moment in RL research. Consequently, we need, as RL researchers, to … Read more

A tactile guide to Python Collections Final

Photo by chuttersnap on Unsplash Python is a powerful programming language with a dynamic semantics, that is guided by 19 principles known as the “Zen of python”. The principles are listed below: “Beautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Complex is better than complicated.Flat is better than nested.Sparse is better … Read more

Guinea Pig Breed Classification

Step 6. Model Comparison and Selection The InceptionV3 transfer learning model had the best scores overall. Metric Comparison for different Models On top of that, it was able to brilliantly classify ‘Skinny’, where classical classifiers had generally failed (high recall). Notably, none of the models were able to confidently identify ‘Abyssinian’ as per se (low … Read more

P versus NP — The million dollar problem!

On May 24, 2000, Clay Mathematics Institute came up with seven mathematical problems, for which, the solution for any of the problem will earn US $1,000,000 reward for the solver. Famously know as the Millennium Problems, so far only one of the seven problems is solved till date. Wanna make a million dollar, try solving … Read more

Hyperledger Fabric on Azure Kubernetes Service Marketplace template

Customers exploring blockchain for their applications and solutions typically start with a prototype or proof of concept effort with a blockchain technology before they get to build, pilot, and production rollout. During the latter stages, apart from the ease of deployment, there is an expectation of flexibility in the configuration in terms of the number … Read more

Building Models with Keras

Keras is a high-level API for building neural networks in python. The API supports sequential neural networks, recurrent neural networks, and convolutional neural networks. It also allows for easy and fast prototyping due to its modularity, user-friendliness, and extensibility. In this post, we will walk through the process of building sequential neural networks for regression … Read more

Building sensitivity atlases

Researchers, environmental managers, ecologists, researchers — we are all looking for a better perspective. Fieldwork, remote sensing and systemic understanding lets us piece together knowledge which in turn can be used to prioritize human interaction with our environment. All this knowledge represents generalizations. The paradox of generalisations is that they are some times needed for … Read more