How to Fail The Azure Fundamentals Certification

The amount of time you need to prepare for the exam will vary significantly. If you’re a senior Cloud engineer who’s transferring from AWS to Azure, you probably don’t need more than a day at tops. If you’re wondering why Microsoft offers a certification covering those fluffy white things in the sky, give yourself a … Read more How to Fail The Azure Fundamentals Certification

Association Rule Mining: What Frequent Itemsets is all about?

Photo by Kelly Sikkema on Unsplash Finding the frequency of occurrence of unique combinations of items To understand frequent itemsets one first needs to understand frequent and itemsets. Let us first look at what itemsets mean. simply put itemsets are the group of items that appear together in a transaction or record. The size of … Read more Association Rule Mining: What Frequent Itemsets is all about?

Ultimate Guide to Merging/Joining Data in Pandas

Introduction The goal of this article is that you come away with a strong knowledge of combining data in pandas using precise methods suited for any question you want to ask about your data. With each data science project or dataset, you want to perform several analyses and create plots to find insights. Often, the … Read more Ultimate Guide to Merging/Joining Data in Pandas

Azure Container Instances – Docker integration now in Docker Desktop stable release

We’re happy to announce the new stable release of Docker Desktop includes the Azure Container Instances – Docker integration. Install or update to the latest release and get started deploying containers to Azure Container Instances (ACI) today. Azure Docker integration The Azure Docker integration enables you to deploy serverless containers to Azure Container Instances (ACI) … Read more Azure Container Instances – Docker integration now in Docker Desktop stable release

Build a scalable security practice with Azure Lighthouse and Azure Sentinel

The Microsoft Azure Lighthouse product group is excited to launch a blog series covering areas in Azure Lighthouse where we are investing to make our service provider partners and enterprise customers successful with Azure. Our first blog in this series covers a top area of consideration for companies worldwide—Security with focus on how Azure Lighthouse … Read more Build a scalable security practice with Azure Lighthouse and Azure Sentinel

Azure NetApp Files cross region replication and new enhancements in preview

As businesses continue to adapt to the realities of the current environment, operational resilience has never been more important. As a result, a growing number of customers have accelerated a move to the cloud, using Microsoft Azure NetApp Files to power critical pieces of their IT infrastructure, like Virtual Desktop Infrastructure, SAP applications, and mission-critical … Read more Azure NetApp Files cross region replication and new enhancements in preview

Distributed Deep Learning Training with Horovod on Kubernetes

You may have noticed that even a powerful machine like the Nvidia DGX is not fast enough to train a deep learning model quick enough. Not mentioning the long wait time just to copy data into the DGX. Datasets are getting larger, GPUs are disaggregated from storage, workers with GPUs need to coordinate for model … Read more Distributed Deep Learning Training with Horovod on Kubernetes

Understanding Signals. It’s not that complicated.

Sound is a wave that results from the back and forth vibration of the medium particles through which the sound wave moves. These sound waves consist of a repeating pattern of high-pressure and low-pressure regions. They are also referred to as pressure waves. An example of a sine wave. When we hear something, our brain … Read more Understanding Signals. It’s not that complicated.

Best places for new businesses in Florianópolis, Brazil (North Shore): a Foursquare data analysis

Which opportunities could the data show? Introduction The Background Brazil is known for Rio de Janeiro or São Paulo. But one of the most southern states os Brazil has a peculiar capital on an island. Or most of the capital is on this island. I’m talking about Florianópolis. Different from the idyllic idea of an … Read more Best places for new businesses in Florianópolis, Brazil (North Shore): a Foursquare data analysis

How to Build a Machine Learning Model to Identify Credit Card Fraud in 5 Steps

When starting a new modeling project, it is important to start with EDA in order to understand the dataset. In this case, the credit card fraud dataset from Kaggle contains 284,807 rows with 31 columns. This particular dataset contains no nulls, but note that this may not be the case when dealing with datasets in … Read more How to Build a Machine Learning Model to Identify Credit Card Fraud in 5 Steps

Deutsche Bӧrse Group continues its journey to the cloudDeutsche Bӧrse Group continues its journey to the cloudManaging Director, Google Cloud DACHGeneral Manager, Google Cloud Compute

The word “transformation” brings many things to mind, like innovation, agility, and change. Consistency and stability are probably not as high on the list of synonyms, but for regulated industries undergoing digital transformation initiatives, those characteristics are just as critical—in fact, they’re critically important for digital transformation to succeed. Deutsche Bӧrse Group, an international financial … Read more Deutsche Bӧrse Group continues its journey to the cloudDeutsche Bӧrse Group continues its journey to the cloudManaging Director, Google Cloud DACHGeneral Manager, Google Cloud Compute

Automate Data Preparation using Google Colab: Read and Process Citi Bike Data in Zip File

Photo by Anthony Fomin on Unsplash From time to time I get requests from colleagues to process some large data files and report some statistics from the data. Since they rely on Excel as their main data processing/analysis tool and don’t use Python, R or SQL, reading and processing data files with more than 1,048,576 … Read more Automate Data Preparation using Google Colab: Read and Process Citi Bike Data in Zip File

Building a Command Line Application to Check For Open Source Vulnerabilities

Vulnerabilities in open source software, programming languages or projects is a big thing as a single exploit could cause a lot of chaos and lead to the loss of thousands of dollars for big organizations. A lot of companies have been paying attention to vulnerabilities in software, dependencies, and languages they use in powering their … Read more Building a Command Line Application to Check For Open Source Vulnerabilities

Active and Semi-Supervised machine learning: Aug 31 — Sep 11

Explainable AI is a big thing nowadays. In ALEX: Active Learning based Enhancement of a Model’s Explainability, the authors use a novel kind of query strategy: prioritization of instances that are “difficult to explain”. (They use the SHAP framework to determine the latter.) Their goal is to arrive at a classifier that is optimized for … Read more Active and Semi-Supervised machine learning: Aug 31 — Sep 11

Predicting Poetic Movements

Analyzing and categorizing poetry to prepare for content-based recommendation Image by abi ismail on Unsplash Within written media, poetry is often regarded as enigmatic, frivolous, or too niche. As a result, poems (even by established poets) are often overlooked by larger publishers and literature-focused websites alike. (The anti-capitalist nature of poetry may play a role … Read more Predicting Poetic Movements

4.5 years of a relationship, in Facebook activity

Analysing data from Facebook interactions and messages with my girlfriend I recently downloaded all the data Facebook has about me. There were many interesting (read: cringeworthy) things I found, but one area I was particularly keen to examine is how the data reflects the progression of my relationship with my girlfriend. In this article, I’m … Read more 4.5 years of a relationship, in Facebook activity

Machine Learning Tasks on Graphs

A graph is an interesting type of data. We could’ve thought that we can make predictions and train the model in the same way as with “normal” data. Surprisingly, machine learning tasks are defined much differently on graphs and we can categorize it into 4 types: node classification, link prediction, learning over the whole graph, … Read more Machine Learning Tasks on Graphs

Learn to Create a Doodle Draw Game on Android

With PyTorch and Deep Java Library QuickDraw Dataset (Sourcehttps://github.com/googlecreativelab/quickdraw-dataset/blob/master/preview.jpg) The objective of a doodle draw game is to race to create a drawing of a particular item, like a house or a cat, as fast as you can. While the drawing part is simple, before the advent of deep learning it would have been impossible … Read more Learn to Create a Doodle Draw Game on Android

The Most Efficient Way to Read Code Written by Someone Else

As developers, regardless of our specialty, whether it being data science, front end, or back end, we spend more than 75% of our time reading code written by others. That task can be such a demanding task. That being said, the ability to read others’ code efficiently is one of the skills that could make … Read more The Most Efficient Way to Read Code Written by Someone Else

How to Explore Data: {DataExplorer} Package

Let’s get started by loading our packages and importing a bit of data. 2.1 Load Packages # Core Packages library(tidyverse) library(tidyquant) library(recipes) library(rsample) library(knitr) # Data Cleaning library(janitor) # EDA library(skimr) library(DataExplorer) # ggplot2 Helpers library(scales) theme_set(theme_tq()) 2.2 Import Data For our case-study we are using data from the Tidy Tuesday Project archive. Each record … Read more How to Explore Data: {DataExplorer} Package

Predicting pneumonia outcomes: Results (using DataRobot API)

Performance of models The AUC was high and similar for all models.For this study, it was more important to identify as many patients who became worse despite seeing a doctor (i.e. true positive). Identification of these patients with poor outcomes would allow better intervention to be provided to increase their chances of a better clinical evolution. … Read more Predicting pneumonia outcomes: Results (using DataRobot API)

Amazon Route 53 Resolver Now Supports VPC DNS Query Logging in AWS GovCloud (US) Regions

Route 53 Resolver is the Amazon DNS server (also sometimes referred to as “AmazonProvidedDNS” or the “.2 resolver”) that is available by default in all Amazon VPCs. Route 53 Resolver responds to DNS queries from AWS resources within a VPC for public DNS records, Amazon VPC-specific DNS names, and Amazon Route 53 private hosted zones. … Read more Amazon Route 53 Resolver Now Supports VPC DNS Query Logging in AWS GovCloud (US) Regions

Which Optimizer Should I Use in my Machine Learning Project?

The problem with choosing an optimizer is that, due to the no-free-lunch theorem, there is no single optimizer to rule them all; as a matter of fact, the performance of an optimizer is highly dependent on the setting. So, the central question that arises is: Which optimizer suits the characteristics of my project the best? … Read more Which Optimizer Should I Use in my Machine Learning Project?

Improve Quality and Efficiency of your Analysis with this little Python Statement

Image by jeeshots.com on Unsplash A practical example for how to use assert to take advantage of Test Driven Development in Python and SQL Ever written a SQL query or manipulated a Pandas Dataframe? Then you know that it’s way to easy to make one small mistake to produce completely wrong results. And it’s even … Read more Improve Quality and Efficiency of your Analysis with this little Python Statement

Create high quality synthetic data in your cloud with Gretel.ai and Python

Create differentially private, synthetic versions of datasets; while meeting compliance requirements to keep sensitive data within your approved environment. Whether your concern is HIPAA for Healthcare, PCI for the financial industry, or GDPR or CCPA for protecting consumer data, being able to get started building without needing a data processing agreement (DPA) in place to … Read more Create high quality synthetic data in your cloud with Gretel.ai and Python

A beginner’s Guide to Google’s BigQuery GIS

Now that we have added public query datasets, we can query them. Let us see that in the next section. Running GIS Queries with BigQuery You can now run standard SQL queries to explore public datasets. However, since these datasets are usually large, you can run select statements with limiting the number of rows to … Read more A beginner’s Guide to Google’s BigQuery GIS

NFS 4.1 support for Azure Files is now in preview

Azure Files is a distributed cloud file system serving file system SMB and REST protocols generally available since 2015. Customers love how Azure Files enables them to easily lift and shift their legacy workloads to the cloud without any modifications or changes in technology. SMB works great on both Windows and UNIX operating systems for … Read more NFS 4.1 support for Azure Files is now in preview

AI-designed “hyperfoods” can possibly help prevent cancer

We now live longer than ever. Yet, we are not necessarily living healthier anymore: with a rapidly aging population, people are experiencing a continuous growth of chronic diseases such as cancer, metabolic, neurological, and heart disorders. This drives healthcare costs through the roof and puts a significant strain on the public health systems [1]. A … Read more AI-designed “hyperfoods” can possibly help prevent cancer

Preparing for what’s next: Building landing zones for successful cloud migrations

As businesses look to the cloud to ensure business resiliency and to spur innovation, we continue to see customer migrations to Azure accelerate. Increasingly, we’ve heard from business leaders preparing to migrate that they could learn from our best practices and want general help thinking about migration, and we started a blog series to help share … Read more Preparing for what’s next: Building landing zones for successful cloud migrations

Number of Parameters in Feed-Forward Neural Network

Calculating the total number of trainable parameters in the feed-forward neural network by hand Machine learning is solving such a large number of sophisticated problems today that it seems like magic. But there isn’t any magic in machine learning rather it has a strong mathematical and statistical foundation. While trying to understand the important and … Read more Number of Parameters in Feed-Forward Neural Network

The Twitch Data Scientist Interview

Image from Pixabay Introduction Twitch is a live video streaming platform that allows users to watch and broadcast live streamed or pre-recorded videos of the broadcaster’s video game gameplay. The platform is owned and operated by Twitch Interactive, a subsidiary of Amazon. Founded in 2011 as an offspring of the “stream anything platform”, Justin.tv, its … Read more The Twitch Data Scientist Interview

Understanding Cosine Similarity And Its Application

Cosine similarity has its place in several applications and algorithms. From the world of computer vision to data mining, there is lots of usefulness to comparing a similarity measurement between two vectors represented in a higher-dimensional space. Let’s go through a couple of scenarios and applications where the cosine similarity measure is leveraged. 1. Document … Read more Understanding Cosine Similarity And Its Application

Knowledge Distillation — A Survey Through Time

Through this blog you will review Knowledge Distillation (KD) and six follow-up papers. In 2012, AlexNet outperformed all the existing models on the ImageNet data. Neural networks were about to see major adoption. By 2015, many state of the arts were broken. The trend was to use neural networks on any use case you could … Read more Knowledge Distillation — A Survey Through Time

Confusion Matrix — What is it?

You have a binary classification problem at hand. Let’s denote the two classes in the target variable as ‘Negative’ and ‘Positive’. You have the dataset to be used to develop the classifier, have performed exploratory data analysis, feature engineering and have come to a conclusion what model should be trained. You have divided your data … Read more Confusion Matrix — What is it?

Why R? Webinar – Me, Myself and my Rprofile

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This Thursday September 17 we are lauching another [whyr.pl/webinars/][http://whyr.pl/webinars/] entitled Me, Myself and … Read more Why R? Webinar – Me, Myself and my Rprofile

Performance Metrics: Confusion matrix, Precision, Recall, and F1 Score

Unraveling the confusion behind the confusion matrix Image by Jon Tyson, from Unsplash Accuracy performance metrics can be decisive when dealing with imbalanced data. In this blog, we will learn about the Confusion matrix and its associated terms, which looks confusing but are trivial. The confusion matrix, precision, recall, and F1 score gives better intuition … Read more Performance Metrics: Confusion matrix, Precision, Recall, and F1 Score

Understanding and Choosing the Right Probability Distributions with Examples

Poisson distribution helps us to predict the probability of a specific event occurs within a time interval. Main Characteristics of a Poisson Distribution: The number of changes occurring in nonoverlapping intervals is independent. The probability of exactly one change occurring in a sufficiently short interval of length h is approximately λh, where λ>0. The probability … Read more Understanding and Choosing the Right Probability Distributions with Examples

4 techniques to enhance your Research in Machine Learning projects

This is the folder layout I tend to use at the beginning of any ML project. This layout is open to extension (such as adding a tests folder, deploy folder, etc) as soon as the project needs to grow up. project # project root├── data # data files├── models # machine learning models├── notebooks # … Read more 4 techniques to enhance your Research in Machine Learning projects

Amazon AppFlow now supports new data formats for ingesting files into Amazon S3

Amazon AppFlow, a fully managed integration service that enables customers to securely transfer data between AWS services and software-as-a-service (SaaS) applications, now offers customers the flexibility to choose json, comma-separated values (CSV), or parquet as the file format when transferring data from a source application to Amazon S3. This feature is supported for all source … Read more Amazon AppFlow now supports new data formats for ingesting files into Amazon S3