The Struggle of Modern Day Intrusion Detection Systems

There are four cases when traffic attempts to pass through an IDS. The first two cases are normal traffic passing through and malicious traffic being rejected, but there are two cases where traffic can be misclassified. A false positive is when good traffic is considered malicious and rejected before entering the system, and a false … Read moreThe Struggle of Modern Day Intrusion Detection Systems

Amazon Elastic Inference Now Available In Amazon ECS Tasks

With Amazon Elastic Inference support in ECS, you can now choose the task CPU and memory configuration that is best suited to the needs of your application, and then separately configure the amount of inference acceleration that you need with no code changes. This allows you to use resources efficiently and to reduce the cost … Read moreAmazon Elastic Inference Now Available In Amazon ECS Tasks

The Divided States of America — Historical Perspectives

The country has never been as polarized in its modern history as it is today. It is said that in the past presidential elections there were much more competitive counties where the margins between democratic and republican candidates. Following maps depict the trend in the past five presidential elections, and you can find more white … Read moreThe Divided States of America — Historical Perspectives

What to Avoid: Common Mistakes on Data Science Applications

Or what not to do if you want to get noticed in the competitive field of Data Science. Photo by Clem Onojeghuo on Unsplash Data science and machine learning careers are still relatively new and I have previously written an article on the problems data scientists often encounter in their jobs because some companies do … Read moreWhat to Avoid: Common Mistakes on Data Science Applications

The 5 Classification Evaluation metrics every Data Scientist must know

This is my favorite evaluation metric and I tend to use this a lot in my classification projects. The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall. Let us start with a binary prediction problem. We are predicting if an asteroid will hit the earth … Read moreThe 5 Classification Evaluation metrics every Data Scientist must know

Don’t blame the AI, it’s the humans who are biased.

Bias in AI programming, both conscious and unconscious, is an issue of concern raised by scholars, the public, and the media alike. Given the implications of usage in hiring, credit, social benefits, policing, and legal decisions[1], they have good reason to be. AI bias occurs when a computer algorithm makes prejudiced decisions based on data … Read moreDon’t blame the AI, it’s the humans who are biased.

Protecting your GCP infrastructure at scale with Forseti Config ValidatorProtecting your GCP infrastructure at scale with Forseti Config ValidatorStrategic Cloud Engineer

This gets the latest data from the Cloud Storage bucket, and run all the steps mentioned earlier (create a model from the latest inventory, run the scanners and then the notifiers) in an automated fashion.  Once it successfully runs, you can check in Cloud SCC what violations (if any) were found. Since you didn’t add … Read moreProtecting your GCP infrastructure at scale with Forseti Config ValidatorProtecting your GCP infrastructure at scale with Forseti Config ValidatorStrategic Cloud Engineer

HDInsight support in Azure CLI now out of preview

We are pleased to share that support for HDInsight in Azure CLI is now generally available. The addition of the az hdinsight command group allows you to easily manage your HDInsight clusters using simple commands while taking advantage of all that Azure CLI has to offer, such as cross-platform support and tab completion. Key Features … Read moreHDInsight support in Azure CLI now out of preview

The Learning Platform for Data-Driven Companies

The nature of work today requires continuous learning and the ability to respond appropriately to new information—including an increasing abundance of data. Companies must ensure that their employees are data fluent—those that do outperform their peers in revenue growth, market share, profitability, customer and employee satisfaction. Data fluency is the ability to understand data, communicate … Read moreThe Learning Platform for Data-Driven Companies

Automatically Analyzing Laboratory Test Data

Tutorial: Automatically Analyzing Laboratory Data to Create a Performance Map How to write Python programs that perform your data analysis for you It’s very common that scientists find themselves with large data sets. Sometimes it comes in the form of gigabytes worth of data in a single file. Other times it’s hundreds of files, each … Read moreAutomatically Analyzing Laboratory Test Data

SAP on Azure Architecture – Designing for security

This blog post was contributed to by Chin Lai The, Technical Specialist, SAP on Azure. This is the first in a four-part blog series on designing a great SAP on Azure Architecture, and will focus on designing for security. Great SAP on Azure Architectures are built on the pillars of security, performance and scalability, availability … Read moreSAP on Azure Architecture – Designing for security

Building a Topic Modeling Pipeline with spaCy and Gensim

Building the pipeline First things first, let’s import our libraries: A couple of things to note here: If you’re new to data science in general and Python in particular, note that you’ll have to pip install both Gensim and spaCy. spaCy has several different models to choose from. I’m using the large, general purpose web … Read moreBuilding a Topic Modeling Pipeline with spaCy and Gensim

Announcing Azure Private Link

Customers love the scale of Azure that gives them the ability to expand across the globe, while being highly available. Through the rapidly growing adoption of Azure, customers need to access the data and services privately and securely from their networks grow exponentially. To help with this, we’re announcing the preview of Azure Private Link. … Read moreAnnouncing Azure Private Link

Retrieving google drive item shares and permissions (in R)

Google drive is a great tool, specifically we’ve been using “G Suite” (the equivalent of google drive but for businesses), for a long time. Lately I noticed it’s missing an important feature – monitoring file shares and permission of google drive items across organization is non-trival (at least in the G suite basic subscription). I … Read moreRetrieving google drive item shares and permissions (in R)

How to predict the success of your marketing campaign

Linear, tree, forest and support vector regression: comparison, source code and ready-to-use app Photo by Anika Huizinga on Unsplash In this article I am going to walk you through the process of building, training and evaluating a prediction model for the number of ad impressions delivered in a digital marketing campaign. All the techniques can … Read moreHow to predict the success of your marketing campaign

Obtaining tokens with AzureAuth inside a Shiny app

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. by Hong Ooi, senior data scientist, Microsoft Azure As of version 1.2.0 (released … Read moreObtaining tokens with AzureAuth inside a Shiny app

Hindi and Other Languages in India based on 2001 census

India is the world’s largest Democracy and as it goes, also a highly diverse place. This is my attempt to see how “Hindi” and other languages are spoken in India. In this post, we’ll see how to collect data for this relevant puzzle – directly from Wikipedia and How we’re going to visualize it – … Read moreHindi and Other Languages in India based on 2001 census

WPP unlocks the power of data and creativity using Google CloudWPP unlocks the power of data and creativity using Google CloudCEO, Google Cloud

Over the past year, our customers have shared many stories with me about how cloud technology is transforming their businesses. One theme that frequently comes up is how the cloud enables marketers to deliver better customer experiences across online and offline campaigns, email, apps, websites, and more. By stitching together these consumer touchpoints via the … Read moreWPP unlocks the power of data and creativity using Google CloudWPP unlocks the power of data and creativity using Google CloudCEO, Google Cloud

Why the AI community needs to advocate for Universal Basic Income

With AI threatening to automate millions of jobs, is it time we redefine our social constructs? As AI enthusiasts, it is easy for us to get excited over new AI-powered technologies like Tesla Autopilot, Google Duplex and Amazon Go. And we should be! After all, these are some of the most innovative and advanced achievements … Read moreWhy the AI community needs to advocate for Universal Basic Income

Stop Using Mean to Fill Missing Data

MICE, or Multivariate Imputation by Chained Equation (what a memorable term), is an imputation method which works by filling the missing data multiple times. Chained Equation approach also has the benefit of being able to handle different data types efficiently — such as continuous and binary. To quote statsmodels.org, The basic idea is to treat … Read moreStop Using Mean to Fill Missing Data

How Much Data Engineering Does A Data Scientist Need To Know?

& How much he/she does NOT need to know. “It is a capital mistake to theorize before one has data.” Sherlock Holmes Data Science has become a huge buzzword in the last 5 years. Every company wants to do some of this famous Data Science and the topic remains one of the highest-priorities for companies … Read moreHow Much Data Engineering Does A Data Scientist Need To Know?

Accuracy Performance Measures in Data Science: Confidence Matrix

A brief look into various ways by which you can assess your performance measures using Confusion Matrix for Data Science models Photo by Eleonora Patricola on Unsplash In a previous article, I had briefly explained the intricate workings and trappings of a k-NN Model, let us now try to look briefly into first implementing the … Read moreAccuracy Performance Measures in Data Science: Confidence Matrix

Predicting the end: the ROC story cloze task

Can you teach common sense to an NLP model? And can you design a dataset that tests it? In the ROC story cloze task, an NLP model receives a four-sentence story context and must pick the more plausible of two possible story endings. Although both the training and evaluation stories for this task are crowdsourced … Read morePredicting the end: the ROC story cloze task

Why Motivation is the Key to Learning Data Science

Set the right goals, create your own curriculum and make a roadmap for learning During my journey in studying data science outside of formal education, I found that motivation was the key to navigating the complexity of the subject and not getting disheartened by the wealth of information that makes up the well-publicised data scientist … Read moreWhy Motivation is the Key to Learning Data Science

Simulating an open cohort stepped-wedge trial

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In a current multi-site study, we are using a stepped-wedge design … Read moreSimulating an open cohort stepped-wedge trial

citecorp: working with open citations

citecorp is a new (hit CRAN in late August) R package for working with data from theOpenCitations Corpus (OCC).OpenCitations, run by David Shotton and Silvio Peroni,houses the OCC, an open repository of scholarly citation dataunder the very open CC0 license. The I4OC (Initiative for Open Citations)is a collaboration between many parties, with the aim of … Read morecitecorp: working with open citations

Why We Unapologetically Use Deep Learning in Our Forecasts

Many observing the takeover by machine learning in the world of data science are starting to caution against the misuse and overuse of deep learning algorithms. From concerns around the “black box” nature of deep learning making models hard to interpret or explain, to the large amounts of data required for deep learning algorithms to … Read moreWhy We Unapologetically Use Deep Learning in Our Forecasts

Analyzing Healthcare Data With SaturnCloud.io and BigQuery

Using The Cloud To Make Collaboration Easier Photo by Luis Melendez on Unsplash Today we wanted to use discuss using cloud tools that are available to everyone to analyze a medical data set. In particular will be using the Kaggle data set for medicare providers. This has information on diagnosis related groups average costs, hospital … Read moreAnalyzing Healthcare Data With SaturnCloud.io and BigQuery

How Data & AI Will Devour the Game Industry

Patterns. All the patterns forever. The flow of stuff n things which trains the physical and digital “psychopaths” to drive progress from their baser objective outputs. lolwut? Almost comically, our greatest drivers in business, academia, and technology are all obstructed psychopaths that can work amazing together when they make the correct connections towards a similar … Read moreHow Data & AI Will Devour the Game Industry

Introspection on one’s virtual footprint

What do Google and Facebook know about you? You can download and look for yourself. or, a little data psychoanalysis Are you worried about the data collected about you online? If you’re reading this post, I figure you’ve heard this before: you’re already aware of how there is a lot of data about you already … Read moreIntrospection on one’s virtual footprint

Gradient Descent: Show Me the Math!

Gradient descent is an iterative learning algorithm and the workhorse of neural networks. With the many customizable examples for PyTorch or Keras, building a cookie cutter neural networks can become a trivial exercise. However when things go awry, a grasp of the foundations can save hours of tedious debugging. In this post we are going … Read moreGradient Descent: Show Me the Math!

An Overview of AutoML Libraries Used in Industry

A brief summary of Automated Machine Learning technologies and libraries from PyCon JP 2019 PyCon JP 2019 is held in 2019/9/16~ 2019/9/17 for two days. I will publish some posts about the talks I am interested in. As an NLP engineer, I am glad to see some talks related to Machine Learning. This post is … Read moreAn Overview of AutoML Libraries Used in Industry

Announcing user delegation SAS tokens preview for Azure Storage Blobs

Cloud storage often serves as a content source for browser and mobile applications. This is typically achieved using application-issued, pre-authorized URLs which provide time-limited access directly to specific content without requiring a service to proxy this access. Azure Storage supports this pattern through the use of shared access signature tokens (SAS tokens). These tokens grant … Read moreAnnouncing user delegation SAS tokens preview for Azure Storage Blobs

Network Design with Decision Optimization

Watson Studio is a platform offering all that a data scientist needs to create, debug and execute all types of models to solve business problems using Artificial Intelligence (AI). In this post, a common Supply Chain problem, known as Network Design, is used to show how different experiences can be combined to support the complete … Read moreNetwork Design with Decision Optimization

How Zero Trust and Zero Leakage Strategies Enable AI-Machine Learning

Recent high-profile data breaches have stalled AI/ML on the cloud. Here is how Zero Trust & Zero Leakage strategies address these concerns. Evidence is mounting that enterprise AI/ML on the cloud is stalling. The primary culprit: valid and real security concerns. Despite this, the business benefits of AI/ML are overwhelming, and the demand for AI/ML … Read moreHow Zero Trust and Zero Leakage Strategies Enable AI-Machine Learning

A Collection of A/B Testing Learning Resources: Newbie to Master

Section I: Value and “why” to test Before we talk about anything else, why do we need to run A/B tests? The two resources below explain what running an A/B test entails, what it aims to achieve, and how it helps with modern, digital software and product development. As a one-sentence summary, A/B tests establish … Read moreA Collection of A/B Testing Learning Resources: Newbie to Master

Business AI for SMB’s: A Management Take On The Challenges of Leveraging Data and AI for Sales Growth and Enhancing the Customer Experience (CX)

[This article was first published on R – Remix Institute, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The modern business world is captivated by data science and … Read moreBusiness AI for SMB’s: A Management Take On The Challenges of Leveraging Data and AI for Sales Growth and Enhancing the Customer Experience (CX)

Label encoding tricks

Photo by Max Baskakov on Unsplash Most machine learning algorithms cannot work with labeled features. Columns with strings are not easy to convert into numeric values. Let’s consider the following data set which contains cat features: id hair breed1 naked sphynx2 short siamois3 naked sphynx4 angora ankara.. … … Both columns hair and breed contain … Read moreLabel encoding tricks

Writing your first Neural Net in less than 30 lines of code with Keras.

https://unsplash.com/@tvick Reminiscing back to when I first started my journey into AI, I remember all too well how daunting some of the concepts seemed. Reading a simple explanation on what a Neural Network is can quickly lead to a scientific paper where every second sentence is a formula with symbols you’ve never even seen before. … Read moreWriting your first Neural Net in less than 30 lines of code with Keras.

Object Detection with Less Than 10 Lines of Code Using Python

Find out what objects are in the image What to know what objects are in the image? Or perhaps you want to count the number of apples in an image? In this post, I will show you how to create your own object detection program using Python in less than 10 lines of code. You … Read moreObject Detection with Less Than 10 Lines of Code Using Python

Two Stories About Labeling Data by Hand — It Still Works

Sure it’s a pain in the butt and has its own issues, but human brains are still amazing I know. Labeling data by hand can be super tedious and mind-numbing. It’s about as far from glamorous and sexy machine learning work as you can get. Aren’t we, as super smart data scientists, supposed to be … Read moreTwo Stories About Labeling Data by Hand — It Still Works

Guide of Choosing Package Management Tool for Data Science Project

Choose suitable tools from pipenv, conda, anaconda project, docker, etc. If you ever worked on a data science project, you must have asked this question. How to manage package, dependency, and environment? After Google for a while, you should see some keywords, Conda, Anaconda, Miniconda, Anaconda-project, Pipenv, Jupiter, Jupiter lab, Docker … This list can … Read moreGuide of Choosing Package Management Tool for Data Science Project