The Data Scientist who rules the ‘Data Science for Good’ competitions on Kaggle.

A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey? Shivam: Data science is all about ideas and experiments. It is all about trying those ideas and experiments, and re-iterating again and again until a successful stage is reached. It’s about developing a mindset … Read moreThe Data Scientist who rules the ‘Data Science for Good’ competitions on Kaggle.

Say Hello to Asynchronous Search for PySpark

“Methods that scale with computation are the future of AI” [1],“The two (general purpose) methods that .. scale …are search and learning.” [2] Prof Rich Sutton, Father of Reinforcement Learning in “The Bitter Lesson” TLDR; Hopsworks uses PySpark to parallelize machine learning applications across lots of containers, containing GPUs if you need them. PySpark’s stage-based … Read moreSay Hello to Asynchronous Search for PySpark

Why the Gaussian distribution is a “natural” choice (Part 1)

Common to all scientific theories is the ambition of deriving observable quantities starting from some abstract model. The parameters of the theory are usually assumed to be known, for instance, based on first principles, direct measurement or something more sophisticated like symmetry considerations. On the other hand, in the Big Data era, growing interest is … Read moreWhy the Gaussian distribution is a “natural” choice (Part 1)

Airbnb Price Prediction Using Linear Regression (Scikit-Learn and StatsModels)

Before diving head first into the data and producing large correlation matrices, I always try to think of the question and get a sense of the features. Why am I doing this analysis? What’s the goal? What relationships between features and the target variable make sense? # Import packagesimport pandas as pdimport patsyimport statsmodels.api as … Read moreAirbnb Price Prediction Using Linear Regression (Scikit-Learn and StatsModels)

Amazon Lex Achieves PCI DSS Compliance

Amazon Lex is now a Payment Card Industry Data Security Standard (PCI DSS) compliant service.  Customers can now use Amazon Lex to capture, transmit, and retrieve sensitive payment card data for use cases such as payment processing, and mobile wallet that are subject to PCI DSS compliance.  PCI DSS is a proprietary information security standard … Read moreAmazon Lex Achieves PCI DSS Compliance

Quality Control Charts: x-bar chart, s-chart and Process Capability Analysis

The x-bar and R-chart are quality control charts used to monitor the mean and variation of a process based on samples taken in a given time. The control limits on both chats are used to monitor the mean and variation of the process going forward. If a point is out of the control limits, it … Read moreQuality Control Charts: x-bar chart, s-chart and Process Capability Analysis

Vignette: Google Trends with the gtrendsR package

Background Google Trends is a well-known, free tool provided by Google that allows you to analyse the popularity of top search queries on its Google search engine. In market exploration work, we often use Google Trends to get a very quick view of what behaviours, language, and general things are trending in a market. And … Read moreVignette: Google Trends with the gtrendsR package

Top three mistakes with K-Means Clustering during data analysis

In this post, we will take a look at a few cases, where KMC algorithm does not perform well or may produce unintuitive results. In particular, we will look at the following scenarios: Our guess on the number of (real) clusters is off. Feature space is highly dimensional. The clusters come in strange or irregular … Read moreTop three mistakes with K-Means Clustering during data analysis

Practical Data Science with R 2nd Edition update

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We are in the last stages of proofing the galleys/typesetting … Read morePractical Data Science with R 2nd Edition update

How to Build a Chatbot — A Lesson in NLP

NLP Chatbot Pattern matching is simple and quick to implement but it can only go so far. It needs a lot of pre-generated templates and is useful only for applications which expect a limited number of questions. xkcd Enter NLP! NLP is a collection of slightly advanced techniques which can understand a broad range of … Read moreHow to Build a Chatbot — A Lesson in NLP

Plus Codes (Open Location Code) and Scripting in Google BigQuery

Open Location Code has a library of open source code in different languages that generate and read Plus Codes. When BigQuery scripting launched, I figured creating something that encodes Plus Codes could be a neat way to introduce this new feature. It allows for easily encoding coordinates in bulk, with BigQuery bringing along a number … Read morePlus Codes (Open Location Code) and Scripting in Google BigQuery

Improve your connectivity to Google Cloud with enhanced hybrid connectivity optionsImprove your connectivity to Google Cloud with enhanced hybrid connectivity optionsProduct ManagerProduct Manager

Whatever the requirement—from enterprise-readiness fundamentals like reliability, performance, and security, to innovations for enabling microservices architecture or hybrid and multi-cloud deployments—the Google Cloud networking portfolio has something to offer.   At NEXT ‘19 in San Francisco, we announced the betas for 100 Gbps Dedicated Interconnect as well as High Availability (HA) VPN. Today, we’re excited to … Read moreImprove your connectivity to Google Cloud with enhanced hybrid connectivity optionsImprove your connectivity to Google Cloud with enhanced hybrid connectivity optionsProduct ManagerProduct Manager

Flying high with Vaex: analysis of over 30 years of flight data in Python

Air travel has had a profound effect on our society. It is one of the main drivers of globalisation. Commercialization of air travel made our world a smaller, more connected place. People can easily explore the far corners of the Earth, and build relationships with cultures far removed from their own. Businesses can grow and … Read moreFlying high with Vaex: analysis of over 30 years of flight data in Python

Predicting the Number of Wildfires in the Amazon Rainforest Using Random Forests

We now define a function which specifies the model parameters for the random forest algorithm. This function can be used to optimize the model parameters during testing such that the error is minimized. We do this by changing the N_ESTIMATORS and MAX_DEPTH values until we minimize the error metric. This process is called hyperparameter tuning: … Read morePredicting the Number of Wildfires in the Amazon Rainforest Using Random Forests

Solving Sudoku with Convolution Neural Network | Keras

In a typical multi-class classification, the neural network outputs scores for each class. Then we apply softmax function on the final scores to convert them into probabilities. And the data is classified into a class that have highest probability value(refer to the following image). Multiclass Classification But in sudoku, the scenario is different. We have … Read moreSolving Sudoku with Convolution Neural Network | Keras

Analyzing Amazon Wildfire Data

Source As a result of the warming climate, wildfires in the Amazon rain forest have been of increasing concern. Here we will explore and analyze the Fires in Brazil data set provided by the Brazilian Government. The data is available here. Exploring data is commonly the first step in building predictive models in data science. … Read moreAnalyzing Amazon Wildfire Data

Modeling Lunar Cycles in Tweets and Financial Markets using Facebook Prophet

I recently read an article suggesting that crime rates increase on full moons. Curious about the effect, I did a little research and it seems that the question of whether lunar cycles affect human behavior has been hotly debated in academic circles for decades (see Thakur and Sharma, 1984 and Rotton and Kelly, 1985 for … Read moreModeling Lunar Cycles in Tweets and Financial Markets using Facebook Prophet

From Cups to Consciousness (Part 3): Mapping your home with SLAM

“All you need is a plan, the road map, and the courage to press on to your destination” — Earl Nightingale In the previous part of this series we talked about how the road to AGI could be divided into perception and control. Within control, navigation and grasping are a crucial part of the roadmap … Read moreFrom Cups to Consciousness (Part 3): Mapping your home with SLAM

Job: Junior Systems Administrator (with a focus on R/Python)

[This article was first published on r – Jumping Rivers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Jumping Rivers is a data science consultancy company focused on … Read moreJob: Junior Systems Administrator (with a focus on R/Python)

rBokeh – Don’t be stopped by missing arguments!

[This article was first published on r-bloggers | STATWORX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In my last article on the STATWORX blog, I have guided … Read morerBokeh – Don’t be stopped by missing arguments!

Announcing the general availability of larger, more powerful standard file shares for Azure Files

Better scale and more power for IT professionals and developers! We’re excited to announce the general availability of larger, more powerful standard file shares for Azure Files. Azure Files is a secure, fully managed public cloud file storage with full range of data redundancy options and hybrid capabilities using Azure File Sync. Here is a … Read moreAnnouncing the general availability of larger, more powerful standard file shares for Azure Files

SAP on Azure–Designing for Efficiency and Operations

This is the final blog in our four-part series on Designing A Great SAP on Azure Architecture. Robust SAP on Azure Architectures are built on the pillars of Security, Performance and Scalability, Availability and Recoverability, and Efficiency and Operations. Within this blog we will a cover a range of Azure services and a new GitHub … Read moreSAP on Azure–Designing for Efficiency and Operations

Repetitive Q: Reading Multiple Files in the Zip Folder

Dear Readers, I always see a repetitive question coming to me and across various forums on how to read multiple files in the zip folder of same separator or multiple separator. Again, here, lets not compromise on speed. Solution is to use easycsv package in R, which in turn uses data.table package function “fread”.Find below … Read moreRepetitive Q: Reading Multiple Files in the Zip Folder

Data Science and Healthcare

I recently went to a Data & Healthcare meetup to see how a renowned cancer research institute, Memorial Sloan Kettering Cancer Center (MSK), applies and uses Data Science. One of the first uses they mentioned was predictive modeling for post-hospital care. Whether it is from being discharged after an acute condition that required a hospital … Read moreData Science and Healthcare

Real-time inference at scale on AWS

The primary way to incorporate machine learning into applications is by deploying a trained model as a web API on cloud infrastructure. Running inference at scale requires an understanding of several engineering design principles that may have more to do with DevOps than Data Science. Running machine learning in production is not just about deploying … Read moreReal-time inference at scale on AWS

Build Face Recognition as a REST API

Let’s define two APIs using the face_recognition package. Compare two faces: Upload two images and return True / False for matching Recognize a face from a known dataset: Upload an image and return the name of the person. Face Recognition Functions Two face recognition functions are defined for the two APIs as util in file … Read moreBuild Face Recognition as a REST API

Look Before You Leap —The Careful Use of Observational Data in Business (Part 1 of 2)

You introduced product recommendations on one section of your e-commerce website a few months ago and want to know if it is ‘working’. One of the simplest comparisons you can do is compare the spending of visitors who clicked on a recommendation with those visitors who didn’t. This is what you find. Website visitors who … Read moreLook Before You Leap —The Careful Use of Observational Data in Business (Part 1 of 2)

Map coloring: the color scale styles available in the tmap package

This vignette builds on the making maps chapter of the Geocomputation with R book.Its goal is to demonstrate all possible map styles available in the tmap package. Prerequisites The examples below assume the following packages are attached: library(spData) # example datasets library(tmap) # map creation library(sf) # spatial data reprojection The world object containing a … Read moreMap coloring: the color scale styles available in the tmap package

Non-Gaussian forecasting using fable

[This article was first published on R on Rob J Hyndman, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. library(tidyverse) library(tsibble) library(lubridate) library(feasts) library(fable) In my previous post … Read moreNon-Gaussian forecasting using fable

2 Months in 2 Minutes – rOpenSci News, October 2019

rOpenSci HQ What would you like to hear about in an rOpenSci Community Call? We are soliciting your “votes” and new ideas for Community Call topics and speakers. Find out how you can influence us by checking out our new Community Calls repository. Videos, speaker’s slides, resources and collaborative notes from our Community Call on … Read more2 Months in 2 Minutes – rOpenSci News, October 2019

Navigating the Sea of Explainability

Setting the right course and steering responsibly This article is coauthored by Joy Rimchala and Shir Meir Lador. Rapid adoption of complex machine learning (ML) models in recent years has brought with it a new challenge for today’s companies: how to interpret, understand, and explain the reasoning behind these complex models’ predictions. Treating complex ML … Read moreNavigating the Sea of Explainability

Advancing Text Mining with R and quanteda

Known categories: Dictionaries Dictionaries contain lists of words that correspond to different categories. If we apply a dictionary approach, we count how often words that are associated with different categories are represented in each document. These dictionaries help us to classify (or categorize) the speeches based on the frequency of the words that they contain. … Read moreAdvancing Text Mining with R and quanteda

Amazon Chime now supports screen sharing from Mozilla Firefox and Google Chrome without a plug-in or extension

Amazon Chime users can now screen share from Mozilla Firefox or Google Chrome without installing any extensions or downloading a plug-in on Microsoft Windows, macOS, Linux and Chrome OS desktop devices. Chime now leverages the Web APIs available in Mozilla Firefox version 66 and higher or Google Chrome version 72 and higher to deliver native … Read moreAmazon Chime now supports screen sharing from Mozilla Firefox and Google Chrome without a plug-in or extension

Making Convolutional Networks Shift-Invariant Again

What’s wrong with modern convolutional networks and how can we fix them? In April 2019, a new computer vision paper appeared, titled “Making Convolutional Networks Shift-Invariant Again”. Why are modern convolutional networks not shift-invariant anymore, what does this mean, and how do we make them shift-invariant again? Most modern convolutional networks are not robust against … Read moreMaking Convolutional Networks Shift-Invariant Again

Chi-Square Test for Feature Selection in Machine learning

Feature selection always plays a key role in machine learning We always wonder where the Chi-Square test is useful in machine learning and how this test makes a difference. Feature selection is an important problem in machine learning, where we will be having several features in line and have to select the best features to … Read moreChi-Square Test for Feature Selection in Machine learning

The Rise of Meta Learning

https://openai.com/blog/solving-rubiks-cube/ Meta-Learning describes the abstraction to designing higher level components associated with training Deep Neural Networks. The term “Meta-Learning” is thrown around in Deep Learning literature frequently referencing “AutoML”, “Few-Shot Learning”, or “Neural Architecture Search” when in reference to the automated design of neural network architectures. Emerging from comically titled papers such as “Learning to … Read moreThe Rise of Meta Learning

Stay in control of your security with new product enhancements in Google CloudStay in control of your security with new product enhancements in Google CloudProduct Manager

When it comes to securing your cloud infrastructure, there is no shortage of challenges. You want to retain the visibility and control you had on-premises, while taking advantage of all the benefits the cloud can provide. The adoption of cloud-based services, for example, makes it easier for your development teams to quickly build and push … Read moreStay in control of your security with new product enhancements in Google CloudStay in control of your security with new product enhancements in Google CloudProduct Manager

Demystifying Convolutional Neural Networks using ScoreCam

Recently, more and more attention is focussed on the interpretability of the Machine Learning models mainly Deep Learning ones because of their black-box nature. One such important Deep Learning architecture used is Convolutional Neural Networks(CNNs) which has made a breakthrough in Computer Vision including image classification, object detection, semantic segmentation, Instance segmentation, image captioning etc. … Read moreDemystifying Convolutional Neural Networks using ScoreCam

From Zero to SOTA in Reinforcement Learning

With that said, over the last year or so we’ve spent a considerable amount of time reading, returning to and distilling a favourite field of ours: Reinforcement Learning (RL). For those interested in RL as a branch of AI, we’re open-sourcing an RL course that we built last year as an introduction for engineers & … Read moreFrom Zero to SOTA in Reinforcement Learning

How to Create Multiple Worksheets From a List of Column Values and Delete Any Empty Columns…

Once the data frames are created, we can now check if there are any empty columns. First, we should put all the data frames into a list, so we can apply a function over a list (function to all the data frames at once). The below function will get rid of the columns with only … Read moreHow to Create Multiple Worksheets From a List of Column Values and Delete Any Empty Columns…

Trusted Cloud: security, privacy, compliance, resiliency, and IP

Can you trust your cloud provider? That’s a question being asked a lot of these days, and with the newest version of our popular white paper Trusted Cloud: Microsoft Azure security, privacy, compliance, resiliency, and protected IP we’ve worked to provide you answers. When we first published Trusted Cloud in 2015, the paper was 13 … Read moreTrusted Cloud: security, privacy, compliance, resiliency, and IP

Using bwimge R package to describe patterns in images of natural structures

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This tutorial illustrates how to use the bwimge R package (Biagolini-Jr 2019) to … Read moreUsing bwimge R package to describe patterns in images of natural structures