A R(API)D assessment of travel carbon emissions around the world

As a climate-conscious consumer, I’ve often wondered about the environmental impact of my routine travel from the East Coast to Salt Lake City and back. After seeing one of many recent headlines highlighting air travel’s surprisingly high rates of carbon emissions, I wondered: would taking a train, bus, or driving a car be better for … Read more

You Say You Want a (Data) Revolution

Lessons learned making data a first-class citizen in enterprise Photo by Paul Skorupskas on Unsplash When subscription-based business models started to be all the rage, companies began to realize that being customer-centric is critical for survival — and success. At Gainsight, which helped create and champion the customer success category and where I was the … Read more

Calibration Techniques of Machine Learning Models

CALIBRATION is a post-processing technique to improve error distribution of a predictive model. The evaluation of machine learning (ML) models is a crucial step before deployment. It is essential to assess how well a model will behave for every single case. In many real applications, along with mean error of the model, it is also … Read more

Algorithmic Beauty: An Introduction to Cellular Automata

An overview of simple algorithms that generate complex, life-like results. The famous rule 30; capable of generating pseudo-random numbers from simple/deterministic rules. Rule 30 was discovered by Stephan Wolfram in ’83. Cellular Automata (CA) are simultaneously one of the simplest and most fascinating ideas I’ve ever encountered. In this post I’ll go over some famous … Read more

AWS Direct Connect Support for AWS Transit Gateway is now Available in Asia Pacific (Sydney) Region

AWS Direct Connect gateway allows you to access any AWS Region (except China) using your AWS Direct Connect connections. You can associate up to three Transit Gateways from any AWS Region with each Direct Connect gateway. AWS Direct Connect is introducing a new type of virtual interface called the transit virtual interface to support connectivity … Read more

Categories AWS ExcerptFavorite

How to Start a Data Science Project That Will Help You Stand Out

A practical way of looking for impactful projects to break into the field Someday in the future, I see myself digging into messy data to find answers and discover important strategic insights for a company. For many aspiring Data Analysts like me, the road to that destination is under construction. Knowing that you are competing … Read more

Give some semantic love to your keyword search!

Image source: https://ebiquity.umbc.edu At first, search engines (Google, Bing, Yahoo, etc.) were lexical: the search engine looked for literal matches of the query words, without an understanding of the query’s meaning and only returning links that contained the exact query. But, with the advent of machine learning and new techniques in the field of Natural … Read more

How to Learn Data Science for Free

Technical skills The first part of the curriculum will focus on technical skills. I recommend learning these first so that you can take a practical first approach rather than say learning the mathematical theory first. Python is by far the most widely used programming language used for data science. In the Kaggle Machine Learning and … Read more

Semantic segmentation : visualization of learning progress by TensorBoard

Building and training of neural networks is not a straightforward process unless you play with the MNIST dataset, kind-of “Hello world” application in the deep learning world. It is very easy to commit a mistake and spend days wondering why the network does not have a performance you expected. Normally, deep learning libraries have some … Read more

AI is “Smarter” than you, you’re“Better” than it, but you’re maybe not “Smart” enough to know it.

Photo by Rock’n Roll Monkey on Unsplash Provocative Controversy is the best. It’s this neat thing we do internally by feeling some logical truth which translates into some emotional sentiment and we call it “things” and make T-shirts out of it. Brains are funny little squishy meatballs, capable of imagining things bigger than our universe … Read more

3 Essential Python Skills for Data Scientists

Lambda functions are just so powerful. Yeah, you won’t use them when you have to clean multiple columns the same way — but that’s not something that happened to me very often — more often than not, each attribute will require its own logic behind cleaning. Lambda functions allow you to create ‘anonymous’ functions. This … Read more

Working with VSCode and Jupyter Notebook Style

If you are getting started with machine learning algorithms, you will come across Jupyter Notebook. To maximize efficiency you can integrate its concept with VS Code. As this requires some understanding on how to set up a Python environment this article shall provide an introduction. There a few reasons why it makes sense to develop … Read more

Real-time Mobile Video Object Detection using Tensorflow

Full-stack Data Science A step-by-step guide to adding object detection to your next mobile app Photo by GeoHey With the increasing interests in computer vision use cases like self-driving cars, face recognition, intelligent transportation systems and etc. people are looking to build custom machine learning models to detect and identify specific objects. However, building a … Read more

Detect and respond to high-risk threats in your logs with Google CloudDetect and respond to high-risk threats in your logs with Google CloudProduct ManagerProduct Marketing Manager

Editor’s Note: This the fourth blog and video in our six-part series on how to use Cloud Security Command Center. There are links to the three previous blogs and videos at the end of this post.  Data breaches aren’t only getting more frequent, they’re getting more expensive. With regulatory and compliance fines, and business resources … Read more

Multi-Label Image Classification with Neural Network | Keras

The only challenge in multi-label classification is data imbalance. And we can not simply use sampling techniques as we can in multi-class classification. Data imbalance is a well-known problem in Machine Learning. Where some classes in the dataset are more frequent than others, and the neural net just learns to predict the frequent classes. For … Read more

Why Data Scientists & Researchers Need To Understand Product Management

A field of Anemones, Ektar 100, Ori Cohen. Reasons for learning product management as a data science researcher. As researchers, our primary job is to understand data. Understanding data can mean a plethora of different things, such as data analysis, feature engineering, algorithm development, model explainability or interpretability, result & error-analysis, etc. Working closely with … Read more

Making a Game for Kids to Learn English and Have Fun with Python

There are a few techniques and then you can learn and create your own game. Game Initialization import pygame# Game Initpygame.init()win = pygame.display.set_mode((640, 480))pygame.display.set_caption(“KidsWord presented by cyda”)run = Truewhile run:pygame.time.delay(100)for event in pygame.event.get():if event.type == pygame.QUIT:run = Falsepygame.display.update()pygame.quit() To start the game, we need a game window. There are two things to set. Window Size … Read more

Pandas Hacks: read_clipboard()

Well, it’s not a hack, but it saves you a lot of time. We’ve all been there: we’re reading an interesting piece of data on Kaggle, StackOverflow, or some obscure website on the second page of Google (yikes!), and it had enough shimmer to pique our curiosity to lure us to playing with it, because … Read more

EARL London 2019 Conference Recap

I had an awesome time at the Enterprise Applications of the R Language (EARL) Conference held in London in September, 2019. EARL reminded me that it is good to keep showing up at conferences. I entered and the first thing I heard was organisers at the table welcoming me “Damian is that you? Awesome to … Read more

Categories R Tags ExcerptFavorite

Azure Cosmos DB recommendations keep you on the right track

The tech world is fast-paced, and cloud services like Azure Cosmos DB get frequent updates with new features, capabilities, and improvements. It’s important—but also challenging—to keep up with the latest performance and security updates and assess whether they apply to your applications. To make it easier, we’ve introduced automatic and tailored recommendations for all Azure … Read more

A lightweight machine learning architecture for IoT streams

Running machine learning models on high-frequency streaming data doesn’t have to cost a fortune. By thinking about our real-time requirements we can design efficient architectures that scale more effortlessly. For the past year and a half, me and my team have been trying to predict the movements of buses along public roads and forecast their … Read more

Meetup Recap: Survey and Measure Development in R

[This article was first published on George J. Mount, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Have you ever taken a survey at the doctor or for … Read more

Categories R Tags ExcerptFavorite

Built-in Jupyter notebooks in Azure Cosmos DB are now available

Earlier this year, we announced a preview of built-in Jupyter notebooks for Azure Cosmos DB. These notebooks, running inside Azure Cosmos DB, are now available. Cosmic notebooks are available for all data models and APIs including Cassandra, MongoDB, SQL (Core), Gremlin, and Spark to enhance the developer experience in Azure Cosmos DB. These notebooks are … Read more

Windows Virtual Desktop is now generally available worldwide

Since we announced the preview of Windows Virtual Desktop in March, thousands of customers have piloted the service, providing valuable feedback and insights for Microsoft to integrate into the service. Today, we are excited to announce the worldwide general availability of Windows Virtual Desktop. It is the only service that delivers simplified management, a multi-session … Read more

How to Manage Machine Learning Products — Part 1

Part I: Why is managing machine learning products so hard? And why should you care? Summary: here’s what I want you to remember about this series of articles: Managing ML products is more challenging than managing normal software products because it involves more uncertainties and requires not only technical but also organizational changes. ML is … Read more

Image denoising by MCMC

Code: The purpose of the code is to recover the original image from the corrupted image. Corrupted image import numpy as npimport cv2 import random import scipyfrom scipy.spatial import distancefrom scipy.stats import multivariate_normalimport pandas as pdfrom PIL import Imagedata = Image.open(‘noise_img.png’)image = np.asarray(data).astype(np.float32) The image was imported and saved in the form of a 2-D … Read more

Elasticsearch meets BERT: Building Search Engine with Elasticsearch and BERT

In this post, we use a pre-trained BERT model and Elasticsearch to build a search engine. Elasticsearch has recently released text similarity search with vector fields. On the other hand, you can convert text into a fixed-length vector using BERT. So once we convert documents into vectors by BERT and store them into Elasticsearch, we … Read more

Starbucks Offer Optimisation

Promotional offers are quite prevalent these days. Almost every corporate house that sells consumer products runs some kind of promotional offers -be it due to increased competition, or to expand the customer base or to generate more revenue. Since there is a cost associated with sending these offers, it is of utmost importance to maximize … Read more

Cleaning Anomalies to Reduce Forecast Error by 9% with anomalize

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this tutorial, we’ll show how we used clean_anomalies() from the anomalize package … Read more

Categories R Tags ExcerptFavorite

Design Principles for Big Data Performance

The evolution of the technologies in Big Data in the last 20 years has presented a history of battles with growing data volume. The challenge of big data has not been solved yet, and the effort will certainly continue, with the data volume continuing to grow in the coming years. The original relational database system … Read more

Fall & Winter Workshop Roundup

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Join RStudio at one of our Fall and Winter workshops! We’ll be … Read more

Categories R Tags ExcerptFavorite

Understanding Bootstrap Confidence Interval Output from the R boot Package

Nuances of Bootstrapping Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a … Read more

Categories R Tags ExcerptFavorite

More models, more features: what’s new in ‘parameters’ 0.2.0

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The easystats project continues to grow, expanding its capabilities and features, … Read more

Categories R Tags ExcerptFavorite

If Software is Eating the World

What’s Alexa got to do with it? With the advent of Alexa, Google Assistant, Siri, and Alibaba and Baidu killing it in smart speaker adoption in China, consumer voice AI is eating the world, but to what end? Case in point, Alexa echo devices don’t make much of a profit for Amazon on hardware sales. … Read more

bamlss: A Lego Toolbox for Flexible Bayesian Regression

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Modular R tools for Bayesian regression are provided by bamlss: From classic … Read more

Categories R Tags ExcerptFavorite

A Brief Introduction to Supervised Learning

Supervised learning is the most common subbranch of machine learning today. Typically, new machine learning practitioners will begin their journey with supervised learning algorithms. Therefore, the first of this three post series will be about supervised learning. Supervised machine learning algorithms are designed to learn by example. The name “supervised” learning originates from the idea … Read more

Visualizing the speeches of world leaders at UNGA

Emmanuel Macron, President of France Addressing world leaders at the UN General Assembly’s annual high-level debate on Tuesday, French President Emmanuel Macron called for courage, and for politicians to take the risks needed to achieve real solutions to contemporary challenges. Full text here. President’s speech was having polarity of 0.08 which highlights it was neutral … Read more

Getting started with {golem}

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A little blog post about where to look if you want to get … Read more

Categories R Tags ExcerptFavorite

Build a Realtime Object Detection Web App in 30 Minutes

Image Credit: https://github.com/tensorflow/models/tree/master/research/object_detection Tensorflow.js is an open-source library enabling us to define, train and run machine learning models in the browser, using Javascript. I will use the Tensorflow.js framework in Angular to build a Web App that detects multiple objects on a webcam video feed. First, we have to select the pre-trained model which we … Read more

How to Install PySpark on a remote machine

The easy way. Spark, Wikipedia. Running PySpark on your remote machine, and using it from within Jupyter or python requires a bit of installation and playing around in your shell. The following method worked for me, I was able to install PySpark and run the demo code from inside Jupyter Lab. So lets begin. Install … Read more

Data Science and Politics

Best Practices for Statistical Ethics This is not a situation unique to government agencies. Many companies and organizations face these challenge. How should those of us empowered with data act in order to reinforce good practices and ethics? One place to start is the American Statistical Associations’ ethical guidelines. Not surprisingly these include: Choosing methods … Read more

Lessons Learned From The Front Line of Analytics

With the enormous amount of data that the world is currently collecting alongside the proliferation of AI, Machine Learning, and “Big Data” methodologies especially in the last several years, there have been many data roles that have been invented to use these data and methods to bring real-world values. There are many skillsets that are … Read more