A R(API)D assessment of travel carbon emissions around the world

As a climate-conscious consumer, I’ve often wondered about the environmental impact of my routine travel from the East Coast to Salt Lake City and back. After seeing one of many recent headlines highlighting air travel’s surprisingly high rates of carbon emissions, I wondered: would taking a train, bus, or driving a car be better for … Read moreA R(API)D assessment of travel carbon emissions around the world

You Say You Want a (Data) Revolution

Lessons learned making data a first-class citizen in enterprise Photo by Paul Skorupskas on Unsplash When subscription-based business models started to be all the rage, companies began to realize that being customer-centric is critical for survival — and success. At Gainsight, which helped create and champion the customer success category and where I was the … Read moreYou Say You Want a (Data) Revolution

Calibration Techniques of Machine Learning Models

CALIBRATION is a post-processing technique to improve error distribution of a predictive model. The evaluation of machine learning (ML) models is a crucial step before deployment. It is essential to assess how well a model will behave for every single case. In many real applications, along with mean error of the model, it is also … Read moreCalibration Techniques of Machine Learning Models

Algorithmic Beauty: An Introduction to Cellular Automata

An overview of simple algorithms that generate complex, life-like results. The famous rule 30; capable of generating pseudo-random numbers from simple/deterministic rules. Rule 30 was discovered by Stephan Wolfram in ’83. Cellular Automata (CA) are simultaneously one of the simplest and most fascinating ideas I’ve ever encountered. In this post I’ll go over some famous … Read moreAlgorithmic Beauty: An Introduction to Cellular Automata

AWS Direct Connect Support for AWS Transit Gateway is now Available in Asia Pacific (Sydney) Region

AWS Direct Connect gateway allows you to access any AWS Region (except China) using your AWS Direct Connect connections. You can associate up to three Transit Gateways from any AWS Region with each Direct Connect gateway. AWS Direct Connect is introducing a new type of virtual interface called the transit virtual interface to support connectivity … Read moreAWS Direct Connect Support for AWS Transit Gateway is now Available in Asia Pacific (Sydney) Region

How to Start a Data Science Project That Will Help You Stand Out

A practical way of looking for impactful projects to break into the field Someday in the future, I see myself digging into messy data to find answers and discover important strategic insights for a company. For many aspiring Data Analysts like me, the road to that destination is under construction. Knowing that you are competing … Read moreHow to Start a Data Science Project That Will Help You Stand Out

Give some semantic love to your keyword search!

Image source: https://ebiquity.umbc.edu At first, search engines (Google, Bing, Yahoo, etc.) were lexical: the search engine looked for literal matches of the query words, without an understanding of the query’s meaning and only returning links that contained the exact query. But, with the advent of machine learning and new techniques in the field of Natural … Read moreGive some semantic love to your keyword search!

How to Learn Data Science for Free

Technical skills The first part of the curriculum will focus on technical skills. I recommend learning these first so that you can take a practical first approach rather than say learning the mathematical theory first. Python is by far the most widely used programming language used for data science. In the Kaggle Machine Learning and … Read moreHow to Learn Data Science for Free

Semantic segmentation : visualization of learning progress by TensorBoard

Building and training of neural networks is not a straightforward process unless you play with the MNIST dataset, kind-of “Hello world” application in the deep learning world. It is very easy to commit a mistake and spend days wondering why the network does not have a performance you expected. Normally, deep learning libraries have some … Read moreSemantic segmentation : visualization of learning progress by TensorBoard

AI is “Smarter” than you, you’re“Better” than it, but you’re maybe not “Smart” enough to know it.

Photo by Rock’n Roll Monkey on Unsplash Provocative Controversy is the best. It’s this neat thing we do internally by feeling some logical truth which translates into some emotional sentiment and we call it “things” and make T-shirts out of it. Brains are funny little squishy meatballs, capable of imagining things bigger than our universe … Read moreAI is “Smarter” than you, you’re“Better” than it, but you’re maybe not “Smart” enough to know it.

3 Essential Python Skills for Data Scientists

Lambda functions are just so powerful. Yeah, you won’t use them when you have to clean multiple columns the same way — but that’s not something that happened to me very often — more often than not, each attribute will require its own logic behind cleaning. Lambda functions allow you to create ‘anonymous’ functions. This … Read more3 Essential Python Skills for Data Scientists

Working with VSCode and Jupyter Notebook Style

If you are getting started with machine learning algorithms, you will come across Jupyter Notebook. To maximize efficiency you can integrate its concept with VS Code. As this requires some understanding on how to set up a Python environment this article shall provide an introduction. There a few reasons why it makes sense to develop … Read moreWorking with VSCode and Jupyter Notebook Style

Real-time Mobile Video Object Detection using Tensorflow

Full-stack Data Science A step-by-step guide to adding object detection to your next mobile app Photo by GeoHey With the increasing interests in computer vision use cases like self-driving cars, face recognition, intelligent transportation systems and etc. people are looking to build custom machine learning models to detect and identify specific objects. However, building a … Read moreReal-time Mobile Video Object Detection using Tensorflow

Detect and respond to high-risk threats in your logs with Google CloudDetect and respond to high-risk threats in your logs with Google CloudProduct ManagerProduct Marketing Manager

Editor’s Note: This the fourth blog and video in our six-part series on how to use Cloud Security Command Center. There are links to the three previous blogs and videos at the end of this post.  Data breaches aren’t only getting more frequent, they’re getting more expensive. With regulatory and compliance fines, and business resources … Read moreDetect and respond to high-risk threats in your logs with Google CloudDetect and respond to high-risk threats in your logs with Google CloudProduct ManagerProduct Marketing Manager

Multi-Label Image Classification with Neural Network | Keras

The only challenge in multi-label classification is data imbalance. And we can not simply use sampling techniques as we can in multi-class classification. Data imbalance is a well-known problem in Machine Learning. Where some classes in the dataset are more frequent than others, and the neural net just learns to predict the frequent classes. For … Read moreMulti-Label Image Classification with Neural Network | Keras

Why Data Scientists & Researchers Need To Understand Product Management

A field of Anemones, Ektar 100, Ori Cohen. Reasons for learning product management as a data science researcher. As researchers, our primary job is to understand data. Understanding data can mean a plethora of different things, such as data analysis, feature engineering, algorithm development, model explainability or interpretability, result & error-analysis, etc. Working closely with … Read moreWhy Data Scientists & Researchers Need To Understand Product Management

Making a Game for Kids to Learn English and Have Fun with Python

There are a few techniques and then you can learn and create your own game. Game Initialization import pygame# Game Initpygame.init()win = pygame.display.set_mode((640, 480))pygame.display.set_caption(“KidsWord presented by cyda”)run = Truewhile run:pygame.time.delay(100)for event in pygame.event.get():if event.type == pygame.QUIT:run = Falsepygame.display.update()pygame.quit() To start the game, we need a game window. There are two things to set. Window Size … Read moreMaking a Game for Kids to Learn English and Have Fun with Python

Azure Cosmos DB recommendations keep you on the right track

The tech world is fast-paced, and cloud services like Azure Cosmos DB get frequent updates with new features, capabilities, and improvements. It’s important—but also challenging—to keep up with the latest performance and security updates and assess whether they apply to your applications. To make it easier, we’ve introduced automatic and tailored recommendations for all Azure … Read moreAzure Cosmos DB recommendations keep you on the right track

A lightweight machine learning architecture for IoT streams

Running machine learning models on high-frequency streaming data doesn’t have to cost a fortune. By thinking about our real-time requirements we can design efficient architectures that scale more effortlessly. For the past year and a half, me and my team have been trying to predict the movements of buses along public roads and forecast their … Read moreA lightweight machine learning architecture for IoT streams

Built-in Jupyter notebooks in Azure Cosmos DB are now available

Earlier this year, we announced a preview of built-in Jupyter notebooks for Azure Cosmos DB. These notebooks, running inside Azure Cosmos DB, are now available. Cosmic notebooks are available for all data models and APIs including Cassandra, MongoDB, SQL (Core), Gremlin, and Spark to enhance the developer experience in Azure Cosmos DB. These notebooks are … Read moreBuilt-in Jupyter notebooks in Azure Cosmos DB are now available

Windows Virtual Desktop is now generally available worldwide

Since we announced the preview of Windows Virtual Desktop in March, thousands of customers have piloted the service, providing valuable feedback and insights for Microsoft to integrate into the service. Today, we are excited to announce the worldwide general availability of Windows Virtual Desktop. It is the only service that delivers simplified management, a multi-session … Read moreWindows Virtual Desktop is now generally available worldwide

How to Manage Machine Learning Products — Part 1

Part I: Why is managing machine learning products so hard? And why should you care? Summary: here’s what I want you to remember about this series of articles: Managing ML products is more challenging than managing normal software products because it involves more uncertainties and requires not only technical but also organizational changes. ML is … Read moreHow to Manage Machine Learning Products — Part 1

Image denoising by MCMC

Code: The purpose of the code is to recover the original image from the corrupted image. Corrupted image import numpy as npimport cv2 import random import scipyfrom scipy.spatial import distancefrom scipy.stats import multivariate_normalimport pandas as pdfrom PIL import Imagedata = Image.open(‘noise_img.png’)image = np.asarray(data).astype(np.float32) The image was imported and saved in the form of a 2-D … Read moreImage denoising by MCMC

Elasticsearch meets BERT: Building Search Engine with Elasticsearch and BERT

In this post, we use a pre-trained BERT model and Elasticsearch to build a search engine. Elasticsearch has recently released text similarity search with vector fields. On the other hand, you can convert text into a fixed-length vector using BERT. So once we convert documents into vectors by BERT and store them into Elasticsearch, we … Read moreElasticsearch meets BERT: Building Search Engine with Elasticsearch and BERT

Create An API To Deploy Machine Learning Models Using Flask and Heroku

For easier deploying on Heroku later, you’ll want to create a github repository for this project and clone it for local use. To create a new repository, click on your profile icon in the top right corner, click repositories, and then click new. Give your repository a name, initialize the repository with a README and … Read moreCreate An API To Deploy Machine Learning Models Using Flask and Heroku

Starbucks Offer Optimisation

Promotional offers are quite prevalent these days. Almost every corporate house that sells consumer products runs some kind of promotional offers -be it due to increased competition, or to expand the customer base or to generate more revenue. Since there is a cost associated with sending these offers, it is of utmost importance to maximize … Read moreStarbucks Offer Optimisation

Cleaning Anomalies to Reduce Forecast Error by 9% with anomalize

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this tutorial, we’ll show how we used clean_anomalies() from the anomalize package … Read moreCleaning Anomalies to Reduce Forecast Error by 9% with anomalize

Dates, Times, Calendars— The Universal Source of Data Science Trauma

With the high points of sanity (what little there is) out of the way, let’s slowly descend into madness. This section is stuff we have the most control over. It’s stuff we, or our colleagues, build. It’s pain of our own making. For the most part, we can minimize the damage when we understand the … Read moreDates, Times, Calendars— The Universal Source of Data Science Trauma

Design Principles for Big Data Performance

The evolution of the technologies in Big Data in the last 20 years has presented a history of battles with growing data volume. The challenge of big data has not been solved yet, and the effort will certainly continue, with the data volume continuing to grow in the coming years. The original relational database system … Read moreDesign Principles for Big Data Performance

More models, more features: what’s new in ‘parameters’ 0.2.0

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The easystats project continues to grow, expanding its capabilities and features, … Read moreMore models, more features: what’s new in ‘parameters’ 0.2.0

Understanding Bootstrap Confidence Interval Output from the R boot Package

Nuances of Bootstrapping Most applied statisticians and data scientists understand that bootstrapping is a method that mimics repeated sampling by drawing some number of new samples (with replacement) from the original sample in order to perform inference. However, it can be difficult to understand output from the software that carries out the bootstrapping without a … Read moreUnderstanding Bootstrap Confidence Interval Output from the R boot Package

Working On a Databricks Cluster From A Remote Machine

Setting up Configuring the databricks-connect client will be pretty easy, you will need to accept the agreement, enter the url (including the https://), enter the token, enter the cluster ID and push enter twice to accept the default values for the Org ID and Port questions. Do you accept the above agreement? [y/N] ySet new … Read moreWorking On a Databricks Cluster From A Remote Machine

Why do we do and how can we benefit from experimental studies?

Who would win the 2018 presidential election if there were no fake news? How many new drivers would sign up in San Francisco if Uber had carried out the alternative incentive plan? Would employees be more efficient at work if companies encourage a 10-minute coffee break every two hours? These questions are difficult to answer … Read moreWhy do we do and how can we benefit from experimental studies?

bamlss: A Lego Toolbox for Flexible Bayesian Regression

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Modular R tools for Bayesian regression are provided by bamlss: From classic … Read morebamlss: A Lego Toolbox for Flexible Bayesian Regression

A Brief Introduction to Supervised Learning

Supervised learning is the most common subbranch of machine learning today. Typically, new machine learning practitioners will begin their journey with supervised learning algorithms. Therefore, the first of this three post series will be about supervised learning. Supervised machine learning algorithms are designed to learn by example. The name “supervised” learning originates from the idea … Read moreA Brief Introduction to Supervised Learning

Visualizing the speeches of world leaders at UNGA

Emmanuel Macron, President of France Addressing world leaders at the UN General Assembly’s annual high-level debate on Tuesday, French President Emmanuel Macron called for courage, and for politicians to take the risks needed to achieve real solutions to contemporary challenges. Full text here. President’s speech was having polarity of 0.08 which highlights it was neutral … Read moreVisualizing the speeches of world leaders at UNGA

Build a Realtime Object Detection Web App in 30 Minutes

Image Credit: https://github.com/tensorflow/models/tree/master/research/object_detection Tensorflow.js is an open-source library enabling us to define, train and run machine learning models in the browser, using Javascript. I will use the Tensorflow.js framework in Angular to build a Web App that detects multiple objects on a webcam video feed. First, we have to select the pre-trained model which we … Read moreBuild a Realtime Object Detection Web App in 30 Minutes

How to Install PySpark on a remote machine

The easy way. Spark, Wikipedia. Running PySpark on your remote machine, and using it from within Jupyter or python requires a bit of installation and playing around in your shell. The following method worked for me, I was able to install PySpark and run the demo code from inside Jupyter Lab. So lets begin. Install … Read moreHow to Install PySpark on a remote machine

The Simple Math behind 3 Decision Tree Splitting criterions

According to Wikipedia, Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. In simple terms, Gini impurity is the measure of impurity in a node. Its formula is: where J is … Read moreThe Simple Math behind 3 Decision Tree Splitting criterions

Python Tips and Trick, You Haven’t Already Seen

Note: This was originally posted at martinheinz.dev There are plenty of articles written about lots of cool features in Python such as variable unpacking, partial functions, enumerating iterables, but there is much more to talk about when it comes to Python, so here I will try to show some of the features I know and … Read morePython Tips and Trick, You Haven’t Already Seen

Data Science and Politics

Best Practices for Statistical Ethics This is not a situation unique to government agencies. Many companies and organizations face these challenge. How should those of us empowered with data act in order to reinforce good practices and ethics? One place to start is the American Statistical Associations’ ethical guidelines. Not surprisingly these include: Choosing methods … Read moreData Science and Politics

A Framework to Distribute Data Projects Across Teams

Next step is to set up the project and the environment in PyCharm. There are two possible scenarios: Starting a new project Cloning a project from Gitlab Setting up a new project with an existing environment is very straightforward in PyCharm, once you open the initial window you will see the option: + Create a … Read moreA Framework to Distribute Data Projects Across Teams

Lessons Learned From The Front Line of Analytics

With the enormous amount of data that the world is currently collecting alongside the proliferation of AI, Machine Learning, and “Big Data” methodologies especially in the last several years, there have been many data roles that have been invented to use these data and methods to bring real-world values. There are many skillsets that are … Read moreLessons Learned From The Front Line of Analytics