Doing Yelp Better —

Using NLP to provide quantitative metrics for restaurants. Welcome to Yelp Report Cards. In this project, if is my aim to produce a business value-add to Yelp’s current business owner services. Using NLTK’s Vader Sentiment Analysis and combining it with Genism’s LDA library, I created a quantitative report card for restaurants to determine areas in … Read moreDoing Yelp Better —

Cloud Run, a managed Knative service, is GACloud Run, a managed Knative service, is GADirector of Product Management, Google CloudVice President of Product & Design

We want to empower developers no matter where their businesses are in their cloud journey, whether that’s on-prem, operating in a managed Kubernetes environment, or running on a fully managed serverless computing platform. Today, we’re announcing that Cloud Run is generally available, helping developers focus on writing high-value code, regardless of where their organizations are … Read moreCloud Run, a managed Knative service, is GACloud Run, a managed Knative service, is GADirector of Product Management, Google CloudVice President of Product & Design

Shifting gears: How the cloud drives digital transformation in the automotive industryShifting gears: How the cloud drives digital transformation in the automotive industryManaging Director Manufacturing and Transportation, Google Cloud

It’s undeniable that technology has improved many facets of modern life. Transportation and mobility, however, continues to be an area where more can be done. For example, look no further than your daily commute. Research by INRIX shows that time spent in traffic has more than doubled in many major cities around the world since … Read moreShifting gears: How the cloud drives digital transformation in the automotive industryShifting gears: How the cloud drives digital transformation in the automotive industryManaging Director Manufacturing and Transportation, Google Cloud

How to Use Airflow without Headaches

Photo by Sebastian Herrmann on Unsplash Data pipelines and/or batch jobs that process and move data on a scheduled basis are well known to all us data folks. The de-facto standard tool to orchestrate all that is Apache Airflow. It is a platform to programmatically author, schedule, and monitor workflows. A workflow is a sequence … Read moreHow to Use Airflow without Headaches

Difference Between NFD, NFC, NFKD, and NFKC Explained with Python Code

The difference between Unicode normalization forms Photo by Joel Filipe on Unsplash Recently I am working on an NLP task in Japanese, one problem is to convert special characters to a normalized form. So I have done a little research and write this post for anyone who has the same need. Japanese contains different forms … Read moreDifference Between NFD, NFC, NFKD, and NFKC Explained with Python Code

My Favorite Machine-Learning Models ALL Data-Scientists Should Know

These are my subjective favorite models, and the ones that I find myself turning to most often. There is a tool for every job just as there is a model for every job, just as there are features for every job. With time comes the wonderful Data Science Superpower that is knowing the best way … Read moreMy Favorite Machine-Learning Models ALL Data-Scientists Should Know

A 7-step guide for effective survey recruitment on Reddit

A helpful how-to guide on how to go about conducting online surveys Making surveys can be deceptively easy. Platforms like Google Forms, Surveymonkey and university bought services like, Qualtrics makes surveys cheap, quick and easy to write. However, doing research online comes with its own potential methodological problems. And doing research on specific platforms come … Read moreA 7-step guide for effective survey recruitment on Reddit

What Online Poker Players Can Teach Us About AI

The Human Side of Things Successful Poker Player (Background) and Me (Foreground) Poker is considered a good challenge for AI, as it is seen as combination of mathematical/strategic play, and human intuition, especially about the strategies of others. I would consider the game a cross between the two extremes of technical vs. human skill: chess … Read moreWhat Online Poker Players Can Teach Us About AI

Forrester names Microsoft a leader in Wave report for Industrial IoT Software Platforms

As a company, we work every day to empower every person on the planet to achieve more. As part of that, we’re committed to investing in IoT and intelligent edge, two technology trends accelerating ubiquitous computing and bringing unparalleled opportunity for transformation across industries. We’ve been working hard to make our Azure IoT platform more … Read moreForrester names Microsoft a leader in Wave report for Industrial IoT Software Platforms

How to build globally distributed applications with Azure Cosmos DB and Pulumi

This post was co-authored by Mikhail Shilkov, Software Engineer, Pulumi. Pulumi is reinventing how people build modern cloud applications, with a unique platform that combines deep systems and infrastructure innovation with elegant programming models and developer tools. We live in amazing times when people and businesses on different continents can interact at the speed of … Read moreHow to build globally distributed applications with Azure Cosmos DB and Pulumi

Democratizing Smart City solutions with Azure IoT Central

One of the most dynamic landscapes embracing Internet of Things (IoT) is the modern city. As urbanization grows, city leaders are under increasing pressure to make cities safer, accessible, sustainable, and prosperous. Underlying all these important goals is the bedrock that makes a city run: infrastructure. Whether it be water, electricity, streets, traffic lights, cities … Read moreDemocratizing Smart City solutions with Azure IoT Central

Predict figure skating world championship ranking from season performances

Finally, the latent scores for the 2 factors across all skaters can be retrieved at the end in a pandas DataFrame, with the previously-stored skater names added back in as row index, and the factors (0 and 1) as columns: pd.DataFrame(skater_scores, index=skater_names)# 0 1# Alexander, MAJOROV 10.226857 3.122723# Javier, FERNANDEZ 16.009246 4.594796# Misha, GE 11.919684 … Read morePredict figure skating world championship ranking from season performances

Building Analytics and Data Lake Capabilities with Limited Data

Nothing ever exists entirely alone. Everything is in relation to everything else. Buddha A very common problem many organizations, especially startups, encounters is having little or no data. Unfortunately, operating blind and without data insights is no longer an option. Operating with data is simply table stakes; all companies leverage data to make strategic shifts … Read moreBuilding Analytics and Data Lake Capabilities with Limited Data

Random Search vs Grid Search for hyperparameter optimization

I’ll follow my jupyter notebook to make things easier to show. Feel free to either run it or implement the code on your own. Keep in mind that some code snippets use code implemented in previous snippets, therefore the order of occurrence matters. All mentioned files in this post are available in my GitHub. Check … Read moreRandom Search vs Grid Search for hyperparameter optimization

Resampling Methods for Unbalanced Datasets — Fraudulent Transactions

Suppose you are tasked with developing a simple machine learning algorithm, whether supervised (with a clear target variable) or unsupervised (with no predefined outcome variable).But looking at your data, it seems that one class dominates the other. In such a case, your model will have a hard time learning from your data to predict future … Read moreResampling Methods for Unbalanced Datasets — Fraudulent Transactions

How to add machine learning-powered text summarization to any project

Have you ever wondered how media organizations are able to produce the raw volume of content they output? How is that the Associated Press, in addition to all of their other coverage, is able to cover 4,400 quarterly earning reports each year? How does The Washington Post run such hyperlocal coverage — like covering every … Read moreHow to add machine learning-powered text summarization to any project

How Machine Learning Improves Marketing Automation

From Product Pricing to Content Research Image Source: UnSplash Even before the digital age, marketing professionals have been among the most eager and early adopters of emerging technologies. AI and machine learning are already popular tools in digital marketing. This evolution is likely to evolve and expand in the next several years as data multiplies, … Read moreHow Machine Learning Improves Marketing Automation

Automatic Machine Learning in Fraud Detection Using H2O AutoML

Machine Learning Automation in Finance Machine learning has many applications in finance such as security, process automation, loan/insurance underwriting, credit scoring, trading, etc. [1][2]. Financial fraud is one of the major concerns in financial security [1][2]. To fight the increasing risk of financial fraud, machine learning has been actively applied to fraud detection [3][4]. There … Read moreAutomatic Machine Learning in Fraud Detection Using H2O AutoML

AI, Social Data Science and the Climate Crisis

Bridging Social Science and Technology For our Planet There is still no Wikipedia explanation on Social Data Science, not that it would make it established as a field, but it is more of a side note in these beginnings. The last few days I have been considering how to put together a program that would … Read moreAI, Social Data Science and the Climate Crisis

workloopR: Analysis of work loops and other data from muscle physiology experiments in R

Studies of muscle physiology often rely on closed-source, proprietary software for not only recording data but also for data wrangling and analyses. Although specialized software might be necessary to record data from highly-specialized equipment, data wrangling and analyses should be free from this constraint. It’s becoming more common for researchers to provide code along with … Read moreworkloopR: Analysis of work loops and other data from muscle physiology experiments in R

Easily deploy SQL Server Always On solutions using the AWS Launch Wizard for SQL Server

AWS Launch Wizard for SQL Server reduces the time it takes to deploy SQL Server Always On solutions on Amazon EC2, accelerating your journey to the cloud. You simply input your SQL Server requirements including performance, number of nodes, and connectivity on the service console, and AWS Launch Wizard identifies the right AWS resources to … Read moreEasily deploy SQL Server Always On solutions using the AWS Launch Wizard for SQL Server

Exploiting Multi-Categorical Features Using Deep Interest

How can your deep learning model get the most out of features with varying length? Originally published at Taboola Engineering Blog on September 4, 2019. At Taboola, our goal is to predict whether users will click on the ads we present to them. Our models use all kinds of features, yet the most important ones … Read moreExploiting Multi-Categorical Features Using Deep Interest

Review: Andrew Ng’s Machine Learning Course

Auditory learner? Listen to the post instead! Stanford’s Machine Learning course taught by Andrew Ng was released in 2011. 8 years after publication, Andrew Ng’s course is still ranked as one of the top machine learning courses. This has become a staple course of Coursera and, to be honest, in machine learning. As of this … Read moreReview: Andrew Ng’s Machine Learning Course

Generating Modern Arts using Generative Adversarial Network(GAN) on Spell

After initializing both generator and discriminator model, let’s write a helper function to save the image after some iteration. def save_images(cnt, noise):image_array = np.full((PREVIEW_MARGIN + (PREVIEW_ROWS * (IMAGE_SIZE + PREVIEW_MARGIN)),PREVIEW_MARGIN + (PREVIEW_COLS * (IMAGE_SIZE + PREVIEW_MARGIN)), 3),255, dtype=np.uint8)generated_images = generator.predict(noise)generated_images = 0.5 * generated_images + 0.5image_count = 0for row in range(PREVIEW_ROWS):for col in range(PREVIEW_COLS):r = … Read moreGenerating Modern Arts using
Generative Adversarial Network(GAN) on Spell

“Survey Says…” — Why Information Based on Surveys Is Not Always Trustworthy

There are good reasons why you need to think twice about survey results Photo: bruce mars/Unsplash As much as I love words, I love numbers. I’m always fascinated by the stories and pictures behind statistics. I am particularly interested in surveys — maybe because I am a big fan of Family Feud. A survey is … Read more“Survey Says…” — Why Information Based on Surveys Is Not Always Trustworthy

Vector Autoregressions & Vector Error Correction Multivariate Model

VECTOR autoregressions (VAR) integrated model comprises multiple time series and is quite a useful tool for forecasting. It can be considered an extension of the autoregressive (AR part of ARIMA) model. VAR model involves multiple independent variables and therefore has more than one equations. Each equation uses as its explanatory variables lags of all the … Read moreVector Autoregressions & Vector Error Correction Multivariate Model

Announcing Network Intelligence Center—towards proactive network operationsAnnouncing Network Intelligence Center—towards proactive network operationsVP of Product Management – Networking

The vision for intelligent and predictive network operations Adoption of hybrid and multi-cloud is absolutely critical for organizations to remain agile. However, this underscores the need for intelligent and continuous network operations—the promise that the network is doing what it needs to do, in line with business intent. For example, if you have global operations, … Read moreAnnouncing Network Intelligence Center—towards proactive network operationsAnnouncing Network Intelligence Center—towards proactive network operationsVP of Product Management – Networking

Convolution Vs Correlation

Convolutional Neural Networks which are the backbones of most of the Computer Vision Applications like Self-Driving Cars, Facial Recognition Systems etc are a special kind of Neural Network architectures in which the basic matrix-multiplication operation is replaced by a convolution operation. They specialize in processing data which has a grid-like topology. Examples include time-series data … Read moreConvolution Vs Correlation

Null Hypothesis and the P-Value

Originally published on my personal blog. When you’re starting your machine learning journey, you’ll come across null hypothesis and the p-value. At a certain point in your journey, it becomes quite important to know what these mean to make meaningful decisions while designing your machine learning models. So in this post, I’ll try to explain … Read moreNull Hypothesis and the P-Value

I Bought a New Computer Just To Try Out CUDA, Was it Worth it?

The CUDA setup is actually quite extensive, as it isn’t as simple as just setting up your drivers. Firstly, you have to pop into your terminal and install the dependencies: sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev Then you have to go to the “ Cuda Zone” and grab yourself some binaries… Or if you’re … Read moreI Bought a New Computer Just To Try Out CUDA, Was it Worth it?

Improving observability of your Kubernetes deployments with Azure Monitor for containers

Over the past few years, we’ve seen significant changes in how an application is thought of and developed, especially with the adoption of containers and the move from traditional monolithic applications to microservices applications. This shift also affects how we think about modern application monitoring, now with greater adoption of open source technologies and the … Read moreImproving observability of your Kubernetes deployments with Azure Monitor for containers

Machine Learning in R: Start with an End-to-End Test

[This article was first published on R – David’s blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As a data scientist, you will likely be asked one … Read moreMachine Learning in R: Start with an End-to-End Test

Azure Container Registry: preview of repository-scoped permissions

The Azure Container Registry (ACR) team is rolling out the preview of repository scoped role-based access control (RBAC) permissions, our top-voted item on UserVoice. In this release, we have a command-line interface (CLI) experience for you to try and provide feedback. ACR already supports several authentication options using identities that have role-based access to an entire registry. However, for … Read moreAzure Container Registry: preview of repository-scoped permissions

Python Input, Output and Import

In this tutorial let us understand the Input and Output built-in-functions used in python, also we will learn how to import libraries and use them in our programs. Image Credits: Data Flair Before getting started let us understand what are built-in-functions? Any function that is provided as part of a high-level language and can be … Read morePython Input, Output and Import

“OK Boomer” escalated quickly — a reddit+BigQuery report

You can now play with the interactive dashboard, to find all sorts of patterns within these comments: Play with the interactive report, or load it full size. I used two different sources of data: To extract all of the historical reddit comments, I used this query: CREATE TABLE `reddit_extracts.201906_all_okboomer`PARTITION BY fake_dateCLUSTER BY subreddit, tsASSELECT TIMESTAMP_SECONDS(created_utc) … Read more“OK Boomer” escalated quickly — a reddit+BigQuery report

And The Star of the Show is — PYTHON

The overall contributions to the open-source projects are seen from all the continents and Asia is on the top with most of its contributions coming from China. The below graph shows us contributions from different continents. The top 50 packages in each language ecosystem have a massive amount of dependent projects. The top npm packages … Read moreAnd The Star of the Show is — PYTHON

How To Compute Satellite Image Statistics And use It In Pandas

The Sentinel 2 image of the area( only Band 3) is shown below. Let us also read the buildings table which we will use to store the statistical summaries derived from the satellite image. Please know that you can use other polygons, like districts, rectangular grids instead of the building polygons for this example. We … Read moreHow To Compute Satellite Image Statistics And use It In Pandas

Explicit AUC maximization

How to explicitly optimize for maximum area under ROC Photo by André Sanano on Unsplash I was getting started on “IEEE-CIS Fraud Detection” Kaggle competition, and something caught my eye: The fact that the results are evaluated based on AUC makes sense for fraud detection tasks for several reasons: The data sets are often unbalanced, … Read moreExplicit AUC maximization