Getting started – Azure SQL Server Managed Instance

There are a lot of options for data scientists to store data in the Azure cloud. In this blog post I will cover the pros and cons of Azure SQL Server Managed Instance and will provide a few tips so you can hit the ground running if you decide to take it for a test … Read moreGetting started – Azure SQL Server Managed Instance

Industrial strength Natural Language Processing

I have spent much of my career as a graduate student researcher, and now as a Data Scientist in the industry. One thing I have come to realize is that a vast majority of solutions proposed both in academic research papers and in the workplace are just not meant to ship — they just don’t scale! And … Read moreIndustrial strength Natural Language Processing

Building a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Introduction In this series, I’ll explain how to create a chat bot that is capable of detecting sentiment, analyzing images, and finally having the basis of a evolving personality. This is part 1 of that series. The Pieces Ruby Sinatra Google Cloud APIs Line (a chat client) Since I live in Japan: I’ll be using … Read moreBuilding a Sentiment Detection Bot with Google Cloud, a Chat Client, and Ruby.

Debugging a Machine Learning model written in TensorFlow and Keras

Things that could go wrong, and how to diagnose if they did. Oct 24, 2018 In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of dumb things, so please don’t judge. Cheat sheet. The numbers refer to sections in this article (https://bit.ly/2PXpzRh) 1 … Read moreDebugging a Machine Learning model written in TensorFlow and Keras

Introduction to Linear Regression in Python

Basic concepts and mathematics There are two kinds of variables in a linear regression model: The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X. The output variable is the variable that we want to predict. It is commonly referred to … Read moreIntroduction to Linear Regression in Python

A line-by-line layman’s guide to Linear Regression using TensorFlow

Computing the Graph With generate_dataset() and linear_regression(), we are now ready to run the program and begin finding our optimal gradient W and bias b! [line 2, 3] x_batch, y_batch = generate_dataset()x, y, y_pred, loss = linear_regression() In this run() function, we start off by calling generate_dataset() and linear_regression() to get x_batch, y_batch, x, y, y_pred … Read moreA line-by-line layman’s guide to Linear Regression using TensorFlow

Perplexity Intuition (and Derivation)

The perplexity of a discrete probability distribution is defined as: from https://en.wikipedia.org/wiki/Perplexity where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events. In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of bits that we need to … Read morePerplexity Intuition (and Derivation)

Telling Apart AI and Humans: #1 Humans VS Androids

ALife 2018 conference, © Lana Sinapayen Prompted by a video where people thought a human was actually a hyper-realistic robot, I decided to write about how to spot humanoid robots. Here are a few tips! After spending so much time with Alter the android and various hyper-realistic robots, I know a thing or two about … Read moreTelling Apart AI and Humans: #1 Humans VS Androids

Neural Nets: From Linear Regression to Deep Nets

Neural networks, especially deep neural networks, have received a lot of attention over the last couple of years. They perform remarkably well on image and speech recognition and form the backbone of the technology used for self-driving cars. What many people find hard to believe is that the mathematics of neural networks have been around … Read moreNeural Nets: From Linear Regression to Deep Nets

SQL Server

Columnstore A columnstore index can provide a very high level of data compression, typically by 10 times, to significantly reduce your data warehouse storage cost. For analytics, a columnstore index offers an order of magnitude better performance than a btree index. Columnstore indexes are the preferred data storage format for data warehousing and analytics workloads. … Read moreSQL Server

The future of data visualization

Tools to shape the future In many product announcements from Google, Apple and BMW, more and more data will be overlaid in our physical environments through augmented reality or projection. That means not only will data be visualized more, but the visual reality around us will be turned into data. Data visualization of a new AR … Read moreThe future of data visualization

Waiting for Weekends: Some Insights on How to Select the Best Wine

There is a huge selection of wines on the market and as for a wine lover it is always a quest to select the best wine. US, France, Spain, Germany and many other wine countries with numerous varieties of wines are easily available in any liquor store. Price can also vary drastically. From my experience, … Read moreWaiting for Weekends: Some Insights on How to Select the Best Wine

The Best Public Datasets for Machine Learning

First, a couple of pointers to keep in mind when searching for datasets. According to Carnegie Mellon University: 1.- A high-quality dataset should not be messy, because you do not want to spend a lot of time cleaning data. 2.- A high-quality dataset should not have too many rows or columns, so it is easy … Read moreThe Best Public Datasets for Machine Learning