Improving the Stack Overflow search algorithm using Semantic Search and NLP

4. Training Word Embeddings using Word2Vec In order for our model to understand the raw text data, we need to vectorize it. Bag of Words and TF-IDF are very common approaches for vectorizing. However, since I would be using an artificial neural network as my model(LSTM), the sparse nature of BOW and TFIDF would pose … Read moreImproving the Stack Overflow search algorithm using Semantic Search and NLP

Autonomous Driving: Intro into SLAM

SLAM is the process where a robot/vehicle builds a global map of their current environment and uses this map to navigate or deduce its location at any point in time [1–3]. Use of SLAM is commonly found in autonomous navigation, especially to assist navigation in areas global positioning systems (GPS) fail or previously unseen areas. … Read moreAutonomous Driving: Intro into SLAM

Searching for Food Deserts in Los Angeles County

img source: robrogers.com For a recent data science project, I collaborated with several other Lambda School students to search for food deserts in L.A. County. A general definition for what qualifies as a food desert is an area that does not have access, within one mile, to a grocery store/market providing fresh, healthy food options, … Read moreSearching for Food Deserts in Los Angeles County

A Detailed, Step-by-Step Guide to Linear Regression using MATLAB

Prediction of Housing Prices The aim is to obtain statistical inference from the given data in the paper of “(1977) Narula and Wellington, Prediction, Linear Regression and the Minimum Sum of Relative Errors, Technometrics” by using linear regression technique for prediction purposes. In the data, 28 data are given for each predictor (11 different predictors) … Read moreA Detailed, Step-by-Step Guide to Linear Regression using MATLAB

Data Science in Production

Source: https://pixabay.com/photos/factory-industry-sugar-3713310/ Building Scalable Model Pipelines with Python One of my biggest regrets as a data scientist is that I avoided learning Python for too long. I always figured that other languages provided parity in terms of accomplishing data science tasks, but now that I’ve made the leap to Python there is no looking back. … Read moreData Science in Production

5 Tips To Create A More Reliable Web Crawler

To Boost your web crawler’s efficiency! When I am crawling websites, web crawlers being blocked by websites could be described as the most annoying situation. To become really great in web crawling, you not only should be able to write the xpath or css selectors quickly but also how you design your crawlers matters a … Read more5 Tips To Create A More Reliable Web Crawler

Sentiment Analysis of Economic Reports Using Logistic Regression

Sentiment analysis is a hot topic in NLP, but this technology is increasingly relevant in the financial markets — which is in large part driven by investor sentiment. With so many reports and economic bulletins being generated on a daily basis, one of the big challenges for policymakers is to extract meaningful information in a … Read moreSentiment Analysis of Economic Reports Using Logistic Regression

Why You Should Double Down On Serverless Infrastructure

When you double-down on serverless architecture you begin to reap amazing rewards. Serverless has been around for a few years. It is not a brand new idea, but it is a new way of thinking about building applications. I always tend to think about why I am doing something before I think about how I … Read moreWhy You Should Double Down On Serverless Infrastructure

Regression — explained in simple terms!!

In this article, I wish to put forth regression in as simple terms as possible so that you do not remember it as a statistical concept, rather as a more relatable experience. Regression — as fancy as it sounds can be thought of as “relationship” between any two things. For example, imagine you stay on … Read moreRegression — explained in simple terms!!

Scholarly Network Analysis

References [1] Feng Xia, Wei Wang, Teshome Megersa Bekele, and Huan Liu. Big scholarly data: A survey.IEEE Transactions on BigData, 3(1):18–35, 2017. [2] Tze-Haw Huang and Mao Lin Huang. Analysis and visualization of co-authorship networks for understanding academic collaboration and knowledge domain of individual researchers. In Computer Graphics, Imaging and Visualisation, 2006 International Conference on, … Read moreScholarly Network Analysis

Dynamic Speed Optimization

REGRESSION MODELING Modeling Ship Performance Curves to Reduce Fuel Consumption Container Ships Can Consume Over 350 Tons of Fuel Per Day, Photo by Anker Crew Insurance Total fuel costs for the global commercial maritime shipping industry were approximately $100 billion in 2018. Emissions regulations, imposed by the International Maritime Organization, are expected to increase fuel … Read moreDynamic Speed Optimization

Bayesian Strategy for Modeling Retail Price with PyStan

Statistical modeling, partial pooling, Multilevel modeling, hierarchical modeling Pricing is a common problem faced by any e-commerce business, and one that can be addressed effectively by Bayesian statistical methods. The Mercari Price Suggestion data set from Kaggle seems to be a good candidate for the Bayesian models I wanted to learn. If you remember, the … Read moreBayesian Strategy for Modeling Retail Price with PyStan

Tune: fast hyperparameter tuning at any scale

Let’s now dive into a concrete example that shows how you to leverage a state-of-the-art early stopping algorithm (ASHA). We will start by running Tune across all of the cores on your workstation. We’ll then scale out the same experiment on the cloud with about 10 lines of code. We’ll be using PyTorch in this … Read moreTune: fast hyperparameter tuning at any scale

NIPS 2018 paper on “Robust Classification of Financial Risk” — Summary

In this short article, I would like to give an overview of a research paper called “Robust Classification of Financial Risk”. The paper was accepted for NIPS 2018 Workshop on “Challenges and Opportunities for AI in Financial Services” and aims to solve very interesting and unique problem occurring in credit lending done with deep learning … Read moreNIPS 2018 paper on “Robust Classification of Financial Risk” — Summary

Local Model Interpretation: An Introduction

Concept and Theory Lime, Local Interpretable Model-Agnostic, is a local model interpretation technique using Local surrogate models to approximate the predictions of the underlying black-box model. Local surrogate models are interpretable models like Linear Regression or a Decision Trees that are used to explain individual predictions of a black-box model. Lime trains a surrogate model … Read moreLocal Model Interpretation: An Introduction

Measures of Proximity in Data Mining & Machine Learning

Moving forward, we are going to talk about Similarity and Dissimilarity between data objects separately. Without further ado, let’s dive into it. Dissimilarities between Data Objects We begin with discussion about distances, which dissimilarities with certain properties. Euclidean Distance The Euclidean distance, d, between two points, x and y, in one, two, three, or higher- … Read moreMeasures of Proximity in Data Mining & Machine Learning

Converting a Deep Learning Model with Multiple Outputs from PyTorch to TensorFlow

Generating and preparing the data The main difference in the data is that there are now 2 different sets of actual outputs, 1 as a continuous variable and the other in binary form. Also, I defined two functions to generate two different types of outputs for the data. The snippet below illustrates the process of … Read moreConverting a Deep Learning Model with Multiple Outputs from PyTorch to TensorFlow

Missing Values In Dataframes With Inspectdf

[This article was first published on Alastair Rushworth, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Summarising NA by column in dataframes Exploring the number of records containing … Read moreMissing Values In Dataframes With Inspectdf

How Markets Fool the Models, and Us

Curve Fit Market Data with Caution Data science used to be the domain of statisticians, scientists and Wall Street quants, but thanks to the ubiquity of data and open source libraries, all of us can now develop powerful, predictive models. Of course, these models also have the power to breed overconfidence, especially in the stock … Read moreHow Markets Fool the Models, and Us

Probability of an Approaching AI Winter

This article addresses the question of whether the field of Artificial Intelligence (AI) is approaching another AI winter or not. Motivation Both industries and governments alike have invested significantly in the AI field, with many AI-related startups established in the last 5 years. If another AI winter were to come about many people could lose … Read moreProbability of an Approaching AI Winter

Sugar, Flower, Fish or Gravel — Now a Kaggle competition

I am very happy to announce the launch of our Kaggle competition “Understanding Clouds from Satellite Images”. This competition is the culmination of literally hundreds of hours of human labor from dozens of scientists. The challenge is to segment satellite images into one of four classes. Typically, when we think about different cloud types we … Read moreSugar, Flower, Fish or Gravel — Now a Kaggle competition

6 lessons learned as a new data science lead

Photo credit: https://unsplash.com/photos/RXWgx93tz8w If you have worked in a data science team already, probably you are not entirely unfamiliar with uncertainty. Most probably you have worked on some greenfield projects in your past. Maybe you have even led some of them. And some of them might have succeeded, while some others might not have reached … Read more6 lessons learned as a new data science lead

Simple Linear Regression with Python

In my previous article, I talked about Simple Linear Regression as a statistical model to predict continuous target values. I also showed the optimization strategy the algorithm employs to compute the regression’s coefficients α and β. Here, I’m going to provide a practical explanation of what I’ve been talking about, and I’m going to do … Read moreSimple Linear Regression with Python

Recency, Frequency, Monetary Model with Python — and how Sephora uses it to optimize their Google…

The last time we analyzed our online shopper date set using the cohort analysis method. We discovered some interesting observations around our cohort data set. While cohort analysis provides us with customer behavior overtime and understand retention rates, we also want to be able to segment our data by their behavior as well. Today, we … Read moreRecency, Frequency, Monetary Model with Python — and how Sephora uses it to optimize their Google…

Advanced analytics is nice, but how about we start with simple analytics?

Advanced analytics! Everyone wants some, but few people need some. Let’s start with a healthy dose of simple analytics first. From simple analytics to advanced analytics I know, I get it! You’re excited about all your data and you want to hire a data scientist right away to get started on all the advanced analytics … Read moreAdvanced analytics is nice, but how about we start with simple analytics?

Understanding the OLS method for Simple Linear Regression

Linear Regression is the family of algorithms employed in supervised machine learning tasks (to learn more about supervised learning, you can read my former article here). Knowing that supervised ML tasks are normally divided into classification and regression, we can collocate Linear Regression algorithms in the latter category. It differs from classification because of the … Read moreUnderstanding the OLS method for Simple Linear Regression

Visualizing NYC Bike Data on interactive and animated maps with Folium plugins

This plugin helps us animate a path on a map. In this case, we don’t have the exact path each trip follows, so we will create lines from origin to destination. Before starting to work with our data, let’s take a look at what settings this plugin needs. From the live demo, we can see … Read moreVisualizing NYC Bike Data on interactive and animated maps with Folium plugins

CI/CD fbprophet on AWS Lambda using CircleCI

Integrate forecasting into your development flow with CircleCI Greg Studio / Unsplash About a year ago I was trying to figure out how to get fbprophet forecasting work on a AWS Lambda and eventually got it working. Since then the need for serverless forecasting matured, and more people got interested in making the tech seamless, … Read moreCI/CD fbprophet on AWS Lambda using CircleCI

Reinforcement Learning — Generalisation of Off-Policy Learning

The Baird Counter Example Till now, we have extended our reinforcement learning topic from discrete state to continuous state and have elaborated a bit on applying tile coding to on-policy learning, that is the learning process follows the trajectory the agent takes. Now let’s have a talk of off-policy learning in continuous settings. While in … Read moreReinforcement Learning — Generalisation of Off-Policy Learning

Using Deep Learning to Classify Relationship State with DeepConnection

Image Classification of Romantic Couples with PyTorch Model scheme of DeepConnection. If there is a root domain to the recent explosion in deep learning, it’s certainly computer vision, the analysis of image and video data. So it doesn’t come as a huge surprise that you try your luck with some computer vision techniques while studying … Read moreUsing Deep Learning to Classify Relationship State with DeepConnection

Machine Learning Algorithms for Every Occasion

Make an informed decision about your choice of algorithm A machine learning algorithm is a method that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Algorithms like linear regression, deep learning, convolutional neural networks and recommendation systems are widely being used and explained. It is easy to get … Read moreMachine Learning Algorithms for Every Occasion

How To Learn Data Science – My path

The following apps are very helpful and I use it * Quora * Medium * Blind * Reddit * Linkedin * Udemy * Coursera * Youtube * Meetup * Datacamp 1. Reddit: I have subscribed to the following Reddit’s and it is very helpful Dataengineering Dataisbeautiful Datasets Learndatascience Learnprogramming Learnpython Machinelearning Learnmachinelearning Python Rstats Computervision … Read moreHow To Learn Data Science – My path

Tech Dividends, Part 2

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In a previous post, we explored the dividend history of stocks included … Read moreTech Dividends, Part 2

Modern R with the tidyverse is available on Leanpub

Yesterday I released an ebook on Leanpub,called Modern R with the tidyverse, which you can alsoread for free here. In this blog post, I want to give some context. Modern R with the tidyverse is the second ebook I release on Leanpub. I released the first one, calledFunctional programming and unit testing for data munging … Read moreModern R with the tidyverse is available on Leanpub

AWS RoboMaker now supports log-based simulation, event driven simulation termination, and simulation event tagging

Log-based simulation allows you to play back pre-recorded log data such as sensor streams for testing robotic functions such as localization, mapping, and object detection. In addition to running a physics-based Gazebo simulation, RoboMaker can be used to test a particular robotic function with log-based simulation.  Event driven simulation termination allows you to end a … Read moreAWS RoboMaker now supports log-based simulation, event driven simulation termination, and simulation event tagging

AWS RoboMaker now supports configurable timeout in over-the-air (OTA) deployment

AWS RoboMaker extends the most widely used open-source robotics software framework, Robot Operating System (ROS), with connectivity to cloud services. These cloud services provide a robotics development environment for application development, a robotics simulation service to accelerate application testing, and a robotics fleet management service for remote application deployment, update, and management. RoboMaker also provides … Read moreAWS RoboMaker now supports configurable timeout in over-the-air (OTA) deployment

Building a Convolutional Neural Network: Male vs Female

Now let’s start with our modeling process Step 1: Creating a new Notebook Click on the link below to visit colab and click on File, then New Python 3 Notebook. Google Colaboratory Edit description colab.research.google.com Step 2: The second step is to import dependencies/libraries we are going to use in this demo: import numpy, matplotlib, … Read moreBuilding a Convolutional Neural Network: Male vs Female

The intuition behind A/B Testing — A Primer for New Product Managers

A/B Testing Basics A Primer for New Product Managers Purpose What is the point of hypothesis tests such as A/B Tests? For that matter, why do we test new things? What are the results of the A/B Tests telling us? How confident should I feel about the result of the A/B Test? How do product … Read moreThe intuition behind A/B Testing — A Primer for New Product Managers

P-Value In Action: Is It Safe to Say That Parallax Correction Really Improve The Accuracy of…

source: https://planetary.s3.amazonaws.com/assets/images/spacecraft/2014/20140227_nasa_gpm.jpg What really is p-value? It really takes a long time for me to figure out the concept of this value. From my experience, I believe the best method to understand about p-value is through a real example. So that’s why in this post, I will explain about p-value using a real example that … Read moreP-Value In Action: Is It Safe to Say That Parallax Correction Really Improve The Accuracy of…

Itaú Unibanco: How we built a CI/CD Pipeline for machine learning with online training in Kubeflow

Once a data scientist has a set of well-performing machine learning models, they need to operationalize them for other applications to consume. Depending on the business requirements, predictions are produced either in real time or on a batch basis. For the AVI project, two business requirements were essential: (1) the ability to have multiple models … Read moreItaú Unibanco: How we built a CI/CD Pipeline for machine learning with online training in Kubeflow

Machine learning on categorical variables

How to properly run and evaluate models Photo by v2osk on Unsplash At first blush, categorical variables aren’t that different from numerical ones. But once you start digging deeper and implement your machine learning (and preprocessing) ideas in code, you will stop every minute asking questions such as “Do I do feature engineering on both … Read moreMachine learning on categorical variables

Best practices for SAP app server autoscaling on Google CloudBest practices for SAP app server autoscaling on Google CloudSAP Solutions ConsultantStrategic Cloud Engineer

In most large SAP environments, there is a predictable and well known daily variation in app server workloads. The timing and rate of workload changes are generally consistent and rarely change, making them great candidates to benefit from the elastic nature of cloud infrastructure. Expanding and contracting VMs to match the workload cycle can speed … Read moreBest practices for SAP app server autoscaling on Google CloudBest practices for SAP app server autoscaling on Google CloudSAP Solutions ConsultantStrategic Cloud Engineer

AI for Industrial Process Control

Tuning a Process Oven with Reinforcement Learning Determining optimal control settings for an industrial process can be tough. For instance, controls can interact, where adjusting one setting requires readjustment of other settings. Also, the relationship between a control and its effect can be very complex. Such complications can be challenging for optimizing a process. This … Read moreAI for Industrial Process Control