Serverless Distributed Data Pre-processing using Dask, Amazon ECS and Python (Part 1)

Source: unsplash.com Dec 18, 2018 The quality and accuracy of machine learning models depend on many factors. One of the most critical factors is pre-processing the dataset before feeding it into the machine learning algorithm that learns from the data. Therefore, it is critical that you feed them the right data for the problem you … Read more Serverless Distributed Data Pre-processing using Dask, Amazon ECS and Python (Part 1)

Think Machine Learning and AI Won’t Impact Your Networking Product — Think Again!

Using ML and AI as a force-multiplier will be a significant competitive advantage for networking product teams Photo by Hitesh Choudhary on Unsplash Machine learning and related techniques have seen tremendous advances in the last few years. And while at times it might feel that there’s a lot of hype surrounding the space, it’s clear that … Read more Think Machine Learning and AI Won’t Impact Your Networking Product — Think Again!

Friend Recommendation Using Heterogeneous Network Embeddings

Imagine Snoopy without Woodstock or Calvin without Hobbes, Friends without Rachel, Batman without Robin or Mowgli without Baloo. Social platforms thrive on the ability of the members to find relevant friends to interact with. The network effect is what drives growth or time spent and daily active users on the application. This is even more … Read more Friend Recommendation Using Heterogeneous Network Embeddings

Multimodal Deep Learning

Fusion of multiple modalities using Deep Learning Being highly enthusiastic about research in deep learning I was always searching for unexplored areas in the field (Though it is tough to find one). I had previously worked on Maths word problem solving and many such related topics. The challenge of using Deep Neural Networks as black boxes … Read more Multimodal Deep Learning

Docker Without the Hassle

How to Use: Slightly Friendlier Version First install docker. Instructions for your machine can be found here. The docker getting started guide is useful for learning how docker works, although we don’t need the details to use it effectively with repo2docker Make sure docker is running. If docker run hello-world shows the message Hello from … Read more Docker Without the Hassle

Distributed TensorFlow using Horovod

Reduce training time for deep neural networks by using many GPUs Marenostrum Supercomputer — Barcelona Supercomputing Center https://bsc.es (This post will be used in my master course SA-MIRI at UPC Barcelona Tech with the support of Barcelona Supercomputing Center) “Methods that scale with computation are the future of Artificial Intelligence” — Rich Sutton, father of reinforcement learning (video 4:49) In … Read more Distributed TensorFlow using Horovod

A journey into supervised machine learning

Some practical examples, tips, and thoughts on supervised ML Earlier this year, through my MBA program at Cornell Tech, I took a great intro course on Machine Learning with a fantastic professor, Lutz Finger. Lutz’s course inspired me to dig even deeper into ML and AI, so I recently started a hands-on Introduction to Machine … Read more A journey into supervised machine learning

Training a Neural Network to Detect Gestures with OpenCV in Python

Rethinking the problem I decided to pivot and try something new. It seemed to me that there was a clear disconnect between the odd look of the training data and images that my model was likely to see in real life. I decided I’d try building my own dataset. I had been working with OpenCV, an … Read more Training a Neural Network to Detect Gestures with OpenCV in Python

Creating US Immigration Path Map in Tableau with R

Meng LiBlockedUnblockFollowFollowing Dec 17 Ever seen a destination map in Tableau? It’s usually used to show the tracks of flights, bus maps, traffic and so on. There are loads of videos and articles teaching you how to create a destination map from a dataset containing all the information you need, for example, this video uses … Read more Creating US Immigration Path Map in Tableau with R

Getting started with mlFlow

What is mlFlow? mlFlow is a framework that supports the machine learning lifecycle. This means that it has components to monitor your model during training and running, ability to store models, load the model in production code and create a pipeline. The framework introduces 3 distinct features each with it’s own capabilities. MlFlow Tracking Tracking is … Read more Getting started with mlFlow

Deep Learning and Hyper-Personalization

Indeed, true personalization understands customers at a deeper level — their real-time intent, purchasing history, preferences and complex shopping journeys. It then utilizes these insights to tailor congruent, 1:1 interactions across channels. So far, most companies rely on machine learning, to take all this customer data and build predictive models on it, operating not just on what’s … Read more Deep Learning and Hyper-Personalization

Histopathologic Cancer Detector – Finding Cancer Cells with Machine Learning

Let’s take a look at the following diagram that illustrates the purposes of the specific layers in the CNN. As we can see above, starting from the left we are learning low-level features and the more we go to the right, the more specific things are being learned. The idea behind Transfer Learning is to … Read more Histopathologic Cancer Detector – Finding Cancer Cells with Machine Learning

Detecting Firms with Intentional Misstatements using Machine Learning

Identifying firms with intentional distortion of financial statements is a challenging and exciting problem among auditors, banks and investors who rely on financial information to make decisions. Yet it is difficult to flag out these firms as intentional accounting misstatement (cooking the books) can take several forms: hiding company losses through other entities, recognizing revenue … Read more Detecting Firms with Intentional Misstatements using Machine Learning

49 Years of Lyrics: Why so Angry?

Data Collection There are three datasets we’re using to run this experiment: A dataset we’ll collect ourselves that includes over 3400 song lyrics between 1970 and 2018. A list of prohibited/restricted words from www.freewebheaders.com that we’ll use to assess the perceived levels of profanity in lyrics. A training dataset from Kaggle (originally used for the … Read more 49 Years of Lyrics: Why so Angry?

On the Perils of Automated Face Recognition

For anyone who has been paying attention, it will not have gone unnoticed that the past year has seen a dramatic expansion in the use of face recognition technology, including at schools, border crossing, and interactions with the police. Most recently, Delta announced that some passengers in Atlanta will be able to check in and … Read more On the Perils of Automated Face Recognition

Our Collections

Explore further the world of data science, machine learning and artificial intelligence We are on a mission to get the best content relevant to data science, machine learning, and artificial intelligence out there for everyone. One of the challenges with any content platform on the internet is having a dedicated and curated list of resources … Read more Our Collections

Improve your scientific models with meta-learning and likelihood-free inference

Article jointly written by Arthur Pesah and Antoine Wehenkel Motivation There are usually two ways of coming up with a new scientific theory: Starting from first principles, deducing the consequent laws, and coming up with experimental predictions in order to verify the theory Starting from experiments and inferring the simplest laws that explain your data. … Read more Improve your scientific models with meta-learning and likelihood-free inference

6 uncommon principles for effective data sciences

How to conceptualize and implement effective data science projects Results, not hype Motivation The more I delve in data science, the more convinced I am that companies and data science practitioners must have a clear view on how to cut through the machine learning and AI hype, to implement an effective data science strategy that drives business … Read more 6 uncommon principles for effective data sciences

Building a Skin Lesion Classification Web App

Using Keras and TensorFlow.js to classify seven types of skin lesions Alex YuBlockedUnblockFollowFollowing Dec 16 After doing research on Convolutional Neural Networks, I became interested in developing an end-to-end machine learning solution. I decided to use the HAM10000 dataset to build a web app to classify skin lesions. In this article, I’ll provide some background information … Read more Building a Skin Lesion Classification Web App

How to Learn Data Science: Staying Motivated.

Learn how to Deal with Anxiety. When you start researching how to become a data scientist, you will discover an unfortunate fact about the profession. Namely, that becoming a data scientist requires knowledge of a broad and deep set of tools, technologies, and skills. All of which makes the prospect of becoming a data scientist VERY … Read more How to Learn Data Science: Staying Motivated.

What Kagglers are using for Text Classification

Advanced NLP techniques for deep learning With the problem of Image Classification is more or less solved by Deep learning, Text Classification is the next new developing theme in deep learning. For those who don’t know, Text classification is a common task in natural language processing, which transforms a sequence of a text of indefinite length … Read more What Kagglers are using for Text Classification

Is a Picture Worth A Thousand Words?

Dec 16, 2018 Source: Dark Reading Background Our project was inspired by Jamie Ryan Kiros who created a model trained on 14 million romance passages to generate a short romantic story for a single image input. Similarly, the ultimate goal of our project was to output a short story for children. “neural-storyteller is a recurrent neural … Read more Is a Picture Worth A Thousand Words?

Getting Started with TensorFlow in Google Colaboratory

Opening up a Colab Notebook When using Colab for the first time, you can launch a new notebook here: Once you have a notebook created, it’ll be saved in your Google Drive (Colab Notebooks folder). You can access it by visiting your Google Drive page, then either double-click on the file name, or right-click, and then … Read more Getting Started with TensorFlow in Google Colaboratory

Develop a NLP Model in Python & Deploy It with Flask, Step by Step

Flask API, Document Classification, Spam Filter By far, we have developed many machine learning models, generated numeric predictions on the testing data, and tested the results. And we did everything offline. In reality, generating predictions is only part of a machine learning project, although it is the most important part in my opinion. Considering a system … Read more Develop a NLP Model in Python & Deploy It with Flask, Step by Step

Logic Theory —Basic Notation

The origin of logic theory starts at the concept of an argument. The majority of logic textbooks contain an opening, central definition for an argument — one that likely sounds much like the following: An argument contains one or more special statements, called premises , offered as a reason to believe that a further statement, called the conclusion, … Read more Logic Theory —Basic Notation

Advanced Queries With SQL That Will Save Your Time

Yes, SQL still exists During the years of working with telecom data my folder with code snippets collected a lot of reusable examples. And it is not about “SELECT * FROM Table1”, I am talking about finding and handling or removing duplicate values, selecting top N values from each group of data within same table, shuffling … Read more Advanced Queries With SQL That Will Save Your Time

Art of Generative Adversarial Networks (GAN)

Dec 16, 2018 Art of Generative Adversarial Networks Code link for all the work mention in the post:- We had this pleasure of working on Generative adversarial network project for our final project for Business Data Science in our curriculum. Though we could have chosen any other subject as our final project yet we went … Read more Art of Generative Adversarial Networks (GAN)

ProGAN: How NVIDIA Generated Images of Unprecedented Quality

Progressively growing GANs enables them to get bigger and more stable The people in the high resolution images above may look real, but they are actually not — they were synthesized by a ProGAN trained on millions of celebrity images. “ProGAN” is the colloquial term for a type of generative adversarial network that was pioneered at NVIDIA. It … Read more ProGAN: How NVIDIA Generated Images of Unprecedented Quality

Ways to Improve a Map Visualization

How to take a map visualization to the next level. First, I will cover two reasons why visualizing data using maps is often compelling to an audience. Then, I will cover three tips that will help you make the transition from good to exceptional when building map visualizations. Why Use a Map for a Data Visualization? … Read more Ways to Improve a Map Visualization

Robots that Reason

Inorganic knowledge traditions with model-based reinforcement learning This essay explores the concept of inorganic knowledge traditions capable of sequential improvement using model based reinforcement learning Many behavioral economists presently believe that there are two primary methods used by humans for strategic decision making. One is fast, intuitive and unconscious — what has been called System 1 thinking. … Read more Robots that Reason

Simple House Price Predictor using ML through TensorFlow in Python

The profession of reality is moving into the 21st century, and as you can imagine home listings are flooding the internet. If you have ever looked at buying a home, renting an apartment, or just wanted to see what the most expensive home in town is (we have all been there), then chances are you … Read more Simple House Price Predictor using ML through TensorFlow in Python

Regression Analysis: Linear Regression

3. Model Building in R I have used the dataset which contains the details of 2,201 flights. The descriptions of each variable are as below. 3.1) Datasets schedtime : the scheduled time of departure (using the 24-hour clock) carrier : the two-letter code indicating which airline operated the flight deptime : the actual departure time dest : the three-letter code … Read more Regression Analysis: Linear Regression

What’s the fuss about Regularization?

As a newbie to machine learning most people get excited when their training error starts reducing. They try hard further and it starts reducing even further, their excitement knows no bounds. They show their results to master Oogway ( elderly wise tortoise in Kungfu Panda) and he calmly says well not a good model you … Read more What’s the fuss about Regularization?

Finding Local Events Using Twitter Data

Project by David Chen, Ashwin Gupta, Shruthi Krish, Raghav Prakash, Wei Wang Twitter is a social media platform that millions of users use to share updates about their lives. Often, these tweets are about local events happening around the user. Though news agencies report on local events, the time it takes an agency to learn … Read more Finding Local Events Using Twitter Data

Predicting hospital length-of-stay at time of admission

Exploring an important healthcare performance metric Photo by Hush Naidoo on Unsplash Project Overview Predictive analytics is an increasingly important tool in the healthcare field since modern machine learning (ML) methods can use large amounts of available data to predict individual outcomes for patients. For example, ML predictions can help healthcare providers determine likelihoods of disease, … Read more Predicting hospital length-of-stay at time of admission

Using Markov Chain Monte Carlo method for project estimation

Using TensorFlow probability for Hamiltonian Sampling Free photo from https://pixabay.com One type of criticism I received for the previous work on project estimation is that the log-Normal distribution has short tails. And this is true, despite all the benefits of log-Normal distribution. The reason is very simple: when fitting the data to the distribution shape … Read more Using Markov Chain Monte Carlo method for project estimation

Processing Time Series Data in Real-Time with InfluxDB and Structured Streaming

This article focuses on how to utilize a popular open source database “Influxdb” along with spark-structured streaming to process, store and visualize data in real time. Here, we will go in detail over how to set up a single node instance of Influxdb, how to extend the Foreach writer of SPARK to use it to … Read more Processing Time Series Data in Real-Time with InfluxDB and Structured Streaming

Modeling tree height and basal area in the Finger Lakes National Forest, NY

I tried my hand at using the R package, randomForest to create two regression models for tree height and basal area based off some lidar and field-collected data in the Finger Lakes National Forest, NY. Disclaimer: this project was my first real taste of R. Earlier in the semester I had done some simple learning into … Read more Modeling tree height and basal area in the Finger Lakes National Forest, NY

Text Generation Using RNNs

Generate characters from Alice in Wonderland Introduction Text generation is a popular problem in Data Science and Machine Learning, and it is a suitable task for Recurrent Neural Nets. This report uses TensorFlow to build an RNN text generator and builds a high-level API in Python3. The report is inspired by @karpathy ( min-char-rnn) and … Read more Text Generation Using RNNs

Introduction to Interactive Time Series Visualizations with Plotly in Python

Introduction to Plotly Plotly is a company that makes visualization tools including a Python API library. (Plotly also makes Dash, a framework for building interactive web-based applications with Python code). For this article, we’ll stick to working with the plotly Python library in a Jupyter Notebook and touching up images in the online plotly editor. When … Read more Introduction to Interactive Time Series Visualizations with Plotly in Python

The Ultimate NanoBook to understand Deep Learning based Image Classifier

The first and most important step of our journey: As I have said before, we are going to simply ask questions that will guide us to build an image classifier. For the sake of brevity, we will call Image Classifier an ICNow, we are ready to start our journey. So let us ask the first question: … Read more The Ultimate NanoBook to understand Deep Learning based Image Classifier

Applying GANs to Super Resolution

SRGAN Results from Ledig et al. [3] Generative adversarial networks (GANs) have found many applications in Deep Learning. One interesting problem that can be better solved using GANs is super-resolution. Super-resolution is a task concerned with upscaling images from low-resolution sizes such as 90 x 90, into high-resolution sizes such as 360 x 360. In this … Read more Applying GANs to Super Resolution

A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Artificial Intelligence has been witnessing a monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision. The agenda for this field is to enable machines … Read more A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Dealing With Class Imbalanced Datasets For Classification.

Skewed datasets are not uncommon. And they are tough to handle. Usual classification models and techniques often fail miserably when presented with such a problem. Although your model could get you to even a 99% accuracy on such cases, yet, if you are measuring yourself against a sensible metric such as the ROC Auc score, … Read more Dealing With Class Imbalanced Datasets For Classification.

Google Landmark Recognition using Transfer Learning

Image classification with 15k classes! Project by Catherine McNabb, Anuraag Mohile, Avani Sharma, Evan David, Anisha Garg Dealing with a large number of classes with very few images in many classes is what makes this task really challenging! The problem comes from a famous Kaggle competition, the Google Landmark Recognition Challenge. Training set contains over 1.2 … Read more Google Landmark Recognition using Transfer Learning

Anime Recommendation engine: From Matrix Factorization to Learning-to-rank

Anime Obsession gone too far!! OtakusHenry Chang, Joey Chen, Guanhua Zhang, Preetika Srivastava and Cherry Agarwal The vast amount of data that is hosted on the internet today has led to the information overflow and thus there is a constant need to improve the user experience. A recommendation engine is a system that helps support … Read more Anime Recommendation engine: From Matrix Factorization to Learning-to-rank