Forecasting Stock Prices using XGBoost — A Detailed Walk-Through

A Step-By-Step Walk-Through Photo by Jamie Street on Unsplash There are many machine learning techniques in the wild, but extreme gradient boosting (XGBoost) is one of the most popular. Gradient boosting is a process to convert weak learners to strong learners, in an iterative fashion. The name XGBoost refers to the engineering goal to push … Read more Forecasting Stock Prices using XGBoost — A Detailed Walk-Through

Make a Bar Chart About Roman Emperors’ Rise to Power with Python

As an enthusiast of both ancient history and Python programming, when I stumbled upon this data set about Roman emperors, I knew what I had to do… use it to make a data visualization in Python! Browsing the columns, I decided to chart the different ways the emperors rose to power. Sure, you could be … Read more Make a Bar Chart About Roman Emperors’ Rise to Power with Python

Visualizing Topic Models with Scatterpies and t-SNE

But not so fast — you may first be wondering how we reduced T topics into a easily-visualizable 2-dimensional space. For this, I used t-Distributed Stochastic Neighbor Embedding (or t-SNE). Taking the document-topic matrix output from the GuidedLDA, in Python I ran: from sklearn.manifold import TSNEtsne_model = TSNE(n_components=2, verbose=1, random_state=7, angle=.99, init=’pca’)# 13-D -> 2-Dtsne_lda … Read more Visualizing Topic Models with Scatterpies and t-SNE

On the importance of creativity in Data Analytics

I recently attended a networking event where I spoke to a range of graduates who were looking at prospective careers in the data science and adjacent spaces. While talking to many of them, I would ask them what they felt the most underrated or under appreciated trait in data analytics was. It is a question … Read more On the importance of creativity in Data Analytics

4 Important Things Universities Need to Teach About Data Science/Analytics

It doesn’t matter whether you are a data analyst or data scientist. Your work is close to meaningless if others do not see its value. In school, we are taught to interpret the significance of hypothesis testings through its p-value. It doesn’t matter if you are doing your chi-squared tests, Kruskal Wallis tests, Wilcoxon signed … Read more 4 Important Things Universities Need to Teach About Data Science/Analytics

A dummies’ guide to build a Kubeflow Pipeline

Kubeflow provides a layer of abstraction over Kubernetes handling things in a better way for Data Science & ML pipelines. It allows ML pipelines to become production-ready and to be delivered at scale through the resilient framework for distributed computing(i.e Kubernetes). We will dive into the details required on how to make machine learning pipeline … Read more A dummies’ guide to build a Kubeflow Pipeline

How to Access Twitter’s API using Tweepy

Twitter API uses OAuth, which is an open authorization protocol to authenticate requests. You will need to create and configure your authentication credentials to access Twitter API. As promised, this is a step-by-step guide so follow along! Step 0: Open a Twitter account. If you already have a Twitter account, skip this step Step 1: … Read more How to Access Twitter’s API using Tweepy

Clear and Creepy Danger of Machine Learning: Hacking Passwords

A white hat study of hacking passwords using ML Not too long ago, it was considered state of the art research to make a computer distinguish cats vs dogs. Now image classification is ‘Hello World’ of Machine Learning (ML), something one can implement in just a few lines of code using TensorFlow. In fact, in … Read more Clear and Creepy Danger of Machine Learning: Hacking Passwords

Question Answering for Enterprise Use Cases

Answering users’ questions in an enterprise domain remains a challenging proposition. Businesses are increasingly turning to automated chat assistants to handle technical support and customer support interactions. But, these tools can only successfully troubleshoot questions they were trained on, exposing a growing challenge for enterprise question answering (QA) techniques today. To address this, IBM Research … Read more Question Answering for Enterprise Use Cases

Sharing my personal journey writing for Towards Data Science

People commonly say that we should not bottle up feelings because it is unhealthy for our wellbeing. For me, my passion for data science is overflowing, and I am in need of an outlet. The purpose of this article is to share my exploration journey, just like how we do Transfer Learning for machine learning … Read more Sharing my personal journey writing for Towards Data Science

Machine Learning 101: Predicting Drug Use Using Logistic Regression In R

Load, clean, and spliting the dataset library(readr)drug_use <- read_csv(‘drug.csv’,col_names=c(‘ID’,’Age’,’Gender’,’Education’,’Country’,’Ethnicity’,’Nscore’,’Escore’,’Oscore’,’Ascore’,’Cscore’,’Impulsive’,’SS’,’Alcohol’,’Amphet’,’Amyl’,’Benzos’,’Caff’,’Cannabis’,’Choc’,’Coke’,’Crack’,’Ecstasy’,’Heroin’,’Ketamine’,’Legalh’,’LSD’,’Meth’,’Mushrooms’,’Nicotine’,’Semer’,’VSA’))library(dplyr)drug_use <- drug_use %>% mutate_at(as.ordered, .vars=vars(Alcohol:VSA)) drug_use <- drug_use %>%mutate(Gender = factor(Gender, labels=c(“Male”, “Female”))) %>%mutate(Ethnicity = factor(Ethnicity, labels=c(“Black”, “Asian”, “White”,“Mixed:White/Black”, “Other”,“Mixed:White/Asian”,“Mixed:Black/Asian”))) %>%mutate(Country = factor(Country, labels=c(“Australia”, “Canada”, “New Zealand”, “Other”, “Ireland”, “UK”,”USA”)))#create a new factor variable called recent_cannabis_usedrug_use = drug_use %>% mutate(recent_cannabis_use=as.factor(ifelse(Cannabis>=”CL3″,”Yes”,”No”)))#create a new tibble that includes a … Read more Machine Learning 101: Predicting Drug Use Using Logistic Regression In R

Predicting Startup Failures Using Classification

[This project was done as part of an immersive data science program called Metis. You can find the files for this project at my GitHub and the slides here. The final model is accessible here] A Bit of Background Recently I tested a slew of classification algorithms to see if I could predict whether a … Read more Predicting Startup Failures Using Classification

The Next Step Towards Conscious AI Should Be Awareness

the developer philosopher What is the “magic trick” to become conscious ? Consciousness did not appear suddenly from a certain level of complexity in our brain but was the product of a long evolutionary process that shaped it. As a consequence, simply build more and more complex AI systems will not make it emerge miraculously, … Read more The Next Step Towards Conscious AI Should Be Awareness

Create Hans Rosling’s famous animated bubble chart in a single piped R command

A great little learning exercise that illustrates the range and flexibility of the R language A few weeks ago I sat down with an hour or so to spare and decided that I’d like to try create Hans Rosling’s Gapminder bubble chart — made famous by his hugely entertaining lectures and TED talks — from … Read more Create Hans Rosling’s famous animated bubble chart in a single piped R command

The Evolution of Computer Science: from the Static to the Dynamic Paradigm

the developer philosopher © https://feaforall.com/wp-content/uploads/2013/04/1.jpg You can read a lot of things about the evolution of computer science although it is a relatively young recognized science. The most controversial topics tend to be about general paradigms of thinking: object-oriented vs functional programming, declarative vs imperative programming, RISC vs CISC, SQL vs NoSQL, etc. Of the … Read more The Evolution of Computer Science: from the Static to the Dynamic Paradigm

Fine-Grained Analysis of Sentence Embeddings

Representing words or sentences as real valued vectors in a high dimensional space has allowed us to incorporate deep learning methods in Natural Language Processing tasks. These embeddings serve as features for a wide variety of machine learning tasks ranging from sequence labelling to information retrieval. Word2Vec (and its variants) have been the go to … Read more Fine-Grained Analysis of Sentence Embeddings

Recommendation System PART 1 — Use of Collaborative Filtering and Hybrid Collaborative — Content…

Recommender systems shown in Amazon Disclaimer: I mentioned items and products in the texts and codes interchangeably. Both of them are the same. When you open some online marketplaces such as Amazon, you will find some recommendations such as frequently bought together, customers who bought this also bought this, similar items, and so on. You … Read more Recommendation System PART 1 — Use of Collaborative Filtering and Hybrid Collaborative — Content…

Using dunder methods to refine data model

Using quaternions as an example. Practically everyone who has ever used Python came across at least one of the so-called Python magic methods. Dunder methods, as they also called that way, are Python’s special functions that allow users to hook into some specific actions being performed. Probably the most frequently encountered one is the__init__ method. … Read more Using dunder methods to refine data model

Time Complexity for Data Scientists

Let’s go further with some use cases that you might encounter and what data structures and tricks are there to help us along the way. Nearest Point For geospatial feature engineering, such as geomancer, where for a particular coordinate, we build features by answering questions such as “what is the distance to the nearest _____?” … Read more Time Complexity for Data Scientists

Hopefully This Bayesian Spam Filter isn’t Too Naive

(notebook) Our data initially came in with some unimportant columns, which I promptly dropped. Next, taking a look at our two columns that we pulled from the dataframe, label, and text, label is a categorical feature that contains “ Spam” or “ Ham”, so I figure spam is spam, and ham is not spam. So … Read more Hopefully This Bayesian Spam Filter isn’t Too Naive

Deep Learning for Natural Language Processing Using word2vec-keras

A deep learning approach for NLP by combining Word2Vec with Keras LSTM Natural language processing (NLP) is a common research subfield shared by many research fields such as linguistics, computer science, information engineering, and artificial intelligence, etc. NLP is concerned with the interactions between computers and human natural languages in general and in particular how … Read more Deep Learning for Natural Language Processing Using word2vec-keras

How Conversational AI is Improving Social Discussions

Photo by William Iven on Unsplash In the world of Facebook, Twitter, Blogs and WhatsApp, you will never know what can go wrong. You don’t even know if the content you are sharing is correct or no. But what you need to know that every shared post is creating an impact — Positive or Negative. … Read more How Conversational AI is Improving Social Discussions

On the over sensationalism of artificial intelligence news

It’s no longer a secret that artificial intelligence (AI) is here to stay. What once was a puzzling and rather niche area of computer science, has suddenly started to take over our lives with its many applications. As a result, due to this mysterious and unknown characteristic of AI and its more prominent child, machine … Read more On the over sensationalism of artificial intelligence news

5 Adoption Challenges that every Data Science Project Faces

Take these steps to avoid your project’s failure in Production Some years ago, at Gramener, we built a customer churn modeling solution for one of the largest global telecom operators. The machine learning solution predicted which of their customers would leave, one month before they stopped usage. In test pilots, the solution helped reduce customer … Read more 5 Adoption Challenges that every Data Science Project Faces

NeurIPS 2019 exploration App with Streamlit

screenshot of the app Streamlit NeurIPS 2019 Exploration App neurips2019exploration.herokuapp.com Since I know about streamlit, I always want to try to build something with it. So I end up building a little app for exploring the papers of NeurIPS 2019, which has a crazy number of accepted papers this year — 1428 papers. You can … Read more NeurIPS 2019 exploration App with Streamlit

How do we create a sustainable thinking economy?

“Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.” – Genesis 11:6 KJV¹ As conveyed in the Tower of Babel origin myth, massive cooperation is so crucial to the self-determination of our species … Read more How do we create a sustainable thinking economy?

How to Approach a Machine Learning Problem?

Machine learning is the process of making a computer learn as you would a child. Just like a child needs to be taught how to understand the problem, leverage the insights from the given situations and act accordingly, a machine learning model also needs to be taught. In today’s world, data is the most expensive … Read more How to Approach a Machine Learning Problem?

The Serious Downsides To The Julia Language In 1.0.3

DataFrames.jl Though fairly critical and minor, the biggest one that I wanted to mention out of the gate is DataFrames.jl. Now don’t get me wrong, I like DataFrames to some degree… But particularly annoying is the way that DataFrames are displayed to the user, and I don’t really understand the reasons for doing such a … Read more The Serious Downsides To The Julia Language In 1.0.3

The effectiveness of programmatic advertising campaigns

In the first part of this series, we compare the effectiveness of a campaign on two different audiences over a limited time horizon to decide if performance is different. We are looking at a banner campaign launched on October 10th. This was started by means of “programmatic advertising” (https://en.wikipedia.org/wiki/Real-time_bidding) for different target groups. For example … Read more The effectiveness of programmatic advertising campaigns

Feature selection evaluation for automated AI

Feature selection algorithms look to effectively and efficiently find an optimal subset of relevant features in the data. This preprocessing step is becoming crucial as the number of features and data set sizes increase. In this article we will present the unique automatic process we developed at TapReason for evaluating the effectiveness of different feature … Read more Feature selection evaluation for automated AI

All you need to know about Regular Expressions

My go-to cheat sheet. Source: XKCD What are Regular Expressions and why should you know about them? Regular expressions are a generalized way to match patterns with sequences of characters. They define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace” like operations. Let’s say you … Read more All you need to know about Regular Expressions

GANs from Zero to Hero: Best Resources for Newcomers

Generative Adversarial Networks is the most interesting idea in the last 10 years in Machine Learning. — Yann LeCun, Director of AI Research at Facebook AI There’s no doubt, GANs are very exciting now, they are the first generative algorithms to give convincingly good results, and probably, these results will become only better with time. … Read more GANs from Zero to Hero: Best Resources for Newcomers

What’s the deal with Accuracy, Precision, Recall and F1?

It often pops up on lists of common interview questions for data science positions. Explain the difference between precision and recall, explain what an F1 Score is, how important is accuracy to a classification model? It’s easy to get confused and mix these terms up with one another so I thought it’d be a good … Read more What’s the deal with Accuracy, Precision, Recall and F1?

5 Data Science, AI and Machine Learning Podcasts to Listen to Now (updated)

I’m obsessed with podcasts. From a quick hit refresher presented in a 15 min format to hour long deep dives into complex topics, this is my preferred medium to consume data science content. You see, I’m a runner. I spend hours on the weekend in Chicago training and working to build endurance for long distance … Read more 5 Data Science, AI and Machine Learning Podcasts to Listen to Now (updated)

MachinaNova: A Personal News Recommendation Engine

MachinaNova, the news machine | Solution Overview MachinaNova is an application that finds news I’m interested in and presents it in a format I can read every day. It begs the question… How does it know what I want to read? It’s magic… A magical, news recommendation system. Well, it’s actually software I trained to … Read more MachinaNova: A Personal News Recommendation Engine

Stats for Baseball Fans: The Single Metric for Offense is OPS.

This summer I fell in love with baseball because I merged something I love (statistics) with something that only mildly intrigued me (the Chicago Cubs & Wrigley Field.) Now the story of how I found the Cubs can be its own blog post — it’s filled with romance, a move from the South, the Atlanta … Read more Stats for Baseball Fans: The Single Metric for Offense is OPS.

Relentless pursuit of 1%: The roles of an analytics department.

When I talk to marketers who are looking to buy my team’s services, or they’re looking to build out their own analytics group — I frequently tell them that a proper analytics team is all about installing intelligence in their marketing organization. It’s not meant for every marketing department — some are small and may … Read more Relentless pursuit of 1%: The roles of an analytics department.

Digital Marketing Made Easy

Search Engine Optimization (SEO): Is the practice of optimizing organic search engine results to enhancing both the quality and the quantity of traffic toward our website. Marketing Objective & KPI The marketing objective is build awareness and interest to get at least 100 visitors to the landing page within 7 days. The primary KPI for … Read more Digital Marketing Made Easy

Using Docker & Kubernetes to Host Machine Learning Models

Source: https://www.pexels.com/photo/baltic-sea-blue-sea-cargo-ship-container-port-2945314/ Chapter 4 excerpt of “Data Science in Production” Docker is a great tool for deploying ML models in the cloud. If you want to set up a production-grade deployment in the cloud, there’s a number of options across AWS and GCP. In chapter 4 of my book in-progress, I focus on ECS for … Read more Using Docker & Kubernetes to Host Machine Learning Models

This Is How Reinforcement Learning Works

(and what will make you build your first AI) In late 2017 Google introduced AlphaZero, an AI system that taught itself from scratch how to master the games of chess, Go and shogi in four hours. The short training time was largely enough for AlphaZero to beat world champion chess programs. (Andriy Popov / Alamy … Read more This Is How Reinforcement Learning Works

Using Panda’s “transform” and “apply” to deal with missing data on a group level

Image by PublicDomainPictures from Pixabay Frequently, when dealing with missing data, the sequencing does not matter, and thus, the values used to replace missing values can be based on the entirety of available data. In such cases, you would typically replace the missing values with your best guess (i.e., mean or medium of the available … Read more Using Panda’s “transform” and “apply” to deal with missing data on a group level