Pruned Cross Validation for hyperparameter optimization

Speed benchmarking Photo by chuttersnap on Unsplash The main advantage of the pruned cross-validation is a search speed increase. If the hyperparameter set yields poor results, the cross-validation is pruned and therefore time, and computation resources are saved. Below you can find a comparison between standard grid search and pruned grid search: Search speed benchmarking Grid … Read more

Society Desperately Needs An Alternative Web

Is it too late to steer a path towards an internet intended to free information, preserve our privacy and be accountable to the needs of humanity? Deposit Photos: Leave Me Alone I see a society that is crumbling. The rampant technology is simultaneously capsizing industries that were previously the bread and butter of economic growth. The … Read more

A step-by-step guide for creating advanced Python data visualizations with Seaborn / Matplotlib

Although there’re tons of great visualization tools in Python, Matplotlib + Seaborn still stands out for its capability to create and customize all sorts of plots. Photo by Jack Anstey on Unsplash In this article, I will go through a few sections first to prepare background knowledge for some readers who are new to Matplotlib: Understand the … Read more

Data Clustering Using Hamiltonian Dynamics

A brief introduction to a flexible clustering algorithm for data on flat and curved surfaces. Imagine you finally land a data science job, and it entails checking, labelling and classifying every new datum added to the dataset manually. Such a job would be a dull and tedious job! Furthermore, with the volume of data being … Read more

The data is in: Ethiopia has the best coffee

Building each country’s coffee profile The dataset records each coffee sample’s country of origin, and that allows me to aggregate the grades for each country and build that country’s coffee profile. Figure 1 shows the results for a few countries and one thing that becomes pretty apparent is that there is actually not much variation between … Read more

“GANs” vs “ODEs”: the end of mathematical modeling?

Disentangling neural networks representations [source] Hi everyone! In this article, I would like to make a connection between classical mathematical modeling, that we study in school, college, and machine learning, that also models objects and processes around us in a totally different manner. While mathematicians create models based on their expertise and understanding of the … Read more

Statistical Learning and Knowledge Engineering All the Way Down

A path to combining machine learning and knowledge bases Photo by Joao Tzanno on Unsplash Doug Lenat, CEO of Cycorp, Inc. and AAAI Fellow, gave an interesting keynote talk at the AAAI Spring Symposium at Stanford University during the AAAI-Make session. Current trends in society contrast with the common perception that people are becoming more and … Read more

Constructivist Machine Learning

A vision towards bringing machine learning closer to humans Photo by The Roaming Platypus on Unsplash Is there a way to re-interpret machine learning in a constructivist way? And more importantly, why should we do it? The answers to both questions are quite straightforward. Yes, we can do it, and the motivation for that may address one … Read more

Feature Selection and Dimensionality Reduction

Remove features with missing values Checking for missing values is a good first step in any machine learning problem. We can then remove columns exceeding a threshold we define. # check missing valuestrain.isnull().any().any() False Unfortunately for our dimensionality reduction efforts, this dataset has zero missing values. Remove features with low variance In sklearn’s feature selection module we … Read more

Data science productionization: trust

I devote most of the posts in this series to the more technological aspects of productionization, although even those aspects are heavily dependent upon some very human processes. But let’s say our code is all packaged, containerized, and version-controlled; that our workflows have all been automated; that all of the processes have technical and non-technical … Read more

Neural Style Transfer Series : Part 2

TensorFlow and pyTorch Implementation of Neural Style Transfer This article follows from what we discussed in the first article. While we spoke about the intuition and the theory of how Neural Style Transfer works, we will now move onto implementing the original paper. If this is the first article you’ve read of this series, I would … Read more

Human Pose Estimation : Simplified

Take a peek into the world of Human Pose Estimation What is Human Pose Estimation anyway? Human pose estimation is an important problem in the field of Computer Vision. Imagine being able to track a person’s every small movement and do a bio-mechanical analysis in real time. The technology will have huge implications. Applications may … Read more

Automation, Risk and Robust Artificial Intelligence

An interview with Professor Thomas Dietterich on the need for high reliability in socio-technical systems involving AI. Photo by Laurent Perren on Unsplash The ways in which artificial intelligence (AI) is woven into our everyday lives can hardly be overstated. Powerful deep machine-learning algorithms increasingly predict what movies we want to watch, which ads we’ll respond … Read more

Tinkering with Tensors and Other Great Adventures

Motivations Why implement a research paper? And why NLP? Let’s start with the latter. Assuming you want to create beneficial AI for the sustained good of humanity (and I mean, who wouldn’t?), you’d necessarily have to create a system capable of advanced reasoning, and which could preferably explain the contents of its consciousness to a … Read more

Data science effectiveness as a UX problem

We data scientists spend so much of our effort helping you understand your users that… you forget that we are users too. Data scientists are users too. There are many instances where it feels like someone attempted to make a data science tool for data scientists without ever having met a live one. If you take … Read more

Data Science Lunacy

Author’s note: I generally don’t write in first-person, but given this rant, it seemed apt. These opinions are wholly my own and should be evaluated as highly dubious. Howling at the moon. Waning Trust I have a hard time taking LinkedIn seriously. If your feed is anything like mine, half of it is blatant self-promotion, and … Read more

How to dominate MLS Fantasy

Hello old friend Note to all of my non-US readers, please forgive me for referring to this sport as “soccer” and not “football”. Enjoy! Growing up in Kansas City, I cherish many fond memories of attending KC Wizards (now Sporting Kansas City) games with my family. Being a soccer player myself (not a very good one, … Read more

Market Basket Analysis with recommenderlab

My take on Market Basket Analysis — Part 2 of 3 Photo by Victoriano Izquierdo on Unsplash Overview Recently I wanted to learn something new and challenged myself to carry out an end-to-end Market Basket Analysis. To continue to challenge myself, I’ve decided to put the results of my efforts before the eyes of the data science community. This … Read more

Improving PewDiePie’s camera quality with Autoencoders

Let’s take a look at how we can use Deep Learning for Image Super-Resolution with Autoencoders. Comparison of the 480p input (left) to an Autoencoder trained for the task of image super-resolution, with it’s higher quality output at the same resolution (right). Recently, I have been reading about various image super resolution techniques that utilize … Read more

Computer Vision for Vaporwave Art

Photo by Sean Foley on Unsplash Using modern tech for artistic pursuits Reading through publications and implementing practical algorithms is exciting, of course. I enjoy finding clever ways to segment images, detect people being jerks, or even help a robot get through a farm. Yet, there are weeks in which I don’t want to use computer vision … Read more

Google Knows What You Are Saying With Only 80 MBs

So what does this have to do with data science? And how are cats involved? Statistical models and machine learning! HMMs, RNN-Ts, CTC, DNNs, LSTMs, CNNs, a brief history and fun with letters! Traditionally, voice diction has used Hidden Markov Models as a basis to predict output. A hidden Markov model is a statistical model … Read more

Comparing Text Summarization Techniques

Text Summarization is an increasingly popular topic within NLP and, with the recent advancements in modern deep learning, we are consistently seeing newer, more novel approaches. The goal of this article is to compare the results of a few approaches that I experimented with: Sentence Scoring based on Word Frequency TextRank using Universal Sentence Encoder … Read more

Data science productionization: scale

Let’s look at the word-normalizing code (from my previous two posts) one more time. The code we wrote previously works fine for a single word. It would even work fine for a few thousand words. But if we need to normalize millions or billions of words, it will take more time than we probably want … Read more

Finding the right model parameters

If you’ve been reading about Data Science and/or Machine Learning, you must have come across articles and projects that work with MNIST dataset. The dataset includes a set of 70,000 images where each image is a handwritten digit from 0 to 9. I also decided to use the same dataset to understand how fine tuning … Read more

Why Norms Matters — Machine Learning

Play with norms: https://www.desmos.com/calculator/wznruz7mxs Evaluation is a crucial step in all modeling and machine learning problems. Since we are often making predictions on entire datasets, providing a single number that summarizes the performance of our model is both simple and effective. There are a number of situations where we need to compress information about a … Read more

The Single Course That Fast-Tracked My Data Science Learning Journey

That boosted my understanding, skills and confidence tremendously when I first started out in data science Before digging into the awesome resources I know and trust, an important disclosure: Some of the links below are affiliate links, which means that if you choose to make a purchase, I will earn a commission. This commission comes at … Read more

The Actual Difference Between Statistics and Machine Learning

Statistical Models vs Machine learning — Linear Regression Example It seems to me that the similarity of methods that are used in statistical modeling and in machine learning has caused people to assume that they are the same thing. This is understandable, but simply not true. The most obvious example is the case of linear regression, which … Read more

De-Googling Bach: Counterpointing Bach’s Rules of the Road With American Populist Music

Counterpointing Bach’s Rules of the Road With American Populist Music “Thnking Outside the Bachs” by Max Harper Ellert I had fun this week playing with the Google Doodle to create Bach harmonies from simple melodies. Looking through some of the articles about the process of melding A.I. and the principles of counterpoint was interesting too, like this … Read more

Corners in Images and Angular Representation of Their Relationships

Corner detection has been an important subject in image processing. It is essential and important, because it helps us find the unique features in images. There are several methods for detecting corners in images. The most famous one, that I assume, is Harris Corner Detection. After I read about it in Open-CV documentation, it gave … Read more

Understanding Negative Log Loss

While learning fast.ai, I decided to test out the “3 lines of code” on some dataset other than the ones used in the course. The wiki page of fast.ai has some recommendations and I decided to try out as many as possible. The first recommended dataset under the easy category was Dogs vs. Cats Redux: … Read more

On Retractions in Biomedical Literature

The fierce competition in academia and the rush to publish, many times lead to flawed results and conclusions in scientific publications. While some of these are honest mistakes, others are deliberate scientific misconduct. According to one study, 76% of retractions were due to scientific misconduct in papers retracted from a specific journal¹. Another study from … Read more

Will Scientific Research be able to avoid Artificial Intelligence pitfalls?

It’s now obvious that AI, Machine Learning and Deep Learning are no longer buzzwords as they’re getting more and more present in every industry. Notwithstanding the trend has been overhyped in 2017, we are now certain that these technologies will be ubiquitous by 2020. Scientific research has not been left behind and AI has been … Read more

Don’t let them GO!

Using machine learning to detect customer churn. We have an example of a virtual company called ‘Sparkify’ who offers paid and free listening service, the customers can switch between either service, and they can cancel their subscription at any time. The given customers dataset is huge (12GB), thus the standard tools for analysis and machine learning … Read more

Exploring FIFA

SalRiteBlockedUnblockFollowFollowing Mar 24 ‘The thing about football — the important thing about football — is that it is not just about football.’ ~Sir Terry Pratchett. Soccer or Association Football, is not just a game, its an emotion for many. People follow their favorite Clubs no lesser than their Religion! Great Players are celebrated all over the world. But not … Read more

The Deployment Pain

Possible Causes of Deployment Anxiety This article was co-authored with Patrick Slavenburg The data science cycle in magnets: data access, data processing, model training, and deployment In October 2017, I was running the KNIME booth at the ODSC London conference. At the booth, we had the usual conference material to distribute: informative papers, various gadgets, … Read more

Natural Language Processing with Spacy in Node.js

Show Me some Examples Extract Dates Say you want to extract all of the dates from this text: The United States increased diplomatic, military, and economic pressures on the Soviet Union, at a time when the communist state was already suffering from economic stagnation. On 12 June 1982, a million protesters gathered in Central Park, New … Read more

Something You don’t know about data File if you just a Starter in Data Science, Import data File…

To be a master in data science, You have to understand how to manage your data and import it from the web because approx. 90% of data in real-world come straight from the internet. Data Engineer Life ( Source: Agula) If you are new to Data Science field, then you must be working hard to learn … Read more

DeViSE Zero-shot learning

Let’s take a closer look at the class probabilities an image classifier returns: With a softmax output layer, each picture can belong to only one single category as softmax is designed to assign a high probability to one single class. This means that you should not introduce an additional category “dog” because the network would … Read more

The complete beginner’s guide to machine learning: simple linear regression in four lines of code!

Even you can build a machine learning model. Seriously! Good data alone doesn’t always tell the whole story. Are you trying to figure out what someone’s salary should be based on their years of experience? Do you need to examine how much you’re spending on advertising in relation to your yearly sales? Linear regression might … Read more

Which Data Science Bootcamp is right for you?

Photo by NESA by Makers on Unsplash If you’re thinking about attending a data science bootcamp but have zero data science experience yourself, you’ll probably not be able to sort the good from the bad. You won’t know which ones focus on the right things, the unnecessary things, the weird edge-case things. And most importantly, you … Read more

Learning Theory: (Agnostic) Probably Approximately Correct Learning

In my previous article, I discussed what is Empirical Risk Minimization and the proof that it yields a satisfactory hypothesis under certain assumptions. Now I want to discuss Probably Approximately Correct Learning (which is quite a mouthful but kinda cool), which is a generalization of ERM. For those who are not familiar with ERM, I … Read more

Everybody has a right to know what’s happening with the planet: towards a global commons

The importance of knowing our environmental history How can we judge today if we don’t know what happened yesterday? For anyone to be able to understand ecosystem services and the value they represent to the environment, they must first have insight into past environmental conditions. Some selected point in the past (often referred to in … Read more