Are Data Engineers Important? Greetings my fellow readers.I hope that this article finds you in a good mood, because I’m about to piss some people off. I’ve been writing a lot of tutorial/guide based articles in the past few weeks. I want to switch it up a little. We wont be touching any code this … Read more Why are Data Engineers Equally as Important as Data Scientists
Art curation has been heavily biased towards supporting male representation in the most elite art institutions. In 1985, a group of anonymous American female artists, the Guerrilla Girls, plastered New York City with 30 different posters. In fact, the group came about when Museum of Modern Art (MoMA) held an exhibition where less than 10% … Read more Where Are All The Women in Modern Art?
By Salim Roukos, IBM Fellow Finding information in a company’s vast trove of documents and knowledge bases to answer users’ questions is never as easy as it should be. The answers may very well exist, but they often remain out of reach for a number of reasons. For starters, unlike the Web, where information is … Read more Advancing Natural Language Processing (NLP) for Enterprise Domains
A guide on how to get theoretically sound explanations from complex deep learning models trained on multivariate time series In this article, we’ll explore a state-of-the-art method of machine learning interpretability and adapt it to multivariate time series data, a use case which it wasn’t previously prepared to work on. You’ll find explanations to core … Read more Interpreting recurrent neural networks on multivariate time series
During a recent NLP project, I came across an article where word clouds were created in the shape of US Presidents using words from their inauguration speeches. Whilst I had used word clouds to visualise the most frequent words in a document, I’d not considered using this with a mask to represent the topic or … Read more Creating word clouds with python
This tutorial goes into extreme detail about how decision trees work. Decision trees are a popular supervised learning method for a variety of reasons. Benefits of decision trees include that they can be used for both regression and classification, they are easy to interpret and they don’t require feature scaling. They have several flaws including … Read more Understanding Decision Trees for Classification (Python)
Last month I took part in a debate on the ‘future of technology in retail and whether technology could replace humans….’ Subsequently, I’ve pulled together some thoughts for Inside Retail (also published here). For the past decade, advancements in technology have been disrupting what seems like most facets of the human experience. And with what … Read more Humans vs. machines. What does it mean in retail?
This post in in a direct continuation of my previous post about Text Preprocessing. This is a practical implementation of some important text preprocessing steps used before they are fed to a machine learning model. Instead of using conventional preprocessing and learning methods through coding scripts, I’ve used a tool called Knime. The dataset is … Read more Sentiment Analysis on raw text using Amazon, IMDB, and Yelp!
The Subjective interpretation takes a different approach from the previous two. Instead of concerning itself with frequencies or counts, the Subjective approach posits that probabilities stem from a person’s personal (subjective) degree of belief that particular event will occur, based on all relevant information available to them. This perspective aligns well with Bayesian statistics, which … Read more What is Probability?
In the first two parts of our overview of the How of XAI, we looked into pre-modelling explainability and explainable modelling methodologies, which focus on explainability at the dataset stage and during model development. Yet these are relatively minor areas of interest compared with explainability after the fact, and post-modelling explainability is where the majority … Read more The How of Explainable AI: Post-modelling Explainability
In the first part of our overview of the How of Explainable AI, we looked a pre-modelling explainability. However, the true scope of explainability is much broader. Explainability can be considered at all stages of AI development, namely, pre-modelling, model development, and post-modelling. The majority of AI explainability literature aims at explaining a black-box model … Read more The How of Explainable AI: Explainable Modelling
AI explainability is a broad and multi-disciplinary domain, being studied in several fields including machine learning, knowledge representation and reasoning, human-computer interaction, and the social sciences. Accordingly, XAI literature includes a large and growing number of methodologies. There are many factors that could contribute to how an AI model operates and makes its predictions, and … Read more The How of Explainable AI: Pre-modelling Explainability
You can’t always change a human’s input to see the output. At Fiddler Labs, we place great emphasis on model explanations being faithful to the model’s behavior. Ideally, feature importance explanations should surface and appropriately quantify all and only those factors that are causally responsible for the prediction. This is especially important if we want … Read more Causality in model explanations and in the real world
Understand principal component analysis (PCA) and clustering methods, and implement each algorithm in two mini projects Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are … Read more The Complete Guide to Unsupervised Learning
In a previous article introducing Recommendation Systems, we mentioned several times the concept of ‘similarity measures’. Why? Because in Recommendation Systems, both Content-Based filtering and Collaborative filtering algorithms, use some specific similarity measure to find how equal two vectors of users or items are in between them. So in the end, a similarity measure is … Read more Similarity measures in Recommendation Systems
Interpreting tweets containing the hashtag #RickyRenunció using spaCy, Google Cloud, and NLP. The island of Puerto Rico and its people are currently making history. On July 13, the Puerto Rico’s Center for Investigative Journalism published a document consisting of 889 pages of Telegram messages interchanged between the governor, Ricardo Roselló, and inner members of his … Read more What did Puerto Rico say after its governor resigned? A Twitter data analysis
A central problem in machine learning is to learn a complicated probability distribution p(x) with only a limited set of high-dimensional data points x drawn from this distribution. For example, to learn the probability distribution over images of cats we need to define a distribution which can model complex correlations between all pixels which form … Read more Deep Latent Variable Models: Unravel Hidden Structures
Background In May 2013, an extremely viral mobile game called Flappy Bird was released. The premise of the game was extremely simple — a bird was flapping its wings to fly through a platform surrounded by pipes with gaps. Touch any of the pipes you die, flap through the gaps between the pipes, and you … Read more Flappy Royale— Improving Results via 1000 Rounds of Practice
Theory, Implementation, and Visualization Support Vector Machine (SVM) is probably one of the most popular ML algorithms used by data scientists. SVM is powerful, easy to explain, and generalizes well in many cases. In this article, I’ll explain the rationales behind SVM and show the implementation in Python. For simplicity, I’ll focus on binary classification … Read more Support Vector Machine Explained
customer segmentation Exploring methods for cluster analysis, visualizing clusters through dimensionality reduction and interpreting clusters through exploring impactful features. Although we have seen a large influx of supervised machine learning techniques being used in organizations these methods suffer from, typically, one large issue; a need for labeled data. Fortunately, many unsupervised methods exist for clustering … Read more Cluster Analysis: Create, Visualize and Interpret Customer Segments
Trolls and bots have a huge and often unrecognized influence on social media. They are used to influence conversations for commercial or political reasons. They allow small hidden groups of people to promote information supporting their agenda and a large scale. They can push their content to the top of people’s news feeds, search results, … Read more Trolls and bots are disrupting social media — here’s how AI can stop them (Part 1)
Trolls and bots are widespread across social media, and they influence us in ways we are not always aware of. Trolls can be relatively harmless, just trying to entertain themselves at others’ expense, but they can also be political actors sowing mistrust or discord. While some bots offer helpful information, others can be used to … Read more Identifying trolls and bots on Reddit with machine learning (Part 2)
Wanna be a singer? AI will compose you an album. In the 2004 movie iRobot, Will Smith asks a robot the rhetorical questions: – Can a robot write a symphony? Can a robot turn a… canvas into a beautiful masterpiece? – CAN YOU? And really, how many of us can succeed in composing something like … Read more Artificial Intelligence and Music: What to Expect?
Optimize your Deep Learning model memory consumption with IBM Large Model Support. Memory management is now a really important topic in Machine Learning. Because of memory constraints, it is becoming quite common to train Deep Learning models using cloud tools such as Kaggle and Google Colab thanks to their free NVIDIA Graphical Processing Unit (GPU) … Read more Deep Learning Analysis Using Large Model Support
We already learned that in the case of Local Differential privacy we add noise to the input data points. So, each individual can add noise to there own data, thus eliminating the need to trust the data owner or curator at all. We can say that in this setting individuals privacy is most protected. Global … Read more Probabilistic tools for Privacy in Data Analysis
According to Gartner, companies struggle to operationalize machine learning models: “The Gartner Data Science Team Survey of January 2018 found that over 60% of models developed with the intention of operationalizing them were never actually operationalized.” Based on our experience working together with various clients, we believe that this inability to operationalize is partly due … Read more The machine learning lifecycle
ConvNet Playground is focused on the task of semantic image search using CNNs. Our (rather simple) approach is implemented in two stages (i.) we extract features from all images in our datasets using a pre-trained CNN (think VGG16, InceptionV3, etc. pre-trained on imageNet) (ii.) We compute similarity as a measure of the distance between these … Read more ConvNet Playground: An Interactive Visualization Tool for Exploring Convolutional Neural Networks
TL; Different approaches of pruning, DR: By pruning, a VGG-16 based classifier is made 3x faster and 4x smaller Deep Learning models these days require a significant amount of computing, memory, and power which becomes a bottleneck in the conditions where we need real-time inference or to run models on edge devices and browsers with … Read more Pruning Deep Neural Networks
Use NVIDIA TensorRT to optimize and speed up inference time on GPU. Illustration on an AI-based Computer Vision with YOLO. This article is organized as follows: Introduction What is NVIDIA TensorRT? Setup the Development Environment using docker Computer Vision Application: Object detection with YOLOv3 model References Conclusion This document presents how to use TensorRT to optimize a … Read more Have you Optimized your Deep Learning Model Before Deployment?
Most people aren’t trying to be biased, but bias is inherent — it influences how we view any situation, often unconsciously. When you think of bias, characteristics like race, gender, and religion likely come to mind. But there’s a much broader context of what bias can actually be. Bias comes in many forms. For example, … Read more AI Often Adds To Bias In Recruiting — But There’s A New Approach That Could Change The Game
When it comes to classification, using a decision tree classifier is one of the easiest to use. Incredibly easy to interpret It handles missing data & outliers very well and as such requires far less up front cleaning You get to forego the categorical variable encoding as decision trees handle categoricals well! Without diving into … Read more Learn Classification with Decision Trees in R
by Monte Zweben & Syed Mahmood of Splice Machine Apache Hadoop emerged on the IT scene in 2006 with the promise to provide organizations with the capability to store an unprecedented volume of data using commodity hardware. This promise not only addressed the size of the data sets but also the type of data, such … Read more What Happened to Hadoop? What Should You Do Now?
AI is transforming how we do business at an unprecedented pace, but the transition to becoming AI-driven is easier than you think. Now is the time to invest and remain at the top of your game. A few weeks ago Artificial Intelligence was thrown into the spotlight as the winners of this year’s Turing award, … Read more How to become an AI-driven company
Learning programming by interpreting numbers that matter to you. Curiosity is a universal human trait. Every single person asks questions. Every single person has interests. Every single person wants to know more about the way the world works — not necessarily for any personal gain, but just to know a bit more about the world … Read more The Fastest Way to Learn to Code? Be Invested in Your Numbers
Over the last few years, most of my spare time has been spent tinkering, learning, and researching machine learning, specifically reinforcement learning and digital actors. Recently I decided to participate in the Obstacle Tower Challenge. To my surprise, my early efforts briefly topped the table, and I placed 2nd in the first round as my … Read more I Placed 4th in my First AI Competition. Takeaways from the Unity Obstacle Tower Competition
Deep learning is becoming the standard way of assessing credit risk and soon will surpass human decision-making. According to Wikipedia, a bank is a “financial institution that accepts deposits from the public and creates credit” which means that one of the two main responsibilities of a bank is to lend money to commercial and corporate … Read more The Future of Lending Money Is Deep Learning
The RIGHT JOIN keyword Just like you’d expect, the RIGHT JOIN is similar to the LEFT JOIN. This join returns all of the rows of the table on the right side of the join and matching rows for the table on the left side of the join. And, for any rows where there are now … Read more SQL JOIN
In this article I will be discussing data-preprocessing techniques, which add on to my previous series of Tensorflow for beginners posts as an extension. The first stage to constructing an AI or machine learning model is to preprocess the data to ensure proper representation by the model. This stage is the most critical part which … Read more Model Tuning & Feature Engineering using XGBoost
Until now, we have have some training code that outputs a tested, robust ML model that we now want to somehow persist (or productionalize) and possibly deploy as a service. Further, let’s suppose that the model’s test performance meets our expectations and we conclude the research phase. For this demo, I’ll adopt the training procedure … Read more From Research to Production: Containerized Training Jobs
In this story, IDW-CNN, by Sun-Yat-sen University, The Chinese University of Hong Kong, SenseTime Group (Limited), is briefly reviewed. Segmentation accuracy is increased by learning from an Image Descriptions in the Wild (IDW) dataset. Unlike previous image captioning datasets, where captions were manually and densely annotated, images and their descriptions in IDW are automatically downloaded … Read more Review: IDW-CNN — Learning from Image Descriptions in the Wild Dataset Boosts the Accuracy…
Normally, we face data sets that are fairly linear or can be manipulated into one. But what if the data set that we are examining really should be looked at in a nonlinear way? Step into the world of nonlinear feature engineering. First, we’ll look at examples of nonlinear data. Next, we’ll briefly discuss the … Read more Machine Learning Pipelines: Nonlinear Model Stacking
The vulnerability Machine learning based classifiers are prone to Adversarial attacksThis means that visual machine learning classifiers that perceive a certain traffic sign (50 km/h, for instance), can not correctly deal with all images that are correctly interpreted by a human being. It is possible to intentionally create traffic sign images that will be understood … Read more Fooling real cars with Deep Learning
Photo by NASA on Unsplash Data visualization is not itself about insight, but rather, about communicating insight. Quantitative insights driven from churning huge amounts of data are often subtle, surprising, and technically complex. Given this it makes it more challenging to communicate these insights to any audience and especially to a business audience who might … Read more Importance of data visualization to derive actionable insights
Image segmentation is an important step in image processing, and it seems everywhere if we want to analyze what’s inside the image. For example, if we seek to find if there is a chair or person inside an indoor image, we may need image segmentation to separate objects and analyze each object individually to check … Read more Introduction to Image Segmentation with K-Means clustering
Apache Spark is explained as a ‘fast and general engine for large-scale data processing.’ However, that doesn’t even begin to encapsulate the reason it has become such a prominent player in the big data space. Apache Spark is a distributed computing platform, and its adoption by big data companies has been on the rise at … Read more Getting Started with Apache Spark
One elementary concept in the actuarial profession, tested on the FM (Financial Mathematics) Exam, is evaluating bond prices by discounting bond coupons as well as the redemption value back to the date of issue of the bond. When buying a bond, the investor is essentially giving the government or company a loan, which the government … Read more Actuarial Science and Data Science with Lifelib
What architecture is this? 🤔 A compiled visualisation of the common convolutional neural networks (TL;DR — jump to the illustrations here) How have you been keeping up with the different convolutional neural networks (CNNs)? In recent years, we have witnessed the birth of numerous CNNs. These networks have only been getting unforgivingly deeper that it … Read more Illustrated: 10 CNN Architectures
Now that know about the different kinds of connections, it may seem counter-intuitive to say that weak ties are the most important. After all, wouldn’t the people who you have the strongest connection with be the most willing to help you? Real-world graphs In order to understand the power of weak ties, we must look … Read more The Power of Weak Ties
In this article I will introduce recent developments in this field. Artificial Intelligence has been seeking a lot of attention as it tries to replicate human intelligence for analyzing complex data around us. The two major subsets of AI: machine learning and deep learning has created a lot of excitement in the research community for … Read more AI in Bioinformatics
How you feel when running a single model on 200 GPUs Pytorch Lightning has all of this already coded for you, including tests to guarantee that there are no bugs in that part of the program. This means you can focus on the core of your research and not worry about all the tedious engineering … Read more Supercharge Your AI Research With Pytorch Lightning