By Salim Roukos, IBM Fellow Finding information in a company’s vast trove of documents and knowledge bases to answer users’ questions is never as easy as it should be. The answers may very well exist, but they often remain out of reach for a number of reasons. For starters, unlike the Web, where information is … Read more Advancing Natural Language Processing (NLP) for Enterprise Domains
A guide on how to get theoretically sound explanations from complex deep learning models trained on multivariate time series In this article, we’ll explore a state-of-the-art method of machine learning interpretability and adapt it to multivariate time series data, a use case which it wasn’t previously prepared to work on. You’ll find explanations to core … Read more Interpreting recurrent neural networks on multivariate time series
During a recent NLP project, I came across an article where word clouds were created in the shape of US Presidents using words from their inauguration speeches. Whilst I had used word clouds to visualise the most frequent words in a document, I’d not considered using this with a mask to represent the topic or … Read more Creating word clouds with python
This tutorial goes into extreme detail about how decision trees work. Decision trees are a popular supervised learning method for a variety of reasons. Benefits of decision trees include that they can be used for both regression and classification, they are easy to interpret and they don’t require feature scaling. They have several flaws including … Read more Understanding Decision Trees for Classification (Python)
Last month I took part in a debate on the ‘future of technology in retail and whether technology could replace humans….’ Subsequently, I’ve pulled together some thoughts for Inside Retail (also published here). For the past decade, advancements in technology have been disrupting what seems like most facets of the human experience. And with what … Read more Humans vs. machines. What does it mean in retail?
This post in in a direct continuation of my previous post about Text Preprocessing. This is a practical implementation of some important text preprocessing steps used before they are fed to a machine learning model. Instead of using conventional preprocessing and learning methods through coding scripts, I’ve used a tool called Knime. The dataset is … Read more Sentiment Analysis on raw text using Amazon, IMDB, and Yelp!
The Subjective interpretation takes a different approach from the previous two. Instead of concerning itself with frequencies or counts, the Subjective approach posits that probabilities stem from a person’s personal (subjective) degree of belief that particular event will occur, based on all relevant information available to them. This perspective aligns well with Bayesian statistics, which … Read more What is Probability?
In the first two parts of our overview of the How of XAI, we looked into pre-modelling explainability and explainable modelling methodologies, which focus on explainability at the dataset stage and during model development. Yet these are relatively minor areas of interest compared with explainability after the fact, and post-modelling explainability is where the majority … Read more The How of Explainable AI: Post-modelling Explainability
In the first part of our overview of the How of Explainable AI, we looked a pre-modelling explainability. However, the true scope of explainability is much broader. Explainability can be considered at all stages of AI development, namely, pre-modelling, model development, and post-modelling. The majority of AI explainability literature aims at explaining a black-box model … Read more The How of Explainable AI: Explainable Modelling
AI explainability is a broad and multi-disciplinary domain, being studied in several fields including machine learning, knowledge representation and reasoning, human-computer interaction, and the social sciences. Accordingly, XAI literature includes a large and growing number of methodologies. There are many factors that could contribute to how an AI model operates and makes its predictions, and … Read more The How of Explainable AI: Pre-modelling Explainability
The Pale Blue Dot “From this distant vantage point, the Earth might not seem of any particular interest. But for us, it’s different. Consider again that dot. That’s here, that’s home, that’s us. On it everyone you love, everyone you know, everyone you ever heard of, every human being who ever was, lived out their … Read more Getting started with Tensorflow, Keras in Python and R
You can’t always change a human’s input to see the output. At Fiddler Labs, we place great emphasis on model explanations being faithful to the model’s behavior. Ideally, feature importance explanations should surface and appropriately quantify all and only those factors that are causally responsible for the prediction. This is especially important if we want … Read more Causality in model explanations and in the real world
Understand principal component analysis (PCA) and clustering methods, and implement each algorithm in two mini projects Unsupervised learning is a set of statistical tools for scenarios in which there is only a set of features and no targets. Therefore, we cannot make predictions, since there are no associated responses to each observation. Instead, we are … Read more The Complete Guide to Unsupervised Learning
In a previous article introducing Recommendation Systems, we mentioned several times the concept of ‘similarity measures’. Why? Because in Recommendation Systems, both Content-Based filtering and Collaborative filtering algorithms, use some specific similarity measure to find how equal two vectors of users or items are in between them. So in the end, a similarity measure is … Read more Similarity measures in Recommendation Systems
Interpreting tweets containing the hashtag #RickyRenunció using spaCy, Google Cloud, and NLP. The island of Puerto Rico and its people are currently making history. On July 13, the Puerto Rico’s Center for Investigative Journalism published a document consisting of 889 pages of Telegram messages interchanged between the governor, Ricardo Roselló, and inner members of his … Read more What did Puerto Rico say after its governor resigned? A Twitter data analysis
A central problem in machine learning is to learn a complicated probability distribution p(x) with only a limited set of high-dimensional data points x drawn from this distribution. For example, to learn the probability distribution over images of cats we need to define a distribution which can model complex correlations between all pixels which form … Read more Deep Latent Variable Models: Unravel Hidden Structures
Background In May 2013, an extremely viral mobile game called Flappy Bird was released. The premise of the game was extremely simple — a bird was flapping its wings to fly through a platform surrounded by pipes with gaps. Touch any of the pipes you die, flap through the gaps between the pipes, and you … Read more Flappy Royale— Improving Results via 1000 Rounds of Practice
Theory, Implementation, and Visualization Support Vector Machine (SVM) is probably one of the most popular ML algorithms used by data scientists. SVM is powerful, easy to explain, and generalizes well in many cases. In this article, I’ll explain the rationales behind SVM and show the implementation in Python. For simplicity, I’ll focus on binary classification … Read more Support Vector Machine Explained
If you’re anything like me, you probably set a lot of goals. Whether it’s to finish a paper by the end of the summer or to spend more time with friends and family, goals are what help motivate us to do something. Goals are also intimately tied to our feelings. You may have had the … Read more Modeling Motivation and Emotion using Feedback Loops
In this post, we seek to develop an intuitive sense of what type I (false-positive) and type II (false-negative) errors represent when comparing metrics in A/B tests, in order to gain an appreciation for “peeking”, one of the major problems plaguing the analysis of A/B test today. To better understand what “peeking” is, it helps … Read more Validating Type I and II Errors in A/B Tests in R
The mlr-org team is very proud to present the initial release of the mlr3 machine-learning framework for R. mlr3 comes with a clean object-oriented-design using the R6 class system.With this, it overcomes the limitations of R’s S3 classes.It is a rewrite of the well-known mlr package which provides a convenient way of accessing many algorithms … Read more mlr3-0.1.0
The effect of covariates on correlations in psychometric networks is assessed with either model-based recursive partitioning (MOB) or conditional inference trees (CTree). Citation Jones PJ, Mair P, Simon T, Zeileis A (2019). “Network Model Trees”, OSF ha4cw, OSF Preprints. doi:10.31219/osf.io/ha4cw Abstract In many areas of psychology, correlation-based network approaches (i.e., psychometric networks) have become a … Read more Network model trees
customer segmentation Exploring methods for cluster analysis, visualizing clusters through dimensionality reduction and interpreting clusters through exploring impactful features. Although we have seen a large influx of supervised machine learning techniques being used in organizations these methods suffer from, typically, one large issue; a need for labeled data. Fortunately, many unsupervised methods exist for clustering … Read more Cluster Analysis: Create, Visualize and Interpret Customer Segments
Trolls and bots have a huge and often unrecognized influence on social media. They are used to influence conversations for commercial or political reasons. They allow small hidden groups of people to promote information supporting their agenda and a large scale. They can push their content to the top of people’s news feeds, search results, … Read more Trolls and bots are disrupting social media — here’s how AI can stop them (Part 1)
Trolls and bots are widespread across social media, and they influence us in ways we are not always aware of. Trolls can be relatively harmless, just trying to entertain themselves at others’ expense, but they can also be political actors sowing mistrust or discord. While some bots offer helpful information, others can be used to … Read more Identifying trolls and bots on Reddit with machine learning (Part 2)
Wanna be a singer? AI will compose you an album. In the 2004 movie iRobot, Will Smith asks a robot the rhetorical questions: – Can a robot write a symphony? Can a robot turn a… canvas into a beautiful masterpiece? – CAN YOU? And really, how many of us can succeed in composing something like … Read more Artificial Intelligence and Music: What to Expect?
Optimize your Deep Learning model memory consumption with IBM Large Model Support. Memory management is now a really important topic in Machine Learning. Because of memory constraints, it is becoming quite common to train Deep Learning models using cloud tools such as Kaggle and Google Colab thanks to their free NVIDIA Graphical Processing Unit (GPU) … Read more Deep Learning Analysis Using Large Model Support
We already learned that in the case of Local Differential privacy we add noise to the input data points. So, each individual can add noise to there own data, thus eliminating the need to trust the data owner or curator at all. We can say that in this setting individuals privacy is most protected. Global … Read more Probabilistic tools for Privacy in Data Analysis
According to Gartner, companies struggle to operationalize machine learning models: “The Gartner Data Science Team Survey of January 2018 found that over 60% of models developed with the intention of operationalizing them were never actually operationalized.” Based on our experience working together with various clients, we believe that this inability to operationalize is partly due … Read more The machine learning lifecycle
ConvNet Playground is focused on the task of semantic image search using CNNs. Our (rather simple) approach is implemented in two stages (i.) we extract features from all images in our datasets using a pre-trained CNN (think VGG16, InceptionV3, etc. pre-trained on imageNet) (ii.) We compute similarity as a measure of the distance between these … Read more ConvNet Playground: An Interactive Visualization Tool for Exploring Convolutional Neural Networks
Related To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, … Read more Microsoft ML Server 9.4 now available
TL; Different approaches of pruning, DR: By pruning, a VGG-16 based classifier is made 3x faster and 4x smaller Deep Learning models these days require a significant amount of computing, memory, and power which becomes a bottleneck in the conditions where we need real-time inference or to run models on edge devices and browsers with … Read more Pruning Deep Neural Networks
This is a reblog from the “Announcing Dash for R” announcement originally published July 10. Dash, the fastest growing framework for building analytic web applications on top of Python models, is now available for the R programming language. Installation | Documentation | GitHub | Gallery Dash was released in 2017 as the latest evolution in Plotly’s open-source analytics tools. At the time, Plotly was … Read more Dash has gone full R
Use NVIDIA TensorRT to optimize and speed up inference time on GPU. Illustration on an AI-based Computer Vision with YOLO. This article is organized as follows: Introduction What is NVIDIA TensorRT? Setup the Development Environment using docker Computer Vision Application: Object detection with YOLOv3 model References Conclusion This document presents how to use TensorRT to optimize a … Read more Have you Optimized your Deep Learning Model Before Deployment?
Most people aren’t trying to be biased, but bias is inherent — it influences how we view any situation, often unconsciously. When you think of bias, characteristics like race, gender, and religion likely come to mind. But there’s a much broader context of what bias can actually be. Bias comes in many forms. For example, … Read more AI Often Adds To Bias In Recruiting — But There’s A New Approach That Could Change The Game
This article is also available in PDF form. A while back someone posted on Reddit about the grading policies of their academic department. Specifically, the department chair made a statement claiming that grades should be Normally distributed with a C average. I responded, claiming that no statistician would ever take the idea that grades follow … Read more Grades Aren’t Normal
I was recently asked if Win-Vector LLC would move the R wrapr package from a GPL-3 license to an LGPL license. In the end I decided to move wrapr distribution to a “GPL-2 | GPL-3” license. This means the package is now available under both GPL-2 and GPL-3 licensing, allowing the user to pick which … Read more Some Notes on GNU Licenses in R Packages
When it comes to classification, using a decision tree classifier is one of the easiest to use. Incredibly easy to interpret It handles missing data & outliers very well and as such requires far less up front cleaning You get to forego the categorical variable encoding as decision trees handle categoricals well! Without diving into … Read more Learn Classification with Decision Trees in R
R fans, you have just one more day to get your hands on discounted EARL London 2019 tickets. Our early bird offer gets you £100 off the full price ticket, so it makes persuading your boss easier! Visit the EARL website for more details and see 2018’s highlights below: [embedded content] Related R-bloggers.com offers daily … Read more EARLy bird ticket offer ends tomorrow!
by Monte Zweben & Syed Mahmood of Splice Machine Apache Hadoop emerged on the IT scene in 2006 with the promise to provide organizations with the capability to store an unprecedented volume of data using commodity hardware. This promise not only addressed the size of the data sets but also the type of data, such … Read more What Happened to Hadoop? What Should You Do Now?
AI is transforming how we do business at an unprecedented pace, but the transition to becoming AI-driven is easier than you think. Now is the time to invest and remain at the top of your game. A few weeks ago Artificial Intelligence was thrown into the spotlight as the winners of this year’s Turing award, … Read more How to become an AI-driven company
Learning programming by interpreting numbers that matter to you. Curiosity is a universal human trait. Every single person asks questions. Every single person has interests. Every single person wants to know more about the way the world works — not necessarily for any personal gain, but just to know a bit more about the world … Read more The Fastest Way to Learn to Code? Be Invested in Your Numbers
Conferences like userR & EARL are the R events to attend every year and personally, and as a company, I can’t imagine skipping one. It’s an important place to be if you want to be up-to-date with the R technology and build up your presence in the community. Our team have given rave reviews after … Read more useR!2019 Toulouse recap
Over the last few years, most of my spare time has been spent tinkering, learning, and researching machine learning, specifically reinforcement learning and digital actors. Recently I decided to participate in the Obstacle Tower Challenge. To my surprise, my early efforts briefly topped the table, and I placed 2nd in the first round as my … Read more I Placed 4th in my First AI Competition. Takeaways from the Unity Obstacle Tower Competition
Deep learning is becoming the standard way of assessing credit risk and soon will surpass human decision-making. According to Wikipedia, a bank is a “financial institution that accepts deposits from the public and creates credit” which means that one of the two main responsibilities of a bank is to lend money to commercial and corporate … Read more The Future of Lending Money Is Deep Learning
The RIGHT JOIN keyword Just like you’d expect, the RIGHT JOIN is similar to the LEFT JOIN. This join returns all of the rows of the table on the right side of the join and matching rows for the table on the left side of the join. And, for any rows where there are now … Read more SQL JOIN
In this article I will be discussing data-preprocessing techniques, which add on to my previous series of Tensorflow for beginners posts as an extension. The first stage to constructing an AI or machine learning model is to preprocess the data to ensure proper representation by the model. This stage is the most critical part which … Read more Model Tuning & Feature Engineering using XGBoost
Until now, we have have some training code that outputs a tested, robust ML model that we now want to somehow persist (or productionalize) and possibly deploy as a service. Further, let’s suppose that the model’s test performance meets our expectations and we conclude the research phase. For this demo, I’ll adopt the training procedure … Read more From Research to Production: Containerized Training Jobs
In this story, IDW-CNN, by Sun-Yat-sen University, The Chinese University of Hong Kong, SenseTime Group (Limited), is briefly reviewed. Segmentation accuracy is increased by learning from an Image Descriptions in the Wild (IDW) dataset. Unlike previous image captioning datasets, where captions were manually and densely annotated, images and their descriptions in IDW are automatically downloaded … Read more Review: IDW-CNN — Learning from Image Descriptions in the Wild Dataset Boosts the Accuracy…
It was November last year when I seriously started blogging and it is time to share with you some experiences and highlights before the summer break… so read on! The first thing that really surprised me (and still surprises me) is the popularity of my blog – and I say this without false modesty: when … Read more Summer Break: A Look back… and ahead