stringfix : adding transcoder shiny app

Adding quotes to a character list Often I have to take a character list or column and put it in a vector, which means before I have to add quotes. It takes times. For me and my colleagues I have created the transcoder shiny app. The main goal is to facilitate formatting list of strings … Read more

Categories R Tags ExcerptFavorite

Manipulating strings with the {stringr} package

{stringr} contains functions to manipulate strings. In Chapter 10, I will teach you about regularexpressions, but the functions contained in {stringr} allow you to already do a lot of work onstrings, without needing to be a regular expression expert. I will discuss the most common string operations: detecting, locating, matching, searching andreplacing, and exctracting/removing strings. … Read more

Categories R Tags ExcerptFavorite

Quick Hit: Speeding Up a Slow/Mundane Task with a Little Rcpp

Over at $DAYJOB’s blog I’ve queued up a post that shows how to use our new opendata package to work with our Open Data portal’s API. I’m not super-sure when it’s going to be posted so keep an RSS reader fixed on if you’re interested in seeing it (I may make a small note … Read more

Categories R Tags ExcerptFavorite

The power of Brain-Computer interface: use your brain to play your video game

Here we show one application of machine learning and signal processing in Neuroscience; translating thoughts into actions with our Game-Based Brain-Computer interface (BCI). Video-Game based BCI. The gaming part increases user engagement and makes it easier to acquire the new skill of controlling the BCI device. BCI enables direct control of brain activity over external … Read more

What A.I. Isn’t

(¹) A lazy leap into the future It isn’t intuitive, creative, inspired, generalized, or conscious. Will it ever be like us? Will it ever think like us? As I study data science I learn a little more about artificial intelligence each day. I practice wielding the tools in my machine learning tool box, and I read … Read more

PySpark in Google Colab

Creating a simple linear regression model with PySpark in Colab Photo by Ashim D’Silva on Unsplash With broadening sources of the data pool, the topic of Big Data has received an increasing amount of attention in the past few years. Besides dealing with the gigantic data of all kinds and shapes, the target turnaround time of the … Read more

Why do Data Visualizations Fail?

Easy … Charts masquerading as useful insights Photo by Braydon Anderson on Unsplash When a data visualization fails, there can be multiple reasons. The most common reason is that the author didn’t understand the message. Thus the meaning in the data is unclear or even hidden. The author didn’t consider what question the audience was asking. Consider the … Read more

Interpreting Data through Visualization with Python Matplotlib

What did the IBM Data Visualization Course Teach Me ? Matplotlib even though is aging, still remains as one of the most vital tools for data visualization, and this post is about using matplotlib effectively, to gain knowledge from a data-set. The IBM data science professional certificate program, which I have started taking around a month … Read more

Breaking neural networks with adversarial attacks

Are the machine learning models we use intrinsically flawed? As many of you may know, Deep Neural Networks are highly expressive machine learning networks that have been around for many decades. In 2012, with gains in computing power and improved tooling, a family of these machine learning models called ConvNets started achieving state of the … Read more

Time Series in Python — Exponential Smoothing and ARIMA processes

ARIMA ARIMA models (which include ARMA, AR and MA models) are a general class of models to forecast stationary time series. ARIMA models are made of three parts: A weighted sum of lagged values of the series (Auto-regressive (AR) part) A weighted sum of lagged forecasted errors of the series (Moving-average (MA) part) A difference … Read more

Security & Privacy in Artificial Intelligence & Machine Learning — Part-6: Up close with Privacy

Note: This is part-6 of a series of articles on ‘Security and Privacy in Artificial Intelligence & Machine Learning’. Here are the links to all articles (so far): Photo by Jason Blackeye on Unsplash In the previous article of the series, we looked at the nature and extent of damage that attackers can inflict if they, … Read more

Hyperparameters in Deep Learning

1. Optimizer Hyperparameters They are related more to the optimization and training process 1.1 Learning rate: The single most important hyperparameter and one should always make sure that has been tuned — Yoshua Bengio Good starting point = 0.01 If our learning rate is too small than optimal value then it would take a much longer time (hundreds … Read more

Why The World Needs Trustworthy Chatbots

Trust is such a human trait; to advance we need to learn how to trust bots Photo by Jehyun Sung on Unsplash The notion of trust underpins so much of society, whether we realise it or not. In modern times, trust is driving the success of new decentralised business models. Trust expert, Rachel Botsman, describes how businesses … Read more

Where the German Companies Are

Last week, the German NGO Open Knowledge Foundation Deutschland e.V. has made German Trade Resister data available via the project, together with the British NGO opencorporates. While the data from German Trade Resister is publicly available in principle, retrieving the data is a case-by-case activity and is very cumbersome (try for yourself if you … Read more

Categories R Tags ExcerptFavorite

Succeeding as a data scientist in small companies/startups

It’s nothing like at a big mature company. This’ll probably be an unbounded series of posts that spawned from this question that came across the awesome community that is the data-nerd twitter cluster: Some Background I’ve spent almost 12 years now at companies sized between 15–150 wearing various hats of “data analyst, engineer, and occasionally, … Read more

Can you Solve TED’s Frog Riddle? Can TED?

Using Bayes’ Rule to Solve a Controversial Problem As part of its riddle series TED-Ed, the youth and education initiative of TED, released a video called “Can you solve the frog riddle?” The video presents a riddle about conditional probability and solves it in a simple way. But, is the solution correct? Critics argue that … Read more

Set Theory — Functions

Today we’re going to expand on functions within the world of set theory. Similar to previous concepts introduced, the nomenclature for standard functions within sets is slightly different than other branches of math, & therefore requires reviewing. There are quite a few terms to introduce, so let’s jump right in! This first table of function … Read more

Word Level English to Marathi Neural Machine Translation using Seq2Seq Encoder-Decoder LSTM Model

A Guide to build Sequence to sequence models using LSTM Table of Contents Introduction Prerequisites Encoder — Decoder Architecture Encoder LSTM Decoder LSTM — Training mode Decoder LSTM — Inference mode Code Walk through Results and Evaluation Future Work End Notes References 1. Introduction Recurrent Neural Networks (or more precisely LSTM/GRU) have been found to be very effective in solving complex sequence related … Read more

Natural Language Processing Using Stanford’s CoreNLP

Introduction Analyzing text data using Stanford’s CoreNLP makes text data analysis easy and efficient. With just a few lines of code, CoreNLP allows for the extraction of all kinds of text properties, such as named-entity recognition or part-of-speech tagging. CoreNLP is written in Java and requires Java to be installed on your device but offers … Read more

Predictive Modeling: Picking the best model

Testing out different types of models on the same data Whether you are working on predicting data in an office setting or just competing in a Kaggle competition, it’s important to test out different models to find the best fit for the data you are working with. I recently had the opportunity to compete with some … Read more

Policy Based Reinforcement Learning, the Easy Way

Step by step approach to understanding Policy Based methods in Reinforcement Learning Photo by Jomar on Unsplash Introduction Suppose you are in a new town and you have no map nor GPS, and you need to reach downtown. You can try assess your current position relative to your destination, as well the effectiveness (value) of each … Read more

Deep Learning & Handwritten Arabic Digits

Using the library to classify the AHCD at 99% accuracy! photo: Morocco, 2000 The ‘hello world’ of deep learning is often the MNIST handwritten number dataset, and I wanted to apply the same techniques to a more interesting application: the Arabic Handwritten Characters Dataset (AHCD), a dataset developed by the American University in Cairo.¹ In … Read more

K-Means Clustering

Data set and Code As I mentioned before, we are going to be using text data and in particular, we will be taking a look at the Enron email data set which is available on Kaggle. For those of you that don’t know the story/scandal surrounding Enron, I would suggest checking out the smartest guys in … Read more

AnzoGraph: A W3C Standards-Based Graph Database

Introduction In this interview, I’m catching up with Barry Zane, Vice President at Cambridge Semantics. Barry is creator of AnzoGraph™, a native, massively parallel processing (MPP) distributed graph database. Barry has had quite a journey in database world. He served as Vice President of Technology of Netezza Corporation from 2000 to 2005, and was responsible … Read more

Real Net Profit: 150% in just 4 Months

Developing a post-commission profitable currency trading model using Pivot Billions and R. Needle, meet haystack. Searching for the right combination of features to make a consistent trading model can be quite difficult and takes many, many iterations. By incorporating Pivot Billions and R into my research process, I was able to dramatically improve the efficiency … Read more

Categories R Tags ExcerptFavorite

Benchmarking cast in R from long data frame to wide matrix

In my daily work I often have to transform a long table to a wide matrix so accommodate some function. At some stage in my life I came across the reshape2 package, and I have been with that philosophy ever since – I find it makes data wrangling easy and straight forward. I particularly like … Read more

Categories R Tags ExcerptFavorite

Deploying an R Shiny App With Docker

If you haven’t heard of Docker, it is a system that allows projects to be split into discrete units (i.e. containers) that each operate within their own virtual environment. Each container has a blueprint written in its Dockerfile that describes all of the operating parameters including operating system and package dependencies/requirements. Docker images are easily … Read more

Categories R Tags ExcerptFavorite

Superhuman “cell-sight” with Deep Learning

Using “in silico labeling” to predict fluorescent labels in unlabeled images and cell morphology, components, and structures. An analysis of the paper In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images published in Cell. Fluorescently tagged neuronal cell culture. Source Take a look at this image, and tell me what you see. Figure 1. Source: Finkbeiner … Read more

NSERC – Discovery Grants Program, over the past 5 years

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 library(XML) library(stringr) url=”” download.file(url,destfile = “GSC.html”) library(XML) tables=readHTMLTable(“GSC.html”) GSC=tables[[1]]$V1 GSC=as.character(GSC[-(1:2)]) namesGSC=tables[[1]]$V2 namesGSC=as.character(namesGSC[-(1:2)]) Correction = function(x) as.numeric(gsub(‘[$,]’, ”, x)) YEAR=2013:2018 for(i in 1:length(YEAR)){ … Read more

Categories R Tags ExcerptFavorite

Launching codecentric.AI Bootcamp course!

Today, I am happy to announce the launch of our codecentric.AI Bootcamp! This bootcamp is a free online course for everyone who wants to learn hands-on machine learning and AI techniques, from basic algorithms to deep learning, computer vision and NLP. However, the course language is German only, but for every chapter I did, you … Read more

Categories R Tags ExcerptFavorite

Liverpool is the Most Popular City in the World (relative to use as password per inhabitant)

The API of is quite remarkable. It not only allows you to fetch the results generally obtained by typing in your e-mail into the browser interface and finding out whether or not you’ve been pwned from the comfort of your shell. It further allows you to very simply check whether a certain password has … Read more

Categories R Tags ExcerptFavorite

Artificial Intelligence and Business Value

Digital technologies are pervasive. Nearly 5 billion people in the world now have a mobile phone connection and more than 7 billion mobile phones are in use (some people have more than one phone). Approximately 2.5 billion of the phones are smartphones. Cell phone penetration is now approaching that of electricity — about 88% of the world’s … Read more

Introducing olsrr

I am pleased to announce the olsrr package, a set of tools for improvedoutput from linear regression models, designed keeping in mindbeginner/intermediate R users. The package includes: comprehensive regression output variable selection procedures heteroskedasticiy, collinearity diagnostics and measures of influence various plots and underlying data If you know how to build models using lm(), you … Read more

Categories R Tags ExcerptFavorite

“Correlation is not causation”. So what is?

Machine learning applications have been growing in volume and scope rapidly over the last few years. What’s Causal inference, how is it different than plain good ole’ ML and when should you consider using it? In this report I try giving a short and concrete answer by using an example. Imagine we’re tasked by the … Read more

Categories R Tags ExcerptFavorite

NLP Learning Series: Part 2 — Conventional Methods for Text Classification

NLP Learning Series (Part 2) Teaching Machines to Learn Text This is the second post of the NLP Text classification series. To give you a recap, recently I started up with an NLP text classification competition on Kaggle called Quora Question insincerity challenge. And I thought to share the knowledge via a series of blog posts on … Read more

Review: YOLOv3 — You Only Look Once (Object Detection)

Improved YOLOv2, Comparable Performance with RetinaNet, 3.8× Faster! YOLOv3 In this story, YOLOv3 (You Only Look Once v3), by University of Washington, is reviewed. YOLO is a very famous object detector. I think everybody must know it. Below is the demo by authors: YOLOv3 As author was busy on Twitter and GAN, and also helped out … Read more

Data Science with Optimus. Part 1: Intro.

Breaking down data science with Python, Spark and Optimus. Don’t worry if you don’t know what these logos are, I’ll explain them in next articles 🙂 Data science has reached new levels of complexity and of course awesomeness. I’ve been doing this for years now, I’m what I want for people is to have a clear and … Read more

Web scraping with Python — A to copy Z

Handling BeautifulSoup, avoiding blocks, enriching with API, storing in a DB and visualizing the data Photo by michael podger on Unsplash Introduction What is web scraping and when would you want to use it? The act of going through web pages and extracting selected text or images. An excellent tool for getting new data or enriching your … Read more

Naive Bayes: Intuition and Implementation

Introduction: What Are Naive Bayes Models? In a broad sense, Naive Bayes models are a special kind of classification machine learning algorithms. They are based on a statistical classification technique called ‘Bayes Theorem’. Naive Bayes model are called ‘naive’ algorithms becaused they make an assumption that the predictor variables are independent from each other. In other … Read more

Create data visualizations like BBC News with the BBC’s R Cookbook

If you’re looking a guide to making publication-ready data visualizations in R, check out the BBC Visual and Data Journalism cookbook for R graphics. Announced in a BBC blog post this week, it provides scripts for making line charts, bar charts, and other visualizations like those below used in the BBC’s data journalism.  The cookbook … Read more

Categories R Tags ExcerptFavorite

Clustered Globe

Setting Constraints & Variables First, we’re gonna set the boundaries of what detail we are going to cluster. At this stage I want to keep countries separated and only cluster activities within a single country. Therefore, by the nature of clustering, small countries will probably become a single cluster. And although there could be cross-border … Read more


I am stuck at home sick today, so I decided to provide a relational analysis of the Stats Package Wars that have been bubbling away for the past week. True in all its details. If you want something slightly more constructive, consider The Plain Person’s Guide to Plain-Text Social Science. Related To leave a comment … Read more

Categories R Tags ExcerptFavorite

Using NLP to build a search & discovery app for Regulators

Regulations need to be updated constantly in this era of rapid socio-economic and technological change. Regulators spend a substantial amount of time assessing the current stock of Acts to identify inconsistent use of language or markers that don’t support innovation and create a burden for businesses. Given the large number of Acts and their complex … Read more