Working Towards an Advanced, Inclusive, andSocially Responsible AI Framework

Three Lessons from O’Reilly’s AI Conference Roger Chen (left), CEO of Computable, and Ben Lorica (right), Chief Data Scientist at O’Reilly Media, introducing keynote speakers Becoming a DataKind volunteer has awarded me a unique opportunity to bring together my interests in data science and social impact. For this reason, I was thrilled to be able to … Read moreWorking Towards an Advanced, Inclusive, andSocially Responsible AI Framework

Two thoughts on the question “Are times series models considered part of Machine Learning or…

This is a very brief note that I decided to write after yet again coming across some form of the question “Is time series analysis part of machine learning?/Is time series analysis considered supervised learning?” in a discussion forum. This question is obviously a very broad one, and is to some extent subjective. I got … Read moreTwo thoughts on the question “Are times series models considered part of Machine Learning or…

Plotting business locations on maps using multiple Plotting libraries in Python

I was browsing through Kaggle and came across a dataset which included locations in latitudes and longitudes. I haven’t worked with plotting on maps so I decided to take this dataset and explore various options available to work through them. This is a basic guide about what I did and the inferences I drew about … Read morePlotting business locations on maps using multiple Plotting libraries in Python

Neural Style Transfer

As a transition from stringing together premade Tensorflow code and algorithms, I began my journey to becoming a proficient Tensorflow coder by attempting my first implementation of an algorithm described in a paper. This project was my way of incorporating some deep learning into my school work. The original project was an image manipulation project … Read moreNeural Style Transfer

What’s new in TensorFlow 2.0?

The machine learning library TensorFlow has had a long history of releases starting from the initial open-source release from the Google Brain team back in November 2015. Initially developed internally under the name DistBelief, TensorFlow quickly rose to become the most widely used machine learning library today. And not without reason. Number of repository stars … Read moreWhat’s new in TensorFlow 2.0?

The relationship between Biological and Artificial Intelligence

Intelligence can be defined as a predominantly human ability to accomplish tasks that are generally hard for computers and animals. Artificial Intelligence [AI] is a field attempting to accomplish such tasks with computers. AI is becoming increasingly widespread, as are claims of its relationship with Biological Intelligence. Often these claims are made to imply higher … Read moreThe relationship between Biological and Artificial Intelligence

Building a Shiny App as a Package

Shiny App as a Package In a previous post, I’ve introduced the {golem} package, which is an opinionated framework for building production-ready Shiny Applications. This framework starts by creating a package skeleton waiting to be filled. But, in a world where Shiny Applications are mostly created as a series of files, why bother with a … Read moreBuilding a Shiny App as a Package

Machine Learning explained for statistics folk

I’m running a one-day workshop called “From Statistics To Machine Learning” in central London on 28 October, for anyone who learnt some statistics and wants to find out about machine learning methods. I guess you might feel frustrated. There’s a lot of interest and investment in machine learning, but it’s hard to know where to … Read moreMachine Learning explained for statistics folk

How Machine Learning Made Me Fall in Love with the WNBA

Photo: Angel McCoughtry of the Atlanta Dream playing against the Minnesota Lynx. Using k-means clustering to fulfill my fantasy of building a women’s sports dream team. I had one of those daydreams that come to you from out of nowhere. Before my eyes fell the image of an all-star women’s sports team. The greatest female players … Read moreHow Machine Learning Made Me Fall in Love with the WNBA

I Have a Lot of Data, I Just Don’t Know Where!

If you can say this about your company OR NOT, you should read this article. Introduction If you are reading this it’s probable that you are doing something with your data or want to do something with it. But, it’s not that easy right? Most data centered projects nowadays (in reality) are complex, expensive, require organizational … Read moreI Have a Lot of Data, I Just Don’t Know Where!

Deep learning in Space

Artificial intelligence is everywhere. Home appliances, automotive, entertainment systems, you name it, they are all packing AI capabilities. The space industry is no exception. In the past few months I have been working on a machine learning application that assists satellite docking from a simple camera video feed. If you want to know how deep … Read moreDeep learning in Space

Spatial data and maps conference – FOSS4G

I’m helping organise a conference on (geo)spatial open source software – FOSS4G. We’re hosting it in the great city of Edinburgh, Scotland in September 2019. Abstract submissions: https://uk.osgeo.org/foss4guk2019/talks_workshops.html We’re very interested in hearing your tales of R, QGIS, Python, GRASS, PostGIS, etc.! Related To leave a comment for the author, please follow the link and … Read moreSpatial data and maps conference – FOSS4G

The Rich didn’t earn their Wealth, they just got Lucky

Tomorrow, on the First of May, many countries celebrate the so called International Workers’ Day (or Labour Day): time to talk about the unequal distribution of wealth again! A few months ago I posted a piece with the title “If wealth had anything to do with intelligence…” where I argued that ability, e.g. intelligence, as … Read moreThe Rich didn’t earn their Wealth, they just got Lucky

Google Next 2019 – observations from the Mango viewing deck

The first thing that hit me about this year’s Google Next was the scale. Thousands of people wandering around Moscone Centre in San Fran with their name badges proudly displayed really brought home that this is a platform of which its users are proud. I was told by several people that the size of the … Read moreGoogle Next 2019 – observations from the Mango viewing deck

Zooming in on maps with sf and ggplot2

When working with geo-spatial data in R, I usually use the sf package for manipulating spatial data as Simple Features objects and ggplot2 with geom_sf for visualizing these data. One thing that comes up regularly is “zooming in” on a certain region of interest, i.e. displaying a certain map detail. There are several ways to … Read moreZooming in on maps with sf and ggplot2

RcppArmadillo 0.9.400.2.0

A new RcppArmadillo release based on the very recent Armadillo upstream release arrived on CRAN earlier today, and will get to Debian shortly. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo … Read moreRcppArmadillo 0.9.400.2.0

Compute R2s and other performance indices for all your models!

Indices of model performance (i.e., model quality, goodness of fit, predictive accuracy etc.) are very important, both for model comparison and model description purposes. However, their computation or extraction for a wide variety of models can be complex. To address this, please let us introduce the performance package! performance We have recently decided to collaborate … Read moreCompute R2s and other performance indices for all your models!

Dealing with correlation in designed field experiments: part I

When we have recorded two traits in different subjects, we can be interested in describing their joint variability, by using the Pearson’s correlation coefficient. That’s ok, altough we have to respect some basic assumptions (e.g. linearity) that have been detailed elsewhere (see here). Problems may arise when we need to test the hypothesis that the correlation … Read moreDealing with correlation in designed field experiments: part I

ViennaR Meetup March – Full Talks Online

The full talks of the ViennaR March Meetup are finally online: A short Introduction to ViennaR, Laura Vana introducing R-Ladies Vienna and Hadley Wickham with a great introduction to tidy(er) data and the new functions pivot_wider() and pivot_longer(). Stay tuned for the next ViennaR Meetups! Introduction [embedded content] You can download the slides of the … Read moreViennaR Meetup March – Full Talks Online

Analysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa

Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio Introduction The Human Immunodeficiency Virus (HIV) is the virus that causes acquired immunodeficiency syndrome (AIDS). The virus invades various immune cells, causing loss of immunity, and thus increased susceptibility to infections, including Tuberculosis and cancer. In a recent publication … Read moreAnalysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa

What matters more in a cluster randomized trial: number or size?

I am involved with a trial of an intervention designed to prevent full-blown opioid use disorder for patients who may have an incipient opioid use problem. Given the nature of the intervention, it was clear the only feasible way to conduct this particular study is to randomize at the physician rather than the patient level. … Read moreWhat matters more in a cluster randomized trial: number or size?

Marketing with Machine Learning: Apriori

This is part 1 of an ongoing series, introduced in Detroit Data Lab Presents: Marketing with Machine Learning Introduction Apriori, from the latin “a priori” means “from the earlier.” As with many of our predictions, we’re learning from the past and applying it toward the future. It’s the “Hello World” of marketing with machine learning! … Read moreMarketing with Machine Learning: Apriori

Detecting faces with Python and OpenCV Face Detection Neural Network

Cool Kids of Death Off Festival Now, we all know that Artificial Intelligence is becoming more and more real and its filling the gaps between capabilities of humans and machines day by day. It’s not just a fancy word anymore. It has had many advancements over the years in many fields and one of such areas … Read moreDetecting faces with Python and OpenCV Face Detection Neural Network

Separating mixed signals with Independent Component Analysis

Image modified from garageband The world around is a dynamic mixture of signals from various sources. Just like the colors in the above picture blend into one another, giving rise to new shades and tones, everything we perceive is a fusion of simpler components. Most of the time we are not even aware that the … Read moreSeparating mixed signals with Independent Component Analysis

Depicting Quranic Lengths with Sentence Drawings Data Art

ShahamBlockedUnblockFollowFollowing Apr 29 This project is an attempt to visually model groups of chapters in the Quran. Verse translation texts are analyzed using NLP topic-modelling techniques and then visualization using a specific art concept of Stefanie Posavec’s Sentence Drawings. The concept of Posavec’s sentence drawing is pretty simple: you draw a line relatively equivalent to … Read moreDepicting Quranic Lengths with Sentence Drawings Data Art

Investigating the Machine Reading Comprehension Problem with Deep Learning

Teaching machines how to do standardized test-like reading questions. This project and article was jointly done by Yonah Mann, Rohan Menezes and myself. When you think about it, reading comprehension is kind of a miracle of human thinking. That we can take a piece of text and, with little to no context, gain a deep … Read moreInvestigating the Machine Reading Comprehension Problem with Deep Learning

Thrive and blossom in the deep learning: FM model for recommendation system

Photo by Paul on Unsplash Since you entered this article, I would assume you are working as an algorithm engineer in the industry. However, the assumption is probably only 20% right according to my experience, because the majority of the algorithm engineers have a great diversity of the interests, they will be clicking the articles, as … Read moreThrive and blossom in the deep learning: FM model for recommendation system

Reinforcement Learning — A conceptual introduction

A simple guide for understanding Reinforcement Learning Photo by Alex Knight on Unsplash As described in a previous post, Reinforcement Learning is the topic in Artificial Intelligence that interests me the most. Even not being a computer scientist, it matches really well with my background, and the concept itself is quite fascinating. In this post I … Read moreReinforcement Learning — A conceptual introduction

Parallel and Distributed Deep Learning : A Survey

Introduction A few years ago Deep Learning was reduced to a scientific curiosity due to the lack of data, on the one hand, and the lack of computer power, on the other hand. In 2012 when a deep learning model beat the ImageNet Large Scale Visual Recognition Challenge¹, a contest that consists in recognizing the … Read moreParallel and Distributed Deep Learning : A Survey

A Flick of the Wrist: Defining the Next Generation of Human-Computer Interaction

For years, we’ve been enchanted by the idea of magic. The thought that someone, with the wave of a wand, snap of a finger, or some special words, can completely change the world around them in an instant is an idea that has captured minds throughout history. Any sufficiently advanced technology is indistinguishable from magic. — … Read moreA Flick of the Wrist: Defining the Next Generation of Human-Computer Interaction

What’s next for mapping apps? On route journey planning.

Picture this; you’re on route — you get a message, the location your meeting at has changed. But you’re already on the train — you need to find a route to the new location. But then a problem arises, the app doesn’t realise you’re already on the train! Suggested step 1 — walk back to the station you just came from… … Read moreWhat’s next for mapping apps? On route journey planning.

Machine Learning to Big Data — Scaling Inverted Indexing with Solr

Motivation The amount of data is growing exponentially every day and the volume has increased incredibly over the last few years. With this accelerated growth of data, it has now become important to search collections quickly to serve business decisions and customer queries both offline and online. To process large document collections quickly, at the … Read moreMachine Learning to Big Data — Scaling Inverted Indexing with Solr

Predicting unknown classes with “Visual to Semantic” transfer. Applications for general AI.

How merging various sensory inputs can help you make better predictions and grasp more information in image classification tasks. Machine learning loves big data, especially when it’s labelled. Google and Tencent released their image tasks datasets consisting of millions and tens of millions of training example. OpenAI showed that just ramping up dataset and network … Read morePredicting unknown classes with “Visual to Semantic” transfer. Applications for general AI.

The Data Science Internship Hunt: A Fortune 500 Story

How to go about getting a Data Science Internship in the United States? It’s that time of the year again when all the Master’s students in the United States are either in search of internships or have bagged one. It’s at times confusing and frustrating, but every student has to go through the internship hunting process. … Read moreThe Data Science Internship Hunt: A Fortune 500 Story

My book ‘Cricket analytics with cricketr and cricpy’ is now on Amazon

‘Cricket analytics with cricketr and cricpy – Analytics harmony with R and Python’ is now available on Amazon in both paperback ($21.99) and kindle ($9.99/Rs 449) versions. The book includes analysis of cricketers using both my R package ‘cricketr’ and my python package ‘cricpy’ for all formats of the game namely Test, ODI and T20. … Read moreMy book ‘Cricket analytics with cricketr and cricpy’ is now on Amazon

Feature engineering

Feature engineering is the process of transforming raw, unprocessed data into a set of targeted features that best represent your underlying machine learning problem. Engineering thoughtful, optimized data is the vital first step. In general, you can think of data cleaning as a process of subtraction and feature engineering as a process of addition. This … Read moreFeature engineering

Why use RTutor for interactive tutorials if there is RStudio’s learnr?

There are nowadays different options to create interactive R tutorials that allow users to type in code, automatically check their solution and get hints if they are stuck. Two options are RStudio’s learnr package and my RTutor package. Obviously, RStudio has a great track record for creating and continously supporting awesome software and R packages. … Read moreWhy use RTutor for interactive tutorials if there is RStudio’s learnr?

Self-driving cars: bigger road safety, less privacy

Photo by Denys Nevozhai on Unsplash How can we increase road safety? Bounteous efforts and resources are vital to increase road safety, and the industry has proposed solutions such as big data and autonomous vehicles. Since most car accidents are due to the human factor and not so much technical failure, it sounds logical to replace the … Read moreSelf-driving cars: bigger road safety, less privacy

The easy way to work with CSV, JSON, and XML in Python

Python’s superior flexibility and ease of use are what make it one of the most popular programming language, especially for Data Scientists. A big part of that is how simple it is to work with large datasets. Every technology company today is building up a data strategy. They’ve all realised that having the right data: … Read moreThe easy way to work with CSV, JSON, and XML in Python

A step towards general NLP with Dynamic Memory Networks

For the first pass (or first ‘episode’), the question embedding ‘q’ is used to compute attention scores for the sentence embeddings from the input module. The attention score of sentence sᵢ can then be passed through a softmax (so that the attention scores sum to one) or an individual sigmoid to obtain gᵢ . gᵢ is the … Read moreA step towards general NLP with Dynamic Memory Networks