Working Towards an Advanced, Inclusive, andSocially Responsible AI Framework

Three Lessons from O’Reilly’s AI Conference Roger Chen (left), CEO of Computable, and Ben Lorica (right), Chief Data Scientist at O’Reilly Media, introducing keynote speakers Becoming a DataKind volunteer has awarded me a unique opportunity to bring together my interests in data science and social impact. For this reason, I was thrilled to be able to … Read more

Two thoughts on the question “Are times series models considered part of Machine Learning or…

This is a very brief note that I decided to write after yet again coming across some form of the question “Is time series analysis part of machine learning?/Is time series analysis considered supervised learning?” in a discussion forum. This question is obviously a very broad one, and is to some extent subjective. I got … Read more

Plotting business locations on maps using multiple Plotting libraries in Python

I was browsing through Kaggle and came across a dataset which included locations in latitudes and longitudes. I haven’t worked with plotting on maps so I decided to take this dataset and explore various options available to work through them. This is a basic guide about what I did and the inferences I drew about … Read more

Neural Style Transfer

As a transition from stringing together premade Tensorflow code and algorithms, I began my journey to becoming a proficient Tensorflow coder by attempting my first implementation of an algorithm described in a paper. This project was my way of incorporating some deep learning into my school work. The original project was an image manipulation project … Read more

The road to Erewohn

Artificial Intelligence: an arms race no one cares to control I fear none of the existing machines; what I fear is the extraordinary rapidity with which they are becoming something very different to what they are at present. No class of beings have in any time past made so rapid a movement forward. Should not that … Read more

What’s new in TensorFlow 2.0?

The machine learning library TensorFlow has had a long history of releases starting from the initial open-source release from the Google Brain team back in November 2015. Initially developed internally under the name DistBelief, TensorFlow quickly rose to become the most widely used machine learning library today. And not without reason. Number of repository stars … Read more

The relationship between Biological and Artificial Intelligence

Intelligence can be defined as a predominantly human ability to accomplish tasks that are generally hard for computers and animals. Artificial Intelligence [AI] is a field attempting to accomplish such tasks with computers. AI is becoming increasingly widespread, as are claims of its relationship with Biological Intelligence. Often these claims are made to imply higher … Read more

Building a Shiny App as a Package

Shiny App as a Package In a previous post, I’ve introduced the {golem} package, which is an opinionated framework for building production-ready Shiny Applications. This framework starts by creating a package skeleton waiting to be filled. But, in a world where Shiny Applications are mostly created as a series of files, why bother with a … Read more

Categories R Tags ExcerptFavorite

Machine Learning explained for statistics folk

I’m running a one-day workshop called “From Statistics To Machine Learning” in central London on 28 October, for anyone who learnt some statistics and wants to find out about machine learning methods. I guess you might feel frustrated. There’s a lot of interest and investment in machine learning, but it’s hard to know where to … Read more

Categories R Tags ExcerptFavorite

I Have a Lot of Data, I Just Don’t Know Where!

If you can say this about your company OR NOT, you should read this article. Introduction If you are reading this it’s probable that you are doing something with your data or want to do something with it. But, it’s not that easy right? Most data centered projects nowadays (in reality) are complex, expensive, require organizational … Read more

Deep learning in Space

Artificial intelligence is everywhere. Home appliances, automotive, entertainment systems, you name it, they are all packing AI capabilities. The space industry is no exception. In the past few months I have been working on a machine learning application that assists satellite docking from a simple camera video feed. If you want to know how deep … Read more

Spatial data and maps conference – FOSS4G

I’m helping organise a conference on (geo)spatial open source software – FOSS4G. We’re hosting it in the great city of Edinburgh, Scotland in September 2019. Abstract submissions: We’re very interested in hearing your tales of R, QGIS, Python, GRASS, PostGIS, etc.! Related To leave a comment for the author, please follow the link and … Read more

Categories R Tags ExcerptFavorite

The Rich didn’t earn their Wealth, they just got Lucky

Tomorrow, on the First of May, many countries celebrate the so called International Workers’ Day (or Labour Day): time to talk about the unequal distribution of wealth again! A few months ago I posted a piece with the title “If wealth had anything to do with intelligence…” where I argued that ability, e.g. intelligence, as … Read more

Categories R Tags ExcerptFavorite

Google Next 2019 – observations from the Mango viewing deck

The first thing that hit me about this year’s Google Next was the scale. Thousands of people wandering around Moscone Centre in San Fran with their name badges proudly displayed really brought home that this is a platform of which its users are proud. I was told by several people that the size of the … Read more

Categories R Tags ExcerptFavorite

Zooming in on maps with sf and ggplot2

When working with geo-spatial data in R, I usually use the sf package for manipulating spatial data as Simple Features objects and ggplot2 with geom_sf for visualizing these data. One thing that comes up regularly is “zooming in” on a certain region of interest, i.e. displaying a certain map detail. There are several ways to … Read more

Categories R Tags ExcerptFavorite

RcppArmadillo 0.9.400.2.0

A new RcppArmadillo release based on the very recent Armadillo upstream release arrived on CRAN earlier today, and will get to Debian shortly. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo … Read more

Categories R Tags ExcerptFavorite

Could not Resist

Also, Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019 is now content complete! It is deep into editing and soon into production! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. offers daily e-mail updates about R news and … Read more

Categories R Tags ExcerptFavorite

Dealing with correlation in designed field experiments: part I

When we have recorded two traits in different subjects, we can be interested in describing their joint variability, by using the Pearson’s correlation coefficient. That’s ok, altough we have to respect some basic assumptions (e.g. linearity) that have been detailed elsewhere (see here). Problems may arise when we need to test the hypothesis that the correlation … Read more

Categories R Tags ExcerptFavorite

RStudio 1.2 Released

We’re excited to announce the official release of RStudio 1.2! What’s new in RStudio 1.2? Over a year in the making, this new release of RStudio includes dozens of new productivity enhancements and capabilities. You’ll now find RStudio a more comfortable workbench for working in SQL, Stan, Python, and D3. Testing your R code is … Read more

Categories R Tags ExcerptFavorite

ViennaR Meetup March – Full Talks Online

The full talks of the ViennaR March Meetup are finally online: A short Introduction to ViennaR, Laura Vana introducing R-Ladies Vienna and Hadley Wickham with a great introduction to tidy(er) data and the new functions pivot_wider() and pivot_longer(). Stay tuned for the next ViennaR Meetups! Introduction [embedded content] You can download the slides of the … Read more

Categories R Tags ExcerptFavorite

Analysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa

Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio Introduction The Human Immunodeficiency Virus (HIV) is the virus that causes acquired immunodeficiency syndrome (AIDS). The virus invades various immune cells, causing loss of immunity, and thus increased susceptibility to infections, including Tuberculosis and cancer. In a recent publication … Read more

Categories R Tags ExcerptFavorite

Relaunching the qualtRics package

rOpenSci is one of the first organizations in the R community I ever interacted with, when I participated in the 2016 rOpenSci unconf. I have since reviewed several rOpenSci packages and been so happy to be connected to this community, but I have never submitted or maintained a package myself. All that changed when I … Read more

Categories R Tags ExcerptFavorite

Compute R2s and other performance indices for all your models!

Indices of model performance (i.e., model quality, goodness of fit, predictive accuracy etc.) are very important, both for model comparison and model description purposes. However, their computation or extraction for a wide variety of models can be complex. To address this, please let us introduce the performance package! performance We have recently decided to collaborate … Read more

Categories R Tags ExcerptFavorite

What matters more in a cluster randomized trial: number or size?

I am involved with a trial of an intervention designed to prevent full-blown opioid use disorder for patients who may have an incipient opioid use problem. Given the nature of the intervention, it was clear the only feasible way to conduct this particular study is to randomize at the physician rather than the patient level. … Read more

Categories R Tags ExcerptFavorite

Marketing with Machine Learning: Apriori

This is part 1 of an ongoing series, introduced in Detroit Data Lab Presents: Marketing with Machine Learning Introduction Apriori, from the latin “a priori” means “from the earlier.” As with many of our predictions, we’re learning from the past and applying it toward the future. It’s the “Hello World” of marketing with machine learning! … Read more

Categories R Tags ExcerptFavorite

Spark & AI Summit 2019

My review of the latest Spark and AI Summit hosted in San Francisco on April 24th and 25th 2019. Last week was hosted the latest edition of the Spark Conference. It was the first time for me attending the conference. Here is a breakdown of the different aspect of the conference. The big news Databricks, organizer … Read more

Depicting Quranic Lengths with Sentence Drawings Data Art

ShahamBlockedUnblockFollowFollowing Apr 29 This project is an attempt to visually model groups of chapters in the Quran. Verse translation texts are analyzed using NLP topic-modelling techniques and then visualization using a specific art concept of Stefanie Posavec’s Sentence Drawings. The concept of Posavec’s sentence drawing is pretty simple: you draw a line relatively equivalent to … Read more

Investigating the Machine Reading Comprehension Problem with Deep Learning

Teaching machines how to do standardized test-like reading questions. This project and article was jointly done by Yonah Mann, Rohan Menezes and myself. When you think about it, reading comprehension is kind of a miracle of human thinking. That we can take a piece of text and, with little to no context, gain a deep … Read more

Thrive and blossom in the deep learning: FM model for recommendation system

Photo by Paul on Unsplash Since you entered this article, I would assume you are working as an algorithm engineer in the industry. However, the assumption is probably only 20% right according to my experience, because the majority of the algorithm engineers have a great diversity of the interests, they will be clicking the articles, as … Read more

Reinforcement Learning — A conceptual introduction

A simple guide for understanding Reinforcement Learning Photo by Alex Knight on Unsplash As described in a previous post, Reinforcement Learning is the topic in Artificial Intelligence that interests me the most. Even not being a computer scientist, it matches really well with my background, and the concept itself is quite fascinating. In this post I … Read more

A Flick of the Wrist: Defining the Next Generation of Human-Computer Interaction

For years, we’ve been enchanted by the idea of magic. The thought that someone, with the wave of a wand, snap of a finger, or some special words, can completely change the world around them in an instant is an idea that has captured minds throughout history. Any sufficiently advanced technology is indistinguishable from magic. — … Read more

What’s next for mapping apps? On route journey planning.

Picture this; you’re on route — you get a message, the location your meeting at has changed. But you’re already on the train — you need to find a route to the new location. But then a problem arises, the app doesn’t realise you’re already on the train! Suggested step 1 — walk back to the station you just came from… … Read more

Join our Discord Server

Chat Exchange ideas, provide valuable feedback and expand your understanding of data science We are creating a discord server for Towards Data Science to better connect our community. By joining, you will be able to discuss key data science and machine learning topics. You could get help on your current project or help others along their … Read more

Machine Learning to Big Data — Scaling Inverted Indexing with Solr

Motivation The amount of data is growing exponentially every day and the volume has increased incredibly over the last few years. With this accelerated growth of data, it has now become important to search collections quickly to serve business decisions and customer queries both offline and online. To process large document collections quickly, at the … Read more

Predicting unknown classes with “Visual to Semantic” transfer. Applications for general AI.

How merging various sensory inputs can help you make better predictions and grasp more information in image classification tasks. Machine learning loves big data, especially when it’s labelled. Google and Tencent released their image tasks datasets consisting of millions and tens of millions of training example. OpenAI showed that just ramping up dataset and network … Read more

My book ‘Cricket analytics with cricketr and cricpy’ is now on Amazon

‘Cricket analytics with cricketr and cricpy – Analytics harmony with R and Python’ is now available on Amazon in both paperback ($21.99) and kindle ($9.99/Rs 449) versions. The book includes analysis of cricketers using both my R package ‘cricketr’ and my python package ‘cricpy’ for all formats of the game namely Test, ODI and T20. … Read more

Categories R Tags ExcerptFavorite

Getting a Data Science Job is not a Numbers Game!

My First (Non Data Science) Job Search Let me tell you a story about my first job search. It was 2010, and data science jobs weren’t really a thing yet. I’ll get to that in a minute, but bear with me first because there’s a point to all this. At the time, I was a … Read more

Categories R Tags ExcerptFavorite

Feature engineering

Feature engineering is the process of transforming raw, unprocessed data into a set of targeted features that best represent your underlying machine learning problem. Engineering thoughtful, optimized data is the vital first step. In general, you can think of data cleaning as a process of subtraction and feature engineering as a process of addition. This … Read more

Why use RTutor for interactive tutorials if there is RStudio’s learnr?

There are nowadays different options to create interactive R tutorials that allow users to type in code, automatically check their solution and get hints if they are stuck. Two options are RStudio’s learnr package and my RTutor package. Obviously, RStudio has a great track record for creating and continously supporting awesome software and R packages. … Read more

Categories R Tags ExcerptFavorite

Self-driving cars: bigger road safety, less privacy

Photo by Denys Nevozhai on Unsplash How can we increase road safety? Bounteous efforts and resources are vital to increase road safety, and the industry has proposed solutions such as big data and autonomous vehicles. Since most car accidents are due to the human factor and not so much technical failure, it sounds logical to replace the … Read more

Why data science.

As an actuary, I believe our work and skill set very much overlaps with what I understand to be those of a ‘data scientist’. We typically work with (very) large sets of data to build analyses and models to ‘predict’ what’s going to happen. For example, a traditional role in health insurance actuarial sciences is … Read more

Templated output in R

Earo Wang, who is the curator for the We are R-Ladies twitter feed this week (last week of April, 2019), had a really nice tweet about using the whisker package to create a template incorporating text and data in R. Her example created a list of tidyverse packages with descriptions. I really liked the example, … Read more

Categories R Tags ExcerptFavorite