My book ‘Deep Learning from first principles:Second Edition’ now on Amazon

The second edition of my book ‘Deep Learning from first principles:Second Edition- In vectorized Python, R and Octave’, is now available on Amazon, in both paperback ($14.99)  and kindle ($9.99/Rs449/-)  versions. Since this book is almost 70% code, all functions, and code snippets have been formatted to use the fixed-width font ‘Lucida Console’. In addition … Read more

Categories R Tags ExcerptFavorite

Closing the Sale: Predicting Home Prices via Linear Regression

Imports, Data Cleansing, and EDA Cleaning and EDA are important for this challenge as this data set contains many ordinal / categorical features that may be important in categorization and will need to be converted to numerical values. As a baseline, I imported the following libraries to clean, explore and model the training data. One of … Read more

Nobody UNDERSTANDS Me … But Soon, Artificial Intelligence Just Might

Our faces and voices can be analyzed for emotion. As I mentioned, biomimicry, or imitating natural design in the things we create, is critical in recreating this human tendency in AI. Our end goal is artificial empathy, which (for now, at least) describes a machine’s ability to recognize and respond to human emotion. Going in line … Read more

Pdftools 2.0: powerful pdf text extraction tools

A new version of pdftools has been released to CRAN. Go get it while it’s hot: install.packages(“pdftools”) This version has two major improvements: low level text extraction and encoding improvements. About PDF textboxes A pdf document may seem to contain paragraphs or tables in a viewer, but this is not actually true. PDF is a … Read more

Categories R Tags ExcerptFavorite

Yet another visualization of the Bayesian Beta-Binomial model

The Beta-Binomial model is the “hello world” of Bayesian statistics. That is, it’s the first model you get to run, often before you even know what you are doing. There are many reasons for this: It only has one parameter, the underlying proportion of success, so it’s easy to visualize and reason about. It’s easy … Read more

Categories R Tags ExcerptFavorite

Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%. Piping Piping is … Read more

Categories R Tags ExcerptFavorite

Pandas for data.table Users

R and Python are both great languages for data analysis. While they are remarkably similar in some aspects, they are drastically different in others. In this post, I will focus on the similarities and differences between Pandas and data.table, two of the most prominent data manipulation packages in Python/R. There is alreay an excellent post … Read more

An introduction to high-dimensional hyper-parameter tuning

If you ever struggled with tuning Machine Learning (ML) models, you are reading the right piece. Hyper-parameter tuning refers to the problem of finding an optimal set of parameter values for a learning algorithm. Usually, the process of choosing these values is a time-consuming task. Even for simple algorithms like Linear Regression, finding the best … Read more

Preprocessing with sklearn: a complete and comprehensive guide

For aspiring data scientist it might sometimes be difficult to find their way through the forest of preprocessing techniques. Sklearn its preprocessing library forms a solid foundation to guide you through this important task in the data science pipeline. Although Sklearn a has pretty solid documentation, it often misses streamline and intuition between different concepts. … Read more

IcoOmen: Using Machine Learning to Predict ICO Prices

Methodology Choose inputs and outputs. Collect and aggregate the data. Prepare the data. Explore and attempt to understand the data. Choose a Machine Learning Model. Measure the performance of the Model. Save the Model. Use the Model to make predictions. 1. Choosing Inputs and Outputs Inputs Choosing the right inputs and outputs (in the case of … Read more

How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

An Application of Sequence Model to Mine Waze Open Data of Traffic Incidents, using Python and Keras. In this tutorial, I will show you how to use RNN deep learning model to find patterns from Waze Traffic Open Data of Incidents Report, and predict if severe traffic jams will happen shortly. Interventions can be taken out … Read more

How to get the most out of Towards Data Science?

Our Readers’ Guide We have received feedback that some of you find it difficult to efficiently navigate our Medium publication. So we have put together a few bullet points that will hopefully aid your experience on our blog. Subscribe to our publication to receive our Monthly Edition and Weekly Selection directly in your mailbox. Follow us … Read more

Gold-Mining Week 15 (2018)

The post Gold-Mining Week 15 (2018) appeared first on Fantasy Football Analytics. Related offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and … Read more

Categories R Tags ExcerptFavorite

Vaex: Out of Core Dataframes for Python and Fast Visualization

So… no pandas ?? There are some issues with pandas that the original author Wes McKinney outlines in his insightful blogpost: “Apache Arrow and the “10 Things I Hate About pandas”. Many of these issues will be tackled in the next version of pandas (pandas2?), building on top of Apache Arrow and other libraries. Vaex starts … Read more

RTutor: Better Incentive Contracts For Road Construction

Since about two weeks, I face a large additional traffic jam every morning due to a construction site on the road. When passing the construction site, often only few people or sometimes nobody seems to be working there. Being an economist, I really wonder how much of such traffic jams could be avoided with better … Read more

Categories R Tags ExcerptFavorite

Day 13 – little helper read_files

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

Recreating the NBA lead tracker graphic

For each NBA game, has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on 10 Dec 2018: Taken from I thought it would be cool to try recreating … Read more

Categories R Tags ExcerptFavorite

Music Genre Classification with Python

Objective Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large … Read more

Rsampling Fama French

Today we will continue our work on Fama French factor models, but more as a vehicle to explore some of the awesome stuff happening in the world of tidy models. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and wrangling the data, … Read more

Categories R Tags ExcerptFavorite

Twins on the up

Are multiple births on the increase? My twin boys turned 5 years old today. Wow, time flies. Life is never dull, because twins are still seen as something of a novelty, so wherever we go, we find ourselves in conversation with strangers, who are intrigued by the whole thing. In order to save time if … Read more

Categories R Tags ExcerptFavorite

Named Entity Recognition (NER), Meeting Industry’s Requirement by Applying state-of-the-art Deep…

we are going to have a quick look at the architecture of four different state-of-the-art approaches by referring to the actual research paper and then we will move on to implement the one with the highest accuracy. Bidirectional LSTM-CRF: More details and implementation in keras. from the paper(Bidirectional LSTM-CRF Models for Sequence Tagging) 2. Bidirectional LSTM-CNNs: … Read more

The Importance of Being Recurrent for Modeling Hierarchical Structure

RNNs have inherent performance limitations For a while, it seemed that RNN’s were taking the Natural Language Processing (NLP) world by storm (from about 2014–17). However, we’ve recently started realizing the limitations of RNN’s, primarily that they are “inefficient and not scalable”. While there is great promise in overcoming these limitations by using more specialized … Read more

My introductory course on Bayesian statistics

So, after having held workshops introducing Bayes for a couple of years now, I finally pulled myself together and completed my DataCamp course: Fundamentals of Bayesian Data Analysis in R! ? While it’s called a course, it’s more like a 4 hour workshop and — without requiring anything but basic R skills and a vague … Read more

Categories R Tags ExcerptFavorite

Supervised Machine Learning: Classification

Machine Learning is the science (and art) of programming computers so they can learn from data. [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. Arthur Samuel, 1959 A better definition: A computer program is said to learn from experience E with respect to some task … Read more

Teaching and Learning Materials for Data Visualization

Data Visualization: A Practical Introduction will begin shipping next week. I’ve written an R package that contains datasets, functions, and a course packet to go along with the book. The socviz package contains about twenty five datasets and a number of utility and convenience functions. The datasets range in size from things with just a … Read more

Categories R Tags ExcerptFavorite

7 Tips to Getting a Data Science Job Faster

Data science is a booming field, but with great publicity comes great difficulty. Breaking into the data science field gets twice as hard each year. The growth of training programs like graduate degrees/certificates and bootcamps far exceeds the growth of new entry-level positions. Prior to 2015, it was a cake-walk to get multiple interviews. Now … Read more

First Mile

The Electric Pulse Thomas Parker Electric Car (1895) | Fisker Karma (2012) The credit to who invented the first electric vehicle is also debated due to the fact that many scientists and tinkerers were working with various forms of electric sources (batteries and electric motors) around the same time. However, there is a prominent name in electric … Read more

The Kernel Trick

The Kernel Trick We have seen how higher dimensional transformations can allow us to separate data in order to make classification predictions. It seems that in order to train a support vector classifier and optimize our objective function, we would have to perform operations with the higher dimensional vectors in the transformed feature space. In real … Read more

Amazon Customer Analysis

User review networks for customer segmentation Over the past decade or two, Americans have continued to prefer payment methods that are traceable, providing retailers and vendors with a rich source of data on their customers. This data is used by data scientists to help businesses make more informed decisions with respect to inventory, marketing, and … Read more

A Guide for Building Convolutional Neural Networks

Computer Vision it at the forefront of advancements in Artificial Intelligence (AI). It’s moving fast with new research coming out each and every day allowing us to do truly amazing things that we could’t do before with computers and AI. Convolutional Neural Networks (CNNs) are the driving force behind every advancement in Computer Vision research … Read more

The invisible workers of the AI era

50 ways to label data There are different ways to get your data labeled. Some firms label their data themselves — although this can be costly, as hiring people simply for these tasks costs firms both money and flexibility. Other companies even find ways to get people to label their data for free. Ever wonder why Google’s reCAPTCHA … Read more

Day 12 – little helper dive

We at STATWORX work a lot with R and we often use the same little helper functions within our projects. These functions ease our daily work life by reducing repetitive code parts or by creating overviews of our projects. At first, there was no plan to make a package, but soon I realised, that it … Read more

Categories R Tags ExcerptFavorite

AI and Machine Learning: Moving from Training to Education

The debate of whether AI will ever achieve capabilities at par or beyond human intelligence is ever ongoing. It certainly has intensified with the recent advancements in AI, Machine Learning (ML), and Deep Learning (DL) with some believing that the current technologies are already capable of paving the way for Artificial General Intelligence (AGI). You … Read more

Visualizing Hurricane Data with Shiny

Motivation for Project Around the time that I was selecting a topic for this project, my parents and my hometown found themselves in the path of a Category 1 hurricane. Thankfully, everyone was ok, and there was only minor damage to their property. But this event made me think about how long it had been … Read more

Categories R Tags ExcerptFavorite

Scraping the Turkey Accordion

Related To leave a comment for the author, please follow the link and comment on their blog: R on datawookie. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) … Read more

Categories R Tags ExcerptFavorite

Towards Ethical Machine Learning I quit my job to enter an intensive data science bootcamp. I understand the value behind the vast amount of data available that enables us to create predictive machine learning algorithms. In addition to recognizing its value on a professional level, I benefit from these technologies as a consumer. Whenever I find myself in … Read more

Reading List Faster With parallel, doParallel, and pbapply

I have several tables that I would like to load as a sole data frame. Derived functions from read. table () have a lot of convenient features, but it seems like there is a lot of steps in the implementation that would slow things down. The gain in performance of reading 29 CSV files (about … Read more

Categories R Tags ExcerptFavorite

Using ggplot2 for functional time series

I spoke yesterday about using ggplot2 for functional data graphics, rather than the custom-built plotting functionality available in the many functional data packages, including my own rainbow package written with Hanlin Shang. It is a much more powerful and flexible way to work, so I thought it would be useful to share some examples. French … Read more

Categories R Tags ExcerptFavorite

Network Centrality in R: New ways of measuring Centrality

This is the third post of a series on the concept of “network centrality” withapplications in R and the package netrankr. The last part introduced the concept ofneighborhood-inclusion and its implications for centrality. In this post, weextend the concept to a broader class of dominance relations by deconstructing indicesinto a series of building blocks and … Read more

Categories R Tags ExcerptFavorite

Geocomputation with R – the afterword

I am extremely proud to announce that Geocomputation with R is complete.It took Robin, Jannes, and me almost 2 years of collaborative planning, writing, refinement, and deployment to make the book available for anyone interested in open source, command-line approaches for handling geographic data.We’re very happy that it’s now ready to present to the world … Read more

Categories R Tags ExcerptFavorite

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system. Our example task is to fit a model on some arbitrary data. Our model will try … Read more

Categories R Tags ExcerptFavorite

Le Monde puzzle [#1075]

A new Le Monde mathematical puzzle in the digit category: Find the largest number such that each of its internal digits is strictly less than the average of its two neighbours. Same question when all digits differ. For instance, n=96433469 is such a number. When trying pure brute force (with the usual integer2digits function!) le=solz=3 … Read more

Categories R Tags ExcerptFavorite

How to give money to the R project

by Mark Niemann-Ross, an author, educator, and writer who teaches about R and Raspberry Pi at LinkedIn Learning I spend a LOT of time at, in particular the sections for documentation and CRAN. But I hadn’t spent much time in the other areas: R Project, R Foundation, and links. When I recently wandered into the foundation area, … Read more

Parsing XML, Named Entity Recognition in One-Shot

Photo credit: Conditional Random Fields, Sequence Prediction, Sequence Labelling Parsing XML is a process that is designed to read XML and create a way for programs to use XML. An XML parser is the piece of software that reads XML files and makes the information from those files available to applications. While reading an … Read more