htmlunitjars Updated to 2.34.0

The in-dev htmlunit package for javascript-“enabled” web-scraping without the need for Selenium, Splash or headless Chrome relies on the HtmlUnit library and said library just released version 2.34.0 with a wide array of changes that should make it possible to scrape more gnarly javascript-“enabled” sites. The Chrome emulation is now also on-par with Chrome 72 … Read more

Categories R Tags ExcerptFavorite

The Ultimate Guide to Data Cleaning

For these reasons, it was important to have a step-by-step guideline, a cheat sheet, that walks through the quality checks to be applied. But first, what’s the thing we are trying to achieve?. What does it mean quality data?. What are the measures of quality data?. Understanding what are you trying to accomplish, your ultimate … Read more

Climate Heatmaps Made Easy

Investigating Paleoclimate Data with Pandas and Seaborn Some time ago Dr. Ed Hawkins, who happens to be the creator of the Climate Spirals, released to the world the Warming Stripes graph for Annual Global Temperature ranging from 1850–2017. The concept is simple but also very informative: each stripe represents the temperature for a single year and … Read more

AI For Everyone: What Andrew Ng want to convey with this Non Technical Course in 30 points.

Thank you Andrew Ng! Overall I liked the course, I wish there could have been more for Human Resources professionals who should understand tools like tensorflow, keras etc. But once again, It was good to see Andrew Ng back in action. Just a last joke to finish it off! Why are there so many shocking … Read more

Machine Learning for Marketers

Introduction Marketing function is evolving rapidly with advancements in eCommerce, digital and mobile and with changing consumer demographics. A recent Forrester study[1] indicated that e-commerce will account for 17.0% of retail sales by 2022, up from a projected 12.9% in 2017. This trend indicates that more and more people are moving online for their purchases … Read more

Markov Chain Monte Carlo

Lifting your understanding of MCMC to an intermediate level When I learned Markov Chain Monte Carlo (MCMC) my instructor told us there were three approaches to explaining MCMC. “Basic: MCMC allows us to leverage computers to do Bayesian statistics. Intermediate: MCMC is a method that can find the posterior distribution of our parameter of interest. … Read more

The Python Dreamteam

As a Data Scientist, I code almost entirely in Python. I also get easily scared by configuring stuff. I don’t really know what a PATH is. I have no clue what lies within the /bin directory on my laptop. These are all things that you seemingly have to get familiar with to not have Python … Read more

EARL London early bird tickets now on sale

Early bird tickets for the Enterprise Applications of the R Language Conference are now on sale! The EARL Conference is in its sixth year, its a cross-sector conference that focuses on the commercial use of the R programming language. Take a look at our highlights from last year: [embedded content] We are busy putting together … Read more

Categories R Tags ExcerptFavorite

Boosting: Is It Always The Best Option?

Gradient boosting has become quite a popular technique in the area of machine learning. Given its reputation for achieving potentially higher accuracy than other models, it has become particularly popular as a “go-to” model for Kaggle competitions. However, use of gradient boosting raises two questions: Does this technique really outperform others consistently irrespective of the … Read more

drat All The ?! : Enabling Easier Package Discovery and Installation with Your Own CRAN-like Repo for Your Packages

I’ve got a work-in-progress drat-ified CRAN-like repo for (eventually) all my packages over at CINC (“CINC is not CRAN” and it also sounds like “sync”). This is in parallel with a co-location/migration of all my packages to SourceHut (just waiting for the alpha API to be baked) and a self-hosted public Gitea instance. Everything … Read more

Categories R Tags ExcerptFavorite

Classification Algorithm Using Probability Patterns

Classification algorithm uses probability kernels(patterns) which are created by applying binary matrix approach on image filtering kernels’ structure. Hello the people who are interested in artificial intelligence and human brain. According to my delusions, I modeled the neurons in human brain. According to the model, I created a classification algorithm. The algorithm works pretty well … Read more

One neural network, many uses

Exploring Representations By Building a Four-in-One Network To fully understand what representations are, let’s build our own deep neural network that does four things: Image caption generator: given an image, generate a caption for it Similar words generator: given a word, find other words similar to it Visually similar image search: given an image, find … Read more

How to make your model awesome with Optuna

Example walk-through Jason and the Argonauts source Data I used the 20 newsgroups dataset from Scikit-Learn to prepare the experiment. You can find the data import below: Model It’s a Natural Language Processing problem, and the model’s pipeline contains a feature extraction step and a classifier. The code for the pipeline looks as follows: Optimization … Read more

RStudio Instructor Training

We are pleased to announce the launch of RStudio’s instructor training and certification program. Its goal is to help people apply modern evidence-based teaching practices to teach data science using R and RStudio’s products, and to help people who need such training find the trainers they need. Like the training programs for flight instructors, the … Read more

Categories R Tags ExcerptFavorite

A wee look at group_map and group_split in dplyr

Dplyr 0.8.0 launched recently, which you probably already know, but just in case you missed it.. Two new functions have been catching my eye : group_map and group_split. The aim of this post – take a first look at these and try and get a new blog post up on github before February is out. … Read more

Categories R Tags ExcerptFavorite

CDSBMexico: remember to apply for BioC2019 travel scholarships

This blog post was first published at the CDSBMexico website. #CDSBMexico: remember to apply for BioC2019 travel scholarships!! Due date is March 15th Let us help you! Here we give you some ideas ?We can also give you feedback via Slack ✅#rstats #bioconductor @Bioconductor #bioc2019 #diversity #LatAm #rstatsES — ComunidadBioInfo (@CDSBMexico) March 1, 2019 … Read more

Categories R Tags ExcerptFavorite

A Case for Nuclear: Bridging the Route to Renewables with Low-Carbon Energy

Evidence-Based Policy is Bigger than You or Your Feelings — Part III They say that cracking prejudice is harder than cracking atoms. This aphorism is even more true if said prejudice is about cracking atoms. Globally, a mere 38% of people approve of nuclear energy, even lower than the 48% which for some reason or another support coal … Read more

Machine Learning for Particle Data When You are Not a Physicist

How a H2O deep learning model can be used to do supervised classification with Python This article introduces Deep Learning with H2O, the open source machine learning package by, and shows how a H2O Deep Learning model can be used to solve supervised classification problem, that is, use the ATLAS experiment to identify the Higgs … Read more

Backpropagation Step by Step

If you are building your own neural network, you will definitely need to understand how to train it. Backpropagation is a commonly used technique for training neural network. There are many resources explaining the technique, but this post will explain backpropagation with concrete example in a very detailed colorful steps. Overview In this post, we … Read more

Deep into End-to-end Neural Coreference Model

Credit to Victor Vasarely In the previous article about the end-to-end neural coreference model, we have seen the results and its application on chatbot. Would you want to dig deeper into how the model works? This article will fulfill your curiosity. This article contains formulas for more details, but I have tried to make the description … Read more

How to use Google Speech to Text API to transcribe long audio files?

Credit: Pixabay Speech recognition is a fun task. A lot of API resources are available in market today which makes it easier for user to opt for one or another. However, when it comes to audio files especially call center data, the task becomes little challenging. Let’s make an assumption that a call center conversation … Read more

DeepMind Combines Logic and Neural Networks to Extract Rules from Noisy Data

In his book “The Master Algorithm”, artificial intelligence researcher Pedro Domingos explores the idea of a single algorithm that can combine the major schools of machine learning. The idea is, without a doubt, extremely ambitious but we are already seeing some iterations of it. Last year, Google published a research paper under the catchy title … Read more

Brains 1:0 AI

Why biological brain are still miles ahead of any AI ever built… or to be built in the next many years. Brains are still way ahead of any AI available today When Albert Einstein said “Look deep into nature, and then you will understand everything better”, he did not have Artificial Intelligence (AI) in mind. He … Read more

Word Embeddings : Intuition and (some) maths to understand end-to-end Skip-gram model

The Skip-gram model is one of the most popular word embeddings which aims to encode words given their context. “You shall know a word by the company it keeps” (Firth, J. R. 1957) This quotation from Firth, a linguist of the 20th century, perfectly illustrates our concerns. By “the company it keeps” or context, we … Read more

AutoML for predictive modeling

Automating machine learning is the topic of growing importance as first results are being used in practice bringing significant cost reduction. My talk at ML Prague conference maps state of the art techniques and open source AutoML frameworks mostly in the field of predictive modeling. I have also presented our research that is being partially … Read more

KDA–Robustness Results

This post will display some robustness results for KDA asset allocation. Ultimately, the two canary instruments fare much better using the original filter weights in Defensive Asset Allocation than in other variants of the weights for the filter. While this isn’t as worrying (the filter most likely was created that way and paired with those … Read more

Categories R Tags ExcerptFavorite

Some Popular Metrics in Machine Learning

In this article I provide a brief overview of several metrics used to evaluate the performance of models that simulate some behavior. These metrics compare the simulated output to some ground truth. Distribution Comparisons Jensen-Shannon Divergence Jensen-Shannon Divergence (JSD) measures the similarity between two distributions (i.e. the ground truth and the simulation). Another way to … Read more

Robust Regressions: Dealing with Outliers in R

Categories Regression Models Tags Machine Learning Outlier R Programming Video Tutorials It is often the case that a dataset contains significant outliers – or observations that are significantly out of range from the majority of other observations in our dataset. Let us see how we can use robust regressions to deal with this issue. I … Read more

Categories R Tags ExcerptFavorite

Moral Dilemmas of Self-Driving Cars

How Should Autonomous Machines Decide Who Not To Kill? I would love to have my own self-driving car. I mean, who wouldn’t? But they’re not perfect. If you think about it, self-driving cars have to make decisions like you and I. They don’t eliminate the possibility of collisions (yet)… just decrease the chances of it happening … Read more

handlr: convert among citation formats

Citations are a crucial piece of scholarly work. They hold metadata on each scholarly work, including what people were involved, what year the work was published, where it was published, and more. The links between citations facilitate insight into many questions about scholarly work. Citations come in many different formats including BibTex, RIS, JATS, and … Read more

Categories R Tags ExcerptFavorite

“If You Were an R Function, What Function Would You Be?”

We’ve been getting some good uptake on our piping in R article announcement. The article is necessarily a bit technical. But one of its key points comes from the observation that piping into names is a special opportunity to give general objects the following personality quiz: “If you were an R function, what function would … Read more

Categories R Tags ExcerptFavorite

Applications of Graph Neural Networks

Graphs and their study have received a lot of attention since ages due to their ability of representing the real world in a fashion that can be analysed objectively. Indeed, graphs can be used to represent a lot of useful, real world datasets such as social networks, web link data, molecular structures, geographical maps, etc. … Read more

R-Trainings in Hamburg – Register now!

With more than 1,500 satisfied participants, eodas R-trainings are the leading courses for the programming language in the German-speaking region. In May, 2019, we bring our popular courses „Introduction to R“ and „Introduction to Machine Learning with R“ to Hamburg again. What you can look forward to? Our program at a glance:   May 14th – 15th|Introduction to … Read more

Categories R Tags ExcerptFavorite

AirBnB listings in Seattle: A deeper look

Question 3: Variations amongst neighborhoods in Seattle 3.a Where are most listings concentrated There are two attributes provided in the dataset that indicate the location of the listing. One is neighborhood and the other is neighborhood group. The latter splits the city into 17 areas while the former splits the city into 87 areas. Here … Read more

What is a Good Metric?

Photo by rawpixel on Unsplash In a small business you don’t always know which metrics are key, you might frequently change the activity you analyse, or you don’t consider your company to have any relevant data, so you do not collect it. What makes a good metric? his is a great question for any company that wishes … Read more

Regression: Kernel and Nearest Neighbor Approach

Nadaraya-Watson Kernel-Weighted Average Regression In the above method, one of the major drawbacks was the equal assignment of weights. This method assigns weights to each point in a window of query point based on a specific Kernel. Main intuition is that weights should decrease with increase in distance and more weights should be assigned for … Read more

Neural-Symbolic VQN — Disentagled Reasoning — Or — The answer: disentanglement

An explanation of an interpretable deep learning system. Nearly every technological step forward starts with an example from science fiction. So before I am going to explain what these scientists built, I want you to watch a part of an episode of the classic TV series Star Trek — Next Generation, The Identity Crisis. Play the embedded video from … Read more

Customers who bought…

One of the classic examples in data science (called data mining at the time) is the beer and diapers example: when a big supermarket chain started analyzing their sales data they encountered not only trivial patterns, like toothbrushes and toothpaste being bought together, but also quite strange combinations like beer and diapers. Now, the trivial … Read more

Categories R Tags ExcerptFavorite

Using Rstudio Jobs for training many models in parallel

Recently, Rstudio added the Jobs feature, which allows you to run R scripts in the background. Computations are done in a separate R session that is not interactive, but just runs the script. In the meantime your regular R session stays live so you can do other work while waiting for the Job to complete. … Read more

Categories R Tags ExcerptFavorite

Data Analysis of 10.000 AI Startups

Extracting insights from AngelList companies Introduction AngelList is a place that connects startups to investors and job candidates looking to work at startups. Their goal is to democratize the investment process, helping startups with both fundraising and talent. Be it to find a job, investors for a startup, or even if just to make connections, … Read more

Making thematic maps for Belgium

Related offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from … Read more

Categories R Tags ExcerptFavorite