For these reasons, it was important to have a step-by-step guideline, a cheat sheet, that walks through the quality checks to be applied. But first, what’s the thing we are trying to achieve?. What does it mean quality data?. What are the measures of quality data?. Understanding what are you trying to accomplish, your ultimate … Read moreThe Ultimate Guide to Data Cleaning
Investigating Paleoclimate Data with Pandas and Seaborn Some time ago Dr. Ed Hawkins, who happens to be the creator of the Climate Spirals, released to the world the Warming Stripes graph for Annual Global Temperature ranging from 1850–2017. The concept is simple but also very informative: each stripe represents the temperature for a single year and … Read moreClimate Heatmaps Made Easy
Thank you Andrew Ng! Overall I liked the course, I wish there could have been more for Human Resources professionals who should understand tools like tensorflow, keras etc. But once again, It was good to see Andrew Ng back in action. Just a last joke to finish it off! Why are there so many shocking … Read moreAI For Everyone: What Andrew Ng want to convey with this Non Technical Course in 30 points.
Introduction Marketing function is evolving rapidly with advancements in eCommerce, digital and mobile and with changing consumer demographics. A recent Forrester study indicated that e-commerce will account for 17.0% of retail sales by 2022, up from a projected 12.9% in 2017. This trend indicates that more and more people are moving online for their purchases … Read moreMachine Learning for Marketers
Lifting your understanding of MCMC to an intermediate level When I learned Markov Chain Monte Carlo (MCMC) my instructor told us there were three approaches to explaining MCMC. “Basic: MCMC allows us to leverage computers to do Bayesian statistics. Intermediate: MCMC is a method that can find the posterior distribution of our parameter of interest. … Read moreMarkov Chain Monte Carlo
As a Data Scientist, I code almost entirely in Python. I also get easily scared by configuring stuff. I don’t really know what a PATH is. I have no clue what lies within the /bin directory on my laptop. These are all things that you seemingly have to get familiar with to not have Python … Read moreThe Python Dreamteam
A pixel art gif I generated with the model. Not too shabby. If there is one thing that people love to talk about in video games, it is the fidelity of their graphics. A measure of gaming progress throughout the years has been the ability of modern day hardware to bring about the next wave of … Read moreGenerative Adversarial Networks: Revitalizing old video game textures
Early bird tickets for the Enterprise Applications of the R Language Conference are now on sale! The EARL Conference is in its sixth year, its a cross-sector conference that focuses on the commercial use of the R programming language. Take a look at our highlights from last year: [embedded content] We are busy putting together … Read moreEARL London early bird tickets now on sale
Gradient boosting has become quite a popular technique in the area of machine learning. Given its reputation for achieving potentially higher accuracy than other models, it has become particularly popular as a “go-to” model for Kaggle competitions. However, use of gradient boosting raises two questions: Does this technique really outperform others consistently irrespective of the … Read moreBoosting: Is It Always The Best Option?
I’ve got a work-in-progress drat-ified CRAN-like repo for (eventually) all my packages over at CINC (“CINC is not CRAN” and it also sounds like “sync”). This is in parallel with a co-location/migration of all my packages to SourceHut (just waiting for the sr.ht alpha API to be baked) and a self-hosted public Gitea instance. Everything … Read moredrat All The ?! : Enabling Easier Package Discovery and Installation with Your Own CRAN-like Repo for Your Packages
Classification algorithm uses probability kernels(patterns) which are created by applying binary matrix approach on image filtering kernels’ structure. Hello the people who are interested in artificial intelligence and human brain. According to my delusions, I modeled the neurons in human brain. According to the model, I created a classification algorithm. The algorithm works pretty well … Read moreClassification Algorithm Using Probability Patterns
Exploring Representations By Building a Four-in-One Network To fully understand what representations are, let’s build our own deep neural network that does four things: Image caption generator: given an image, generate a caption for it Similar words generator: given a word, find other words similar to it Visually similar image search: given an image, find … Read moreOne neural network, many uses
Example walk-through Jason and the Argonauts source Data I used the 20 newsgroups dataset from Scikit-Learn to prepare the experiment. You can find the data import below: Model It’s a Natural Language Processing problem, and the model’s pipeline contains a feature extraction step and a classifier. The code for the pipeline looks as follows: Optimization … Read moreHow to make your model awesome with Optuna
The purpose of this post is to provide a complete and simplified explanation of Principal Component Analysis, and especially to answer how it works step by step, so that everyone can understand it and make use of it, without necessarily having a strong mathematical background. PCA is actually a widely covered method on the web, … Read moreA step by step explanation of Principal Component Analysis
Let’s say I’m locked in a room and given a large batch of Chinese writing. I don’t know any Chinese, neither written nor spoken. I can’t even differentiate the writing from other similar scripts, such as Japanese. Now, I receive a second batch of writing, but this time with a set of instructions in English … Read moreArtificial Intelligence can never be truly intelligent
We are pleased to announce the launch of RStudio’s instructor training and certification program. Its goal is to help people apply modern evidence-based teaching practices to teach data science using R and RStudio’s products, and to help people who need such training find the trainers they need. Like the training programs for flight instructors, the … Read moreRStudio Instructor Training
Dplyr 0.8.0 launched recently, which you probably already know, but just in case you missed it.. Two new functions have been catching my eye : group_map and group_split. The aim of this post – take a first look at these and try and get a new blog post up on github before February is out. … Read moreA wee look at group_map and group_split in dplyr
This blog post was first published at the CDSBMexico website. #CDSBMexico: remember to apply for BioC2019 travel scholarships!! Due date is March 15thhttps://t.co/iegG0qQzwu Let us help you! Here we give you some ideas ?We can also give you feedback via Slack ✅#rstats #bioconductor @Bioconductor #bioc2019 #diversity #LatAm #rstatsES pic.twitter.com/EORg8d2Qxj — ComunidadBioInfo (@CDSBMexico) March 1, 2019 … Read moreCDSBMexico: remember to apply for BioC2019 travel scholarships
Many people begin their machine learning journey using Python and Sklearn. If you want to work with big data you have to use Apache Spark. It is possible to work with Spark in Python using Pyspark. However, since Spark is written in Scala, you will see much better performance by using Scala. There are numerous … Read moreTraining Your First Classifier with Spark and Scala
Evidence-Based Policy is Bigger than You or Your Feelings — Part III They say that cracking prejudice is harder than cracking atoms. This aphorism is even more true if said prejudice is about cracking atoms. Globally, a mere 38% of people approve of nuclear energy, even lower than the 48% which for some reason or another support coal … Read moreA Case for Nuclear: Bridging the Route to Renewables with Low-Carbon Energy
How a H2O deep learning model can be used to do supervised classification with Python This article introduces Deep Learning with H2O, the open source machine learning package by H2O.ai, and shows how a H2O Deep Learning model can be used to solve supervised classification problem, that is, use the ATLAS experiment to identify the Higgs … Read moreMachine Learning for Particle Data When You are Not a Physicist
When you want to improve value for your customers, it pays to plan clear goals and a data-driven approach to achieving them. The problem is that it’s a tricky process with hidden traps that can make you feel good even if you’re running your business off the road. In this article I’m sharing the seven … Read moreSeven-steps to set goals and pick metrics for customers
If you are building your own neural network, you will definitely need to understand how to train it. Backpropagation is a commonly used technique for training neural network. There are many resources explaining the technique, but this post will explain backpropagation with concrete example in a very detailed colorful steps. Overview In this post, we … Read moreBackpropagation Step by Step
Credit to Victor Vasarely In the previous article about the end-to-end neural coreference model, we have seen the results and its application on chatbot. Would you want to dig deeper into how the model works? This article will fulfill your curiosity. This article contains formulas for more details, but I have tried to make the description … Read moreDeep into End-to-end Neural Coreference Model
Credit: Pixabay Speech recognition is a fun task. A lot of API resources are available in market today which makes it easier for user to opt for one or another. However, when it comes to audio files especially call center data, the task becomes little challenging. Let’s make an assumption that a call center conversation … Read moreHow to use Google Speech to Text API to transcribe long audio files?
In his book “The Master Algorithm”, artificial intelligence researcher Pedro Domingos explores the idea of a single algorithm that can combine the major schools of machine learning. The idea is, without a doubt, extremely ambitious but we are already seeing some iterations of it. Last year, Google published a research paper under the catchy title … Read moreDeepMind Combines Logic and Neural Networks to Extract Rules from Noisy Data
Why biological brain are still miles ahead of any AI ever built… or to be built in the next many years. Brains are still way ahead of any AI available today When Albert Einstein said “Look deep into nature, and then you will understand everything better”, he did not have Artificial Intelligence (AI) in mind. He … Read moreBrains 1:0 AI
The Skip-gram model is one of the most popular word embeddings which aims to encode words given their context. “You shall know a word by the company it keeps” (Firth, J. R. 1957) This quotation from Firth, a linguist of the 20th century, perfectly illustrates our concerns. By “the company it keeps” or context, we … Read moreWord Embeddings : Intuition and (some) maths to understand end-to-end Skip-gram model
Automating machine learning is the topic of growing importance as first results are being used in practice bringing significant cost reduction. My talk at ML Prague conference maps state of the art techniques and open source AutoML frameworks mostly in the field of predictive modeling. I have also presented our research that is being partially … Read moreAutoML for predictive modeling
This post will display some robustness results for KDA asset allocation. Ultimately, the two canary instruments fare much better using the original filter weights in Defensive Asset Allocation than in other variants of the weights for the filter. While this isn’t as worrying (the filter most likely was created that way and paired with those … Read moreKDA–Robustness Results
In today’s post I want to help you incorporate your company’s branding into your ggplot graphs. Why should you care about this? I’m glad you asked! Have you ever seen a graph that looks like this? Of course you have! This is the default ggplot theme, and these graphs are everywhere. Now, look–I … Read moreYou Need to Start Branding Your Graphs. Here’s How, with ggplot!
In this article I provide a brief overview of several metrics used to evaluate the performance of models that simulate some behavior. These metrics compare the simulated output to some ground truth. Distribution Comparisons Jensen-Shannon Divergence Jensen-Shannon Divergence (JSD) measures the similarity between two distributions (i.e. the ground truth and the simulation). Another way to … Read moreSome Popular Metrics in Machine Learning
Categories Regression Models Tags Machine Learning Outlier R Programming Video Tutorials It is often the case that a dataset contains significant outliers – or observations that are significantly out of range from the majority of other observations in our dataset. Let us see how we can use robust regressions to deal with this issue. I … Read moreRobust Regressions: Dealing with Outliers in R
How Should Autonomous Machines Decide Who Not To Kill? I would love to have my own self-driving car. I mean, who wouldn’t? But they’re not perfect. If you think about it, self-driving cars have to make decisions like you and I. They don’t eliminate the possibility of collisions (yet)… just decrease the chances of it happening … Read moreMoral Dilemmas of Self-Driving Cars
Citations are a crucial piece of scholarly work. They hold metadata on each scholarly work, including what people were involved, what year the work was published, where it was published, and more. The links between citations facilitate insight into many questions about scholarly work. Citations come in many different formats including BibTex, RIS, JATS, and … Read morehandlr: convert among citation formats
We’ve been getting some good uptake on our piping in R article announcement. The article is necessarily a bit technical. But one of its key points comes from the observation that piping into names is a special opportunity to give general objects the following personality quiz: “If you were an R function, what function would … Read more“If You Were an R Function, What Function Would You Be?”
Graphs and their study have received a lot of attention since ages due to their ability of representing the real world in a fashion that can be analysed objectively. Indeed, graphs can be used to represent a lot of useful, real world datasets such as social networks, web link data, molecular structures, geographical maps, etc. … Read moreApplications of Graph Neural Networks
The 7th use of R in Official Statistics conference is the event for all things R in the production and use of government statistics. The 7th installment of this conference will take place from 20 to 21 May 2019 at the National Institute of Statistics in Bucharest, Romania. Keynote Speakers We are very proud to … Read moreuRos2019: tutorials, keynote speakers, registration and call for papers!
With more than 1,500 satisfied participants, eodas R-trainings are the leading courses for the programming language in the German-speaking region. In May, 2019, we bring our popular courses „Introduction to R“ and „Introduction to Machine Learning with R“ to Hamburg again. What you can look forward to? Our program at a glance: May 14th – 15th|Introduction to … Read moreR-Trainings in Hamburg – Register now!
The Swing In our rapidly-evolving digitally-centric world, I often fall into the trap of thinking of design as the latest glossy app or shiny consumer electronics product. It’s easy to lose sight of the fact that design is as old as human kind; an ancient and innate part of who we are as a species. … Read moreMachine Thinking, Conveyance, and the Future of Design
Question 3: Variations amongst neighborhoods in Seattle 3.a Where are most listings concentrated There are two attributes provided in the dataset that indicate the location of the listing. One is neighborhood and the other is neighborhood group. The latter splits the city into 17 areas while the former splits the city into 87 areas. Here … Read moreAirBnB listings in Seattle: A deeper look
Photo by rawpixel on Unsplash In a small business you don’t always know which metrics are key, you might frequently change the activity you analyse, or you don’t consider your company to have any relevant data, so you do not collect it. What makes a good metric? his is a great question for any company that wishes … Read moreWhat is a Good Metric?
Nadaraya-Watson Kernel-Weighted Average Regression In the above method, one of the major drawbacks was the equal assignment of weights. This method assigns weights to each point in a window of query point based on a specific Kernel. Main intuition is that weights should decrease with increase in distance and more weights should be assigned for … Read moreRegression: Kernel and Nearest Neighbor Approach
An explanation of an interpretable deep learning system. Nearly every technological step forward starts with an example from science fiction. So before I am going to explain what these scientists built, I want you to watch a part of an episode of the classic TV series Star Trek — Next Generation, The Identity Crisis. Play the embedded video from … Read moreNeural-Symbolic VQN — Disentagled Reasoning — Or — The answer: disentanglement
One of the classic examples in data science (called data mining at the time) is the beer and diapers example: when a big supermarket chain started analyzing their sales data they encountered not only trivial patterns, like toothbrushes and toothpaste being bought together, but also quite strange combinations like beer and diapers. Now, the trivial … Read moreCustomers who bought…
Recently, Rstudio added the Jobs feature, which allows you to run R scripts in the background. Computations are done in a separate R session that is not interactive, but just runs the script. In the meantime your regular R session stays live so you can do other work while waiting for the Job to complete. … Read moreUsing Rstudio Jobs for training many models in parallel
Extracting insights from AngelList companies Introduction AngelList is a place that connects startups to investors and job candidates looking to work at startups. Their goal is to democratize the investment process, helping startups with both fundraising and talent. Be it to find a job, investors for a startup, or even if just to make connections, … Read moreData Analysis of 10.000 AI Startups
Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from … Read moreMaking thematic maps for Belgium
Why video games are so good for machine learning? One of the first reasons behind the popularity of video games among AI researchers is the tendency of video games to mimic real life in many ways. This idea is not very straightforward when it comes to older games, as they have arcade-style graphics and physics. … Read moreVideo Games as a Perfect Playground for Artificial Intelligence