Forgotten features of R 4.0.0

R version 4.0.0 was released almost two years ago. The change in the major version, 3.x.y to 4.0.0, represented significant and potentially breaking changes. For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn’t shareable. This naturally slows down adoption. We moved our … Read more

Categories R Tags ExcerptFavorite

Tackling the Take-Home Challenge

The markdown file is very helpful to get a feel for what kind of data we’ll be working with. It includes data definitions and a very open ended instruction. Because there are essentially no constraints, we will use Python and Jupyter notebooks. I have a directory on my computer, coding_interviews that contains every take home … Read more

Text Cleaning for NLP in Python

A Critical Step in Natural Language Processing Made Easy! Photo by Dmitry Ratushny on Unsplash One of the most common tasks in Natural Language Processing (NLP) is to clean text data. In order to maximize your results, it’s important to distill your text to the most important root words in the corpus and clean out … Read more

Image Compression with PCA

Utilizing Images to Beautifully Represent Principal Component Analysis Photo by Erik Mclean on Unsplash Principal Component Analysis or PCA is a dimensionality reduction technique for data sets with many continuous (numeric) features or dimensions. It uses linear algebra to determine the most important features of a dataset. After these features have been identified, you can … Read more

Inventory Management for Retail — Periodic Review Policy

1. Inventory Management for Retail As an Inventory Manager of a mid-size retail chain, you are in charge of setting the replenishment quantity in the ERP. Because your warehouse operational manager is complaining about the orders frequencies, you start to challenge the replenishment rules implemented in the ERP, especially for the fast runners. Previously we … Read more

Rating Each Drivers 2021 Season – 10 – 1

[This article was first published on Sport Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet Hello, welcome to the second part of this look at … Read more

Categories R Tags ExcerptFavorite

How to Develop an R Shiny Dashboard In 10 Minutes or Less

Developing an R Shiny dashboard from scratch can be a time-consuming process. Luckily for you, you don’t need to start from scratch. In 2021 we released four R Shiny dashboard templates that are open to the public. The best part is – you can use and modify them free of charge! Today we’ll show you … Read more

Categories R Tags ExcerptFavorite

What’s in a Lambda? — Part 2

Now that you’ve learned about lambda functions in Python, I’ll walk through a data processing example. Photo by OpenIcons on Pixabay This is a follow-up to my earlier article, What’s in a Lambda?. Be sure to check it out first — I decided to write this follow-up due to the original article’s popularity. In that … Read more

5 Advanced Tips on Python Decorators

Do you want to write concise, readable, and efficient code? Well, python decorators may help you on your journey. Photo by Mauricio Muñoz on Unsplash In chapter 7 of Fluent Python, Luciano Ramalho discusses decorators and closures. They are not super common in basic DS work, however as you start building production models writing async … Read more

Upgrading R

[This article was first published on R – datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the recipe I use to upgrade R on a … Read more

Categories R Tags ExcerptFavorite

simstudy update: ordinal data generation that violates proportionality

Version 0.4.0 of simstudy is now available on CRAN and GitHub. This update includes two enhancements (and at least one major bug fix). genOrdCat now includes an argument to generate ordinal data without an assumption of cumulative proportional odds. And two new functions defRepeat and defRepeatAdd make it a bit easier to define multiple variables … Read more

Categories R Tags ExcerptFavorite

DataCamp Competition – Was a website redesign successful

[This article was first published on Blogs on Adejumo R.S, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. “🧑If first you don’t succeed, try two or more times … Read more

Categories R Tags ExcerptFavorite

A Comparative Review of the R-Instat GUI for R

[This article was first published on R | r4stats.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. by Robert A. Muenchen Introduction R-Instat is a free and open source … Read more

Categories R Tags ExcerptFavorite

Building Confidence on Explainability Methods

Explainability must be an integral part of modeling for a Data Scientist. Suppose we were to develop a credit scoring model (whether or not we are going to grant a loan to someone); explainability could provide many insights: verify that expected features (salary, debt ratio…) have a significant impact, or conversely understand why unexpected ones … Read more

How I developed a fully functional Purchasing Application using Python

Purchase Order entry, send it to your supplier, and receive the product in your warehouse Photo by Christiann Koepke on Unsplash Python is the most famous language when it comes to data: right from data integration to analysis to prediction. Considering it is open source, there are developers developing new libraries, bringing in new capabilities. … Read more

Automatic differentiation in R with Stan Math

Automatic differentiation Automatic differentiation (AD) refers to the automatic/algorithmic calculation of derivatives of a function defined as a computer program by repeated application of the chain rule. Automatic differentiation plays an important role in many statistical computing problems, such as gradient-based optimization of large-scale models, where gradient calculation by means of numeric differentiation (i.e. finite-differencing) is … Read more

Categories R Tags ExcerptFavorite

Predicting When Kickers Get Iced with {tidymodels}

Normally, I would do some EDA to better understand the data set but in the interest of word count I’ll jump right into using tidymodels to predict whether or not a given field goal attempt will be iced. In order to make the data work with the XGBoost algorithm I’ll subset and convert some numeric … Read more

Categories R Tags ExcerptFavorite

December 2021: “Top 40” New CRAN Packages

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. One hundred thirty-four new packages made it to CRAN last December. Here … Read more

Categories R Tags ExcerptFavorite

Introducing scale model in greybox

[This article was first published on R – Open Forecasting, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the end of June 2021, I released the greybox … Read more

Categories R Tags ExcerptFavorite

How to build a data lake from scratch — Part 2: Connecting the components

The complete tutorial of how to make use of popular technology to build a data engineering sandbox In this series of articles I am guiding you through setting up your very own data lake infrastructure as a data engineering sandbox. In this part I will show how to connect the various services of our data … Read more

RNN: Recurrent Neural Networks — How to Successfully Model Sequential Data in Python

Setup We’ll need the following data and libraries: Let’s import all the libraries: The above code prints package versions used in this example: Tensorflow/Keras: 2.7.0pandas: 1.3.4numpy: 1.21.4sklearn: 1.0.1plotly: 5.4.0 Next, we download and ingest Australian weather data (source: Kaggle). We also perform some simple data manipulation and derive a new variable (Median Temperature) for us … Read more

Evaluating Football Dribbling Skill by Utilizing the Elo Algorithm

In my post Measuring and Visualizing Football Players’ Skills, I demonstrated a standardized robust methodology for measuring players’ on-the-ball skills — the Lift Index. Using this approach, we can evaluate any on-the-ball action in football. All that is needed is a big enough sample of actions, probability, and their actual outcomes. Nevertheless, do all action-types … Read more

Understanding DBSCAN and Implementation with Python

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise, which is an unsupervised learning algorithm. DBSCAN is one of the most widely used clustering methods because the clusters found by DBSCAN can be any shape, which can deal with some special cases that other methods cannot. One of the most used examples to show … Read more

Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}

Setup Loading the R libraries and data set. # Loading libraries library(geomtextpath) # For adding text to ggplot2 curves library(tidytuesdayR) # For loading data set library(ggbeeswarm) # For creating a beeswarm plot library(tidyverse) # For the ggplot2, dplyr libraries library(gganimate) # For plot animation library(ggthemes) # For more ggplot2 themes library(viridis) # For plot themes … Read more

Categories R Tags ExcerptFavorite

Non-linear model of serial dilutions with Stan

[This article was first published on Posts | Joshua Cook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In chapter 17 “Parametric nonlinear models” of Bayesian Data Analysis … Read more

Categories R Tags ExcerptFavorite

Automation Python Scripts: Connecting to Oracle Database using Cx Oracle

I’ve been using cx_Oracle Python extension module to access my Oracle Database and I was able to automate most of my small tasks because of it. There are documentations available online provided by Oracle which can help you directly access Oracle database, but in this article I’ll be demonstrating how I use Cx Oracle and … Read more

Training Neural Networks to Create Text Like a Human

Recurrent neural networks can generate text which is indistinguishable from human writing. Here is an example with Amazon product reviews data. Photo by mauRÍCIO SANTOS on Unsplash Language modeling uses various techniques to determine the probability of a sequence of words in a sentence in a particular language. It very much draws upon the work … Read more

Deploy Logistics Operational Dashboards using DataPane

You will not build a complete cloud architecture with ETL jobs and advanced visualization tools like PowerBI, Tableau or Google Studio. The idea is to extract data from the WMS, process your data locally and deploy reports that can be used by operational teams. 1. Deploy reporting capabilities with DataPane This framework gives you the … Read more

Predicting future recessions

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Even if this sounds incredible, yes, we can predict future recessions using a … Read more

Categories R Tags ExcerptFavorite

Detecting multicollinearity — it’s not that easy sometimes

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. By Huey Fern Tay with Greg Page When are two variables too related to one … Read more

Categories R Tags ExcerptFavorite

Using the Local Dialect to Teach R Programming

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Tell us about yourself My name is Dattijo Murtala Makama, I am … Read more

Categories R Tags ExcerptFavorite

Are You a Pandas User Who Is Curious About R for Data Analysis? Here Is a Quick Start

We have just seen how to calculate the number of data points in each group which means counting by groups. There are many other aggregations we can perform for comparing different groups such as mean, min, max, number of unique values, and so on. We can calculate the total sales quantity for each store as … Read more

What’s in an F-String?

An overview of Python’s method for combining strings and variables and why you should use it. Photo by Mohammad Rahmani on Unsplash This is the fifth article in my series discussing unique features in Python; be sure to check out the first four on lambdas, list comprehensions, dictionaries, and tuples. If you’re a relatively new … Read more

Part of Speech Tagging

Leverage this essential building block of NLP to perform a Voice of Customer analysis Photo by Alexandre Pellaes on Unsplash Part of Speech (POS) is a way to describe the grammatical function of a word. In Natural Language Processing (NLP), POS is an essential building block of language models and interpreting text. While POS tags … Read more

An End-to-End Machine Learning Project — Heart Failure Prediction Part 2

Model deployment in a web application Welcome to part two of a series on an end-to-end machine learning project! In the first article we trained, validated, tuned and saved a machine learning model that uses patient information to predict heart failure probability. In this article, we will develop a web application through which anyone can … Read more

Exploratory Data Analysis with Python

A structured approach for exploring a new dataset Photo by Amanda Sandlin on Unsplash EDA, or Exploratory Data Analysis, is the process of examining and understanding the structure of a dataset. It’s a critical part of any machine learning project, and it is the tool in your toolbox that allows you to approach data you’ve … Read more

Deploy Cloud Functions on GCP with Terraform

Cloud Functions are scalable “pay-as-you-go” Functions as a Service (FaaS) from Google Cloud Platform (GCP) to run your code with zero server management. Cloud Functions can be written in Node.js, Python, Go, Java, .NET, Ruby, and PHP programming languages. For the sake of readability, this tutorial will contain a Cloud Function written in Python. But … Read more

rOpenSci News Digest, January 2022

Dear rOpenSci friends, it’s time for our monthly news roundup! You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci! rOpenSci HQ Co-working events Join us for social coworking & office hours monthly on 1st Tuesdays! Hosted by Steffi LaZerte and Nick Tierney. Everyone welcome. No RSVP … Read more

Categories R Tags ExcerptFavorite

Implementing Expectation-Maximisation Algorithm from Scratch with Python

Demystifying the horrors of the EM Algorithm by building one from scratch A mixture model. Created using Tableau. The Expectation-Maximisation (EM) Algorithm is a statistical machine learning method to find the maximum likelihood estimates of models that contain unknown latent variables. I am pretty certain that that sentence will make no sense at all to … Read more