3 (and Half) Powerful Tricks To Effectively Read CSV Data In Python

The parameter usecols in pandas.read_csv() is extremely useful to load only the specific columns from the csv data set. Here is the direct comparison of the time taken by read_csv() with and without usecols pandas.read_csv() usecols | Image by Author Importing .csv file to pandas DataFrame using usecols is ⚡️ 2.4X ⚡️ faster than importing … Read more

Learn Plotly for Advanced Python Visualization: A Use Case Approach

In order to add customizations such as cluster colors, bubble sizes, and hover-over tips, we need to first add three new columns to our data frame that assign these ‘customization parameters’ to each data point. The following code will add a new column called ‘color’ to the data frame. We first define a function called … Read more

Need to Code a Difficult Pharma Stats Table? The R Tables for Regulatory Submissions (RTRS) Working Group Wants to Know

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The R Consortium’s R Tables for Regulatory Submissions (RTRS) Working Group has … Read more

Categories R Tags ExcerptFavorite

Getting Started with Geospatial Analysis

Using geographic data and geospatial images to study the impact of climate changes, natural disasters, or human activity. Geographic data includes geospatial data captured using satellite imagery and geographic postponing systems (GPS) and other geographic data generally described explicitly in terms of geographic coordinates. Geospatial analysis includes collecting, reporting, plotting, and analyzing this data using … Read more

Defining the Moving Average Model for Time Series Forecasting in Python

Explore the moving average model and discover how we can use the ACF plot to identify the right MA(q) model for our time series Photo by Pawel Czerwinski on Unsplash One of the foundational models for time series forecasting is the moving average model, denoted as MA(q). This is one of the basic statistical models … Read more

How RGB and Grayscale Images Are Represented in NumPy Arrays?

Let’s start with image basics (Image by author, made with draw.io) Today, you’re going to learn some of the most important and fundamental topics in machine learning and deep learning. I guarantee that today’s content will deliver some of the foundational concepts that are key to start learning deep learning — a subset of machine … Read more

Will Data Scientists Still Be in Demand in 2022?

I have read many different viewpoints online about data engineering replacing data science as the hottest job of the 21st century. After working closely with both data engineering and data science teams, I have come to the conclusion that both fields are equally valuable. Companies need data engineers. They need people who are able to … Read more

How to Decrease the Carbon Footprint of Digital Communication

Photo by Mikaela Wiedenhoff on Unsplash An assessment of influences of email behaviours on greenhouse emissions using System Dynamics Nowadays, it is quite normal to communicate via digital tools. We have social media platforms to chat with friends, videotelephony services to conduct job interviews and, of course, the good old email. One would think that … Read more

NICE EnginFrame adds AWS HPC cluster management with AWS ParallelCluster

Today we are announcing general availability of NICE EnginFrame 2021.0. NICE EnginFrame is an easy-to-use, web front-end that makes HPC job submission and management easier for customers. With this latest release, customers are able to use NICE EnginFrame across both on-premises and AWS environments using its new AWS HPC Connector feature. Where customers may have … Read more

Categories AWS ExcerptFavorite

Sentiments of Rome

Being a low-resource language, it is not easy to acquire an annotated corpus in Latin, with a substantial size. I used one accessible dataset[4][5] on the internet, which consists of 45 Latin sentences classified into 3 sentiment classes (POSITIVE, NEGATIVE, NEUTRAL, MIXED) and have been extracted from Horace’s Odes (The creators of the dataset have … Read more

Advent of 2021, Day 6 – Setting up IDE

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Series of Apache Spark posts: Let’s look into the IDE that … Read more

Categories R Tags ExcerptFavorite

Mastering Histograms in Matplotlib

Details of Making Histograms The histogram is one of the most popular plots. It is useful to understand the overall distribution of a continuous variable. So, almost in any data analysis or exploratory data analysis, or machine learning project, you will start with some histograms. In this article, I will explain how to make histograms … Read more

Essential guide to Machine Learning Model Monitoring in Production

Techniques to detect data drift Image by Mediamodifier from Pixabay Model Monitoring is an important component of the end-to-end data science model development pipeline. The robustness of the model not only depends upon the training of the feature engineered data but also depends on how well the model is monitored after deployment. Typically a machine … Read more

Enabling keyless authentication from GitHub ActionsEnabling keyless authentication from GitHub ActionsDeveloper Advocate and Product ManagerSolutions Architect

GitHub Actions is a third-party CI/CD solution popular among many Google Cloud customers and developers. When a GitHub Actions Workflow needs to read or mutate resources on Google Cloud – such as publishing a container to Artifact Registry or deploying a new service with Cloud Run – it must first authenticate. Traditionally, authenticating from GitHub … Read more

Tutorial on Surface Crack Classification with Visual Explanation (Part 2)

Now, we are going to generate visual explanation heat maps. To generate the heat-maps we are going to use Grad-CAM[1] algorithm. The heat-maps identify the image regions that influence the network’s decision. If we look at the heat map, we can easily understand which image pixels contribute to the network’s decision. To work with grad-cam, … Read more

Shiny Weekly: News from the R Shiny Community

Shiny Weekly has officially launched! The Shiny Weekly newsletter is our way of creating a central resource for R Shiny news from the R community. We’re so thankful to those who manage R news aggregators and to those of you who post quality, informative content. We feel that a centralized newsletter for Shiny has been … Read more

Categories R Tags ExcerptFavorite

Why R? 2021 Jumping Rivers Session

[This article was first published on Why R? Foundation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This year we are organizing the fifth edition of Why R? … Read more

Categories R Tags ExcerptFavorite

Webhook vs API — Which One Do You Need?

Even if you are completely unfamiliar with technology, you likely utilize APIs on a daily basis. Whether ordering from an online store, determining your train schedule, or checking a weather app— we constantly pose requests for information, which is retrieved from a system or database we don’t necessarily know anything about. The layer between our … Read more

A 2nd look at vaccination breakthroughs in Switzerland

[This article was first published on Mirai Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Our Covid19 app provides a global view of the pandemic, but how … Read more

Categories R Tags ExcerptFavorite

Indexing from zero in R

[This article was first published on R on Tea & Stats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Everybody knows that R is an inferior programming language, … Read more

Categories R Tags ExcerptFavorite

9 new books added to Big Book of R

06 December 2021 Every time I update Big Book of R I’m blown away by how much good stuff is out there! In this release there’s 9 new books which covers the widest range of topics of any release to date :). Thanks to Sivuyile Nzimeni for one of the adds! Thinking Outside The Grid … Read more

Categories R Tags ExcerptFavorite

Heterogeneous Treatment Effects with Instrumental Variables: A Causal Machine Learning Approach

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Problem Setting In our forthcoming paper on Annals of Applied Statistics, we propose … Read more

Categories R Tags ExcerptFavorite

Setting up a Text Summarisation Project (Part 2)

Leveraging zero-shot learning for text summarisation with Hugging Face’s Pipeline API Photo by David Dvořáček on Unsplash This is the second part of a tutorial on setting up a text summarisation project. For more context and an overview of this tutorial, please refer back to the introduction as well as part 1 in which we … Read more

Believe Rationally

How To Update Your Beliefs Based On Evidence Imagine you took a rapid at-home covid-19 test. If you test positive, how worried should you be? Alternatively, if you test negative, how safe should you feel? Photo by Medakit Ltd on Unsplash This article will arm you with the knowledge and the tools to correctly and … Read more

Introduction to Applied Linear Algebra: Norms & Distances

Photo of Yan Krukov from Pexels Goal: This article gives an introduction to vector norms, vector distances and their application in the field of data science Why you should learn it: Vector norms and distances are used to describe attributes of vectors and the relationship of different vectors to each other. It is widely used … Read more

Self-Training Classifier: How to Make Any Algorithm Behave Like a Semi-Supervised One

You may think that Self-Training involves some magic or uses a highly complex approach. In reality, though, the idea behind Self-Training is very straightforward and can be explained by the following steps: First, we gather all labeled and unlabeled data, but we only use labeled observations to train our first supervised model. Then we use … Read more

Localization of indoor Wi-Fi users by Bayesian statistical modelling

Identifying indoor Wi-Fi users’ locations with a tolerance of uncertainty by Pymc3 Wi-Fi sensor network With the help of GPS, outdoor positioning has witnessed significant development. However, we are suffering from important inaccuracies when facing the indoor case. The existence of Wi-Fi network gives an alternative to build a localization system and significant research has … Read more

Advent of 2021, Day 5 – Setting up Spark Cluster

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Series of Apache Spark posts: We have explore the Spark architecture … Read more

Categories R Tags ExcerptFavorite

Easy Interpretations of ADF Test in R

############################################################################ # This R function helps to interpret the output of the urca::ur.df function. # The rules are based on https://stats.stackexchange.com/questions/ # 24072/interpreting-rs-ur-df-dickey-fuller-unit-root-test-results # # urdf is the output of the urca::ur.df function # level is one of c(“1pct”, “5pct”, “10pct”) # # Author: Hank Roark # Date: October 2019 ############################################################################ interp_urdf – function(urdf, level=“5pct”) {      # urdf = lt.adf.df.trend$LRY   # level = “5pct”      if(class(urdf) != “ur.df”)      stop(‘parameter is not of class ur.df from urca package’)   if(!(level %in% c(“1pct”, “5pct”, “10pct”) ) )      stop(‘parameter level is not one of 1pct, 5pct, or 10pct’)      #cat(“========================================================================\n”)   cat( paste(“At the”, level, “level:\n”) )     cat(“The model is of type none : “); print([email protected]$call$formula)     tau1_teststat_wi_crit = tau1_teststat > tau1_crit     if(tau1_teststat_wi_crit) {       cat(“tau1: The null hypothesis is not rejected, unit root is present\n”)     } else {       cat(“tau1: The null hypothesis is rejected, unit root is not present\n”)     }     #cat(“The model is of type drift\n”)     cat(“The model is of type drift : “); print([email protected]$call$formula)     tau2_teststat_wi_crit = tau2_teststat > tau2_crit     phi1_teststat_wi_crit = phi1_teststat  phi1_crit     if(tau2_teststat_wi_crit) {       # Unit root present branch       cat(“tau2: The first null hypothesis is not rejected, unit root is present\n”)       if(phi1_teststat_wi_crit) {         cat(“phi1: The second null hypothesis is not rejected, unit root is present\n”)         cat(”      and there is no drift.\n”)       } else {         cat(“phi1: The second null hypothesis is rejected, unit root is present\n”)         cat(”      and there is drift.\n”)       }     } else {       # Unit root not present branch       cat(“tau2: The first null hypothesis is rejected, unit root is not present\n”)       if(phi1_teststat_wi_crit) {         cat(“phi1: The second null hypothesis is not rejected, unit root is present\n”)         cat(”      and there is no drift.\n”)         warning(“This is inconsistent with the first null hypothesis.”)       } else {         cat(“phi1: The second null hypothesis is rejected, unit root is not present\n”)         cat(”      and there is drift.\n”)       } … Read more

Categories R Tags ExcerptFavorite

Writing Functions in R: Working Example One

[This article was first published on R – Mathew Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A. Background We usually want to write reusable code that … Read more

Categories R Tags ExcerptFavorite

How to Build a Poisson Hidden Markov Model Using Python and Statsmodels

Manufacturing strikes in the United States plotted against time (Data source: R data sets) (Image by Author) A step-by-step tutorial to get up and running with the Poisson HMM A Poisson Hidden Markov Model is a mixture of two regression models: A Poisson regression model which is visible and a Markov model which is ‘hidden’. … Read more

Scaled Line chart — What are they and why do you absolutely need them

Combining the power of two simple things to get something awesome Image by Author Sometimes when you combine two seemingly simple things, you get something awesome. In this article, you will see the power of combining a line chart and a numeric scaler. Just for the sake of vocabulary: The line chart is very useful … Read more

FuzzyWuzzy — the Before and After

Data Preprocessing — Cleaning the Data Before Analysis Before we choose our FuzzyWuzzy function and start comparing strings, we want to clean the data to ensure that our results will be as accurate as possible. Cleaning the data means removing irrelevant strings, and thus improving the functions’ performance. For example, let’s assume we compare strings … Read more

Paper explained: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

The secrets of why the SwAV model architecture works so well for self-supervised pre-training Teaching a neural network to understand the world around it without human supervision has been one of the north stars of the computer vision research community for years. Recently, multiple publications have shown the potential of novel methods to make significant … Read more

Analysing Interactions with SHAP

The mean prediction is the average predicted bonus across all 1000 employees. If you add up all the values in the contribution matrix and add the mean prediction you will get the models actual prediction for that employee. In our case, the mean predicted bonus was $148.93. All the values in the matrix add up … Read more

Why R? 2021 Invited Talks

[This article was first published on Why R? Foundation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet This year we are organizing the fifth edition of Why … Read more

Categories R Tags ExcerptFavorite

Normalization, Standardization and Normal Distribution

I will start this post with a statement: normalization and standardization will not change the distribution of your data. In other words, if your variable is not normally distributed, it won’t be turn into one with the normalize method. normalize() or StandardScaler() from sklearn won’t change the shape of your data. Standardization Standardization can be … Read more

Examples of Multi-Cursor for working with Data

How to save time and nerves when coding for data analysis in VS Code using Multi-Cursor and selection features Doing multiple thing at once — Photo by Matt Bero at unsplash Working with data can be very dynamic with repeated forward and backward motions through your code to adjust and copy snippets, introducing new assumptions, … Read more

Codex — a bridge between clouds?

image — shutterstock Can Codex translate commands between AWS and Google Cloud? Many businesses need to deal with multiple cloud environments. AWS, Azure, and Google Cloud each have their own sets of commands to carry out cloud actions, such as setting up buckets, defining service accounts, and other administration tasks. It would be great to … Read more