plumber 1.1.0

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I am happy to announce that {plumber} v1.1.0 is now on CRAN! … Read more

Categories R Tags ExcerptFavorite

What does it take to do a t-test?

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this post, I examine the fundamental assumption of independence underlying the … Read more

Categories R Tags ExcerptFavorite

Introducing popthemes

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Back once again with the block rocking themes! Yes, having produced a set … Read more

Categories R Tags ExcerptFavorite

10 Tips and Tricks for Data Scientists Vol.2

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We have started a series of articles on tips and … Read more

Categories R Tags ExcerptFavorite

10 Tips and Tricks for Data Scientists Vol.1

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction As data scientists, we love to do our job … Read more

Categories R Tags ExcerptFavorite

Predicting 2020–21 NBA’s Most Valuable Player using Machine Learning

What do ML models say about the MVP race? Photo by Keith Allison on Wikimedia Commons At the end of every season, media members across the National Basketball Association (NBA) are asked to decide on the winner of the league’s most sought-after individual regular season award: The Most Valuable Player (MVP). Created in the 1955–56 … Read more

styler 1.4.0

I am happy to announce that styler 1.4.0 is available on CRAN. Since the last release over a year ago, styler was improved in various ways. Dry and quiet runs You can run styler without modifying any files with the dry mode enabled. When “on”, the styling is performed without writing back, when “fail”, it … Read more

Categories R Tags ExcerptFavorite

Data Wrangling Solutions — Working With Dates — Part 1

Reading files containing the dates column. Photo by Elena Mozhvilo on Unsplash The topic discussed here is a challenge that every aspiring data scientist/ analyst stumbles upon at the start of their data science journey. The challenge with this problem is that you will continue to encounter it in some scenario or the other, and … Read more

Use and Enhance this Python Class to Download Excel Workbooks and Prepare them for Analytics

Use the Python class c_download_prep_excel to automatically download Excel spreadsheets from websites and prepare them for use in data analytics projects Free image, courtesy of Pixabay.com. In a recent article, I shared a Python class that downloads reports from analytics.usa.gov. The files provide data about how the public accesses roughly 5,700 U.S. government websites. They … Read more

One common misconception about Random Forest and overfitting

Bootstrapping, the majority vote rule, and the paradox of 100% training accuracy Photo by Robert Bye on Unsplash Does 100% train accuracy indicate overfitting? There are numerous suggestions to tune the depth of trees in Random Forest to prevent that from happening: see here or here. This advice is misguided. The post explains why 100% … Read more

Exploring other {ggplot2} geoms

[This article was first published on %>% dreams, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. R users are incredibly fortunate to work in an open source community … Read more

Categories R Tags ExcerptFavorite

Easily integrate Custom Functions in MATLAB with Python

MATLAB professional level functions in Python scripts MATLAB implementation is usually quite reliable as it is developed by professionals. But the advantages of using Python are immense. In this post, I will show how you can integrate your custom MATLAB function into your Python script. Image by Gerd Altmann from Pixabay Let us make a … Read more

4 Machine learning techniques for outlier detection in Python

Based on the feedback given by readers after publishing “Two outlier detection techniques you should know in 2021”, I have decided to make this post which includes four different machine learning techniques (algorithms) for outlier detection in Python. Here, I will use the I-I (Intuition-Implementation) approach for each technique. That will help you to understand … Read more

7 Ways to Measure the Value of Data

Metrics & Measurement 7 Ways to Inform Your Data Strategy Image courtesy of Gabby K In the previous post, I wrote about how you can measure the quality of your data assets. I also alluded that you should prioritize your measurement efforts based on the value the data bring to your business since the act … Read more

Interpolating NYC Bike Share Data to Discover Rebalancing Movements

Using Pandas concat to restructure Citi Bike trip data To ensure there are bikes (and docks) available when needed, Citi Bike, like most bike share systems, rebalances or moves bikes from where there are too many to where they are needed. Citi Bike doesn’t disclose data about where bikes are moved, and most of these … Read more

Two-layered recommender system methodology: a prize-winning solution

Learn how to combine simple algorithms into a powerful recommendation engine Photo by Denise Jans on Unsplash A Cinema Challenge hackathon was held from 14 to 22 November of 2020. It was dedicated to creating solutions for online theatre sweet.tv. Participants could decide on one of three projects: Challenge 1 — film recommender system; Challenge … Read more

How to solve common problems with GAMs

[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Here I’ll provide solutions to some common problems I run into when … Read more

Categories R Tags ExcerptFavorite

When interpolation doesn’t work with GAMs

[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. GAMs shouldn’t be used for extrapolation, and they can give strange results … Read more

Categories R Tags ExcerptFavorite

Spectroscopy Suite Update

My suite of spectroscopy R packages has been updated on CRAN. There are only a few small changes, but they will be important to some of you: ChemoSpecUtils now provides a set of colorblind-friendly colors, see ?colorSymbol. These are available for use in ChemoSpec and ChemoSpec2D. At the request of several folks, readJDX now includes … Read more

Categories R Tags ExcerptFavorite

Amazon SageMaker now supports private Docker registry authentication

Amazon SageMaker now supports adding authentication to requests for pulling images stored in your private Docker Registry to build containers for real-time inference. Amazon SageMaker makes it easy to deploy your trained models to production with a single click, so you can start generating real-time inferences with low latency. You can bring your own code … Read more

Categories AWS ExcerptFavorite

Amazon Forecast enables AWS Resource Groups

Amazon Forecast adds support for AWS Resource Groups and AWS Resource Groups Tag Editor. Amazon Forecast uses machine learning (ML) to generate more accurate demand forecasts, without requiring any prior ML experience. Forecast brings the same technology used at Amazon.com to developers as a fully managed service, removing the need to manage resources or rebuild … Read more

Categories AWS ExcerptFavorite

Object Extraction From Images

Object Extraction Using Skimage Let’s say you have an image as below, which is exactly the same as the one above, except that I manually added a “white stain” in the middle. Your goal is to extract “0” and “5”, and make them separate images. Using Skimage, you can make it happen in just 2 … Read more

Evaluation Bias; are you inadvertently training on your entire dataset?

We’ve seen that introducing validation splits or folds adds more moving pieces to our workflow, especially if we perform cross-validation. Fortunately, the open source AIQC framework for reproducible deep learning data preparation and batch model tuning can handle this for you! github.com/aiqc/aiqc Here we see how the High-Level API makes splitting and folding stratified data … Read more

Workshop 31.03.21: ‘Bring a Shiny App to Production’

[This article was first published on Mirai Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A 3 hours, hands-on workshop about safe, agile and automated deployment of … Read more

Categories R Tags ExcerptFavorite

Data Scientist vs Machine Learning Engineers Skill Differences

Introduction Data Science Machine Learning Summary References Overlap between these two popular tech roles is sure to happen, so let’s dive deep into what skills are required for both roles, and what makes them different. In general, data scientists can expect to work on the modeling side more, while machine learning engineers tend to focus … Read more

Data Analysis at Its Finest

An example that will make you like R better Photo by American Heritage Chocolate on Unsplash Python and R are the predominant languages in the data science ecosystem. They both provide numerous libraries to perform efficient data wrangling and analysis. When it comes to data analysis, Pandas has always been the first choice for me. … Read more

Check, please! Billing in Cloud StorageCheck, please! Billing in Cloud StorageCloud Developer Advocate

Standard Storage is appropriate for storing data that is frequently accessed, such as serving website content, interactive workloads, or data supporting mobile and gaming applications. For standard storage, the monthly cost is the only cost you need to plan for. However, for the other three storage types, you’ll want to consider the minimum storage duration … Read more

Rethinking ‘rehost, replatform, rearchitect’: Cloud migration for the real worldRethinking ‘rehost, replatform, rearchitect’: Cloud migration for the real worldConsulting & Engineering Manager, Professional Services

When helping customers plan large-scale migrations of applications to the cloud, we here on the Professional Services team sometimes observe them pouring countless hours into the top-down evaluation of their application estate and categorizing them into discrete migration strategies like “rehost”, “replatform”, “refactor” and so on. It’s a well1 established2 industry3 practice4 in which the … Read more

Using BigQuery Administrator for real-time monitoringUsing BigQuery Administrator for real-time monitoringProduct Manager, BigQuerySoftware Engineer

When doing analytics at scale with BigQuery, understanding what is happening and being able to take action in real-time is critical. To that end, we are happy to announce Resource Charts for BigQuery Administrator. Resources Charts provide a native, out-of-the-box experience for real-time monitoring and troubleshooting of your BigQuery environments. Resource Charts make it easy … Read more

Amazon ElastiCache for Redis now supports highly available clusters on AWS Local Zones

Amazon ElastiCache for Redis now supports running clusters with high availability across multiple AWS Local Zones. AWS Local Zones are an extension of an AWS Region where you can run your latency-sensitive applications using AWS services in geographic proximity to end-users. Previously, Amazon ElastiCache for Redis only supported launching a cluster in a single AWS … Read more

Categories AWS ExcerptFavorite

Let’s jump on the Poetry bandwagon

Why you should use Poetry for your Python data science projects Photo by Danny Howe on Unsplash Poetry may revolutionize the way current Python projects are created and shared. It is intuitive to use, and solves some critical pain points that Python developers have complained about for years. I came upon Poetry when learning about … Read more

Upping your organizational and reproducibility game for Bayesian analyses with `MCMCvis` 0.15.0

[This article was first published on R – Lynch Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The background Results from Bayesian analyses are often the result … Read more

Categories R Tags ExcerptFavorite

Build your own arcpy helper package

Say goodbye to cut-and-paste Python Photo by Volodymyr Hryshchenko on Unsplash Esri’s arcpy package for Python is the most comprehensive toolkit available for management, analysis and visualisation of geographic information. This exhaustive set of capabilities comes at some cost to the brevity of your code. Seemingly simple operations can be complicated to implement, and the … Read more

Scale cloud adoption with modular designs for enterprise-scale landing zones

This post was co-authored by Sarah Lean, Senior Content Engineer, Azure Tailwind Traders1 is a retail company that is looking to adopt Azure as part of its IT strategy. The IT team is familiar with deploying infrastructure on-premises and is now researching what they need to do in order to run their workloads on Azure. … Read more

4 Principles to Learn SQL for Data Science

Tips for improving your data science skillset Photo by Mayer Maged on Unsplash It goes without saying that the ability to code up SQL is necessary to landing and succeeding in any data science role. There isn’t any evidence that this is changing anytime soon, SQL is here to stay, so mastering this skill is … Read more

5 reasons Databricks runs best on Azure

For any organization running big data workloads in the cloud, exceptional scale, performance, and optimization are essential. Databricks customers have multiple choices for their cloud destination. Azure Databricks is the only first-party service offering for Databricks, which provides customers with distinct benefits not offered in any other cloud. The first-party integration and our unique strategic … Read more

Strengthen and optimize compliance in Azure Security Center

The Regulatory Compliance dashboard in Azure Security Center is an excellent tool for helping organizations understand their compliance posture relative to industry standards. Reporting on compliance with specific standards is obviously critical for regulated customers, though tracking compliance status is also relevant to many other organizations who want to align with industry-defined best practices. Many … Read more

Create Your Personal Cheat Sheets

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Create Your Personal Cheat Sheets Cheat Sheets are a handy way to … Read more

Categories R Tags ExcerptFavorite

Building Small Services, Deploying on Kubernetes, and Integrating with API Gateway

Abstracting Backend API Authentication with Python & Redis Photo by Alejandro Escamilla on Unsplash Recently, I was working on the integration of a backend system with the API gateway. The backend system has its own APIs but does not have authentication. Or I should say that it does has authentication but only apply to a … Read more

A Structured Approach for Ideating AI Use Cases With AI Discovery

The best way to understand how the modified ideation process works is through a case study. The case study below is from an actual session that I conducted; the company name and sensitive information have been masked for confidentiality purpose. Client’s background BETTER BRAIN is an institute of higher learning. Their CEO is interested in … Read more