Back by popular demand: Google Cloud products in 4 words or less (2021 edition)Back by popular demand: Google Cloud products in 4 words or less (2021 edition)Developer Advocate, GoogleDirector, Developer Advocacy

The 4 words or less Google Cloud developer’s cheat sheet is a project that describes each Google Cloud product in 4 words or less. If you are just getting started, this resource gives you a quick overview of all that is available to you on Google Cloud. And if you’re more experienced, it can be … Read more

Faster data exploration with DataExplorer

Data exploration is an important part of the modeling process. It can also take up a fair amount of time. The awesome DataExplorer package in R aims to make this process easier. To get started with DataExplorer, you’ll need to install it like below: install.packages(“DataExplorer”) Let’s use DataExplorer to explore a dataset on diabetes. # … Read more

Categories R Tags ExcerptFavorite

Clustering similar spatial patterns

TLTR: Clustering similar spatial patterns requires one or more raster datasets for the same area. Input data is divided into many sub-areas, and spatial signatures are derived for each sub-area. Next, distances between signatures for each sub-area are calculated and stored in a distance matrix. The distance matrix can be used to create clusters of … Read more

Categories R Tags ExcerptFavorite

2021 R Conferences

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. It is not yet clear what lasting impact the Covid-19 pandemic will … Read more

Categories R Tags ExcerptFavorite

stack overload

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The Riddle this week is rather straightforward to explain: stacking … Read more

Categories R Tags ExcerptFavorite

Four new digital training offerings for AWS End User Computing

We’re excited to introduce four new digital training offerings that help you learn how to plan, deploy, secure, and manage cloud-based desktops and applications. The offerings are designed for desktop or virtual desktop infrastructure managers, IT administrators, and technical professionals interested in cloud-based virtualization. These free self-paced courses and curriculums include presentations, interactive e-learning modules, … Read more

Categories AWS ExcerptFavorite

Imports of the Data Minimization Principle in the Big Data World

Non-compliance costs much The value of data in the age of big data is ever-increasing. As the data landscape expands every second every day; and data becomes ubiquitous and easier to collect, personal data is massively being mined and stored for being revitalized in the application of location tracking, healthcare, predictive policing, predictive justice, fraud … Read more

Time Series Forecasting with XGBoost and Feature Importance

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Those who follow my articles know that trying to predict gold prices has … Read more

Categories R Tags ExcerptFavorite

ppsr live on CRAN!

Finding predictive patterns in your dataset with one line of code! Today — March 2nd 2021 — my first R package was published on the comprehensive R archive network (CRAN). ppsr is the R implementation of the Predictive Power Score (PPS). The PPS is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships … Read more

Categories R Tags ExcerptFavorite

Connecting customers and businesses with Azure Communication Services and Microsoft Teams

Last year, we launched Azure Communication Services to help businesses reach customers—enabled by voice, video, text chat, and connections into short message service (SMS) and public switched telephone network (PSTN)—across applications, websites, and mobile platforms. Since launch, we have onboarded hundreds of customers and partners into preview across multiple industries and look forward to supporting … Read more

Thinks Another: Using Spectrograms to Identify Stage Wiggliness?

Last night I started wondering about ways in which I might be able to use signal processing (Fourier analysis) or symbol dynamics (eg Thinks: Symbolic Dynamics for Categorising Rally Stage Wiggliness?) to help categorise the nature of rally stage twistiness. Over a morning coffee break, I reminded myself of spectrograms, graphical devices that chunk a time … Read more

Categories R Tags ExcerptFavorite

Our Learnings from 3 Failures over 5 Years to Set Up a Data Catalog

At Atlan, we started out as a data team ourselves, driving data projects for social good with organizations like the United Nations, Gates Foundation, the World Bank, etc. We acted as a “data team” for our customers, so we experienced all the chaos and frustration of dealing with large-scale data firsthand. We were awoken with … Read more

Data Discovery Platforms and Their Open Source Solutions

I’ve compiled a high level comparison based on publicly available information. (Note: This is likely to be incomplete; please reach out if you have additional information!) A few observations: All platforms have free-text search (via Elasticsearch or Solr). Only Amundsen (Lyft) and Lexikon (Spotify) include recommendations on the home page. All platforms show basic table … Read more

AWS AppConfig is now available in AWS GovCloud (US) Regions

AWS AppConfig is now available in the AWS GovCloud (US) Regions. AWS AppConfig helps you deploy application configurations in a managed and monitored way just like you do with code deployments, but without the need to change the code if a configuration value changes. AWS AppConfig is intended for users who want to quickly and … Read more

Categories AWS ExcerptFavorite

Go faster and cheaper with Memorystore for Memcached, now GAGo faster and cheaper with Memorystore for Memcached, now GAProduct Manager for Cloud MemorystoreProduct Manager, Databases

Last year, we announced the beta release of Memorystore for Memcached, a fully managed service compatible with open-source Memcached protocol. Memorystore for Memcached has delivered speed and cost savings for our customers and we’re announcing the general availability of Memorystore for Memcached.  With Memorystore for Memcached, you can leverage the common Memcached in-memory key value … Read more

Super-FAST EDA in R with DataExplorer

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This article is part of a R-Tips Weekly, a weekly video tutorial that … Read more

Categories R Tags ExcerptFavorite

Server(shiny)-less dashboards with R, {htmlwidgets} and {crosstalk}

In this blog post, I want to discuss something that I, personally, have never seen discussed; how to create a “serverless” (or “shinyless” you could say) dashboard using R. I made one dashboard like that, which you can find here. This dashboard is running on a simple, standard web server. No Shiny involved! The idea … Read more

Categories R Tags ExcerptFavorite

Summer Internships 2021

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Photo by JD Long We are excited to announce the fourth formal … Read more

Categories R Tags ExcerptFavorite

SQL A to Z | Part 1

Learning the most important and commonly used SQL statements As data scientists, we utilize data to provide insights and recommendations to inform product strategy, growth and marketing, operations, and many other areas of the business. In many cases, data is stored in a relational database format, uses a structure that allows access to data points … Read more

Various Types of Cross-Validation Techniques for Machine Learning and Statistical Modeling

Why Cross-Validation is important for machine learning and statistical modeling? Cross-validation is widely known as a powerful tool of the data scientist for assessing the effectiveness of the machine learning or Statistical model, especially for preventing overfitting and underfitting problems (Sudhir et al. 2006). Overfitting and underfitting in machine learning and Statistical model (Image by … Read more

Using OpenAI’s CLIP to Search for Design Patents

Note that using both text and image embeddings for semantic search is not new. For example, you can read about recent developments in these papers, “Dialog-based Interactive Image Retrieval” by Xiaoxiao Guo et al. [6] and “Visual-Semantic Matching by Exploring High-Order Attention and Distraction” by Yongzhi Li, et al. [7]. After experimenting with the system, … Read more

Brimming With Possibilities: Query zqd & Mine Logs with zq from R

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Brim Security maintains a free, Electron-based desktop GUI for exploration of … Read more

Categories R Tags ExcerptFavorite

Kernel Machine From Scratch

Training a neural network model may be hard, knowing what it has learned is even harder. Back to 1995, Radford M. Neal showed that a single layer neural network with random parameters would converge to a Gaussian process as the width goes to infinity. In 2018, Lee et al. further generalized the result to infinite … Read more

How to Visualize Decision Tree from a Random Forest Model?

Decision Tree model Interpretation Image by Gerd Altmann from Pixabay Random Forest or Random Decision Forest is a supervised ensemble machine learning technique, for training classification and regression models. The training algorithm of random forest uses bagging or bootstrap aggregation technique, using decision tree as base learners. Random Forest uses randomization to reduce the bias … Read more