Amazon Redshift launches RA3 in Hong Kong and China Regions (Beijing, Ningxia)

Amazon Redshift RA3 is now available in the Asia Pacific (Hong Kong) Region, China (Beijing) Region, operated by Sinnet, and China (Ningxia) Region, operated by NWCD. Amazon Redshift RA3 instances with managed storage allow you to scale compute and storage independently for fast query performance and lower costs and also enable you to securely and easily … Read more

Categories AWS ExcerptFavorite

AWS Elemental MediaPackage extends its metadata passthrough capabilities

AWS Elemental MediaPackage now supports timed ID3 metadata passthrough for live and VOD streams in HLS, CMAF, and DASH formats. ID3 metadata tags enable data to be embedded into video streams at specified timecodes and used by downstream systems or clients to enhance the playback experience. By dynamically adding metadata to a stream, you can … Read more

Categories AWS ExcerptFavorite

Amazon EKS managed node groups now supports parallel node upgrades

Amazon Elastic Kubernetes Service (EKS) managed node groups now supports upgrading multiple nodes in parallel. EKS managed node groups help make it easy to run a highly-available and secure Kubernetes cluster by automating the provisioning and lifecycle management of worker nodes, eliminating the need to select or configure multiple AWS services to add and update … Read more

Categories AWS ExcerptFavorite

Implement and Train Text Classification Transformer Models — the Easy Way

Learn how to implement and train text classification Transformer models like BERT, DistilBERT and more with only a few lines of code Image by author Text classification is debatably the most common application of NLP. And, like for most NLP applications, Transformer models have dominated the field in recent years. In this article, we’ll discuss … Read more

AWS Glue Studio now provides data previews during visual job authoring

AWS Glue Studio now allows you to preview your data at each step of the visual job authoring process. AWS Glue Studio automatically samples your data, then runs each transform in your job so you can test and debug your transformations without having to save or run the job. Data previews are available for each … Read more

Categories AWS ExcerptFavorite

AutoML for time series: advanced approaches with FEDOT framework

An example of using FEDOT and other AutoML libraries on real-world data with gaps and non-stationarity AutoML framework FEDOT for time series forecasting (image by author) As we already noticed in our previous post, that most of the modern open-source AutoML frameworks do not cover time series forecasting tasks extensively. In that post, we have … Read more

Building a VAE Playground with Streamlit

Docker is a containerization tool. Allows for reproducibility by taking a snapshot of the current environment and creating an isolated environment to run the application. This allows us to ship our code and run it on any device that has docker. Besides containerizing applications, we can even use Docker to create configurable and reproducible developer … Read more

Installing Ubuntu 20.04 LTS and running YOLOv4 and YOLOv5 on it.

To run YOLO with GPU we would need the appropriate driver. Go to “Software & Updates” and install driver in “Additional Drivers”. If you face any problem, you can also install driver from the command-line. Follow the instructions given here. If installing from here then, if any driver shows (proprietary, tested) choose it. Otherwise if … Read more

Introduction to Postgresql: Part 3

Time to insert, update, delete… Photo by Matthew Spiteri on Unsplash In Part 2, we learned how to create tables in Postgres. We also looked at how to connect to the database, so see part 2 if you need a refresher. For this part, we are going to look at how to insert, update, and … Read more

How to Calculate Partial Correlation coefficient in R-Quick Guide

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. partial correlation coefficient r, When we want to find the linear … Read more

Categories R Tags ExcerptFavorite

All Pandas qcut() you should know for binning numerical data based on sample quantiles

df[‘age_group’].value_counts().sort_index()Millennial 2Gen X 2Boomer 3Greatest 5Name: age_group, dtype: int64 There is an argument called retbin to return the bins. If it’s set to True, the result will return the bins. It’s useful when the 2nd argument q is passed as a single number value. result, bins = pd.qcut(df[‘age’], 5, # A single number valueretbins=True)# Print … Read more

Effortless Distributed Training of Ultra-Wide GCNs

Figure 1: A depiction of the training pipeline for GIST. sub-GCNs divides the GCN model into multiple sub-GCNs. Every sub-GCN is trained by subTrain using mini-batches constructed with the Cluster operation. Sub-GCN parameters are intermittently aggregated into the global model through the subAgg operation. [Figure created by author.] In this post, I will overview a … Read more

A Discourse on Reinforcement Learning

[ LET’S KNOW SERIES ] — #2 Part I — AN EXPANSIVE SETTING Co-author(s) | Sukrit Shashi Shankar The narrative presents an expansive setting, with multiple paradigms, related around the theme of Reinforcement Learning(RL). We believe that such a setting may help the reader to perceive a broader view of RL, realizing its underlying assumptions, … Read more

Three Tricks to Speed Up and Optimise Your Python

Data Science Discussions A review regarding three Python tricks that I have discovered in my June readings. Image by pasja1000 from Pixabay Every data scientists needs to maintain up-to-date: every day they should read, read and again read. Nobody is born educated! One possible strategy to maintain yourself up-to-date is to register to Twitter and … Read more

Comparing Random Forest and Gradient Boosting

Before we dive into the summary of key differences, let’s do a quick refresher. Depending on how we train and regularise a decision tree, tree can range from shallow simple underfitted tree (high bias) to deep complex overfitted tree (high variance). In other words, whether the prediction error is mostly due to bias or variance … Read more

Improving a Visualization

I saw this post on Reddit’s r/dataisbeautiful showing this plot of streaming services market share, comparing 2020 to 2021 US Streaming Services Market Share, 2020 vs 2021 and thought it looked like a good candidate for trying out some plot improvement techniques. Yes, that was a reasonably long while ago, this post has taken quite … Read more

Categories R Tags ExcerptFavorite

rOpenSci at useR!2021 – Presentations from Staff and Community

Are you putting together your useR!2021 conference schedule this weekend? Four rOpenSci staff and lots of community members are giving presentations and there’s something for everyone! 🔗 Talks by rOpenSci staff Jeroen Ooms, Lead Infrastructure Engineer, will give a keynote talk about the R-universe project on Friday, July 9, 12:30PM – 1:30PM UTC / 5:30AM … Read more

Categories R Tags ExcerptFavorite

OpenAI Launches GitHub Copilot: AI Focused On Code Generation. Should We Be Worried Now?

Should You Be Worried as a Data Scientist? Considering its merits and flaws, it is worth asking if GitHub Copilot affects developer jobs in the future. When GPT-3 was released, the answer to this question was a tentative, faint yes. However, now that Copilot is out and will be a commercially available product that integrates … Read more


Opinion And why it is not Python, nor SQL The 5 things every data analyst should know Problem Definition A bias is an inclination for or against an idea. Most of the time, this is totally unconscious, it takes place mainly when our results are exactly how we expect them to be. We are all … Read more

Why You Failed Your Machine Learning Interview

My experience in the ML interview process. 11 exemplary questions and ways to answer them Background Image from Envato, License held by Author through them Not so long ago I left university with a master’s degree in computer science.And I absolutely knew I had to find a job in the realm of Machine Learning(ML) and … Read more

Five Ways to Get Real-Life Data Science Experience Even If You Have No Experience

When I first started developing machine learning models, I found that my lack of Pandas skills was a big limitation to what I could do. Unfortunately, there aren’t many resources on the internet that allow you to practice your Pandas skills, unlike Python and SQL… A few weeks ago, however, I came across this resource … Read more

Creating a unified analytics platform for digital nativesCreating a unified analytics platform for digital nativesDeveloper AdvocateFirebase Product Strategy & Operations

Digital native companies have no shortage of data, which is often spread across different platforms and Software-as-a-service (SaaS) tools. As an increasing amount of data about the business is collected, democratizing access to this information becomes all the more important. While many tools offer in-application statistics and visualizations, centralizing data sources for cross-platform analytics allows … Read more

Rubin Observatory offers first astronomy research platform in the cloudRubin Observatory offers first astronomy research platform in the cloudGoogle for Education Marketing

This week, the Vera C. Rubin Observatory is launching the first preview of its new Rubin Science Platform (RSP) for an initial cohort of astronomers. The observatory, which is located in Chile but managed by the U.S. National Science Foundation’s NOIRLab in Tucson, AZ and SLAC in California, is jointly funded by the NSF and … Read more

Linking Images and Text with OpenAI CLIP

The following sections explain how to set up CLIP in Google Colab, and how to use CLIP for image and text search. Installation To use CLIP we first need to install a set of dependencies. To facilitate this we are going to install them through Conda. Also, Google Colab will be used to make replication … Read more

Significance or Hypothesis Tests with Python

College Statistics with Python In a series of weekly articles, I will cover some important statistics topics with a twist. The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random … Read more

Using Distillation to Protect Your Neural Networks

Like this past post on rethinking regularization, distillation provides flatter local minima. Hence, small changes to the input data are less likely to change the predicted values. Why is that important? Attackers can create adversarial examples. These examples include small changes (e.g. changing a few pixels) to real inputs that result in wrong predictions. This … Read more

Choosing the right machine learning approach for your applicationChoosing the right machine learning approach for your applicationDirector, Analytics & AI Solutions

The X axis represents the size of the data set and the Y axis is the error rate. As the size of the data set increases, the error rate drops. But notice something critical about the size of the data set — the x-axis is2^20, 2^21, 2^ 22, etc. In other words, each new tic … Read more

New research: Enterprises more confident than ever in cloud securityNew research: Enterprises more confident than ever in cloud securityDirector of Messaging and Content, Google Cloud

This is a clear indication that there are fewer reservations around the efficacy of cloud-based security solutions, signaling an increase in trust as organizations invest in cloud-based infrastructure and solutions.  We are committed to safe, secure solutions  Google Cloud protects your data, applications, and infrastructure, as well as your customers, from fraudulent activity, spam, and … Read more

AWS Amplify CLI adds support for storing environment variables and secrets accessed by AWS Lambda functions

Amplify CLI now supports storing environment variables and secrets to be used in AWS Lambda functions to help separate environment-specific configurations from business logic. The AWS Amplify CLI is a command line toolchain that helps frontend developers create app backends in the cloud that often include business logic powered by AWS Lambda functions. Customers use … Read more

Categories AWS ExcerptFavorite

Unit 3 Application) Evolving Neural Network for Time Series Analysis

Evolutionary Computation Course The Culmination of Unit 3 by applying our Concepts to Evolve a Neural Network for Predicting a Time Series Problem Hello and Welcome back to this full course on Evolutionary Computation! In this post we will wrap up Unit 3 with the much anticipated application of evolving the weights of a Neural … Read more

Feature Importance in Random Forest

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The Turkish president thinks that high interest rates cause inflation, contrary to the … Read more

Categories R Tags ExcerptFavorite

Spatially weighted averages in R with sf

Spatial joins allow to augment one spatial dataset with information from another spatial dataset by linking overlapping features. In this post I will provide an example showing how to augment a dataset containing school locations with socioeconomic data of their surrounding statistical region using R and the package sf (Pebesma 2018). This approach has the … Read more

Categories R Tags ExcerptFavorite

SAP Data Analytics in the Google Cloud

How to combine SAP with the Google Cloud Platform Photo by Christian Lue on Unsplash How to combine SAP with the Google Cloud Platform — Provider of powerful data analytics tools such as BigQuery, Data Studio or recently looker — to gain a powerful data analytics platform and valuable insights. If you are interested in … Read more

Reasonable Vehicles Rule the Road

Researchers at BMW have suggested a novel approach for the progression of autonomous vehicles, which uses RDFox, a semantic reasoning engine, in its knowledge-spatial architecture. Semantic reasoning is the “ability to make logical deductions from the information that is explicitly available”. This approach models the road environment, provides a dynamic update mechanism, and has semantic … Read more

Use New Relic One to effortlessly monitor applications in Azure Spring Cloud

Today, we are announcing the integration of New Relic One performance monitoring in Azure Spring Cloud. Over the past 18 months, we worked with many enterprise customers to learn about their scenarios. Many of these customers have thousands of Spring Boot applications running in on-premises data centers. As they migrate these applications to the cloud, … Read more

Basic Concepts of Natural Language Processing (NLP) Models and Python Implementation

As mentioned above, data cleaning is basic but very important step in NLP. Below are a few ways for data cleaning. Let’s consider the below line. line = ‘Reaching out for HELP. Please meet me in LONDON at 6 a.m #urgent’ 1. Remove stopwords: There are a few words which are very commonly used … Read more

Smartphone for Activity Recognition (Part 2)

In the previous article, we were performing classification on the Human Activity Recognition dataset. We know that this dataset has so many features (561 to be exact) and some of them strongly correlate with each other. Random Forest model can classify human activities as good as 94% accuracy using this dataset. However, it takes forever … Read more

Word, Subword, and Character-Based Tokenization: Know the Difference

The differences that anyone working on an NLP project should know Image by Sincerely Media on Unsplash Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that provides machines (computers) the ability to understand written and spoken human language in the same way as human beings. NLP is almost everywhere and helping people … Read more

Looking for the perfect words? Generate them.

Using an LSTM model to generate poems. Photo by Thought Catalog on Unsplash. Ever try to find the right words to describe how you are feeling? Want to profess your love to your significant other in a poem? Why don’t you just generate one based on thousands of well-known poems? This is what we attempted … Read more

Access Google Drive Using Google Colab Running an R Kernel

This approach builds upon that of a previous Medium article, attached here. It served as an inspiration and initial guide; however, for me, it threw errors, which seem to be Colab versioning artifacts. After working out a few kinks, I was able to successfully access my Google Drive file storage from a Google Colaboratory notebook … Read more

Think of `&&` as a stricter `&`

[This article was first published on Higher Order Functions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In programming languages, we find logical operators for and and or. … Read more

Categories R Tags ExcerptFavorite

How to find z score in R-Easy Calculation-Quick Guide

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. z score how to find?, z-score provides how many standard deviations … Read more

Categories R Tags ExcerptFavorite

Analysis of the Indian education system

Different factors in schooling Observations: More than 50% of states have at least 92% of schools with drinking water Much more schools have drinking water compared to electricity A very low percentage of schools have computer facility A higher percentage of schools have girls toilets as compared to boys toilets Comparison of different factors among … Read more