Effectively Deploying and Scaling Shiny Apps with ShinyProxy, Traefik and Docker Swarm

[This article was first published on R | databentobox, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Table of Contents Introduction If you search for R Shiny apps … Read more Effectively Deploying and Scaling Shiny Apps with ShinyProxy, Traefik and Docker Swarm

Superspreading and the Gini Coefficient

Abstract: We look at superspreading in infectious disease transmission from a statistical point of view. We characterise heterogeneity in the offspring distribution by the Gini coefficient instead of the usual dispersion parameter of the negative binomial distribution. This allows us to consider more flexible offspring distributions. This work is licensed under a Creative Commons Attribution-ShareAlike … Read more Superspreading and the Gini Coefficient

Mimic Excel’s Conditional Formatting in R

[This article was first published on triKnowBits, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The DT package is an interface between R and the JavaScript DataTables library … Read more Mimic Excel’s Conditional Formatting in R

AWS Storage Gateway consolidates alarms and metrics for simplified console monitoring and management

With this launch, you can monitor CloudWatch Alarms for gateway metrics such as cache status, disk performance, and health notifications for VMware vSphere high availability in the Storage Gateway console. You can also choose to receive alarm notifications by subscribing an alarm to the Amazon SNS from the Amazon CloudWatch console. Storage Gateway is a … Read more AWS Storage Gateway consolidates alarms and metrics for simplified console monitoring and management

Two Different Methods to Apply Some Corey Hoffstein Analysis to your TAA

So, first off: I just finished a Thinkful data science in python bootcamp program that was supposed to take six months, in about four months. All of my capstone projects I applied to volatility trading; long story short, none of the ML techniques worked, and the more complex the technique I tried, the worse it … Read more Two Different Methods to Apply Some Corey Hoffstein Analysis to your TAA

AWS Marketplace enables SaaS contract upgrades and renewals

Sellers, whether an independent software vendor (ISV) or consulting partner, can use self-service functionality within Seller Private Offers to easily create an upgrade or renewal offer at any time during a buyer’s active agreement. When creating an upgrade, sellers can grant new entitlements, apply pricing discounts, update payment schedules, or change an end user license … Read more AWS Marketplace enables SaaS contract upgrades and renewals

Performance tuning best practices for Memorystore for RedisPerformance tuning best practices for Memorystore for RedisStrategic Cloud Engineer, Google Cloud

Run benchmarkNow that you have your data loaded and command chosen, you can run the benchmark test. Adjust the number of processes and instances to execute YCSB according to the load amount. In order to identify performance bottlenecks, you need to look at multiple metrics. Here are the typical indicators to investigate: LatencyYCSB outputs latency … Read more Performance tuning best practices for Memorystore for RedisPerformance tuning best practices for Memorystore for RedisStrategic Cloud Engineer, Google Cloud

Confusion Matrix for Your Multi-Class Machine Learning Model

A confusion matrix is a tabular way of visualizing the performance of your prediction model. Each entry in a confusion matrix denotes the number of predictions that were made by the model where it classified the classes correctly or incorrectly. Anyone who is already familiar with the confusion matrix knows that most of the time … Read more Confusion Matrix for Your Multi-Class Machine Learning Model

Why building a data science solution is complex but not in the way you think

It is not all about the tech and the algorithms Data science is complicated, getting value from data is hard and most data projects fail. These are all pains that we felt when we tried to deliver a data science solution. After talking to different people in the industry, the most common challenge faced by … Read more Why building a data science solution is complex but not in the way you think

Several Different Ways to Combine Datasets in SAS

Sometimes you need to combine observations from two or more data sets into a single observation in a new data set according to the values of a common variable. This is called match-merging. Generally speaking, during match-merging, SAS sequentially checks each observation of each data set to see whether the BY values match, and then … Read more Several Different Ways to Combine Datasets in SAS

AdaOpt classification on MNIST handwritten digits (without preprocessing)

Last week on this blog, I presented AdaOpt for R, applied to iris dataset classification. And the week before that, I introduced AdaOpt for Python. AdaOpt is a novel probabilistic classifier, based on a mix of multivariable optimization and a nearest neighbors algorithm. More details about the algorithm can be found in this (short) paper. … Read more AdaOpt classification on MNIST handwritten digits (without preprocessing)

pins 0.4: Versioning

A new version of pins is available on CRAN today, which adds support for versioning your datasets and DigitalOcean Spaces boards! As a quick recap, the pins package allows you to cache, discover and share resources. You can use pins in a wide range of situations, from downloading a dataset from a URL to creating … Read more pins 0.4: Versioning

Mad methods

[This article was first published on R on OSM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Over the past few weeks, we’ve examined the three major methods … Read more Mad methods

The robustness of Machine Learning algorithms against missing or abnormal values

The principle behind that test is very simple, and in reverse compared to the usual process: Normally, a sane Data Scientist would try to impute the missing or abnormal values to improve his model’s accuracy; I chose to do it the other way around! I would start by evaluating the performance of classic machine learning … Read more The robustness of Machine Learning algorithms against missing or abnormal values

Amazon Elasticsearch Service now supports three Availability Zone deployments in Asia Pacific (Mumbai) Region

Amazon Elasticsearch Service now enables you to deploy your instances across three Availability Zones (AZs) providing better availability for your domains. If you enable replicas for your Elasticsearch indices, Amazon Elasticsearch Service distributes the primary and replica shards across nodes in different AZs to maximize availability.  Favorite

Google Cloud adds smart analytics frameworks for AI Platform NotebooksGoogle Cloud adds smart analytics frameworks for AI Platform NotebooksProduct Manager, Google CloudProduct Manager

Google Cloud is announcing the beta release of smart analytics frameworks for AI Platform Notebooks. Smart Analytics Frameworks  brings closer the model training and deployment offered by AI Platform with the ingestion, preprocessing, and exploration capabilities of our smart analytics platform. With smart analytics frameworks for AI Platform Notebooks, you can run petabyte-scale SQL queries … Read more Google Cloud adds smart analytics frameworks for AI Platform NotebooksGoogle Cloud adds smart analytics frameworks for AI Platform NotebooksProduct Manager, Google CloudProduct Manager

Helping veterans build a career path with the Google Cloud certification challengeHelping veterans build a career path with the Google Cloud certification challengeCertification Operations Manager

Each year about 200,000 veterans transition out of service but, despite being well-equipped to work in the tech sector, many of these skilled veterans don’t have a clear career path. That’s why Google Cloud is partnering with VetsInTech to help U.S. veterans develop in-demand cloud technology skills through a Google Cloud certification challenge. This six- … Read more Helping veterans build a career path with the Google Cloud certification challengeHelping veterans build a career path with the Google Cloud certification challengeCertification Operations Manager

Combining the power of Apache Spark and AI Platform Notebooks with Dataproc HubCombining the power of Apache Spark and AI Platform Notebooks with Dataproc HubSolutions Architect

7. This should open a page that shows you either a configuration form or redirects you to the JupyterLab interface. If this is working, keep note of the URL of the page that you opened.  8. Share the URL with the group of data scientists that you created the Dataproc Hub instance for. Dataproc Hub … Read more Combining the power of Apache Spark and AI Platform Notebooks with Dataproc HubCombining the power of Apache Spark and AI Platform Notebooks with Dataproc HubSolutions Architect

Meeting reliability challenges with SRE principlesMeeting reliability challenges with SRE principlesSite Reliability Engineer

You’ve built a beautiful, reliable service, and your users love it. After the initial rush from launch is over, realization dawns that this service not only needs to be run, but run by you! At Google, we follow site reliability engineering (SRE) principles to keep services running and users happy. Through years of work using … Read more Meeting reliability challenges with SRE principlesMeeting reliability challenges with SRE principlesSite Reliability Engineer

RStudio Shortcuts and Tips

Updated: May 2020 by Appsilon Data Science How to Work Faster in RStudio In this article we have compiled many of our favorite RStudio keyboard shortcuts, tips, and tricks to help increase your productivity while working with the RStudio IDE. We’ll also provide information about supplemental tools and techniques that are useful for data scientists … Read more RStudio Shortcuts and Tips

Optimize for internet traffic with Peering Service and the routing preference option

Last week at the Microsoft Build conference, we announced that Azure Peering Service is now generally available. We also introduced “routing preference,” a new option for our customers to further architect and optimize their traffic to and from Azure over the “public Internet.” Networking is a critical enabler of the cloud. The experience when accessing … Read more Optimize for internet traffic with Peering Service and the routing preference option

How to Safely Remove a Dynamic Shiny Module

Despite their advantages, Dynamic Shiny Modules can destabilize the Shiny environment and cause its reactive graph to be rendered multiple times. In this blogpost, I present how to remove deleted module leftovers and make sure that your Shiny graph observers are rendered just once. While working with advanced Shiny applications, you have most likely encountered … Read more How to Safely Remove a Dynamic Shiny Module

How Different Metrics Correlate with Winning in the NBA over 30 Years

Which metrics lead to the winningest NBA basketball teams? How has a metric’s correlation to winning changed over the last 30 years? The game has changed tremendously over the past decades and even in the past few seasons. The game has trended toward analytics-driven basketball, emphasis on long-distance shooting, and faster pacing. This is the … Read more How Different Metrics Correlate with Winning in the NBA over 30 Years

Attack Pattern Detection and Prediction

Cyber-adversaries are becoming more sophisticated in their efforts to avoid detection, and many modern malware tools are already incorporating new ways to bypass antivirus and other threat detection measures. Because networks and organizations use sophisticated methods to detect and respond to attacks, the response can be so strong that criminals try to respond with something … Read more Attack Pattern Detection and Prediction

The impact of rules on queries

Realising the knowledge in your data An application’s logic for processing and manipulating data is typically controlled by an application or logic layer which sits between the database and the presentation. This formulates the requests which must then comply with the database’s structure. The following diagram represents the classic three tier architecture on which much … Read more The impact of rules on queries

PCA, Eigenvectors and the Covariance Matrix

Almost every data science course will at some point (usually) sooner than later cover PCA, i.e. Principal Component Analysis. PCA is an important tool used in exploratory data analysis for dimensionality reduction. In this post I want to show you (hopefully in an intuitive way) how PCA works its mathematical magic. Let’s start with a short … Read more PCA, Eigenvectors and the Covariance Matrix

Announcing General Availability of Amplify iOS and Amplify Android, with new authentication, data, and AI/ML support

Compared to the previous AWS Mobile SDKs for iOS and Android, the Amplify iOS and Android libraries are organized by use case and provide declarative programming interfaces that enable mobile developers to easily add capabilities such as Authentication, Analytics, Predictions (for common AI/ML use cases), API (GraphQL and REST), DataStore (for offline and real-time data), … Read more Announcing General Availability of Amplify iOS and Amplify Android, with new authentication, data, and AI/ML support

Critique of “Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period” — Part 1: Reproducing the results

I’ve been looking at the following paper, by researchers at Harvard’s school of public health, which was recently published in Science: Kissler, Tedijanto, Goldstein, Grad, and Lipsitch (2020) Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period (also available here, with supplemental materials here). This is one of the papers referenced in my recent … Read more Critique of “Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period” — Part 1: Reproducing the results

How Kaggle solved a spam problem in 8 days using AutoMLHow Kaggle solved a spam problem in 8 days using AutoMLStaff Developer Advocate and Head of Competitions, Kaggle

Kaggle is a data science community of nearly 5 million users. In September of 2019, we found ourselves under a sudden siege of spam traffic that threatened to overwhelm visitors to our site. We had to come up with an effective solution, fast. Using AutoML Natural Language on Google Cloud, Kaggle was able to train, test, and … Read more How Kaggle solved a spam problem in 8 days using AutoMLHow Kaggle solved a spam problem in 8 days using AutoMLStaff Developer Advocate and Head of Competitions, Kaggle

SENTIMENTAL ANALYSIS USING VADER

interpretation and classification of emotions Sentiment analysis is a text analysis method that detects polarity (e.g. a positive or negative opinion) within the text, whether a whole document, paragraph, sentence, or clause. Sentiment analysis aims to measure the attitude, sentiments, evaluations, attitudes, and emotions of a speaker/writer based on the computational treatment of subjectivity in … Read more SENTIMENTAL ANALYSIS USING VADER

Microsoft and Docker collaborate on new ways to deploy containers on Azure

Now more than ever, developers need agility to meet rapidly increasing demands from customers. Containerization is one key way to increase agility. Containerized applications are built in a more consistent and repeatable way, by way of defining desired infrastructure, dependencies, and configuration as code for all stages of the lifecycle. Applications often start and stop … Read more Microsoft and Docker collaborate on new ways to deploy containers on Azure

Reasons why data science projects are not always successful – Part 1

Data science is one of the most wide-ranging disciplines of the 21st century. Data scientists use a wide variety of methods and tools to generate more knowledge from data and its analysis. Especially in times like today, data and the insights we can draw from them are becoming increasingly important. Almost every business process generates … Read more Reasons why data science projects are not always successful – Part 1

The Azure SQL family: Innovation and value in the cloud

How businesses respond in times of uncertainty is as varied as the businesses themselves. Many slow down operations to operate more cost-effectively, while others lean into new opportunities that didn’t exist before. Regardless of how you respond, ensuring your organization can cost-effectively adapt and scale to rapidly changing conditions is key. When it comes to … Read more The Azure SQL family: Innovation and value in the cloud

Streamlining your image building process with Azure Image Builder

Customizing virtual machine (VM) images to meet security and compliance requirements and achieve faster deployment is a strong need for many enterprises, but most don’t enjoy the process and energy needed for determining the right tooling, building the right pipeline, and maintaining it continuously. We built Azure Image Builder service to make building customized images … Read more Streamlining your image building process with Azure Image Builder

Deploy to Azure using GitHub Actions from your favorite tools

Enterprises and teams are adopting DevOps technologies combined with people and processes to deliver high-quality code, with faster release cycles and continuous delivery of value, to achieve higher levels of satisfaction for their own customers. However, it can often get difficult to craft CI/CD pipelines by editing multiple YAMLs to stitch your code to cloud automation workflows. … Read more Deploy to Azure using GitHub Actions from your favorite tools

Sentiment Analysis: VADER or TextBlob?

Both libraries output relatively similar results, however VADER looks to pick up more of the negative tone from the IMDB review, which TextBlob missed out on.Both libraries are also highly extendable to look at many other categories related to natural language processing, such as: Part of Speech Tagging The process of converting a sentence to … Read more Sentiment Analysis: VADER or TextBlob?

PCA and UMAP with tidymodels and #TidyTuesday cocktail recipes

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Lately I’ve been publishingscreencasts demonstrating how to use thetidymodels framework, … Read more PCA and UMAP with tidymodels and #TidyTuesday cocktail recipes

Tidymodels and XGBooost; a few learnings

This post will look at how to fit an XGBoost model using the tidymodels framework rather than using the XGBoost package directly. Tidymodels is a collection of packages that aims to standardise model creation by providing commands that can be applied across different R packages. For example, once the code is written to fit an … Read more Tidymodels and XGBooost; a few learnings