What We Love About Prefect

It’s Open Source, plays nicely with Kubernetes, and there’s a great community Source: Author You can read more about why we were less than impressed with Kubeflow in our previous post. In short, we’re building a reference architecture for machine learning projects. When it’s done, it’lll be a collection of our favorite machine learning tools, … Read more

Demystifying cloud economics

Migrating to the cloud is an evolution, and it’s important to think differently about how you consume resources. As you’re building a business case in your organization, it’s critical to step back and understand the cloud’s key constructs and transform your mindset. It starts by having a conversation about today versus tomorrow and what is possible in … Read more

E-commerce on Azure increases security with Payment Card Industry Three-Domain Secure compliance

More customers than ever are shopping from home in the current health environment, and companies are responding by rapidly deploying cloud-based e-commerce solutions. Azure is helping these companies meet their customers’ needs with robust, customizable, and scalable e-commerce solutions that process transactions quickly and securely.  Security is paramount for both e-commerce providers and customers, and … Read more

C++ Foundations

When I left my old job at Austrian Post, my colleagues got me an amazing parting-gift: An Amazon Deep Racer 🙂 While it is quite easy to work with Deep Racer using AWS Sagemaker, I wanted to program Deep Racer directly. Deep Racer runs on Ubuntu Linux with the Robot Operating System (ROS). ROS is … Read more

Simulate your Trading Strategy with Python

I’m typically interested in comparing weekly (5-day) vs monthly (20-day) investment interval. The dataset is pulled via the Python package yahoofinancials. Backtesting is done by randomly sampling historical prices over a 5-year period since 2016 to construct portfolios. Before getting into simulation, below are some of the assumptions embedded in the model: Same starting time … Read more

5 Reasons You Should Learn Shiny

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. 👉 Sign Up For More Blog Articles 👈 Many data scientists struggle with … Read more

Categories R Tags ExcerptFavorite

Benstats Talks #1: A presentation on CNNS

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Its been a while since I last posted anything on my … Read more

Categories R Tags ExcerptFavorite

Excess Deaths February Update

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The CDC continues to update its counts of deaths by cause … Read more

Categories R Tags ExcerptFavorite

Mesmerize Your Readers With Animated Graphs & GIFs in R

[This article was first published on Dylan Anderson, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In my last post, I demonstrated how you can create graphs and … Read more

Categories R Tags ExcerptFavorite

AWS Network Firewall Deployment Automations for AWS Transit Gateway is Generally Available

We’re excited to announce the launch of AWS Network Firewall Deployment Automations for AWS Transit Gateway, a reference implementation to help customers deploy and configure the AWS resources needed to inspect and filter VPC-to-VPC (East-West) traffic. AWS Network Firewall gives customers granular visibility and control of their network traffic, allowing customers to accomplish network segmentation, … Read more

Categories AWS ExcerptFavorite

folded Normals

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. While having breakfast (after an early morn swim at the … Read more

Categories R Tags ExcerptFavorite

Three NLP Decoding Methods

From greedy to beam search Photo by Jesse Collins on Unsplash One of the often-overlooked parts of sequence generation in natural language processing (NLP) is how we select our output tokens — otherwise known as decoding. You may be thinking — we select a token/word/character based on the probability of each token assigned by our … Read more

Chatbot: Complete Pycharm App

Chatterbot, Django, Python and Pycharm all unified in this ready to go Chatbot App Image by author Motivation: Are you looking for a completely ready to go chatbot, which you can easily adapt to your needs? Look no further, if you are willing to use Python, Pycharm, Django and Chatterbot all combined. Top of that, … Read more

Automating quota management with Azure Quota REST API

Enterprises are increasingly defined by the applications they use and build to run their core business processes, including the customer experiences they provide. Across all sectors, we see how companies like challenger banks, online healthcare providers, e-commerce providers, and other startups are winning customers by providing new applications. The continuous need to innovate and deliver … Read more

Multi-Agent Deep Reinforcement Learning in 13 Lines of Code Using PettingZoo

A tutorial on multi-agent deep reinforcement learning for beginners This tutorial provides a simple introduction to using multi-agent reinforcement learning, assuming a little experience in machine learning and knowledge of Python. A Brief Introduction to Reinforcement Learning Reinforcement stems from using machine learning to optimally control an agent in an environment. It works by learning … Read more

Data Visualization with Pandas

It is more than just plain numbers Photo by Markus Winkler on Unsplash Pandas is arguably the most popular data analysis and manipulation library. It makes it extremely easy to manipulate data in tabular form. The various functions of Pandas constitutes a powerful and versatile data analysis tool. Data visualization is an essential part of … Read more

Add Shiny to Rmarkdown

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This article is part of a R-Tips Weekly, a weekly video tutorial that … Read more

Categories R Tags ExcerptFavorite

Run data science at scale with Dataproc and Apache SparkRun data science at scale with Dataproc and Apache SparkProduct Manager, Data Analytics

Dataproc Hub feature is now generally available: Secure and scale open source machine learning Dataproc Hub, a feature now generally available for Dataproc users, provides an easier way to scale processing for common data science libraries and notebooks, govern custom open source clusters, and manage costs so that enterprises can maximize their existing skills and … Read more

Liquid Neural Networks in Computer Vision

In this post, we will discuss the new liquid neural networks and what they might mean for the vision field (cite) Excitement is building in the artificial intelligence community around MIT’s recent release of liquid neural networks. The breakthroughs that Hasani and team have made are incredible. Let’s dive in. YouTube version of this post. … Read more

Quantifing changes of spatial patterns

TLTR: Quantifing changes of spatial patterns requires two datasets for the same variable in the same area. Both datasets are divided into many sub-areas, and spatial signatures are derived for each sub-area for each dataset. Next, distances for each pair of areas are calculated. Sub-areas with the largest distances represent the largest change. To reproduce … Read more

Categories R Tags ExcerptFavorite

Getting started with k-means and #TidyTuesday employment status

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the latest in my series of screencasts demonstrating … Read more

Categories R Tags ExcerptFavorite

January 2020: “Top 40” New CRAN Packages

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Two hundred thirty new packages made it to CRAN in January. Here … Read more

Categories R Tags ExcerptFavorite

External Graphics with knitr

This is part three of our four part series on {knitr} and {rmarkdown} Part 1: Specifying the correct figure dimension in {knitr}. Part 2: What image format should you use for graphics. Part 3: Including external graphics in your document (this post). Part 4: Optimal {knitr} settings. In this third post, we’ll look at including … Read more

Categories R Tags ExcerptFavorite

Essential Math for Data Science: Eigenvectors and application to PCA

ESSENTIAL MATH FOR DATA SCIENCE Understand eigenvectors and eigenvalues and how they relate to Principal Component Analysis (PCA) (Image by author) Matrix decomposition, also called matrix factorization is the process of splitting a matrix into multiple pieces. In the context of data science, you can for instance use it to select parts of the data, … Read more

Introducing our new Solutions Training for Partners: Sales Best Practices courses

AWS Training and Certification is excited to launch the Solutions Training for Partners: Sales Best Practices digital training series. These 30-minute fundamental courses are designed to introduce AWS Partners to cloud concepts and relevant AWS solutions for services that customers leverage most to innovate and transform their business. AWS Solutions Training for Partners: Analytics on … Read more

Categories AWS ExcerptFavorite

Detecting Beeps from Brains

To attempt supervised learning, I had to confront the problems of heterogeneity and the small dataset. Because of these problems, directly applying supervised learning to whole sessions produced poor results. To overcome them, I decided to look only at small time-windows or “snippets” and predict whether a beep was occurring. The price for overcoming heterogeneity … Read more

Gradio vs Streamlit vs Dash vs Flask

Machine learning models are exciting and powerful, but they aren’t very useful by themselves. Once a model is complete, it likely has to be deployed before it can deliver any sort of value. As well, being able to deploy a preliminary model or a prototype to get feedback from other stakeholders is extremely useful. Recently, … Read more

Accelerating Department of Defense mission workloads with Azure

As the Azure engineering team continues to deliver a rapid pace of innovation for defense customers, we’re also continuing to support Department of Defense (DoD) customers and partners in delivering new capabilities to serve mission needs. In many cases, accelerating mission workloads means forging a faster and more secure way for teams to build, ship, … Read more

Migrate SAP systems faster with Cloud Move for Azure by SNP

​This blog post has been co-authored by Hiren Shah, Principal PM Manager, SAP on Azure SAP solutions power the digital enterprise core of many enterprises. As part of their digital transformation journeys, organizations are looking to migrate their mission-critical SAP workloads to Microsoft Azure to take advantage of hyperscale cloud, agility, and business continuity. Migrating mission-critical … Read more

Empirical Economics with R (Part D): Instrumental Variable Estimation and Potential Outcomes

Chapter 5 of my course Empirical Economics with R covers instrumental variable (IV) estimation. While being one of the most popular methods in academic economic papers for estimating causal effects (see e.g. the statistics here), I was not sure whether to introduce IV estimation in this Bachelor level course. My hesitation was due to the … Read more

Categories R Tags ExcerptFavorite

Complete the Introduction R Course for Free until March 7

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Complete the Introduction to R Course for Free until March 7 To … Read more

Categories R Tags ExcerptFavorite

Kustomize Best Practices

[This article was first published on Open Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction In recent years, Kubernetes has become a renowned solution for orchestrating … Read more

Categories R Tags ExcerptFavorite

Machine Learning ‘on the rocks’ [Whiskey Dataset] | by Gerasimos Plegas

The Blended Malt takes the lead in rating, with the simple Blended coming next — the former’s Mean is by 0,23% (88,11–87,88) higher. It is noteworthy, that the Blended Malt’s Median is quite above the 2nd quartile (Mean), hence more than 50% of the bottles are rated above the average (88%). This is a decent … Read more

All Pandas json_normalize() you should know for flattening JSON

All nested lists are put up into a single column students and other values are flattened. To flatten the nested list, we can set the argument record_path to [‘students’]. Notices that not all records have math and physics, and those missing values are shown as NaN. pd.json_normalize(json_list, record_path=[‘students’]) (image by author) If you would like … Read more

At your service! With schedule-based autoscaling, VMs are at the readyAt your service! With schedule-based autoscaling, VMs are at the readyProduct Manager

We believe that managing even the most demanding VM-based application in Google Cloud should be easy. For instance, in a Compute Engine environment, managed instance groups (MIGs) offer autoscaling that lets you automatically change an instance group’s capacity based on current load, so you can rightsize your environment—and your costs. Autoscaling adds more virtual machines … Read more

How to Build a Custom Machine Learning Model

Implementing a custom ensemble model with under-sampling for imbalanced data This post will show you how to implement your own model and make it compliant with scikit-learn’s API. The final result will be a model that can not only be fitted and used for predictions but also be used in combination with other scikit-learn tools … Read more

Amazon Connect now provides disconnect reason for Voice Calls & Tasks

Amazon Connect CTR (contact trace records) stream now includes disconnect reason for Voice calls & Tasks. This will indicate whether an agent or customer disconnected the call, or whether a telecom or network issue caused a call to disconnect, or if a task was completed by an agent or flow.   Disconnect reason is available … Read more

Categories AWS ExcerptFavorite

Introducing JumpeR – For Track and Field Data

Ordinarily posts on Swimming + Data Science have focused on swimming, or sometimes diving. Today though we’re going to visit some of our more gravity-afflicted colleagues and do a bit of cross-training. That’s because following what I’m going to call the SwimmeR package’s massive success literally several people reached out to me regarding developing a … Read more

Categories R Tags ExcerptFavorite

Introducing the RStudio Launcher Plugin SDK

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Improving Interoperability through the RStudio Job Launcher In a previous blog post … Read more

Categories R Tags ExcerptFavorite

The Gambler’s Ruin Problem

Photo by Karine Avetisyan on Unsplash A sequence of random variables X that represent different points in time defined by discrete time intervals can be referred to as a stochastic process. The first random variable in the process is known as the initial state, and the rest of the random variables in the process define … Read more

Creating synthetic time series data

A step-by-step guide to creating high quality synthetic time-series datasets with Python In this post, we will create synthetic versions of a time-series dataset using Gretel.ai’s synthetic data library, visualize and analyze the results, and discuss several use cases for synthetic time series data. Time series © Gretel.ai One of the biggest bottlenecks that we … Read more

A deep dive into serverless applications on Power Apps and Azure

In 2021, each month we will be releasing a monthly blog covering the webinar of the month for the low-code application development (LCAD) on Azure solution. LCAD on Azure is a new solution to demonstrate the robust development capabilities of integrating low-code Microsoft Power Apps and the Azure products you may be familiar with.   This month’s … Read more

From Teaching to Data Science

How and Why I Transitioned from Teaching Elementary School to Data Science Photo by Author When I first started browsing career choices, I almost immediately turned away from data science because of the requirements listed in the job postings: Coding Experience in Python, SQL and/or R Masters or a PhD in computer science, engineering, math, … Read more

Flooding: An Emerging Threat To the Modern Day Coastline

Image by author In this post, we will outline a case study analyzing the social and economic impacts of flooding along the eastern seaboard of the United States. We will ask questions like, “What is the relationship between flood losses, migration patterns, and real-estate value?” and, “Are home values likely to decrease where climate change … Read more

distinct() vs dropDuplicates() in Spark

What’s the difference between distinct() and dropDuplicates() in Spark? Photo by Juliana on unsplash.com The Spark DataFrame API comes with two functions that can be used in order to remove duplicates from a given DataFrame. These are distinct() and dropDuplicates() . Even though both methods pretty much do the same job, they actually come with … Read more