Amazon Kinesis Data Analytics for Apache Flink introduces custom maintenance windows in preview

Amazon Kinesis Data Analytics for Apache Flink now supports UpdateApplicationMaintenanceConfiguration in preview. Amazon Kinesis Data Analytics periodically patches the underlying infrastructure of applications with OS and container-image security updates to meet AWS compliance and security goals during the default maintenance windows in each region. You can use UpdateApplicationMaintenanceConfiguration via the CLI or the API to choose … Read more

Categories AWS ExcerptFavorite

This week’s stories from Google Cloud: April 30, 2021This week’s stories from Google Cloud: April 30, 2021

How to transfer your data to Google Cloud Any number of factors can motivate your need to move data into Google Cloud, including data center migration, machine learning, content storage and delivery, and backup and archival requirements. When moving data between locations, it’s important to think about reliability, predictability, scalability, security, and manageability. Google Cloud … Read more

3 Python Pandas Tricks for Efficient Data Analysis

Explained with examples. Photo by Nick Fewings on Unsplash Pandas is one of the predominant data analysis tools which is highly appreciated among data scientists. It provides numerous flexible and versatile functions to perform efficient data analysis. In this article, we will go over 3 pandas tricks that I think will make you a more … Read more

Deep In Singular Value Decomposition

For minimizing the dimensions of features, SVD is easily the most popular method of choice for Data Scientists. This is because SVD is easily the most versatile and venerable method of decomposition at our disposal. SVD serves as the basis for many models that need to interpret data in very high dimensions. Without the use … Read more

New AWS Solutions Implementation: AWS Blueprints

AWS Blueprints  helps AWS Distributors and AWS Solution Providers deploy, manage, and monitor solutions for their small and medium business (SMB) customers in the AWS Cloud. The solution deploys repeatable, scalable AWS Service Catalog portfolios including a mix of AWS services and third-party applications. Customizable and extensible pre-packaged portfolios: AWS Distributors and AWS Solution Providers can … Read more

Categories AWS ExcerptFavorite

5 Reasons Why Aspiring Data Scientists Should Join Hackathons in 2021

Even if you have little or no coding skills Photo by Mimi Thian on Unsplash I recently participated in my first hackathon which was a highly positive experience despite my initial worries. I had little to no software engineering or development experience coming into it. And I always believed hackathons were reserved for tech gurus … Read more

EdTechs transform education with AI and AnalyticsEdTechs transform education with AI and AnalyticsStrategic Business Executive, Education & Research, Google Cloud

Over the last year, COVID-19 presented unforeseen challenges for practically every type of business and organization—including schools, colleges, and universities. For educational institutions, the pandemic was an unapologetic agent of acceleration, shifting one billion learners from in-person to online learning within two months.  The rapid transition to online learning exposed many schools’ lack of readiness … Read more

Amazon Redshift announces preview of cross-account data sharing

Amazon Redshift data sharing allows you to share live, transactionally consistent data across different Redshift clusters without the complexity and delays associated with data copies and data movement. Ability to share data across clusters that are in the same AWS account is already generally available. Now you can preview cross-account data sharing to share data across Redshift … Read more

Categories AWS ExcerptFavorite

Five Subtle Pitfalls 99% Of Junior Python Developers Fall Into

#4. Verbose If-else statements There’s nothing worse in programming than unstructured, verbose conditional branching, especially when you’re using a language as concise and straightforward as Python. Poorly written if-else statements aren’t only hard to keep track of, but they also slow down your program. To put you on the spot, let’s compare the two pieces … Read more

KNN Algorithm Machine Learning

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. knn algorithm machine learning, in this tutorial we are going to … Read more

Categories R Tags ExcerptFavorite

Breadth vs Depth

Opinion Should data scientists be specialists or generalists? I’ve seen pretty conflicting advice about whether it’s better for a data scientist to specialize or be a generalist. The tension is this: to stand out from the crowd, companies want you to be a specialist in the specific skills they need. But focusing on a niche … Read more

Lessons from the First Two Data Scientists at a Startup

The pros, cons, and questions to ask before taking the plunge Photo by Camerauthor Photosandstories on Unsplash Data science in startups is notorious for being a memorable ride. From work that pivots on a dime from spreadsheets to customer interviews to CI/CD pipelines, to being handed more responsibility than you likely know what to do … Read more

R tips and tricks – readClipboard

[This article was first published on R – Eran Raviv, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Here is a small utility function to save you some … Read more

Categories R Tags ExcerptFavorite

Creating, editing, and merging ONNX pipelines.

Visualizing a simple image processing pipeline (image by author). ONNX is an amazingly useful format for storing Data Science / AI artifacts for version control and deployment. We are happy to share sclblonnx, a python package that enables easy editing and augmenting of ONNX graphs. Over the last year at Scailable we have heavily been … Read more

Stepping into the magical world of GANs

A step by step tutorial to GAN https://unsplash.com/photos/6dN9l-gopyo A generative model can potentially do magic, if trained properly it may write poetry, generate music, draw images like an expert. The objective of GAN is to generate synthetic samples after which are very realistic. GAN models learn the trick in an adversarial setting. There are two … Read more

Building a serverless, containerized batch prediction model using Google Cloud Run and Terraform

The goal of this post is a to set up a serverless infrastructure, managed in code, to serve batch predictions of a machine learning model or any other lightweight computation in an asynchronous way: A Google Cloud Run service will listen for new files in a Cloud Storage bucket via pub/sub message topic, trigger a … Read more

Easily build real-time apps with WebSockets and Azure Web PubSub—now in preview

Real-time application scenarios such as chat for streaming videos, interactive whiteboards for remote education, and IoT dashboards are becoming ever more popular. Businesses are keen to build such applications for enhanced user experiences and real-time interactions with end customers. Today, we are announcing the preview of the Azure Web PubSub service for building real-time web applications … Read more

Microsoft acquires Kinvolk to accelerate container-optimized innovation

The ability to run Kubernetes anywhere, whether in the cloud or on-premises, has been a high priority for Azure customers looking to rapidly innovate, with increasing customer focus on the benefits of container-optimized workloads and operating systems, lean application modernization, easier operations, and platform resiliency. To support this rapid evolution, we’re announcing that Microsoft has acquired … Read more

Data Apps with Python’s Streamlit

Inputs We’ll use slider to help us sample a fraction of the dataset, and multiselect to choose which columns we want. import streamlit as stimport pandas as pddef explore(df)…def transform(df):# Select sample sizefrac = st.slider(‘Random sample (%)’, 1, 100, 100)if frac < 100:df = df.sample(frac=frac/100) # Select columnscols = st.multiselect(‘Columns’, df.columns.tolist(),df.columns.tolist())df = df[cols] return dfdef … Read more

Automate Finding and Busting Constraints with Smart Simulations

Using simulations to generate insights about your work processes A process chart of a simulation (Image by author) Simulations are a traditional, ‘good old fashioned’ AI technique for finding potential optimizations in a system; they can be applied to pretty much any domain, and have found widespread adoption in fields like manufacturing and supply chain … Read more

Recent Advances in Functional Data Analysis

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The fourth “One World webinar” organized by YoungStatS will take place on June … Read more

Categories R Tags ExcerptFavorite

a common confusion between sample and population moments

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Related Favorite

Categories R Tags ExcerptFavorite

How to Reshape a Pandas DataFrame

I remember playing a lot with modeling clay and bricks when I was little. What I loved the most was not the toys themselves, but the fun of building and shaping things with small parts. I was fascinated by the fact that two bricks only fit together if you put them in the right position. … Read more

Top Big AI Trends and Challenges Impacting Media, Advertising & Entertainment Industry

I recently interviewed some of the top data science leaders from Comcast/Freewheel, Condé Nast, ViacomCBS, Audoir, USA Today Network, and Samba TV on the biggest trends, challenges, and opportunities they see for ML & AI in media, advertising, & entertainment — and what the future may hold. Let’s dive in! What are some of the … Read more

Announcing general availability of Amazon Redshift native JSON and semi-structured data support

Amazon Redshift native support for JSON and semi-structured data is now generally available. It is based on the new data type ‘SUPER’ that allows you to ingest and store semi-structured data in your Amazon Redshift data warehouses. Amazon Redshift also includes support for PartiQL for SQL-compatible access to relational, semi-structured and nested data. Using the SUPER … Read more

Categories AWS ExcerptFavorite

Machine Learning Model Interpretation

Using Skater to built ML visualization Tree(Source: By Author) Interpreting a machine learning model is a difficult task because we need to understand how a model works in the backend, what all parameters the model uses, and how the model is generating the prediction. There are different python libraries that we can use to create … Read more

ARIMA Model In Python

The next step is to transform our data to Stationary so we will have an estimate for d and D parameters we will use in the model. This can be done using Differencing and it’s performed by subtracting the previous observation from the current observation. difference(T) = observation(T) — observation(T-1) Then, we will test it … Read more

Logistic Regression R- Tutorial

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Logistic Regression R, In this tutorial we used the student application … Read more

Categories R Tags ExcerptFavorite

The Solution to my Viral Coin Tossing Poll

Some time ago I conducted a poll on LinkedIn that quickly went viral. I asked which of three different coin tossing sequences were more likely and I received exactly 1,592 votes! Nearly 48,000 people viewed it and more than 80 comments are under the post (you need a LinkedIn account to fully see it here: … Read more

Categories R Tags ExcerptFavorite

Deep Learning with Keras Cheat Sheet (2021), Python for Data Science

Artificial Neural Network (ANN) Deep learning model used for classification and regression. Binary Classification: The code below is an example of an ANN model used for binary classification (identifying a class as 0 or 1). The code adds 3 layers to the model. The first is the input layer and has 12 nodes, the second … Read more

How to plot XGBoost trees in R

In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases. Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, … Read more

Categories R Tags ExcerptFavorite

Set Up Your Package to Foster a Community – Community Call Summary

Last Thursday we held a Community Call discussing how to set up a Package to Foster a Community. This call included speakers Maëlle Salmon, Hugo Gruson and Steffi LaZerte, and was moderated by Stefanie Butland. Scientific software development – and with that R packages – is a community effort. While there are often just a … Read more

Categories R Tags ExcerptFavorite

Amazon FSx File Gateway delivers faster and more efficient on-premises access to fully managed file storage in the cloud

AWS Storage Gateway adds a new gateway type, Amazon FSx File Gateway, providing low-latency on-premises access to fully managed file shares in the cloud. Customers that want to take advantage of fully managed cloud file storage, but require low latency for their users and applications, can now easily extend Amazon FSx for Windows File Server … Read more

Categories AWS ExcerptFavorite

Understanding Design Docs Principles for Achieving Data Scientists

Guide for Data Scientists Productivity Run your Data Project Effectively with the Right Design Docs Source (Unsplash) A good design docs is inseparable from A Good Data Scientist and Engineer — Vincent Tatan, Google ML Engineer In most cases, Engineers spent 18 months contemplating and writing documents on how best to serve the customer. — … Read more

What I learnt from my Data Science job

Important lessons and learnings Credits — Instagram andrewtneel I recently started working as a Data Scientist and here are my thoughts after the first three months into my job. Apart from these small learnings there are also a lot of technical aspects which I came across but that anyone can learn even after joining. Machine … Read more

How to make Topic Models Interpretable: 3 New Ideas

Three Innovative Techniques for Tuning LDA Topic Model Outputs Topic modelling is an unsupervised machine learning approach which scans a set of documents, detects word and phrase patterns within them, and automatically clusters word groups and similar expressions that best characterize a set of text responses (or documents). To date, Latent Dirichlet Allocation (LDA) has … Read more

Using SQL for R data.frames with sqldf

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. There are many R packages for querying SQL Databases. Recently, I … Read more

Categories R Tags ExcerptFavorite

C++ Basics: Moving Resources

When should we write our own move constructor and move assignment operator? Photo by Fotis Fotopoulos on Unsplash When writing a program you will encounter a case where you need to move (large) resources around from one object to another. In C++ we have move semantics which is a way to move resources to avoid … Read more

Self Organizing Maps in R- Supervised Vs Unsupervised

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Self-organizing maps are very useful for clustering and data visualization. Self-organizing … Read more

Categories R Tags ExcerptFavorite

Discover the latest Red Hat on Azure innovations—sign up for the Red Hat Summit

Microsoft is excited to join Red Hat Summit 2021. It is our fifth year consecutively participating, and we look forward to engaging with the Red Hat community. In the April segment (April 27-28), Scott Guthrie, our executive vice president of Cloud + AI, will join Paul Cormier in the keynote “Open hybrid cloud: Changing what’s … Read more