Introducing Streamlit Sharing

Deploy, manage, and share your Streamlit apps for free Machine learning and data science code is easy to share but hard to use. GitHub overflows with models, algorithms, and datasets. But code is static. Can you play with the models? See the algorithms? Interact with the data? Doing so requires following complex instructions, installing packages, … Read more

Data Observability: How to Prevent Your Data Pipelines from Breaking

The relationship between data downtime, observability, and reliable insights Image courtesy of Julia Kozoski on Unsplash. While the technologies and techniques for analyzing, aggregating, and modeling data have largely kept pace with the demands of the modern data organization, our ability to tackle broken data pipelines has lagged behind. So, how can we identify, remediate, … Read more

Clear Understanding of Depth-First Search Algorithm and Its Python Implementation: Graph Algorithm

Learn with Clear Visualizations. Also learn a common mistake that people make in Depth-first search algorithm What is a depth-first search? This is one of the widely used and very popular graph search algorithms. To understand this algorithm, think of a maze. What we do when have to solve a maze? We take a route, … Read more

Nucleci segmentation in R with Platypus.

In my previous post I’ve introduced you to my latest project platypus – R package for object detection and image segmentation. This time I will go into more details and show you how to use it on biomedical data. Today we will work on 2018 Data Science Bowl dataset.You can download images and masks directly … Read more

Categories R Tags ExcerptFavorite

Private, “Protected” Attributes in Python — Demystified Once and For All

Neither name mangling feature, nor the skewed appearance of the names written as self.__horn is loved by the Pythonistas. In public repositories, you’ll notice that developers often prefer to follow the convention of adding just one leading underscore (self._horn) to “protect” them from inadvertent modification. Critics suggest that to prevent attribute clobbering it’s enough to … Read more

Submitting R package to CRAN

Disclaimer: I have no affiliation with Microsoft Corp. or Revolution Analytics. For the n-th time in x years, submitting an R package to CRAN ended uplike comedy. This time for one anecdotal note (a kind of warning),whereas the previous accepted version of ESGtoolkit has had,for 5 years, 12 warnings and notes combined. No error, nothing’s … Read more

Categories R Tags ExcerptFavorite

Raspberry Pi E-Paper Dashboard with R

[This article was first published on schochastics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. For once, this is not a post on an R package for network … Read more

Categories R Tags ExcerptFavorite

Data Science — Building a Network Graph using Microsoft Power BI for SQL Relational Data

Network Theory is a state-of-art theory that is used to represent the complex relationships between entities. Some interesting applications are pandemic diffusion analysis (e.g., COVID19), social network analysis (e.g., Facebook network), world trade analysis, etc.. Network Graph is built upon the Network Theory and can provide a dynamic, sometimes mind-blowing, graphs for storytelling. This article … Read more

Fermat’s Riddle

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. · A Fermat-like riddle from the Riddler (with enough room … Read more

Categories R Tags ExcerptFavorite

New Dataproc optional components support Apache Flink and DockerNew Dataproc optional components support Apache Flink and DockerStrategic Cloud Engineer, Google Cloud

Google Cloud’s Dataproc lets you run native Apache Spark and Hadoop clusters on Google Cloud in a simpler, more cost-effective way. In this blog, we will talk about our newest optional components available in Dataproc’s Component Exchange: Docker and Apache Flink. Docker container on Dataproc Docker is a widely used container technology. Since it’s now … Read more

Gold-Mining Week 6 (2020)

[This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Favorite

Categories R Tags ExcerptFavorite

10 Most Popular Programming Languages For 2020 and Beyond

10. C C is a general-purpose, procedural computer programming language supporting structured programming, lexical variable scope, and recursion, with a static type system. By design, C provides constructs that map efficiently to typical machine instructions. Despite being the reason for the existence of most of the programming languages, it still has its niches in a … Read more

Handling BLANK in Power BI

How to cope with blank values in Power BI reports? Check these 3 possible solutions Photo by Pixabay on Pexels.com While creating reports, I’m sure that you are facing situations when you get “(blank)” as a result and you don’t want to display it like this to your end-users. Of course, keeping blanks makes sense … Read more

ANOVA vs Multiple Comparisons

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. When we run an ANOVA, we analyze the differences among … Read more

Categories R Tags ExcerptFavorite

Gain real-time insights on Oracle E-Business Suite data with Azure and Incorta

Insights that are frequently sought after in an organization are often locked in business-critical systems. These systems power functions such as sales, marketing, finance, supply chain, and operations. While the data these applications produce can tell the complete business story, organizations struggle to coalesce the data and deliver valuable analytics on top of them. It … Read more

The Shift and Balance Fallacies

[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Two related fallacies I see in machine learning practice … Read more

Categories R Tags ExcerptFavorite

Benford’s law meets IPL, Intl. T20 and ODI cricket

“To grasp how different a million is from a billion, think about it like this: A million seconds is a little under two weeks; a billion seconds is about thirty-two years.” “One of the pleasures of looking at the world through mathematical eyes is that you can see certain patterns that would otherwise be hidden.” Steven … Read more

Categories R Tags ExcerptFavorite

A Guide to Building Your First Regression Model in Just 8 Lines of Code

Interpreting the Results: Photo by National Cancer Institute on Unsplash With our model fitted, it’s time for us to take a look at what it has established from the data we provided. First, let’s look at the parameters it has evaluated for the data: print(model.coef_) #prints the slope of the line[1]: [[1.13557995]]print(model.intercept_) #prints the intercept … Read more

Multi-domain text classification via Labelur

Labelur is an online service that performs multi-domain text classification. It’s effortless to get started! Simply use any programming language of your choice, such as Python, Java, Node, or PHP (who’s still using PHP?) to send a POST request to Labelur’s API server with your text data. Labelur will reply with the label for the … Read more

Ultimate Pandas Guide — Inspecting Data Like a Pro

Photo by Laura Woodbury from Pexels Whether you’re working on a simple analysis or a complex machine learning model, there’s a lot of value in being able to answer quick, exploratory questions about the nature of your data. Fortunately, Pandas makes this easy. In this post, I’ll walk through several DataFrame attributes and methods that … Read more

Predicting Customer Churn in the Telecommunications Industry

Using feature importance to simplify a problem through dimensionality reduction, and threshold-moving for imbalanced classification. Image via Shutterstock under license to Leo Siu-Yin Why Predict Customer Churn? Getting new customers is much more expensive than retaining existing ones. Some studies have shown that it costs six to seven times more to acquire a new customer … Read more

When Satellites Collide…

A Computational Thinking Story With the Wolfram Language Photo by NASA on Unsplash Space is big, really big. However, space around Earth is getting more and more crowded. Every time a new spacecraft is launched little bits and pieces end up in orbit around. The US Space Surveillance Network estimates that there are over 100 … Read more

Benefits of Tableau for a Data Scientist

Introduction Quick and Simple SQL, R, Python, and MATLAB k-means Algorithm Cross-functional Summary References Tableau [2] is becoming more and more useful for Data Scientists as it incorporates more finely tuned functionality. If you are not familiar with Tableau, it is essentially a tool that is widely used by several different types of people in … Read more

Real Time Data Streaming in Power BI with Azure

Create a seamless real time Power BI Dashboard with Azure Azure Event Hubs is a big data streaming platform and event ingestion service which can track and process thousands of events per second. Data from event hub can be transformed and stored using real time analytics service. Azure Stream Analytics is a real-time analytics service … Read more

Version Control 101: Definition and benefits

The field of software engineering is a rapidly changing field. When it comes to software, there are no final versions. All applications and codes are always undergoing continuous development. One of the essential aspects of software engineering is version control. Version control systems are a special type of software development tool designed to help software … Read more

Emulating a PID Controller with Long Short-term Memory: Part 1

Using the Temperature Control Lab to create Proportional-Integral-Derivative controller data Photo by Alexander Popov on Unsplash Do you ever just get really excited about an idea? Maybe it’s a new DIY project you’re tackling, or a cool assignment at work. Maybe you’re crazy like me and want to hike the Pacific Crest Trail (as I’m … Read more

Object Detection: Stopping Karens Before They Can Strike using Keras and OpenCV

Above, we see the ROC score, confusion matrix, and loss/accuracy for the mobilenet training. Considering the amount of data we used, these metrics are pretty good. For the validation set, only 13 images were incorrectly classified. As for the loss and accuracy, the loss was able to go below .1 and the accuracy was well … Read more

10 Must-Know Tidyverse Features!

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Interested in more R tutorials? Learn more R tips: 👉 Register for our … Read more

Categories R Tags ExcerptFavorite

Credit Risk Analysis with Machine Learning

The features above have missing values that need to be treated. As we can see, they have skewed distribution, which is an indication that we should fill the missing values with the median value for each feature. It’s time to deal with the missing values from the remaining columns. We are filling these values according … Read more

How to Build a Serverless Application using AWS Chalice | Siben Nayak

Next, run the chalice command to create a new project: $ chalice new-project daily-news This will create a daily-news folder in your current directory. You can see that Chalice has created several files in this folder. Image by Author Let’s take a look at the app.py file: The new-project command created a sample app that … Read more

Fluent Bit supports Amazon S3 as a destination to route container logs

Amazon ECS customers can use the FireLens interface in their task definition to configure Fluent Bit to send logs to Amazon S3. Once you deploy your task definition, it will automatically start routing logs. Customers using containers on Amazon EKS or self-managed Kubernetes clusters can now route container logs to Amazon S3 by installing Fluent … Read more

Categories AWS ExcerptFavorite

2 Months in 2 Minutes – rOpenSci News, October 2020

rOpenSci HQ rOpenSci at R-Ladies Our community manager Stefanie Butland, and one of our software review editors Brooke Anderson, are speaking remotely at an R-Ladies East Lansing meetup Thursday, October 22nd. They will talk about our how to get involved in rOpenSci using our new Contributing Guide as an entry point, and through participating in … Read more

Categories R Tags ExcerptFavorite

Hotelling’s T^2 in Julia, Python, and R

[This article was first published on R on Data & The World, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The t-test is a common, reliable way to … Read more

Categories R Tags ExcerptFavorite

QQ-plots and Box-Whisker plots: where do they come from?

QQ-plots and Box-Whisker plots usually become part of the statistical toolbox for the students attending my course of ‘Experimental methods in agriculture’. Most of them learn that the QQ-plot can be used to check for the basic assumption of gaussian residuals in linear models and that the Box-Whisker plot can be used to describe the … Read more

Categories R Tags ExcerptFavorite

My year in R

Learning R for a little over a year now was and still is a great experience. But a year isn’t a lot, so why make a blog post about it? I believe that pausing what one is doing and periodically evaluating if this pursuit is the right direction for him or her – is a … Read more

Categories R Tags ExcerptFavorite

Explore Public Datasets with Google BigQuery and DataStudio

You can do more complex query; for example, an aggregated summary of COVID-19 cases as confirmed, death, and recovered; group by the country level; order by the number of confirmed cases; can be queried with the following SQL script: SELECTcountry_region, SUM(confirmed) as confirmed_sum, SUM(deaths) as deaths_sum, SUM(recovered) as recovered_sumFROM`bigquery-public-data.covid19_jhu_csse.summary` WHEREdate = ‘2020-10-12’GROUP BY country_regionORDER BY … Read more

Redivis makes research data accessible, experiences collaborative with BigQueryRedivis makes research data accessible, experiences collaborative with BigQuery

Understanding the data we collect is essential—it allows us to identify trends and uncover answers about our world. However, stories in our data frequently go untold. Large datasets are hard to share between research communities due to their size, security restraints, and complexity. Even if these datasets are accessible to users, the tools needed to … Read more

How to guide: Set up, Manage & Monitor Spark on Kubernetes (with code examples)

With code examples Earlier this year at Spark + AI Summit, we had the pleasure of presenting our session on the best practices and pitfalls of running Apache Spark on Kubernetes (K8s). In this post we’d like to expand on that presentation and talk to you about: What is Kubernetes? Why run Spark on Kubernetes? … Read more

AI Adopting in Banking: Get It Right with a Tech Formula Part 1

You’ve heard the buzz: artificial intelligence (AI) is the hot new commodity in finance. But can you just sprinkle some “intelligence” atop your core banking systems and call it a win? Hardly. Formalizing an AI use case and even running a successful pilot is the easy part. Deploying and scaling that AI algorithm is where … Read more

How to make Python faster than Julia

Source: unsplash.com Should I switch to Julia? This question is quickly becoming the new version of the old one “should I translate production code from Python to C?”. No doubt Julia is increasingly popular among scientific developers dealing with time-consuming algorithms on daily. But before you throw yourself into learning a new language (which is … Read more

An Intuitive Look at Event Prediction

Problem definition To make it more clear, take an example from telco domain and think about a network consists of thousands of devices and a monitoring tool that keeps all alarms occur on the devices. These alarms can be in various types. In a simple manner, you are working with a dataset which includes alarm … Read more

Version 0.10.0 of NIMBLE released

[This article was first published on R – NIMBLE, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’ve released the newest version of NIMBLE on CRAN and on … Read more

Categories R Tags ExcerptFavorite

Tableau vs. R Shiny: Which Excel Alternative Is Right For You?

tl;dr As of late 2020 there are many dashboarding/reporting/BI tools out there, with the most popular ones being Tableau, PowerBI, and R Shiny. The question quickly becomes – “How can I determine the right tool for my particular needs?” Today we’ll compare two of the most widely used tools at Fortune 500 companies: Tableau – an intuitive … Read more

Categories R Tags ExcerptFavorite