Little useless-useful R function – Full moon finder

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Another one from the series [1,2,3,4] of useless functions. This one … Read more

Categories R Tags ExcerptFavorite

An introduction to surrogate modeling, Part I: fundamentals

2.1 The idea Here is how surrogate modeling does the trick: it constructs a statistical model (or surrogate model) to accurately approximate the simulation output. Subsequently, this trained statistical model can be deployed to replace the original computer simulation in performing sensitivity analysis, optimizations, or risk analysis. Fig. 1 Using a surrogate model to replace the expensive … Read more

4 Things To Consider When Hiring a Data Scientist

Top Data Science Candidates receive many opportunity options daily. To avoid candidates fall through the cracks, you must consider these four things. As a staffing agency with one of our specialties being Data Science, we know what to look out for when hiring Data Science talent. Data Science is still one of the buzzwords in … Read more

Rmd-based Reports with R Code Appendices

[This article was first published on TeachR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The PDF file accompanying this post was created by the attached Rmd file. … Read more

Categories R Tags ExcerptFavorite

Databases: An Overview

Part 1: Concepts and Installation: MySQL, HBase, MongoDB In today’s world, we have to handle huge amounts of data and store it in a favorable way. Data are mostly generated today from social media sites like Facebook, and Twitter in huge volumes every day. Previously, we dealt with mostly structured data, i.e, data that can … Read more

Python vs R: The Basics

An aspiring data scientist’s guide on deciding between two popular languages. [image credit] Most data scientists refer to either Python or R as their “go-to” programming language. Both have vast software ecosystems and communities, so either language is suitable for almost any data science task. So the question is, which language should an aspiring data … Read more

Python and R – Part 1: Exploring Data with Datatable

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Interested in more Python and R tutorials? 👉 Register for our blog to … Read more

Categories R Tags ExcerptFavorite

Forecasting hotel booking demand using giotto-time

We run our analysis following three steps: feature creation (data engineering), feature selection, and modeling. To select the best features and estimate the performance of our models we use a training set consisting of 85% of the data, and keep 15% for the test. To perform feature engineering and modeling tasks we use giotto-time, a … Read more

2020 Table Contest Deadline Extended

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The original deadline for the 2020 Table Contest was scheduled for October … Read more

Categories R Tags ExcerptFavorite

Why RStudio Supports Python for Data Science

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As RStudio’s products have increasingly supported Python over the past year, some … Read more

Categories R Tags ExcerptFavorite

Practical EDA Guide with Pandas

There are 5 categorical features and scores of 3 different tests. The goal is to check how these features affect the test scores. We can start by checking the distribution of test scores. The plot function of pandas can be used to create a kernel density plot (KDE). df[‘reading score’].plot(kind=’kde’, figsize=(10,6), title=’Distribution of Reading Score’) … Read more

Gold-Mining Week 8

[This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Favorite

Categories R Tags ExcerptFavorite

15 ways to create a Pandas DataFrame

Method 5 — From a csv file using read_csv method of pandas library. This is one of the most common ways of dataframe creation for EDA. Delimiter (or separator) , header and the choice of index column from the csv file is configurable. By default, separator is comma, header is inferred from first line if … Read more

Amazon Neptune now supports Apache TinkerPop 3.4.8 in the latest engine release

Engine release 1.0.4.0 is the default for newly created Neptune clusters. Existing customers will not be automatically updated, but can choose to upgrade by following the instructions on the engine release page. Apache TinkerPop 3.4.8 introduces new features and improvements, such as the elementMap() step and an improved behavior for working with Map instances. Upgrading … Read more

Categories AWS ExcerptFavorite

New Cloud Shell Editor: Get your first cloud-native app running in minutesNew Cloud Shell Editor: Get your first cloud-native app running in minutesProduct Manager

As enterprises move their applications and services to the cloud, developers frequently find themselves evaluating and experimenting with new technologies to identify the best solution to solve their day-to-day problems. This evaluation process could include tasks such as: identifying which platform to host or migrate an application to, or learning how to use an API … Read more

Trigger Cloud Run with events from more than 60 Google Cloud sourcesTrigger Cloud Run with events from more than 60 Google Cloud sourcesProduct Manager Developer Advocate

Cloud Run (fully-managed) lets you create microservice-based applications that are scalable and extensible. But setting up event-based communication between decoupled microservices can be hard to implement, customize and maintain. Today, we’re announcing Eventarc, new events functionality that allows you to trigger Cloud Run from more than 60 Google Cloud sources. Now in Preview, Eventarc helps … Read more

Boo! Fight off your scariest cloud monsters with Active AssistBoo! Fight off your scariest cloud monsters with Active AssistProduct Marketing, Google CloudProduct Manager, Google Cloud

Look, when you’re running your applications in the cloud there’s a lot you have to keep top of mind: performance, security, agility, cost, and more. One thing you shouldn’t have to worry about? Monsters. That’s right, monsters. And yet, many of you do have monsters running amok in your cloud—just not the kind you see … Read more

How to do bias-variance tradeoff the right way in Machine Learning

Bias-Variance Tradeoff Source: https://unsplash.com/photos/zBsXaPEBSeI One of the most common decisions that data scientists and machine learning experts have to face daily is how to go about validating their models. Ask any data engineer about the topic of validation and they will instantly start to drop names like overfitting, underfitting, and bias-variance tradeoff. If you are … Read more

SQL in Data Science

Today, SQL is a standard when it comes to manipulating and querying data. One of key benefits being that SQL allows users to quickly and efficiently input and retrieve information from relational databases. A relational database is a type of database that stores and provides access to data points that are related to one another … Read more

PowerBI vs. R Shiny: Two Popular Excel Alternatives Compared

tl;dr Choosing the appropriate dashboarding/reporting/BI tool has never harder than it is now, as there are plenty of genuinely great options such as R Shiny, PowerBI, and Tableau. Today we’ll compare two widely used tools at Fortune 500 companies: PowerBI – a collection of software services, apps, and connectors that work together to turn unrelated … Read more

Categories R Tags ExcerptFavorite

Debugging with Dean: My first YouTube screencast

[This article was first published on Dean Attali’s R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I wanted to solve a real bug in real-time, to … Read more

Categories R Tags ExcerptFavorite

Microsoft named a Leader in the Gartner 2020 Magic Quadrant for Industrial IoT Platforms

Embracing digital transformation in Industrial IoT requires companies to rethink and shift business models and operations. Doing so, however, has become more difficult in the past six months due to production slowdowns, restrictions on employee movement with social distancing, and rapidly shifting market demands. Yet for many companies, the industry disruptions caused by COVID-19 have … Read more

Ockham’s Spatula

The science and the art of model deployment. Photo by Lem Odegard, editing by pixelpony, licensed under CC BY-NC-SA 2.0 Model building is like climbing a mountain. It’s what you spend so much time planning for. It’s what everybody wants to talk about. It’s what gives you that euphoric feeling of accomplishment when you’re finished. … Read more

Creating a Plant Pet Toxicity Classifier

Looking closely at our scientific names (and having iterated through the data a few times), we find that a lot of the names are outdated synonyms for more accepted species, or are misspelled. This will cause issues in both image collection and later on, when training a model to identify identical images that possess different … Read more

Applying a Deep Q Network for OpenAI’s Car Racing Game

The applications of Deep Q-Networks are seen throughout the field of reinforcement learning, a large subsect of machine learning. Using a classic environment from OpenAI, CarRacing-v0, a 2D car racing environment, alongside a custom based modification of the environment, a DQN, Deep Q-Network, was created to solve both the classic and custom environments. The environments … Read more

Machine Learning Can Detect Covid-19 In Less Than Five Minutes!

SARS-CoV-2 virus shares similar early symptoms to other respiratory diseases such as influenza and seasonal human coronaviruses (hCov); therefore, any Covid-19 detection methods must be useful in differentiating between similar viruses. The CNN network utilised in this research was trained and validated on a dataset that contained four unique viruses, the training and validation partitions … Read more

Writing your own sklearn transformer: feature scaling, DataFrames and column transformation

Writing your own sklearn functions, part 2 Since scikit-learn added DataFrame support to the API a while ago it became even easier to modify and write your own transformers – and the workflow has become a lot easier. Many of sklearns home remedies still work with numpy arrays internally or return arrays, which often makes … Read more

Amazon Ads Analytics — Extended

Amazon’s global e-commerce sales are projected to grow by 20% to reach $416.8 billion in 2020, and Amazon ads revenue growth is keeping pace. In 2020, Amazon’s ad revenues in the United States are projected to amount to 12.75 billion U.S. dollars (compare to 10.32 billion U.S. dollars in 2019). With this increase in activity, … Read more

Train Conversational AI in 3 lines of code with NeMo and Lightning

Train state-of-the-art speech recognition, NLP and TTS models at scale with NeMo and Lightning Image by author NeMo (Neural Modules) is a powerful framework from NVIDIA, built for easy training, building and manipulating of state-of-the-art conversational AI models. NeMo models can be trained on multi-GPU and multi-node, with or without Mixed Precision, in just 3 … Read more

PyCaret 2.2 is here — What’s new?

PyCaret 2.2 is now available for download using pip. https://www.pycaret.org We are excited to announce PyCaret 2.2 — update for the month of Oct 2020. PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the machine … Read more

Track Real-Time Gold Prices using Apache Kafka, Pandas & MatPlotLib

We are in the era where tracking, processing and analyzing real-time data is becoming a necessity of many businesses. Needless to say handling streaming data sets is becoming one of the most crucial and sought of skills for Data Engineers and Scientists. For this article I am assuming that you are familiar with Apache Kafka … Read more

An Introduction to Game Theory Using Python

Using Game Theory to Sharpen my Skills in Python and Develop Intuition to Write Functions What is game theory? In his book “Playing for Real: A Text on Game Theory,” Ken Binmore characterizes it as the study of rational interaction within groups of people. Essentially, whenever you deal with another person, you’re playing a game. … Read more

Rick and Morty story generation with GPT2 using Transformers and Streamlit in 57 lines of code

Photo by Benigno Hoyuela on Unsplash. This post will show you how to fine-tune a pre-trained GPT2 model on Rick and Morty transcripts using Hugging Face’s Transformers library, build a demo application, and deploy it using Streamlit Sharing. With the rapid progress in Machine Learning (ML) and Natural Language Processing (NLP), new algorithms are able … Read more

Sentiment Analysis & Entity Extraction with AWS Comprehend

Quick overview of using AWS Lambda, Boto3, and Comprehend for high-level NLP tasks in Python Image from Unsplash Amazon Web Services (AWS) has been constantly expanding its Machine Learning services in various domains. AWS Comprehend is the AWS powerhouse for Natural Language Processing (NLP). Two common projects in NLP include Sentiment Analysis and Entity Extraction. … Read more

Tune and interpret decision trees for #TidyTuesday wind turbines

[This article was first published on rstats | Julia Silge, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This is the latest in my series ofscreencasts demonstrating how … Read more

Categories R Tags ExcerptFavorite

inverse Gaussian trick [or treat?]

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. When preparing my mid-term exam for my undergrad mathematical statistics … Read more

Categories R Tags ExcerptFavorite

How To Build and Deploy a Machine Learning Model with FastAPI

So, what is FastAPI? According to the official documentation, it’s a modern and fast web framework for building APIs with Python 3.6+. Performance-wise, it’s up there with NodeJS and Go, and that tells you something. It is also easy enough to learn and comes with automatic interactive documentation, but more on that later. This article … Read more

All treats, no tricks with product recommendation reference patternsAll treats, no tricks with product recommendation reference patternsDirector, Outbound Product ManagementProduct Marketing Manager, Google Cloud

In all things technology, change is the only constant. This year alone has brought more uncertainty than ever before, and the IT shadows have felt full of perils. With the onset of the pandemic, the way consumers shop has shifted faster than anyone could have predicted. The move to online shopping vs. brick and mortar … Read more

Amazon Textract announces improvements to reduce average API processing times by up to 20%

As part of this update, we also improved the accuracy of detecting documents that are difficult to read because they are captured at extreme angles.  The latest models launched today in all AWS regions where Amazon Textract is available. You will start noticing the decrease in latency and better accuracy of documents captured at extreme … Read more

Categories AWS ExcerptFavorite

The Bachelorette Ep. 3 – Bro’s Before – Data and Drama in R

Those who were looking for entertainment last night may not have been satisfied if they decided to watch The Bachelorette. Upon analyzing Twitter data, it is clear that there were conflicts of interest amongst TV watchers in the US. The top hashtags trending with #TheBachelorette reflect the notion that people weren’t necessarily tuned in last … Read more

Categories R Tags ExcerptFavorite

Effective altruism, AI safety, and learning human preferences from the state of the world

Please find below the transcript for Season 2 Episode 4: Jeremie (00:00):Hey, everyone. Welcome to another episode of the Towards Data Science podcast. My name is Jeremie and, apart from hosting the podcast, I’m also on the team at the SharpestMinds data science mentorship program. I’m really excited about today’s episode because I’ve been thinking … Read more

Everything you need to Learn Python from Zero to Hero

Photo by Danielle MacInnes on Unsplash It’s never too late to start learning, if you’ve always been curious about how these billion-dollar applications like Facebook, Twitter, Instagram are built, learning about programming not only enlightens you on the hidden layers beneath what makes an application work, but it also gifts you the ability to create … Read more