Time Series Forecasting: KNN vs. ARIMA

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. It is always hard to find a proper model to forecast time series … Read more

Categories R Tags ExcerptFavorite

Trader Joe’s Democrats and Walmart Republicans

Modeling US elections using chain stores If your county has more Dollar Tree stores than Starbucks, you’re likely a Republican. If you have even a single Trader Joe’s store in your county, you’re probably a Democrat. Photo by Elliott Stallion on Unsplash The chain stores found in a community can tell us a lot about … Read more

New Datadog integration with Azure offers a seamless configuration experience

This post was co-authored by Sreekanth Thirthala Venkata, Principal Program Manager, Visual Studio and .NET. Microsoft Azure enables customers to migrate and modernize their applications to run in the cloud, in coordination with many partner solutions. One such partner is Datadog, which provides observability and security tools for users to understand the health and performance … Read more

Exciting updates to my top 4 Shiny packages

[This article was first published on Dean Attali’s R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Building new packages in R is a lot of fun … Read more

Categories R Tags ExcerptFavorite

Engaging in a European dialogue on customer controls and open cloud solutionsEngaging in a European dialogue on customer controls and open cloud solutionsCEO, Google Cloud

At last year’s Europe-focused Google Cloud Next event, we outlined our commitment to European customers, sharing ways Google Cloud is helping European organizations transform their businesses in our cloud and address their strict data security and privacy requirements. This included expanding our existing cloud regions on the continent, growing our ecosystem of local partners, and … Read more

Getting A Data Science Job is Harder Than Ever

Outlandish Job Requirements This seems to be a thread in most discussions I’ve had with Data Science job seekers — Nobody feels qualified anymore. Many Data Science Job descriptions do not communicate the actual requirements of the role being advertised. One major affect of this is that aspiring Data Scientist whom prioritize their personal and … Read more

Five Design-Sheet Methodology Approach to Data Visualisation

Image by author — D3.js visualisation Let’s say we are supposed to visualise our findings and communicate them clearly to our stakeholders, how do we decide on the type of data visualisations to deliver these insights effectively? Do we just explore beginner-friendly data visualisation tools like Tableau or Microsoft Power BI to do the charts? … Read more

Does management stand in your way when it comes to using R?

We recently launched a survey which looks into how business restrictions impact the commercial usage of R. We want to know if your company has concerns around the use of an open source language – many respondents have already replied that management is a key blocker in being able to use R at work. If … Read more

Categories R Tags ExcerptFavorite

The Deloitte Data Scientist Interview

Deloitte Data Science Interview Questions Image from Pixabay Introduction Headquartered in New York, Deloitte Touche Tohmatsu Limited is a privately-owned, multinational company, offering professional services in audit & assurance, consulting, risk and tax assessments, and financial advisory. Since its inception in 1845, the company has developed into one of the largest accountancy and audit firms … Read more

Building a Product Recommendation System for E-Commerce: Part II — Model Building

Oh NLP with Python How I build a product recommendation system using python Photo by Campaign Creators on Unsplash This blog is a continuation of my previous work¹, in which I talked about how I gathered product reviews and information through web scraping. I will now explain more about how I built the product recommendation … Read more

Building a Product Recommendation System for E-Commerce: Part I — Web Scraping

Web Scraping With Python How I extract data using web scraping with Python during my Data Science Internship at ScoreData Image by noshad ahmed from Pixabay Today, if we think of the most successful and widespread applications of machine learning in business, recommender systems could be one of the first examples people have in mind. … Read more

Site Selection and Best Path using Optimization Techniques — Canada Post Example

Source — https://developers.google.com/maps/solutions/images/storelocator_clothing.png In this article, we’ll focus on the courier delivery services industry (Canada). There is always a need to reduce operational costs in this industry. And to do this, companies often face challenges while selecting the location for a new site and finding the best path for the delivery van to reduce costs … Read more

Understanding Markov Decision Process (MDP)

Towards Training Better Reinforcement Learning Agents Pacman In this article, we’ll be discussing the objective using which most of the Reinforcement Learning (RL) problems can be addressed— a Markov Decision Process (MDP) is a mathematical framework used for modeling decision-making problems where the outcomes are partly random and partly controllable. We’ll discuss MDPs in greater … Read more

Grid Search for SARIMAX Parameters

An easy way to find optimal parameters for your statsmodels SARIMAX model Image by Author If you’ve landed here, chances are you’re implementing a statsmodels SARIMAX time series model, and you’re looking for an easy way to identify all of the best parameters. And I have some excellent news for you…you’ve landed in the right … Read more

Getting started with TensorFlow Serving

An easy and efficient way to deploy your deep learning model to production Photo by Ian Battaglia on Unsplash TensorFlow Serving is a part of TensorFlow Extended(TFX) that makes deploying your machine learning model to a server more comfortable than ever. Before Google released TensorFlow Serving, your model has to be deployed into production using … Read more

Build & Deploy Diabetes Prediction app using Flask, ML and Heroku

Import the necessary libraries #importing Librariesimport numpy as np np.random.seed(42) ## so that output would be sameimport matplotlib.pyplot as pltimport pandas as pdimport seaborn as sns%matplotlib inline ## our plot lies on the same notebook#modelsfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.svm import SVC#Evaluationfrom sklearn.model_selection import train_test_split,cross_val_scorefrom sklearn.model_selection import … Read more


Begin by installing tortus and enabling the appropriate nbextentions. $ pip install tortus$ jupyter nbextension enable — py widgetsnbextension After opening a Jupyter Notebook, read your data into a pandas dataframe. import pandas as pdmovie_reviews = pd.read_csv(‘movie_reviews.csv’) Import the package and create an instance of Tortus. from tortus import Tortustortus = Tortus(df, text, num_records=10, id_column=None, … Read more

AWS Launch Wizard now supports SQL Server Always On deployments on Linux

AWS Launch Wizard offers a guided way of sizing, configuring, and deploying AWS resources for third party applications, such as Microsoft SQL Server Always On and HANA-based SAP systems, without the need to manually identify and provision individual AWS resources. Previously, customers could use AWS Launch Wizard to easily perform SQL Server Always On deployments … Read more

Categories AWS ExcerptFavorite

What can we do to help convert leads?

We will also be dropping these columns, which are mainly related to leads: Lead Quality (Low in Relevance, Might be, Not Sure, Worst) Lead Origin (Landing Page Submission, Lead Add Form, Lead Import) Lead Source (Facebook, Google, Olark Chat, Organic Search..etc) Training Here we will train an XGBoost model with 10-fold cross-validation, where the target … Read more

New course on preparation and graphing of biological data in R

[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We’re running a short online introduction to data preparation and graphing. The … Read more

Categories R Tags ExcerptFavorite

Introducing torch for R

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. As of this writing, two deep learning frameworks are widely used in … Read more

Categories R Tags ExcerptFavorite

Permuted block randomization using simstudy

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Along with preparing power analyses and statistical analysis plans (SAPs), generating … Read more

Categories R Tags ExcerptFavorite

Rectifying Hand-Drawn Marks on Maps With the mapscanner Package

mapscanner It is sometimes easy in the midst of the cutting-edge world of uniquesoftware development that is rOpenSci to forget that even though oursoftware might be freely available from anywhere in the world, access toadequate hardware is often restricted. Restricted access to hardware israrely acknowledged as a reason for differing outcomes from the practiceof science … Read more

Categories R Tags ExcerptFavorite

Apply family functions – Part 1

[This article was first published on Posts on, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The apply family functions belong to the R base package, they are … Read more

Categories R Tags ExcerptFavorite

Who are Data Scientists?

Whilst these single response statistics are interesting, I felt that the best way to define the typical coder was to look at all responses and find the most common combination of all answers. One note, whilst most answers were categorical or grouped values, four of the questions allowed continuous numerical values. In order to process … Read more

How to interact with APIs in Python

To start things off you are going to have to make an account with OpenWeatherMap, which means you must be eligible according to their policies. After making an account you will be brought to the home dashboard. Navigate to the API keys tab. This is the key you will need when we code out the … Read more

Coding Best Practices for Data Science

Avoid common mistakes and level-up your data science coding skills Photo by nesabymakers on Unsplash Many of us start our career in Data Science without a strong software engineering background. While this might not be a problem at first, your notebook will get much slower and you will struggle to maintain it as your code … Read more

Anomaly Detection in Time Series Sensor Data

In this step, I will perform the following learning algorithms to detect anomalies. Interquartile Range K-Means clustering Isolation Forest Let’s start training with these algorithms. Interquartile Range Strategy: Calculate IQR which is the difference between 75th (Q3)and 25th (Q1) percentiles. Calculate upper and lower bounds for the outlier. Filter the data points that fall outside … Read more

Building Autoencoders on Sparse, One Hot Encoded Data

A hands-on review of loss functions suitable for embedding sparse one-hot-encoded data in PyTorch Since their introduction in 1986 [1], general Autoencoder Neural Networks have permeated into research in most major divisions of modern Machine Learning over the past 3 decades. Having been shown to be exceptionally effective in embedding complex data, Autoencoders offer simple … Read more

The Most Underrated R packages

In my experience as an R user, I’ve come across a lot of different packages and curated lists. Some are in my bookmarks like the great awesome-R list, or the monthly “best of” list curated by R studio. If you don’t know them, go check them out asap. In this post, I’d like to show … Read more

Tips for How to Succeed in Coding Interviews

Lessons Learned from a Twitter Engineer Photo by thisisengineering on Unsplash Coding interview is a daunting experience. You interview for your dream job, and a random stranger asks you to think on your feet for an hour. You are being put under a microscope, and every comment you make and every code code you write … Read more

Microsoft partners with the telecommunications industry to roll out 5G and more

The increasing demand for always-on connectivity, immersive experiences, secure collaboration, and remote human relationships is pushing networks to their limits, while the market is driving down price. The network infrastructure must ensure operators are able to optimize costs and gain efficiencies, while enabling the development of personalized and differentiated services. To address the requirements of … Read more

SQL Order-based Calculations

Image by author The union of field values in SQL is common, such as firstname+lastname and year (birthday). No matter how many fields an expression contains, they come from the same row. We call this intra-row calculation. Correspondingly, there are inter-row calculations. Examples include getting the difference between the result of the champion and the … Read more

Pairs Trading ADR and SPY. The price dynamics of ADR-SPY spreads motivate a mean reversion trading strategy.

Algo Trading ADR price dynamics motivate a trading strategy Looking in the same direction? Photo by SK Yeong on Unsplash Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s … Read more

Central Limit Theorem: Proofs & Actually Working Through the Math

… Not another ‘hand-wavy’ CLT explanation… Let’s actually work through the math Photo by Diego PH on Unsplash For anyone pursuing study in Data Science, Statistics, or Machine Learning, stating that “The Central Limit Theorem (CLT) is important to know” is an understatement. Particularly from a Mathematical Statistics perspective, in most cases the CLT is … Read more

5 Probability Questions to Test Your Skills

With many of you applying to Data Science positions, it is expected to be asked various sorts of probability questions during the technical aspect of the interview process. Within this post, I aim to cover 5 different probability questions (increasing in difficulty) which I believe serve as a good blanket to the various types of … Read more

Filter Learning with Unsupervised Learning

An unsupervised learning method for learning filters that can extract meaningful features out of images Data is everything. Especially in deep learning, the amount of data, type of data, and quality of data are the most important factors. Sometimes the amount of labeled data that we have is not enough or the problem domain that … Read more

Practical Machine Learning Basics

My first exploration of Machine learning using the Titanic competition on Kaggle Louis & Lola, survivors of the Titanic disaster (Photo from Library of Congress Prints and Photographs, No known restrictions on publication) This article describes my attempt at the Titanic Machine Learning competition on Kaggle. I have been trying to study Machine Learning but … Read more

XGBoost, LightGBM, and Other Kaggle Competition Favorites

An Intuitive Explanation and Exploration Kaggle is the data scientist’s go-to place for datasets, discussions, and perhaps most famously, competitions with prizes of tens of thousands of dollars to build the best model. With all the flurried research and hype around deep learning, one would expect neural network solutions to dominate the leaderboards. It turns … Read more

3 Ways to Build Neural Networks in TensorFlow with the Keras API | by Orhan Gazi Yalçın | Medium

Building Deep Learning models with Keras in TensorFlow 2.x is possible with the Sequential API, the Functional API, and Model Subclassing Figure 1. The Sequential API, The Functional API, Model Subclassing Methods Side-by-Side If you are going around, checking out different tutorials, doing Google searches, spending a lot of time on Stack Overflow about TensorFlow, … Read more

Ultimate Pandas Guide — Mastering the Groupby

We can also index with a single column (as opposed to list): sales_data.groupby(‘month’).agg(sum)[‘purchase_amount’] In this case, we get a Series object instead of a DataFrame. I tend to prefer working with DataFrames, so I typically go with the first approach. Now that we have the basics down, let’s go through a few of the more … Read more

Most important IT side skill, Regex

Regex is known as the IT skill that drastically increases productivity in everything you do on a computer! Image by Author I was astounded when I first learned how to search for texts using Regex and process text data in ways unimaginable to me before. Today we can utilize its power inside Python, but it … Read more

Building a Simple Pipeline in R

[This article was first published on R – Mathew Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A. Introduction Having completed some sort of data analysis, we … Read more

Categories R Tags ExcerptFavorite