Strange Attractors: an R experiment about maths, recursivity and creative coding

[This article was first published on R on Coding Club UC3M, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. by Antonio Sánchez Learning to code can be quite … Read moreStrange Attractors: an R experiment about maths, recursivity and creative coding

What are Your Use Cases for rOpenSci Tools and Resources?

We want to know how you use rOpenSci packages and resources so we can give them, their developers, and your examples more visibility. It’s valuable to both users and developers of a package to see how it has been used “in the wild”. This goes a long way to encouraging people to keep up development … Read moreWhat are Your Use Cases for rOpenSci Tools and Resources?

Shiny 1.4.0

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Shiny 1.4.0 has been released! This release mostly focuses on under-the-hood fixes, … Read moreShiny 1.4.0

Doing and reporting your first mediation analysis in R

How to provide support for the mediation with statistical procedures We will provide statistical support for the mediation with the help of the mediation analysis in 4 simple steps. First, we will test the total effect. Here we are looking if any change in sepal length impacts the DV at all. More on this later. … Read moreDoing and reporting your first mediation analysis in R

AWS Resource Groups and Tag Editor Are Now Available in the AWS GovCloud (US) Regions

Starting today, you can use AWS Resource Groups and Tag Editor in the AWS GovCloud (US) Regions. AWS Resource Groups makes it easier to manage and automate tasks on large numbers of AWS resources. AWS Resource Groups Tag Editor allows you to add tags to, edit, or delete tags on multiple AWS resources at once. … Read moreAWS Resource Groups and Tag Editor Are Now Available in the AWS GovCloud (US) Regions

Live Prediction of Traffic Accident Risks Using Machine Learning and Google Maps

Here, I describe the creation and deployment of an interactive traffic accident predictor using scikit-learn, Google Maps API, Dark Sky API, Flask and PythonAnywhere. Traffic accidents are extremely common. If you live in a sprawling metropolis like I do, chances are that you’ve heard about, witnessed, or even involved in one. Because of their frequency, … Read moreLive Prediction of Traffic Accident Risks Using Machine Learning and Google Maps

Understanding Fixup initialization

How to train residual networks without normalization layers. Why should we even care about initialization? Proper initialization of weight matrices is extremely important. According to Jeremy Howard, people for decades could not train neural networks because of improper initialization. In order to see it, we can reproduce one of the experiments from Jeremy’s lectures. Let’s … Read moreUnderstanding Fixup initialization

Guided Grad-CAM is Broken! Sanity Checks for Saliency Maps

Certain techniques for understanding what a CNN is looking at don’t work. They have no connection to the model’s weights or to the training data, and may be merely acting as edge detectors. In this post we will discuss the NeurIPS 2018 paper, “Sanity Checks for Saliency Maps” which demonstrates that several popular saliency map … Read moreGuided Grad-CAM is Broken! Sanity Checks for Saliency Maps

Automatic data types checking in predictive models

The problem: We have data, and we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types.Many strange errors appear when we are creating models just because of data format. The new version of funModeling 1.9.3 (Oct 2019) aimed to provide quick and clean assistance on … Read moreAutomatic data types checking in predictive models

Predicting Taxi fares in NYC using Google Cloud AI Platform(Billion + rows) Part 1

Taxis | Photo by Francesco Ungaro on pexels.com This project aims at creating a Machine Learning model to estimate taxi fares in New York City using a dataset corresponding to taxi rides which is hosted in BigQuery. There are more than a Billion rows with a size of 130 GB. You can find it here. … Read morePredicting Taxi fares in NYC using Google Cloud AI Platform(Billion + rows) Part 1

Explain your machine learning with feature importance

Let’s imagine that you’ve landed a consulting gig with a bank who have asked you to identify those who have a high likelihood of default on the next month’s bill. Armed with the machine learning techniques that you’ve learnt and practiced, let’s say you proceed to analyze the data set given by your client and … Read moreExplain your machine learning with feature importance

Stocks and Bonds are Now Both Right — No Recession in Sight

Expect Stocks to Make New Highs as the Economy Firms Up Recession fears are everywhere. The US-China trade war, Middle East violence, impeachment in DC, protests in Hong Kong, and Brexit have all weighed on the economy. These fears are likely overblown. Consumer economic data remains robust and the Fed has cut rates twice as … Read moreStocks and Bonds are Now Both Right — No Recession in Sight

Azure Monitor adds Worker Service SDK, new ASP.NET core metrics

Application Insights from Azure Monitor empowers developers and IT professionals to observe, debug, diagnose, and improve their distributed services hosted on the cloud, on-premises, and through hybrid solutions. The release of the Application Insights for ASP.NET Core 2.8.0 for web applications and the Application Insights for .NET Core Worker Service 2.8.0 for non-web applications delivers new value to developers including: … Read moreAzure Monitor adds Worker Service SDK, new ASP.NET core metrics

Benchmarking simple models with feature extraction against modern black-box methods

Comparison of Normalized Scores The results over the different datasets and algorithms have to be normalized in order to make them comparable to each other. For this purpose we process the data with the following three steps: 1. Remove unstable algorithms: Algorithms which did not converge and yielded results way below dummy performance were excluded. … Read moreBenchmarking simple models with feature extraction against modern black-box methods

Modeling News Coverage with Python. Part 3: Newspaper Coverage and Google Search Trends

Fitting models of Google Search to Search Trends and News Articles This post integrates data from a limited sample of newspaper coverage with Google Search trends to model interactions between the two. In these examples, the preliminary analysis finds news coverage useful for forecasting search trends but small and mixed results in the other direction. … Read moreModeling News Coverage with Python. Part 3: Newspaper Coverage and Google Search Trends

How to achieve adoption of Intelligent Automation with maximum ROI and speed?

Source: https://www.burwood.com/blog-archive/automation-vs-orchestration-whats-the-difference Automation will displace 200,000 workers from their jobs in the banking industry in the next ten years, Almost half of the wage-paying jobs all around the world could theoretically be at risk of automation using technologies already at hand. In the next 12 years, 1 out of 3 American workers are at risk … Read moreHow to achieve adoption of Intelligent Automation with maximum ROI and speed?

Models as Serverless Functions

Source: Wikimedia Chapter 3 of “Data Science in Production” I recently published Chapter 3 of my book-in-progress on leanpub. The goal with this chapter is to empower data scientists to leverage managed services to deploy models to production and own more of DevOps. Data Science in Production Building Scalable Model Pipelines with Python towardsdatascience.com Serverless … Read moreModels as Serverless Functions

Word Clouds Are Lame

Exploring the limitations of the word cloud as a data visualization. Author: Shelby Temple; Made with Tableau Word clouds have recently become a staple of data visualization. They are especially popular when analyzing text. According to Google Trends, it seems that the rise in popularity started around 2009 with search term interest currently just under … Read moreWord Clouds Are Lame

Five Tips for Contributing to Open Source Software

A data scientist’s perspective Photo by Yancy Min on Unsplash Contributing to Open-Source Software (OSS) can be a rewarding endeavor, especially for new data scientists. It helps improve skills, provides invaluable experience when collaborating on projects, and gives you a chance to showcase your code. However, many data scientists do not consider themselves to be … Read moreFive Tips for Contributing to Open Source Software

A Data Visualization Adventure

From raw data to the #1 spot on DataIsBeautiful Is data visualization art or science? Does the clarity from bar charts and line graphs always trump data viz that is unusual and/or beautiful? These are some polarizing questions in the data viz community. Some of you just screamed out loud “Science! Clarity!” While others would … Read moreA Data Visualization Adventure

Build Your First Computer Vision Project — Dog Breed Classification

Get started building your first computer vision project in less than 30 minutes. Photo by Joe Caione on Unsplash For us, humans, it is pretty easy to tell one dog breed from another. That is if you are talking about 10–20 popular dog breeds. When we are talking about more than 100 kinds of dogs, … Read moreBuild Your First Computer Vision Project — Dog Breed Classification

Data Science with SQL in Python

Python Application in SQL Ever hear about the database programming language, Sequel (SQL)? How can we use Python code to harness the power of SQL databases & be able to retrieve, manipulate & delete that information stored in the database, with Python? In this article, I plan on giving a thorough beginner’s tutorial on Sequel … Read moreData Science with SQL in Python

Malware Classification using Machine Learning

Takeaway from implementing the Microsoft Malware Classification Challenge (BIG) Image Source : Kaggle If you love to explore large and challenging data sets, then probably you should give Microsoft Malware Classification a try. Before diving deep in to the problem let’s take few points on what can you expect to learn from this: How to … Read moreMalware Classification using Machine Learning

[NLP] SpaCy Classifier with pre-train token2vec VS. One without pre-train

Photo by Nuno Silva on Unsplash We see quite satisfactory results from classifier without pre-train language model. However, let’s experiment and see how much it will further improve when we apply one. First, we need to implement spaCy pretrain on the documents and save the token2vec model. But before we start pre-training, we need to … Read more[NLP] SpaCy Classifier with pre-train token2vec VS. One without pre-train

First step towards Data Science: Journey to the Home for Data Science

Journey to the Home for Data Science Getting started with kaggle competitions Source: https://miro.medium.com/max/1053/1*gO6yZ3Z855MW26FuEiQjKw.png Kaggle is an AirBnB for Data Scientists — this is where they spend their nights and weekends. It’s a crowd-sourced platform to attract, nurture, train and challenge data scientists from all around the world to solve data science, machine learning and … Read moreFirst step towards Data Science: Journey to the Home for Data Science

Did Russia Use Manafort’s Polling Data in 2016 Election?

Introduction: On August 2, 2016 then Trump campaign manager, Paul Manafort, gave polling data to Konstantin Kalimnik a Russian widely assumed to be a spy. Before then Manafort ordered his protege, Rick Gates, to share polling data with Kilmnik. Gates periodically did so starting April or May. The Mueller Report stated it did not know … Read moreDid Russia Use Manafort’s Polling Data in 2016 Election?

Rename Columns | R

Often data you’re working with has abstract column names, such as (x1, x2, x3…). Typically, the first step I take when renaming columns with r is opening my web browser.  For some reason no matter the amount of times doing this it’s just one of those things. (Hoping that writing about it will change that) … Read moreRename Columns | R

How to write Web apps using simple Python for Data Scientists?

In the start we said that each time we change any widget, the whole app runs from start to end. This is not feasible when we create apps that will serve deep learning models or complicated machine learning models. Streamlit covers us in this aspect by introducing Caching. 1. Caching In our simple app. We … Read moreHow to write Web apps using simple Python for Data Scientists?

Data Mangement Strategy: Part 2

Data Quality & Architecture This is part 2of the series of articles related to carrying out and implementing a successful Data Management Strategy within an aspiring Digital Organization. You can find the the introduction to this series here. In this article we will focus on the following topics: Data Quality Data Architecture Data Integration These … Read moreData Mangement Strategy: Part 2

My Learning Plan for Getting Into Data Science from Scratch

I started when I was in college and still continue up to this day! My decision to get into data science started way back when I was still in college in early 2015. I actually didn’t plan to become a data scientist originally, but a quant — someone who is essentially a financial analyst that … Read moreMy Learning Plan for Getting Into Data Science from Scratch

easyMTS: My First R Package (Story, and Results)

This weekend I decided to create my first R package… it’s here! https://github.com/NicoleRadziwill/easyMTS Although I’ve been using R for 15 years, developing a package has been the one thing slightly out of reach for me. Now that I’ve been through the process once, with a package that’s not completely done (but at least has a … Read moreeasyMTS: My First R Package (Story, and Results)

Hyper-Parameter Optimization of General Regression Neural Networks

A major advantage of General Regression Neural Networks (GRNN) over other types of neural networks is that there is only a single hyper-parameter, namely the sigma. In the previous post (https://statcompute.wordpress.com/2019/07/06/latin-hypercube-sampling-in-hyper-parameter-optimization), I’ve shown how to use the random search strategy to find a close-to-optimal value of the sigma by using various random number generators, including … Read moreHyper-Parameter Optimization of General Regression Neural Networks

“This is CS50”: A Pleasant Way to Kick Off Your Data Science Education

What is CS50? It is the introductory course on computer science taught at Harvard University by Professor David J. Malan. It is the largest class at Harvard with 800 students, 102 staff, and a professional production team. It offers both an on-campus and an online course. I’ve taken the online one, but it’s already THE … Read more“This is CS50”: A Pleasant Way to Kick Off Your Data Science Education

Locate V-beat in Electrocardiogram (ECG)

using machine learning and image processing A health technology company gave me a challenge:Given a collection of ECG strip images, find the location of V-beat in each image. The ECG plot records a V-beat during a premature ventricular contraction in the heartbeat. This article explains what I did to train a machine learning model to … Read moreLocate V-beat in Electrocardiogram (ECG)

My First R Package (Part 2)

In Part 1, I set up RStudio with usethis, and created my first Minimum Viable R Package (MVRP?) which was then pushed to Github to create a new repository. I added a README: > use_readme_rmd() ✔ Writing ‘README.Rmd’ ✔ Adding ‘^README\\.Rmd$’ to ‘.Rbuildignore’ ● Modify ‘README.Rmd’ ✔ Writing ‘.git/hooks/pre-commit’ Things were moving along just fine, … Read moreMy First R Package (Part 2)

Discovering football anthems through data analysis

In one of my prior blogs around the FIFA World Cup, I wrote about how music and football are inseparable. Music is something that is part of football’s culture and vice versa. Music unites the armies behind clubs all over the world and enhances the atmosphere prior to (and during) games. Everyone probably has his … Read moreDiscovering football anthems through data analysis