A Multitask Music Model with BERT, Transformer-XL and Seq2Seq

Buzzwordy clickbait title intended, but still a simple concept. This is Part III of the “Building An A.I. Music Generator” series. I’ll be covering the basics of Multitask training with Music Models — which we’ll use to do really cool things like harmonization, melody generation, and song remixing. We’ll be building off of Part I … Read more A Multitask Music Model with BERT, Transformer-XL and Seq2Seq

Insurance data science : Pictures

At the Summer School of the Swiss Association of Actuaries, in Lausanne, following the part of Jean-Philippe Boucher (UQAM) on telematic data, I will start talking about pictures this Wednesday. Slides are available online Ewen Gallic (AMSE) will present a tutorial on satellite pictures, and a simple classification problem, related to Alzeimher detection. We will … Read more Insurance data science : Pictures

The Symmetry and Asymmetry of Baseball’s Graph

To derive insights about baseball, many analysts use a Markov chain model to describe the game. While a modeler could pose such a chain in a myriad of different ways, a frequently seen and comparatively simple choice describes a half-inning with 25 states, and seemingly countless possible transitions among these states. These states (also called … Read more The Symmetry and Asymmetry of Baseball’s Graph

Practical Tips for Training a Music Model

This is Part II of the “Building An A.I. Music Generator” series. We’ll be taking a deeper dive into building the music model introduced in Part I. Here’s a quick outline: Data Encoding and how to handle: Polyphony Note pitch/duration Training best practices: Data Augmentation Positional Encoding Teacher Forcing TransformerXL Architecture Note: This is a … Read more Practical Tips for Training a Music Model

Creating a Pop Music Generator with the Transformer

TLDR; Train a Deep Learning model to generate pop music. You can compose music with our pre-trained model here — http://musicautobot.com. Source code is available here — https://github.com/bearpelican/musicautobot. In this post, I’m going to explain how to train a deep learning model to generate pop music. This is Part I of the “Building An A.I. … Read more Creating a Pop Music Generator with the Transformer

Speaking at BARUG

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We will be speaking at the Tuesday, September 3, 2019 … Read more Speaking at BARUG

Data Visualization For Everyone Pt 2

Part 2: Creating & Curating Visualization Part 1 My background is in the AEC ( Architecture, Engineering, & Construction) industry — one which has historically lagged behind most every other industry in adoption and application of new and transformative technologies. That challenge presents itself today still — with the modern shift towards data-driven corporate frameworks. … Read more Data Visualization For Everyone Pt 2

Data Visualization For Everyone Pt 1

Part 1: Collection, Storage, and Versioning Part 2 My background is in the AEC ( Architecture, Engineering, & Construction) industry — one which has historically lagged behind most every other industry in adoption and application of new and transformative technologies. That challenge presents itself today still — with the modern shift towards data-driven corporate frameworks. … Read more Data Visualization For Everyone Pt 1

You Must Know Constrained Least Squares

Spicing up multi-objective least squares to introduce hard problem constraints. We talked about least squares before, also in the case of multiple objective functions, it was quite simple to reduce the problem to the classical formulation. Now, some problems have constraints to their solutions and are not that straight-forward to solve with simply taking the … Read more You Must Know Constrained Least Squares

Built too frail: Why your data governance initiative wasn’t going to work anyway

Everyone knows data governance is crucial. Everyone knows data quality must get better. So why are they all sitting around pointing fingers at IT? Ninety-five percent of professionals think data governance is important. Fifty-two percent say it’s critical. But very few plan to do much about it. Sixty-three percent of organizations still don’t have a … Read more Built too frail: Why your data governance initiative wasn’t going to work anyway

Be careful of NA/NaN/Inf values when using base R’s plotting functions!

I was recently working on a supervised learning problem (i.e. building a model using some features to predict some response variable) with a fairly large dataset. I used base R’s plot and hist functions for exploratory data analysis and all looked well. However, when I started building my models, I began to run into errors. … Read more Be careful of NA/NaN/Inf values when using base R’s plotting functions!

Everything You Need To Know About Saving Weights In PyTorch

We will first see how to write the syntax for state_dict. It’s pretty easy. Its just a python’s ordered dictionary. But, printing this, would result in chaos. So we wouldn’t print the state_dict for the entire model here, but I encourage you guys to go ahead and print it out on your screens ! I … Read more Everything You Need To Know About Saving Weights In PyTorch

NBA Data Analytics: Changing the Game

The 3 Point Shot and Data Visualization Tools “Analytics are part and parcel of virtually everything we do now” — NBA Commissioner Adam Silver If you’re a fan of the NBA you’re well aware that the NBA is undergoing a massive revolution right now. The game itself is changing as teams are discovering new ways … Read more NBA Data Analytics: Changing the Game

Crypto Trading 2019 Half Year Review: 17 Advanced + 15 Neural Net strategies tested [Part 9]

This is Part 9 in multi-part series: Part 1: Basic strategies, introduction, setup and testing vs June-July market. Part 2: Advanced strategies and where to find them, testing vs June-July market. Part 3: Basic and Advanced strategies testing vs August market. Part 4: Neural Network strategies description and backtests against September market. Part 5: Neural … Read more Crypto Trading 2019 Half Year Review: 17 Advanced + 15 Neural Net strategies tested [Part 9]

Should you explain your predictions with SHAP or IG?

Some of the most accurate predictive models today are black box models, meaning it is hard to really understand how they work. To address this problem, techniques have arisen to understand feature importance: for a given prediction, how important is each input feature value to that prediction? Two well-known techniques are SHapley Additive exPlanations (SHAP) … Read more Should you explain your predictions with SHAP or IG?

Your single source for Azure best practices

Optimizing your Azure workloads can feel like a time-consuming task. With so many services that are constantly evolving it’s challenging to stay on top of, let alone implement, the latest best practices and ensure you’re operating in a cost-efficient manner that delivers security, performance, and reliability. Many Azure services offer best practices and advice. Examples … Read more Your single source for Azure best practices

Cyclists – London Ride 100 – Analysis for riders and clubs using Shiny/R

Introduction The Prudential Ride London is an annual summer cycling weekend and within that I will focus on the Ride London-Surrey 100, a 100 mile route open to the public starting at the Stratford Olympic Park in East London and finishing in front of Buckingham Palace. I wrote an initial analysis using R in 2016 … Read more Cyclists – London Ride 100 – Analysis for riders and clubs using Shiny/R

How to Automate EDA with DataExplorer in R

EDA (Exploratory Data Analysis) is one of the key steps in any Data Science Project. The better the EDA is the better the Feature Engineering could be done. From Modelling to Communication, EDA has got much more hidden benefits that aren’t often emphasised while beginners start while teaching Data Science for beginners. The Problem That … Read more How to Automate EDA with DataExplorer in R

Do you love Data Science? I mean, the Data part in it

Last week, We talked all about Artificial Intelligence (also Artifical Stupidity) which led me to think about the foundation of Data Science that’s the Data itself. I think, Data is the least appreciated entity in the Data Science Value chain. You might agree with me, If you do Data Science outside Competitive Platforms like Kaggle … Read more Do you love Data Science? I mean, the Data part in it

The Martian Chronicles — When Deep Learning meets Global Collaboration

What you see above are 2 gray-scale photos of the surface of Mars, captured in the vicinity of the landing site of the Spirit, which was a robotic rover built by NASA, active from 2004 to 2010. If you look closely, you will find that the image to the right of the arrow is exactly … Read more The Martian Chronicles — When Deep Learning meets Global Collaboration

Applying product methodologies in data science

What makes a great data driven product? Fancy models? Ground breaking ideas? The truth is that the secret sauce usually rests in successfully implementing a product methodology. In this post I carry out a retro on a recent hackathon experience, using lean and agile methodology concepts of Minimum Viable Product, Risky Assumptions, and Spikes. I … Read more Applying product methodologies in data science

Plant Disease Detection Web Application using Fastai

Achieving state of the art result with fast.ai Creating an AI web application that detects diseases in plants using FastAi which built on the top of Facebook’s deep learning platform: PyTorch. According to the Food and Agriculture Organization of the United Nations (UN), transboundary plant pests and diseases affect food crops, causing significant losses to … Read more Plant Disease Detection Web Application using Fastai

Shopping with Your Camera: How Visual Search Is Transforming eCommerce

Visual search is taking the retail world by storm. When customers perform a visual search, they look for a product with an image instead of keywords. Shoppers can take a photo of something they want to buy (such as a pair of sneakers on a passerby) and upload it to the visual search engine of … Read more Shopping with Your Camera: How Visual Search Is Transforming eCommerce

Biological Data Science and Why Domain Expertise and Context is King in

If I showed you a picture of a cat and told you it was a pink panda bear, would you believe me? If I showed you a picture of a cat and told you it was a pink panda bear, would you believe me? How would you go about validating that what I was telling … Read more Biological Data Science and Why Domain Expertise and Context is King in

Effective Way for Finding Deep Learning Papers

Woman holding book on bookshelves — Photo by Becca Tapert on Unsplash Recently, I came across a great video of Prof. Andrew Ng who explains in front of a CS class at Stanford how one can excel in the field of artificial intelligence. I will rephrase his words below. Deep learning is evolving fast enough … Read more Effective Way for Finding Deep Learning Papers

Statistical Sentiment Analysis for Survey Data using Python

4. FREQUENT WORDS Frequently we want to know which words are the most common from survey s since we are looking for some patterns. Given the data set, we can find k number of most frequent words with Natural Language Processing (NLP) using Python. In natural language processing, useless words (data), are referred to as … Read more Statistical Sentiment Analysis for Survey Data using Python

Synthesizing population time-series data from the USA Long Term Ecological Research Network

Introduction The availability of large quantities of freely available data is revolutionizing the world of ecological research. Open data maximizes the opportunities to perform comparative analyses and meta-analyses. Such synthesis efforts will increasingly exploit “population data”, which we define here as time series of population abundance. Such population data plays a central role in testing … Read more Synthesizing population time-series data from the USA Long Term Ecological Research Network

Local randomness in R

[This article was first published on rstats on QuestionFlow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Prologue Let’s say we have a deterministic (non-random) problem for which … Read more Local randomness in R

Plumber Logging

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The plumber R package is used to expose R functions as API … Read more Plumber Logging

What to do when “the model doesn’t work”?

Photo by JESHOOTS.COM on Unsplash Your team has worked for months to gather data, built a predictive model, create a user interface, and deploy a new machine learning product with some early customers. But instead of celebrating victory, you’re now hearing grumbling from the account managers for those early adopter customers that they’re not happy … Read more What to do when “the model doesn’t work”?

Support Vector Machine Python Example

Support Vector Machine (SVM) is a supervised machine learning algorithm capable of performing classification, regression and even outlier detection. The linear SVM classifier works by drawing a straight line between two classes. All the data points that fall on one side of the line will be labeled as one class and all the points that … Read more Support Vector Machine Python Example

10 Steps to your very own Corporate A.I project

A non-technical guide for managers, leaders, thinkers and dreamers Much like the rest of the world, A.I has a 1% problem. While (very) large corporations benefit from the vast amount of data available to them, as well as a combination of business, technological and regulatory expertise, most SMEs have no such luck. This should however … Read more 10 Steps to your very own Corporate A.I project

Anisotropic, Dynamic, Spectral and Multiscale Filters Defined on Graphs

As part of the “Tutorial on Graph Neural Networks for Computer Vision and Beyond” I’m presenting an overview of important Graph Neural Network works, by distilling key ideas and explaining simple intuition behind milestone methods using Python and PyTorch. This post continues the first part of my tutorial. In the “Graph of Graph Neural Network … Read more Anisotropic, Dynamic, Spectral and Multiscale Filters Defined on Graphs

A Few Examples of Why The Data Revolution Fills Me with Both Wonder and Fear

60 Minutes: The Oracle of AI (July 14th, 2019) If “The Human Face of Big Data” shined a light on the potential benefits of AI, a profile of China’s current efforts in AI-based surveillance spurs some of those overriding concerns regarding a real-life “1984” coming. Kai-Fu Lee, known as the “Oracle of AI,” was the … Read more A Few Examples of Why The Data Revolution Fills Me with Both Wonder and Fear

Proximal Policy Optimization Tutorial (Part 2/2: GAE and PPO loss)

Let’s code from scratch an RL football agent! Part 1 link: Proximal Policy Optimization Tutorial (Part 1: Actor-Critic Method) Welcome to the second part of the Reinforcement Learning math and code tutorial series. In the first part of this series, we saw how to setup the Google Football Environment and then implemented an Actor-Critic model … Read more Proximal Policy Optimization Tutorial (Part 2/2: GAE and PPO loss)

The ultimate guide to A/B testing. Part 2: Data distributions

A/B testing is a very popular technique for checking granular changes in a product without mistakenly taking into account changes that were caused by outside factors. In this series of articles, I will try to give an easy hands-on manual on how to design, run and estimate results of a/b tests, so you are ready … Read more The ultimate guide to A/B testing. Part 2: Data distributions

AI, Machine Learning and Data Science Roundup: July/August 2019

A mostly monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I’ve noted over the past month or so. Open Source AI, ML & Data Science News StanfordNLP: a pure-Python package for grammatical … Read more AI, Machine Learning and Data Science Roundup: July/August 2019

Serverless Recommendation System using PySpark and GCP

I will call the factor matrix for users as “X” and factor matrix for movies as “Y”. By the way, we don’t know what latent features represent and we don’t have to. — We just need to figure out the values of latent features, then determining unknown user ratings for each movie is just the … Read more Serverless Recommendation System using PySpark and GCP

Machine Learning: Making binary annotations a little less boring

For a university project, I’m developing a Music recommendation classifier based on the Spotify API. The vast idea is to recommend new music to the user, based on songs he personally likes or dislikes, based on the musical components of the song (speed, tonality, instrumentality and many more). The preparation of the dataset usually is … Read more Machine Learning: Making binary annotations a little less boring

Python and R for Data Wrangling: Examples for Both, Including Speed-Up Considerations.

Skill-up by becoming a bilingual data scientist. Learn speed-up code tips. Write bilingual notebooks with interoperable Python and R cells. © Artur/AdobeStock A couple of years back, you would write your data analysis program, exclusively in one of these two languages: Python or R. Both languages offer great functionality from data exploration to modeling and … Read more Python and R for Data Wrangling: Examples for Both, Including Speed-Up Considerations.

Six ways we’re making Azure reservations even more powerful

New Azure reservations features can help you save more on your Azure costs, easily manage reservations, and create internal reports. Based on your feedback, we’ve added the following features to reservations: Azure Databricks pre-purchase plan App Service Isolated Stamp Fee reservations Ability to automatically renew reservations Ability to scope reservations to resource group Enhanced usage … Read more Six ways we’re making Azure reservations even more powerful

Frawd detection using Benford’s Law (Python Code)

For this article i choose two particular datasets which came to publicity from recent elections. The first one is American Presidential Election 2016 and the second one came from Rusian Presidential Elections 2018. For my first project i get my data from Impractical Python Project. I took into consideration only the votes for Donald Trump … Read more Frawd detection using Benford’s Law (Python Code)

Can we use a neural network to generate Shiny code?

Many news reports scare us with machines taking over our jobs in the not too distant future. Common examples of take-over targets include professions like truck drivers, lawyers and accountants. In this article we will explore how far machines are from replacing us (R programmers) in writing Shiny code. Spoiler alert: you should not be … Read more Can we use a neural network to generate Shiny code?

Vectors and Functions

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous set we started with arithmetic operations on vectors. We’ll take … Read more Vectors and Functions

How I went from Automation Engineer to Data Scientist

Advice and lessons learned from my journey into the world of Data Science and Machine Learning. Photo by Allie Smith on Unsplash In this post, I would like to share some valuable insights I learned throughout my 2.5 years career transition journey, in hopes that it will help any reader who is contemplating making a … Read more How I went from Automation Engineer to Data Scientist

A new handwritten digits dataset in ML town: Kannada-MNIST

Class-wise mean images of the 10 handwritten digits in the Kannada MNIST dataset I am disseminating 2 datasets:Kannada-MNIST dataset: 28X 28 grayscale images: 60k Train | 10k TestDig-MNIST: 28X 28 grayscale images: 10240 (1024×10) {See pic below} Putting the ‘Dig’ in Dig-MNIST The Kannada-MNIST dataset is meant to be a drop-in replacement for the MNIST … Read more A new handwritten digits dataset in ML town: Kannada-MNIST

Deploy your First Analytics Project

Definitive Guide for Data Professionals (Data Analytics) Deploy Lazada’s Web Scraping Dashboard with Heroku Source: Unsplash For data scientists, it is very important to deploy your application to the cloud, to make it accessible for any technical or non technical users. From my working experience, the expectations that data scientists create machine learning models while … Read more Deploy your First Analytics Project