How To Train A GAN On 128 GPUs Using PyTorch

PyTorch Lightning To train this system on 128 GPUs we’re going to use a lightweight wrapper on top of PyTorch called PyTorch-Lightning which automates everything else we haven’t discussed here (training loop, validation, etc…). The beauty of this library is that the only thing you need to define is the system in a LightningModule interface … Read moreHow To Train A GAN On 128 GPUs Using PyTorch

Using the lpSolve package in R to optimise an electricity system

Reducing carbon emissions is maybe the world’s most pressing challenge at the moment. One obvious avenue for action is the reduction of carbon emissions from electricity generation, which are a significant contributor to global carbon emissions overall. This is particularly true if trends now in place continue undisturbed, with the world relying on electricity to … Read moreUsing the lpSolve package in R to optimise an electricity system

What is vtreat?

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. vtreat is a DataFrame processor/conditioner that prepares real-world data for … Read moreWhat is vtreat?

Using Python and R to visualize and summarize my Foursquare’s Swarm check-ins

Reliving my visit to Singapore through data On July 4, 2019, I visited Singapore. The trip, which initially was supposed to last four days, was extended due to how magnificent, shiny, and out-of-this-world this city-state was. Now, a couple of weeks after, I found myself reminiscing about my days walking at the Gardens by the … Read moreUsing Python and R to visualize and summarize my Foursquare’s Swarm check-ins

Legal Certainty and the Possibility of Computer Decision Making in the Courtroom

Written by Viviane Lindenbergh (2018), as a Law Bachelor’s thesis at VU University Amsterdam. Will computers take over our jobs? During the industrial revolution, many feared that automation would result in mass unemployment in industries that relied on manual labour. The development of more sophisticated robotics and artificial intelligence brings about a similar discussion, only … Read moreLegal Certainty and the Possibility of Computer Decision Making in the Courtroom

The five discrete distributions every Statistician should know

And here I will generate the PMFs of the discrete distributions we just discussed above using Pythons built-in functions. For more details on the upper function, please see my previous post — Create basic graph visualizations with SeaBorn. Also, take a look at the documentation guide for the below functions # Binomial :from scipy.stats import … Read moreThe five discrete distributions every Statistician should know

Better together, synergistic results from digital transformation

Intelligent manufacturing transformation can bring great changes, such as connecting the sales organization with field services. Moving to the cloud also provides benefits such as an intelligent supply chain and innovations enabled by connected products. As such, digital transformation is the goal of many, as it can mean finding a competitive advantage. The Azure platform … Read moreBetter together, synergistic results from digital transformation

Geo Zone Redundant Storage in Azure now in preview

Announcing the preview of Geo Zone Redundant Storage in Azure. Geo Zone Redundant Storage provides a great balance of high performance, high availability, and disaster recovery and is beneficial when building highly available applications or services in Azure. Geo Zone Redundant Storage helps achieve higher data resiliency by doing the following: Synchronously writing three replicas … Read moreGeo Zone Redundant Storage in Azure now in preview

Improving Azure Virtual Machines resiliency with Project Tardigrade

“Our goal is to empower organizations to run their workloads reliably on Azure. With this as our guiding principle, we are continuously investing in evolving the Azure platform to become fault resilient, not only to boost business productivity but also to provide a seamless customer experience. Last month I published a blog post highlighting several … Read moreImproving Azure Virtual Machines resiliency with Project Tardigrade

The Public Cloud Did Not Kill Hadoop — But Complexity Could

by Monte Zweben 2019 has been a rocky year for the big three Hadoop distributors. From the internal optimism and external skepticism regarding the Cloudera/Hortonworks merger completing in January to MapR’s letter of impending doom in May and subsequent purchase by HPE, to Cloudera’s very bad Wednesday in June which saw a stock price collapse … Read moreThe Public Cloud Did Not Kill Hadoop — But Complexity Could

Short Stories: A Collection of Quick Analytics

Scenario: A company is interested in getting into the ‘ville market. The have been monitoring the market for a while and are trying to predict their own potential sales. All the products in the database are competitors (they are all ‘ville products). They believe that they will be able to achieve the median or the … Read moreShort Stories: A Collection of Quick Analytics

Particle Filter : A hero in the world of Non-Linearity and Non-Gaussian

The superiority of particle filter technology in nonlinear and non-Gaussian systems determines its wide range of applications. In addition, the multi-modal processing capability of the particle filter is one of the reasons why it is widely used. Internationally, particle filtering has been applied in various fields. In the field of economics, it is used in … Read moreParticle Filter : A hero in the world of Non-Linearity and Non-Gaussian

Using linear models with binary dependent variables, a simulation study

This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for freehere.This is taken from Chapter 8, in which I discuss advanced functional programming methods formodeling. As written just above (note: as written above in the book), map() simply applies a functionto a list of inputs, and … Read moreUsing linear models with binary dependent variables, a simulation study

(Bootstrapping) Follow-Up Contrasts for Within-Subject ANOVAs (part 2)

A while back I wrote a post demonstrating how to bootstrap follow-up contrasts for repeated-measure ANOVAs for cases whereyou data violates some / any assumptions. Here is a demo of how to conduct the same bootstrap analysis, more simply (no need to make your data wide!) 1. Fit your repeated-measures model with lmer library(lme4) data(obk.long, … Read more(Bootstrapping) Follow-Up Contrasts for Within-Subject ANOVAs (part 2)

Graph Analytics — Introduction and Concepts of Centrality

Eigen Vector Centrality The last flavor of centrality that we will be exploring is known as the Eigen Vector Centrality. This metric measures the importance of a node in a graph as a function of the importance of its neighbors. If a node is connected to highly important nodes, it will have a higher Eigen … Read moreGraph Analytics — Introduction and Concepts of Centrality

A Multitask Music Model with BERT, Transformer-XL and Seq2Seq

Buzzwordy clickbait title intended, but still a simple concept. This is Part III of the “Building An A.I. Music Generator” series. I’ll be covering the basics of Multitask training with Music Models — which we’ll use to do really cool things like harmonization, melody generation, and song remixing. We’ll be building off of Part I … Read moreA Multitask Music Model with BERT, Transformer-XL and Seq2Seq

Insurance data science : Pictures

At the Summer School of the Swiss Association of Actuaries, in Lausanne, following the part of Jean-Philippe Boucher (UQAM) on telematic data, I will start talking about pictures this Wednesday. Slides are available online Ewen Gallic (AMSE) will present a tutorial on satellite pictures, and a simple classification problem, related to Alzeimher detection. We will … Read moreInsurance data science : Pictures

The Symmetry and Asymmetry of Baseball’s Graph

To derive insights about baseball, many analysts use a Markov chain model to describe the game. While a modeler could pose such a chain in a myriad of different ways, a frequently seen and comparatively simple choice describes a half-inning with 25 states, and seemingly countless possible transitions among these states. These states (also called … Read moreThe Symmetry and Asymmetry of Baseball’s Graph

Practical Tips for Training a Music Model

This is Part II of the “Building An A.I. Music Generator” series. We’ll be taking a deeper dive into building the music model introduced in Part I. Here’s a quick outline: Data Encoding and how to handle: Polyphony Note pitch/duration Training best practices: Data Augmentation Positional Encoding Teacher Forcing TransformerXL Architecture Note: This is a … Read morePractical Tips for Training a Music Model

Creating a Pop Music Generator with the Transformer

TLDR; Train a Deep Learning model to generate pop music. You can compose music with our pre-trained model here — http://musicautobot.com. Source code is available here — https://github.com/bearpelican/musicautobot. In this post, I’m going to explain how to train a deep learning model to generate pop music. This is Part I of the “Building An A.I. … Read moreCreating a Pop Music Generator with the Transformer

Speaking at BARUG

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We will be speaking at the Tuesday, September 3, 2019 … Read moreSpeaking at BARUG

Data Visualization For Everyone Pt 2

Part 2: Creating & Curating Visualization Part 1 My background is in the AEC ( Architecture, Engineering, & Construction) industry — one which has historically lagged behind most every other industry in adoption and application of new and transformative technologies. That challenge presents itself today still — with the modern shift towards data-driven corporate frameworks. … Read moreData Visualization For Everyone Pt 2

Data Visualization For Everyone Pt 1

Part 1: Collection, Storage, and Versioning Part 2 My background is in the AEC ( Architecture, Engineering, & Construction) industry — one which has historically lagged behind most every other industry in adoption and application of new and transformative technologies. That challenge presents itself today still — with the modern shift towards data-driven corporate frameworks. … Read moreData Visualization For Everyone Pt 1

You Must Know Constrained Least Squares

Spicing up multi-objective least squares to introduce hard problem constraints. We talked about least squares before, also in the case of multiple objective functions, it was quite simple to reduce the problem to the classical formulation. Now, some problems have constraints to their solutions and are not that straight-forward to solve with simply taking the … Read moreYou Must Know Constrained Least Squares

Be careful of NA/NaN/Inf values when using base R’s plotting functions!

I was recently working on a supervised learning problem (i.e. building a model using some features to predict some response variable) with a fairly large dataset. I used base R’s plot and hist functions for exploratory data analysis and all looked well. However, when I started building my models, I began to run into errors. … Read moreBe careful of NA/NaN/Inf values when using base R’s plotting functions!

NBA Data Analytics: Changing the Game

The 3 Point Shot and Data Visualization Tools “Analytics are part and parcel of virtually everything we do now” — NBA Commissioner Adam Silver If you’re a fan of the NBA you’re well aware that the NBA is undergoing a massive revolution right now. The game itself is changing as teams are discovering new ways … Read moreNBA Data Analytics: Changing the Game

Should you explain your predictions with SHAP or IG?

Some of the most accurate predictive models today are black box models, meaning it is hard to really understand how they work. To address this problem, techniques have arisen to understand feature importance: for a given prediction, how important is each input feature value to that prediction? Two well-known techniques are SHapley Additive exPlanations (SHAP) … Read moreShould you explain your predictions with SHAP or IG?

Your single source for Azure best practices

Optimizing your Azure workloads can feel like a time-consuming task. With so many services that are constantly evolving it’s challenging to stay on top of, let alone implement, the latest best practices and ensure you’re operating in a cost-efficient manner that delivers security, performance, and reliability. Many Azure services offer best practices and advice. Examples … Read moreYour single source for Azure best practices

Cyclists – London Ride 100 – Analysis for riders and clubs using Shiny/R

Introduction The Prudential Ride London is an annual summer cycling weekend and within that I will focus on the Ride London-Surrey 100, a 100 mile route open to the public starting at the Stratford Olympic Park in East London and finishing in front of Buckingham Palace. I wrote an initial analysis using R in 2016 … Read moreCyclists – London Ride 100 – Analysis for riders and clubs using Shiny/R

How to Automate EDA with DataExplorer in R

EDA (Exploratory Data Analysis) is one of the key steps in any Data Science Project. The better the EDA is the better the Feature Engineering could be done. From Modelling to Communication, EDA has got much more hidden benefits that aren’t often emphasised while beginners start while teaching Data Science for beginners. The Problem That … Read moreHow to Automate EDA with DataExplorer in R

Do you love Data Science? I mean, the Data part in it

Last week, We talked all about Artificial Intelligence (also Artifical Stupidity) which led me to think about the foundation of Data Science that’s the Data itself. I think, Data is the least appreciated entity in the Data Science Value chain. You might agree with me, If you do Data Science outside Competitive Platforms like Kaggle … Read moreDo you love Data Science? I mean, the Data part in it

The Martian Chronicles — When Deep Learning meets Global Collaboration

What you see above are 2 gray-scale photos of the surface of Mars, captured in the vicinity of the landing site of the Spirit, which was a robotic rover built by NASA, active from 2004 to 2010. If you look closely, you will find that the image to the right of the arrow is exactly … Read moreThe Martian Chronicles — When Deep Learning meets Global Collaboration

Applying product methodologies in data science

What makes a great data driven product? Fancy models? Ground breaking ideas? The truth is that the secret sauce usually rests in successfully implementing a product methodology. In this post I carry out a retro on a recent hackathon experience, using lean and agile methodology concepts of Minimum Viable Product, Risky Assumptions, and Spikes. I … Read moreApplying product methodologies in data science

Plant Disease Detection Web Application using Fastai

Achieving state of the art result with fast.ai Creating an AI web application that detects diseases in plants using FastAi which built on the top of Facebook’s deep learning platform: PyTorch. According to the Food and Agriculture Organization of the United Nations (UN), transboundary plant pests and diseases affect food crops, causing significant losses to … Read morePlant Disease Detection Web Application using Fastai

Shopping with Your Camera: How Visual Search Is Transforming eCommerce

Visual search is taking the retail world by storm. When customers perform a visual search, they look for a product with an image instead of keywords. Shoppers can take a photo of something they want to buy (such as a pair of sneakers on a passerby) and upload it to the visual search engine of … Read moreShopping with Your Camera: How Visual Search Is Transforming eCommerce

Biological Data Science and Why Domain Expertise and Context is King in

If I showed you a picture of a cat and told you it was a pink panda bear, would you believe me? If I showed you a picture of a cat and told you it was a pink panda bear, would you believe me? How would you go about validating that what I was telling … Read moreBiological Data Science and Why Domain Expertise and Context is King in

Effective Way for Finding Deep Learning Papers

Woman holding book on bookshelves — Photo by Becca Tapert on Unsplash Recently, I came across a great video of Prof. Andrew Ng who explains in front of a CS class at Stanford how one can excel in the field of artificial intelligence. I will rephrase his words below. Deep learning is evolving fast enough … Read moreEffective Way for Finding Deep Learning Papers

Statistical Sentiment Analysis for Survey Data using Python

4. FREQUENT WORDS Frequently we want to know which words are the most common from survey s since we are looking for some patterns. Given the data set, we can find k number of most frequent words with Natural Language Processing (NLP) using Python. In natural language processing, useless words (data), are referred to as … Read moreStatistical Sentiment Analysis for Survey Data using Python

Synthesizing population time-series data from the USA Long Term Ecological Research Network

Introduction The availability of large quantities of freely available data is revolutionizing the world of ecological research. Open data maximizes the opportunities to perform comparative analyses and meta-analyses. Such synthesis efforts will increasingly exploit “population data”, which we define here as time series of population abundance. Such population data plays a central role in testing … Read moreSynthesizing population time-series data from the USA Long Term Ecological Research Network

Plumber Logging

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The plumber R package is used to expose R functions as API … Read morePlumber Logging

Local randomness in R

[This article was first published on rstats on QuestionFlow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Prologue Let’s say we have a deterministic (non-random) problem for which … Read moreLocal randomness in R

What to do when “the model doesn’t work”?

Photo by JESHOOTS.COM on Unsplash Your team has worked for months to gather data, built a predictive model, create a user interface, and deploy a new machine learning product with some early customers. But instead of celebrating victory, you’re now hearing grumbling from the account managers for those early adopter customers that they’re not happy … Read moreWhat to do when “the model doesn’t work”?

Support Vector Machine Python Example

Support Vector Machine (SVM) is a supervised machine learning algorithm capable of performing classification, regression and even outlier detection. The linear SVM classifier works by drawing a straight line between two classes. All the data points that fall on one side of the line will be labeled as one class and all the points that … Read moreSupport Vector Machine Python Example

10 Steps to your very own Corporate A.I project

A non-technical guide for managers, leaders, thinkers and dreamers Much like the rest of the world, A.I has a 1% problem. While (very) large corporations benefit from the vast amount of data available to them, as well as a combination of business, technological and regulatory expertise, most SMEs have no such luck. This should however … Read more10 Steps to your very own Corporate A.I project