Dow Jones Stock Market Index (3/4): Log Returns GARCH Model

Categories Advanced Modeling Tags Data Visualisation Linear Regression R Programming In this third post, I am going to build an ARMA-GARCH model for Dow Jones Industrial Average (DJIA) daily log-returns. You can read the first and second part which I published previously. Packages The packages being used in this post series are herein listed. suppressPackageStartupMessages(library(lubridate)) … Read more

Categories R Tags ExcerptFavorite

InfluxDB Data Retention

InfluxDB Configuration InfluxDB does not listen for collectd input by default. In order to allow data to be submitted by a collectd agent, the InfluxDB server must be configured to listen for collectd connections. This section describes how to configure collectd on a RHEL/CentOS system. The first step is to create a database on the … Read more

Don’t reinvent the wheel: making use of shiny extension packages. Join MünsteR for our next meetup!

In our next MünsteR R-user group meetup on Tuesday, February 5th, 2019, titled Don’t reinvent the wheel: making use of shiny extension packages., Suthira Owlarn will introduce the shiny package and show how she used it to build an interactive web app for her sequencing datasets. You can RSVP here: http://meetu.ps/e/Gg5th/w54bW/f Shiny is a popular … Read more

Categories R Tags ExcerptFavorite

Continuing to Grow Community Together at ozunconf, 2018

In late November 2018, we ran the third annual rOpenSci ozunconf. This is the sibling rOpenSci unconference, held in Australia. We ran the first ozunconf in Brisbane in 2016, and the second in Melbourne in 2017. Photos taken by Ajay from Fotoholics As usual, before the unconf, we started discussion on GitHub issue threads,and the … Read more

Categories R Tags ExcerptFavorite

RStudio Server Pro is now available on Microsoft Azure Marketplace

RStudio is excited to announce the availability of its flagship, enterprise-ready, integrated development environment for R in Azure Marketplace. RStudio Server Pro for Azure is an on-demand, commercially-licensed integrated development environment (IDE) for R on the Microsoft Azure Cloud. It offers all of the capabilities found in the popular RStudio open source IDE, plus turnkey … Read more

Categories R Tags ExcerptFavorite

Looking back at 2018 and plans for 2019

At the end of every year I plan to write about the highlight of the current year and set plans for the future. First, let’s talk about my work in 2018. Research wise, my scientometrics paper Is predatory publishing a real threat? Evidence from a large database study was featured in many news outlets. Its … Read more

Categories R Tags ExcerptFavorite

BH 1.69.0-1 on CRAN

The BH package provides a sizeable portion of the Boost C++ libraries as a set of template headers for use by R. It is quite popular, and frequently used together with Rcpp. The BH CRAN page shows e.g. that it is used by rstan, dplyr as well as a few other packages. The current count … Read more

Categories R Tags ExcerptFavorite

Neural Networks and Philosophy of Language

Why Wittgenstein’s theories are the basis of all modern NLP Word embeddings is probably one of the most beautiful and romantic ideas in the history of artificial intelligence. If Philosophy of Language is the branch of philosophy that explores the relationship between language and reality and how we are able to make meaningful conversations understanding one … Read more

Explainable AI vs Explaining AI — Part 2: Statistical Intuitive vs. Symbolic Reasoning Systems

Statistical Intuitive vs. Symbolic Reasoning Systems In the early 1900s, the horse Clever Hans showed a remarkable ability to answer arithmetic questions. Hans tapped numbers or letters with his hoof in order to answer questions. This behavior, drew the world‘s attention to him as the first intelligent animal. However, after some experimentation, it appeared that … Read more

Understanding Generative Adversarial Networks (GANs)

Generative Adversarial Networks The “indirect” training method The “direct” approach presented above compare directly the generated distribution to the true one when training the generative network. The brilliant idea that rules GANs consists in replacing this direct comparison by an indirect one that takes the form of a downstream task over these two distributions. The training … Read more

Long-awaited updates to htmlTable

Lets celebrate 2019 with some updates to my most popular package ever, the htmlTable. The image is CC by Thomas Hawk One of the most pleasant surprises has been the popularity of my htmlTable-package with more than 100k downloads per month. This is all thanks to more popular packages relying on it and the web … Read more

Categories R Tags ExcerptFavorite

My First Adventures in NLP

A journey in exploring sentiment analysis as a “black box” and breaking it down to find insights. November 15, 2018 I set out on a mission to create a drug-review sentiment analysis model. It began as most ideas do… with a problem. For the past 24 months my wife has struggled through a major depressive … Read more

Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention

A new visualization tool shows how BERT forms its distinctive attention patterns. In Part 1 of this series, I described 6 key patterns that appear in BERT’s self-attention layers. For example, one pattern focuses nearly all of the attention on the next word in the sequence; another focuses on the previous word (see illustration below). … Read more

Mask R-CNN for Ship Detection & Segmentation

Model predicting mask segmentations and bounding boxes for ships in a satellite image In this post we’ll use Mask R-CNN to build a model that takes satellite images as input and outputs a bounding box and a mask that segments each ship instance in the image. We’ll use the train and dev datasets provided by … Read more

Marketing analytics with greybox

One of the reasons why I have started the greybox package is to use it for marketing research and marketing analytics. The common problem that I face, when working with these courses is analysing the data measured in different scales. While R handles numeric scales natively, the work with categorical is not satisfactory. Yes, I … Read more

Categories R Tags ExcerptFavorite

Tutorial: An app in R shiny visualizing biopsy data —  in a pharmaceutical company

Tutorial: An app in R shiny visualizing biopsy data — in a pharmaceutical company Learn how to build a shiny app for the visualization of clustering results. The app helps to better identify patient data samples, e.g. during a clinical study. This tutorial is a joint work effort. The Tutorial was presented by Olaf Menzer in a workshop … Read more

Categories R Tags ExcerptFavorite

Machine Learning — Probability & Statistics

Essential Probability & Statistics for Machine Learning Machine Learning is an interdisciplinary field that uses statistics, probability, algorithms to learn from data and provide insights which can be used to build intelligent applications. In this article, we will discuss some of the key concepts widely used in machine learning. Probability and statistics are related areas of … Read more

Review: InstanceFCN — Instance-Sensitive Score Maps (Instance Segmentation)

Fully Convolutional Network (FCN), With Instance-Sensitive Score Maps, Better than DeepMask, Competitive with MNC In this story, InstanceFCN (Instance-sensitive Fully Convolutional Networks), by Microsoft Research, Tsinghua University, and University of Science and Technology of China, is shortly reviewed. By using Fully Convolutional Network (FCN), Instance-Sensitive Score Maps are introduced and all Fully Connected (FC) layers are … Read more

RTest: pretty testing of R packages

The specflow and cucumber.io for R. Enabling non-coders to interpret test reports for R-packages, moreover allowing non-coders to create test cases. A step towards simple r package validation. by startupstockphotos http://startupstockphotos.com/post/143841899156 Table of contents Why RTest? Testing in R seems simple. Start by using usethis::test_name(“name”) and off you go by coding your tests in testthat with … Read more

Categories R Tags ExcerptFavorite

PyTorch Autograd : Understanding the heart of PyTorch’s magic

Source: http://bumpybrains.com/comics.php?comic=34 Let’s just agree, we are all bad at calculus when it comes to large neural networks. It is impractical to calculate gradients of such large composite functions by explicitly solving mathematical equations especially because these curves exist in a large number of dimensions and are impossible to fathom. To deal with hyper-planes in … Read more

7 Reasons for Policy Professionals to Get Pumped About R Programming in 2019

Note: A version of this article was also published via LinkedIn here. With the rise of ‘Big Data’, ‘Machine Learning’ and the ‘Data Scientist’ has come an explosion in the popularity of using open-source programming tools for data analysis. This articleprovides a short summary of some of the evidence of these tools overtakingcommercial alternatives and … Read more

Categories R Tags ExcerptFavorite

Part 2, further comments on OfS grade-inflation report

Update, 2019-01-07: I am pleased to say that the online media article that I complained about in Sec 1 below has now been amended by its author(s), to correct the false attributions.  I am grateful to Chris Parr for helping to sort this out. In my post a few days ago (which I’ll now call … Read more

Categories R Tags ExcerptFavorite

From raw images to real-time predictions with Deep Learning

In my opinion, one of the most exciting fields in Artificial Intelligence is computer vision. I find it very interesting how we can now automatically extract knowledge from complex raw data structures such as images. The goal of this article is to explore a complete example of a computer vision application: building a face expression … Read more

Manifolds in Data Science — A Brief Overview

What is this thing? Data science requires an insightful understanding of data. As more and more data accumulates, it becomes harder to answer the following question: How do I spatially represent my data in an accurate and meaningful way? I claim that a super useful step in answering this question is understanding what a manifold is. … Read more

Explained: Multilingual Sentence Embeddings for Zero-Shot Transfer

Applying a Single Model on 93 Languages Language models and transfer learning have become one of the cornerstones of NLP in recent years. Phenomenal results were achieved by first building a model of words or even characters, and then using that model to solve other tasks such as sentiment analysis, question answering and others. While … Read more

RcppStreams 0.1.2

A maintenance release of RcppStreams arrived on CRAN this afternoon. RcppStreams brings the excellent Streamulus C++ template library for event stream processing to R. Streamulus, written by Irit Katriel, uses very clever template meta-programming (via Boost Fusion) to implement an embedded domain-specific event language created specifically for event stream processing. This release provides a minor … Read more

Categories R Tags ExcerptFavorite

February 21st & 22nd: End-2-End from a Keras/TensorFlow model to production

Registration is now open for my 1.5-day workshop on how to develop end-2-end from a Keras/TensorFlow model to production. It will take place on February 21st & 22nd in Berlin, Germany. Please register by sending an email to [email protected] with the following information: name company/institute/affiliation address for invoice phone number reference to this blog The … Read more

Categories R Tags ExcerptFavorite

Announcing the 1st Shiny Contest

Shiny apps are a great way to communicate your data science insights with striking, dynamic, interactive visualizations and reports. Over the years, we have loved interacting with the Shiny community and loved seeing and sharing all the exciting apps, dashboards, and interactive documents Shiny developers have produced. We also love seeing Shiny developers openly sharing … Read more

Categories R Tags ExcerptFavorite

Reverse Geocoding in R

Free Without the Google or Bing API Photo by Lonely Planet on Unsplash As I continue to work on my dissertation, I have come across a few glitches in executing what should be easy scripts from various packages in R and Python. This weekend, I gave myself the task to reverse geocode ~1 million latitude and longitude … Read more

Timing the Same Algorithm in R, Python, and C++

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on … Read more

Categories R Tags ExcerptFavorite

Know your enemy

Projected Gradient Descent (PGD) The PGD attack is a white-box attack which means the attacker has access to the model gradients i.e. the attacker has a copy of your model’s weights. This threat model gives the attacker much more power than black box attacks as they can specifically craft their attack to fool your model without … Read more

2018 Volatility Recap

2018 brought more volatility to the markets, which so far has spilled into 2019. Let’s take a look at the long term volatility history picture using the Dow Jones Industrial Average: Indeed, 2018 was the most volatile year since 2011. Relatively speaking however, the volatility is on the low end for a bear market, which … Read more

Categories R Tags ExcerptFavorite

Programming language that rules the Data Intensive (Big Data+Fast Data) frameworks.

Back in the good old days, companies primarily used one main programming language (e.g. C,C++, Java, C#,…), one type of database (SQL) and two data exchange formats (XML, JSON). The situation has changed since the beginning of 21st century with the rise of internet and mobile devices. Every year, the volume and speed of data … Read more

Marketing Analytics through Markov Chain

Image Source : http://setosa.io/ev/markov-chains/ Imagine you are a company selling a fast-moving consumer good in the market. Let’s assume that the customer would follow the below journey to make the final purchase: These are the states at which the customer would be at any point in the purchase journey. Now, how to find out at which … Read more

Deep Learning with Magnetic Resonance and Computed Tomography Images

Getting started with applying deep learning to magnetic resonance (MR) or computed tomography (CT) images is not straightforward; finding appropriate data sets, preprocessing the data, and creating the data loader structures necessary to do the work is a pain to figure out. In this post I hope to alleviate some of that pain for newcomers. … Read more

Survival Analysis: Intuition & Implementation in Python

Table of Contents Introduction Definitions Mathematical Intuition Kaplan-Meier Estimate Cox Proportional Hazard Model End Note Additional Resources Introduction Survival Analysis is a set of statistical tools, which addresses questions such as ‘how long would it be, before a particular event occurs’; in other words we can also call it as a ‘time to event’ analysis. This … Read more

Deep Multi-Task Learning — 3 Lessons Learned

For the past year, my team and I have been working on a personalized user experience in the Taboola feed. We used Multi-Task Learning (MTL) to predict multiple Key Performance Indicators (KPIs) on the same set of input features, and implemented a Deep Learning (DL) model in TensorFlow to do so. Back when we started, … Read more

Scaling H2O analytics with AWS and p(f)urrr (Part 1)

In these small tutorials to follow over the next 3 weeks, I go through the steps of using an AWS AMI Rstudio instance to run a toy machine learning example on a large AWS instance. I have to admit that you have to know a little bit about AWS to follow the next couple of … Read more

Categories R Tags ExcerptFavorite

Multi-Layer Perceptron usingFastAI and PyTorch

In this blog, I am going to show you how to build a neural network(multilayer perceptron) using FastAI v1 and Pytorch and successfully train it to recognize digits in the image. Pytorch is a very popular deep learning framework released by Facebook, and FastAI v1 is a library which simplifies training fast and accurate neural … Read more

Concept Learning and Feature Spaces

How can we teach a machine to grasp an idea? In the summer of 2004, millions of cinema-goers across the world experienced I, Robot — a movie depicting a vision of a close future in which humanoid machine servants are an integral part of society. Inspired by Asimov’s collection of short stories printed under the same name, the … Read more

Co-integration and Mean Reverting Portfolio

In the previous post https://statcompute.wordpress.com/2018/07/29/co-integration-and-pairs-trading, it was shown how to identify two co-integrated stocks in the pair trade. In the example below, I will show how to form a mean reverting portfolio with three or more stocks, e.g. stocks with co-integration, and also how to find the linear combination that is stationary for these stocks. … Read more

Categories R Tags ExcerptFavorite

Demystifying Maths of SVM

Deriving the optimization objective of the Support Vector Machine for a linearly separable dataset with a detailed discourse on each step So, three days into SVM, I was 40% frustrated, 30% restless, 20% irritated and 100% inefficient in terms of getting my work done. I was stuck with the Maths part of Support Vector Machine. I … Read more

Optimize Data Science Models with Feature Engineering: Cluster Analysis, Metrics Development, and…

Cluster Analysis, Metrics Development, and PCA with Baby Names Data While baby name articles are mandatory reading for soon to be parents, the U.S. Social Security’s (SSA’s) Baby Names data set should be a required for budding data scientists. The data set can be sliced and diced in many different ways, including language and time based … Read more

Hot Dog or Not a Hot Dog: Using Metaprogramming to Write UI Tests

mvndyBlockedUnblockFollowFollowing Jan 5 Today, I make an exciting next step in a personal exploration of Kotlin, the next frontier in modern metaprogramming. In an epic quest to use metaprogramming to write UI tests for TornadoFX, the idea was to upload a TornadoFX project into TornadoFX-Suite, which in turn reads the project, detects UI components, and … Read more

Has Global Violence Declined? A Look at the Data

The Drivers of the Decline in Violence For this section, we will look at Steven Pinker’s ideas as outlined in The Better Angels of Our Nature. (This work is the best look at potential reasons why violence has declined — partly because others refuse to concede this point). Pinker explains five forces, intended to account for the decline … Read more

How BERT leverage attention mechanism and transformer to learn word contextual relations

After ELMo (Embeddings from Language Model) and Open AI GPT (Generative Pre-trained Transformer), a new state-of-the-art NLP paper is released by Google. They call this approach as BERT (Bidirectional Encoder Representations from Transformers). Both Open AI GPT and BERT use transformer architecture to learn the text representations. One of the difference is BERT use bidirectional … Read more

Regularization in Gradient Point of View [ Manual Back Propagation in Tensorflow ]

Results Train/Test Accuracy When we view the accuracy plots for both training and testing data, we can observe that…..Highest Training Accuracy is Achieved when adding: sqrt(θ²)/θHighest Testing Accuracy is Achieved when adding: -tanh(θ)Lowest Performance is Achieved when adding: θ When we add the term θ the derivative value to each weight just becomes one, and … Read more

gganimation for the nation

gganimate hits CRAN – At the NHS_R conference back in October, I showed a few ways of building animations using gifski – I also wrote up the method I used in an earlier post right here. And, the source code for all this stuff is available from my github here. However – the gganimate file … Read more

Categories R Tags ExcerptFavorite

An Introduction to Docker for R Users

A quick introduction on using Docker for reproducibility in R. Disclaimer: this blog post is an introduction to Docker for beginners,and will takes some shortcuts ? What is Docker? Docker is “a computer program that performs operating-system-levelvirtualization, also known as ‘containerization’”Wikipedia. As anyfirst line of a Wikipedia article about tech, this sentence is obscureto anyone … Read more

Categories R Tags ExcerptFavorite