Data Science Austria

Survival analysis and the stratified sample

Part II: Case-control sampling and regression strategy Due to resource constraints, it is unrealistic to perform logistic regression on data sets with millions of observations, and dozens (or even hundreds) of explanatory variables. Luckily, there are proven methods of data compression that allow for accurate, unbiased model generation. Traditional logistic … Read moreSurvival analysis and the stratified sample

Understanding Bayesian Inference with a simple example in R!

 Hi there! Last summer, the Royal Botanical Garden (Madrid, Spain) hosted the first edition of MadPhylo, a workshop about Bayesian Inference in phylogeny using RevBayes. It was a pleasure for me to be part of the organization staff with John Huelsenbeck, Brian Moore, Sebastian Hoena, Mike May, Isabel Sanmartin and … Read moreUnderstanding Bayesian Inference with a simple example in R!

Batch Processing of Monotonic Binning

In my GitHub repository (, multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production environment, we often would want … Read moreBatch Processing of Monotonic Binning

historical word embeddings & lexical semantic change

I have developed a Git Hub guide that demonstrates a simple workflow for sampling Google n-gram data and building historical word embeddings with the aim of investigating lexical semantic change. Here, we build on this workflow, and unpack some methods presented in Hamilton, Leskovec, and Jurafsky (2016) & Li et … Read morehistorical word embeddings & lexical semantic change