Data Science Austria

Batch Processing of Monotonic Binning

In my GitHub repository (https://github.com/statcompute/MonotonicBinning), multiple R functions have been developed to implement the monotonic binning by using either iterative discretization or isotonic regression. With these functions, we can run the monotonic binning for one independent variable at a time. However, in a real-world production environment, we often would want … Read moreBatch Processing of Monotonic Binning

historical word embeddings & lexical semantic change

I have developed a Git Hub guide that demonstrates a simple workflow for sampling Google n-gram data and building historical word embeddings with the aim of investigating lexical semantic change. Here, we build on this workflow, and unpack some methods presented in Hamilton, Leskovec, and Jurafsky (2016) & Li et … Read morehistorical word embeddings & lexical semantic change

How to easily automate R analysis, modeling and development work using CI/CD, with working examples

Automating the execution, testing and deployment of R work is a very powerful tool to ensure the reproducibility, quality and overall robustness of the code that we are building, be it for data analysis and modeling purposes, developing R packages or even blogging. Modern tools also provide a free an … Read moreHow to easily automate R analysis, modeling and development work using CI/CD, with working examples

Big Data: On RDDs, Dataframes,Hive QL with Pyspark and SparkR-Part 3

Out[90]: [[‘Runs’, ‘Mins’, ‘BF’, ‘4s’, ‘6s’, ‘SR’, ‘Pos’, ‘Dismissal’, ‘Inns’, ‘Opposition’, ‘Ground’, ‘Start Date’], [’15’, ’28’, ’24’, ‘2’, ‘0’, ‘62.5’, ‘6’, ‘bowled’, ‘2’, ‘v Pakistan’, ‘Karachi’, ’15-Nov-89′], [‘DNB’, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘-‘, ‘4’, ‘v Pakistan’, ‘Karachi’, ’15-Nov-89′], [’59’, ‘254’, ‘172’, ‘4’, ‘0’, ‘34.3’, ‘6’, ‘lbw’, ‘1’, ‘v … Read moreBig Data: On RDDs, Dataframes,Hive QL with Pyspark and SparkR-Part 3