Diving into K-Means…
Sep 9, 2018 We have completed our first basic supervised learning model i.e. Linear Regression model in the last post here. Thus in this post we get started with the … Read more
Sep 9, 2018 We have completed our first basic supervised learning model i.e. Linear Regression model in the last post here. Thus in this post we get started with the … Read more
Reading and processing data for statistical and quantitative analysis in trading Sep 8, 2018 Anyone interested in the statistical analysis of financial markets has the need to process historical data. Historical … Read more
https://academy.microsoft.com/en-us/professional-program/tracks/big-data/ Block 1 – Data Fundamentals Learn data science basics. Explore topics like data queries, data analysis, data visualization and how statistics informs data science practices. Please choose from Course … Read more
When we do time series analysis, we are usually interested either in uncovering causal relationships (Does \(X_t\) influence \(Y_{t+1}\)?) or in getting the most accurate forecast possible. Especially in the … Read more
Don’t make decisions based on the weights of an ML model Aug 31, 2018 As I see our customers fall in love with BigQuery ML, an old problem rises its head — I … Read more
Aug 29, 2018 Picture taken from Pixabay In this post and the next, we will look at one of the trickiest and most critical problems in Machine Learning (ML): Hyper-parameter tuning. … Read more
I had my first contact with stochastic control theory in one of my Master’s courses about Continuous Time Finance. I found the subject really interesting and decided to write my … Read more
Aug 28, 2018 Image quality is a notion that highly depends on observers. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly subjective … Read more
Neural Processes (NPs) caught my attention as they essentially are a neural network (NN) based probabilistic model which can represent a distribution over stochastic processes. So NPs combine elements from two worlds:
Deep Learning – neural networks are flexible non-linear functions which are straightforward to train
Gaussian Processes – GPs offer a probabilistic framework for learning a distribution over a wide class of non-linear functions
Despite huge progress in machine learning over the past decade, building production-ready machine learning systems is still hard. Three years ago when we set out to build machine learning capabilities into the Salesforce platform, we learned that building enterprise-scale machine learning systems is even harder.
Can we teach computers to write code? This is the question that brings out an entire branch of research specialized in program synthesis. Programming is a demanding task that requires extensive knowledge, experience and not a frivolous degree of creativity.
Probability and statistics are everywhere: from finance and demographic projections to casino games, these disciplines help us make sense of the world. They also underlie much of the machine learning … Read more
Sounds cool, but … what is it? As I’ve started to pay more attention to machine learning, differentiable rendering is one topic that caught my attention and has been popping … Read more
While looking for some interesting geographical data to work with, I came across the Road Safety Data published by the UK government. This is a very comprehensive road accident data … Read more
A new update of my sjstats-package just arrived at CRAN. This blog post demontrates those functions of the sjstats-package that deal especially with Bayesian models. The update contains some new and some revised functions to compute summary statistics of Bayesian models, which are now described in more detail.
TDAstats is an R pipeline for topological data analysis, specifically, the use of persistent homology in Vietoris-Rips simiplicial complexes to study the shape of data.
Auto-Keras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML … Read more
Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost everyone in different domains like e-commerce, social … Read more
A Boltzmann machine defines a probability distribution over binary-valued patterns. What makes Boltzmann machine models different from other deep learning models is that they’re undirected and don’t have an output … Read more
The last few months I set out to build up to build a news and event aggregator. You can see the work in progress here: data-science-austria.at WordPress Plugins Here is … Read more
It is not enough to just stand up a web service that can make predictions. Aug 13, 2018 Original Image Source — Meme overlay by Imgflip In a 2017 SAS survey, 83% of … Read more
4. Class weighted / cost sensitive learning Without resampling the data, one can also make the classifier aware of the imbalanced data by incorporating the weights of the classes into … Read more
The nature of the problem: medical fraud and abuse The U.S. department of health and human services in a pamphlet Avoiding Medicare Fraud and Abuse: A Roadmap for Physicians states … Read more
There are a multitude of options when it comes to storing and processing data. In this post I want to give you a brief overview of Azure SQL datawarehouse, Microsoft’s … Read more
Aug 2, 2018 Photo by JESHOOTS.COM on Unsplash Look at this equation: Value function of Reinforcement Learning If it does not intimidate you, then you are a mathematical savvy and there … Read more
Recently I came across this cooking recipes data set in Kaggle, and it inspired me to combine 2 of my main interests in life. Food and machine learning. What makes … Read more
So you’ve seen the recent news about how artificial intelligence (AI) is changing everything. However, the idea of AI has been around for a long time. Machines that think and … Read more
Estimators were introduced in version 1.3 of the Tensorflow API, and are used to abstract and simplify training, evaluation and prediction. If you haven’t worked with Estimators before I suggest … Read more
Jul 19, 2018 Hypothesis analysis is a widely known concept and is used extensively by researchers, statisticians and quantitative analysts. It allows them to follow a set of formal steps … Read more
Docker is a tool which helps developers build and ship high quality applications, faster, anywhere. Source Why Docker With Docker, developers can build any app in any language using any … Read more
Jul 8, 2018 In this tutorial we will discuss about integrating PySpark and XGBoost using a standard machine learing pipeline. We will use data from the Titanic: Machine learning from … Read more
In the previous post I covered the basics you need to know to work with SQL Server. In this post, I want to show you some more advanced techniques that … Read more
DIY Noise-Cancellation System prototype made with TensorFlow. Jun 25, 2018 Image by TheDigitalArtist on Pixabay In this post I describe how I built an active noise cancellation system by means of … Read more
After getting scrum.org the PSM I I wanted to capture the relevant content. The complete guido can be downloaded here: scrumguides.org 1. What is Scrum? Scrum is a framework for … Read more
Introduction Learning rate might be the most important hyper parameter in deep learning, as learning rate decides how much gradient to be back propagated. This in turn decides by how … Read more
I’ve been involved in building several different types of recommendation systems, and one thing I’ve noticed is that each use case is different from the next, as each aims to … Read more
Many data professionals are strict on the language to be used for ANN models limiting their dev. environment exclusively to Python. I decided to test performance of Python vs. R … Read more
Jun 14, 2018 This post is about implementing simple linear regression model for ML beginners in step by step way with detailed explanation. If you are new to machine learning, … Read more
Using MQTT protocol, we will get captured data from sensors, logging them to an IoT service, ThingSpeak.com and to a mobile App, Thingsview. 1. Introduction In my previous article, MicroPython … Read more
When you are using Google’s Colaboratory (Colab) for running your Deep Learning models the most obvious way to access the large datasets is by storing them on Google Drive and … Read more
Since R is mostly a functional language and data science work lends itself to be expressed in a functional form you can come by just fine without learning about object-oriented … Read more
Over the past few decades, four key change initiatives have been taking place in the organizations: strategic planning, re-engineering, total quality management and downsizing. The aim of these initiatives was … Read more
Measuring the effect of an intervention on some metric is an important problem in many areas of business and academia. Imagine, you want to know the effect of a recently … Read more
One of the things I particularly like about working in data science, is the science part: Figuring out the right questions to ask, how to frame a problem correctly and … Read more
data.table is an awesome R package, but there are a few things you need to watch out for when using it. R usually does not modify objects in place (e.g. by … Read more
When you work for a large corporation you often have little choice in picking a specific operating system for your company laptop. This post is a collection of random problems … Read more
This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated. … Read more
Here is a great tutorial on how to host hugo on netlify Other examples using the exact same theme: Creating the hugo site In order to create a new hugo … Read more
SQL is not the sexiest language on the block and many/most data scientists I know prefer to stick to R and/or Python. Some common complains I hear about SQL are: … Read more
Creating an R package is as easy as typing: package.skeleton(name = “YourPackageName”) As you might have guessed, this function creates the basic file and folder structure you need to create … Read more