Organize Why R? 2019 pre-meeting in your city

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Why R? pre-meetings are R meetups that support local R groups. They promote … Read moreOrganize Why R? 2019 pre-meeting in your city

Probability Distributions in Data Science

An introduction to some of the most commonly used Probability Distributions in Data Science with real-life examples. Having a sound statistical background can be greatly beneficial in the daily life of a Data Scientist. Every time we start exploring a new dataset, we need to first do an Exploratory Data Analysis (EDA) in order to … Read moreProbability Distributions in Data Science

Top Down View at Reinforcement Learning

Stitch together the different parts and branches of Reinforcement Learning When you are new to Reinforcement Learning you will no doubt be bombarded with weird terms, like Model-Based, Model-Free, On Policy, Off Policy etc… Soon you will find it exhausting to keep track of this terminology that seem to appear all over the place, without … Read moreTop Down View at Reinforcement Learning

The Data Processing Error in the Most Prominent Fair Machine Learning Dataset (short version)

While ProPublica’s COMPAS data are used in an increasing number of studies, researchers have generally taken the datasets created by ProPublica as they are and do not appear to have scrutinized them for data processing issues. Instead of testing a novel fairness definition or procedure, I take a closer look at the actual datasets put … Read moreThe Data Processing Error in the Most Prominent Fair Machine Learning Dataset (short version)

Firebase Unity Solutions: Update game behavior without deploying with Remote ConfigFirebase Unity Solutions: Update game behavior without deploying with Remote Config

Last June we announced Firebase Unity Solutions, an open-source Github repository with sample projects and scripts to help you add cloud-based features to your games being built on Unity. Our debut project, Firebase_Leaderboard, utilized Firebase Realtime Database to create and manage a cross-platform high score leaderboard. Today, we’re introducing the second solution incorporating Firebase services … Read moreFirebase Unity Solutions: Update game behavior without deploying with Remote ConfigFirebase Unity Solutions: Update game behavior without deploying with Remote Config

Working as a Data Scientist in Cybersport

Data science for Dota team Photo by Fredrick Tendong on Unsplash In football, there have long been teams that use big data analysis for their transfers and analysis of their players’ games. There is even a football club that for years bought players for itself only by looking at its statistics and what data scientists … Read moreWorking as a Data Scientist in Cybersport

‘mRpostman’ – IMAP Tools for R in a Tidy Way

mRpostman is an R package to help you to easy connect to your IMAP (Internet Message Access Protocol) server and execute commands, such as listing mailboxes, searching and fetching messages in a tidy way. It calls ‘curl’ in background when issuing the IMAP commands (all credit to Jeroen Ooms and Daniel Stenberg). So far, I … Read more‘mRpostman’ – IMAP Tools for R in a Tidy Way

Amazon ElastiCache announces online vertical scaling for Redis Cluster mode and improves scaling non-Redis Cluster mode

The Redis Cluster mode in Amazon ElastiCache provides superior availability and scalability, supporting up to 170 TB of in-memory capacity. With this release, you can resize your sharded Redis Cluster both vertically (by scaling up or down) and horizontally (previously available, by scaling in and out to add or remove shards) while keeping the cluster … Read moreAmazon ElastiCache announces online vertical scaling for Redis Cluster mode and improves scaling non-Redis Cluster mode

Very Short Introduction to Data Science Terminology

Photo by Franki Chamaki on Unsplash An attribute is a property of an object etc. Attribute is also known as variable or feature. A collection of attributes describes an object. Object is also known as sample, entity, or instance. Data can often be represented or abstracted as an n×d data matrix. “n” rows correspond to … Read moreVery Short Introduction to Data Science Terminology

The geometric interpretation of 3D lines and planes

Linear Algebra is that branch of calculus whose objects live beyond ℝ. Those objects might be coordinates in the spaces (hence points) or combinations of points in the form of multivariate equations. Every time we work with more than 3 dimensions, it is not physically possible to visualize our objects. Hence, in this article I’m … Read moreThe geometric interpretation of 3D lines and planes

Six Challenges Every Data Scientist Will Face and How to Overcome Them

Supplements for thought? Image made by author. The age of information has bestowed upon humanity one of the biggest explosions of tech-focused jobs ever. While an abundance of the power behind big successful companies such as Uber, Facebook, AirBnB, and Amazon is their ingenuity and convenience for consumers, their success can also be attributed to … Read moreSix Challenges Every Data Scientist Will Face and How to Overcome Them

Five ways ML helps broadcasters achieve new efficiencies and reinvent CX

Source: unsplash.com With broadcasting behemoths like Netflix and Hulu dominating the market, winning eyeballs and making the audience stay tuned to your video content is no walk in the park. But not impossible, either. AI and ML development experts recommend creating a coherent marketing strategy and using a winning combination of video and image analysis … Read moreFive ways ML helps broadcasters achieve new efficiencies and reinvent CX

Quick Hit: A new 64-bit Swift 5 RSwitch App

At the bottom of the R for macOS Developer’s Page there’s mention of an “other binary” called “RSwitch” that is “a small GUI that allows you to switch between R versions quickly (if you have multiple versions of R framework installed).” Said switching requires you to use the “tar.gz” versions of R from the R … Read moreQuick Hit: A new 64-bit Swift 5 RSwitch App

So baut man Analytikplattformen – Teil 2: Intelligentes Benutzer- und Rollenkonzept

What does a modern analytics platform need to offer companies real added value? Why is the administration of user and role rights a factor not to be underestimated when using analytics platforms? In the previous article, we showed how important an intuitive user interface and an open user group concept are for the company-wide use … Read moreSo baut man Analytikplattformen – Teil 2: Intelligentes Benutzer- und Rollenkonzept

LineFlow: Simple NLP Dataset Handler for PyTorch or Any Framework

Smaller Code, Less Pain For an NLP task, you might need to tokenize text or build the vocabulary in the pre-processing. And you probably have experienced that the pre-processing code is as messy as your desk. Forgive me if your desk is clean 🙂 I have such experience too. That’s why I create LineFlow to … Read moreLineFlow: Simple NLP Dataset Handler for PyTorch or Any Framework

A High-Level Guide to Autoencoders

An autoencoder toolbox from most basic to most fancy. In the wonderful world of machine learning and artificial intelligence, there exists this structure called an autoencoder. Autoencoders are a type neural network which is part of unsupervised learning (or, to some, semi-unsupervised learning). There are many different types of autoencoders used for many purposes, some … Read moreA High-Level Guide to Autoencoders

Sensing the Air Quality

A low-cost IoT Air Quality Monitor based on RaspberryPi 4 Santiago, Chile during a winter environmental emergency I have the privilege of living in one of the most beautiful countries in the world, but unfortunately, not “all are flowers”. Chile during winter season suffers a lot with air contamination, mainly due to particulate materials as … Read moreSensing the Air Quality

Support for Windows Shadow Copies is Now Extended to All Amazon FSx File Systems

Amazon FSx for Windows File Server, a service that provides fully managed native Microsoft Windows file systems, has made Windows shadow copies available to all file systems. Since launch of this feature on July 31, 2019, it was available only on newly-created Amazon FSx file systems. Now, customers can use shadow copies on any existing … Read moreSupport for Windows Shadow Copies is Now Extended to All Amazon FSx File Systems

The case against the jupyter notebook

Joel Grus on the TDS podcast Editor’s note: This is the first episode of the Towards Data Science podcast’s “Climbing the Data Science Ladder” series, hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. You can listen to the podcast below: To most data scientists, … Read moreThe case against the jupyter notebook

Helping a Reader with Python Web Scraping Refactored.

Bhargava Reddy Morampalli, a microbiologist from India, read my first post on web scraping from my old blog. If you didn’t get a chance to check out that post you can read it here. Python Web Scraping Refactored My first article on my old blog was on a web scraping example. Web scraping is one … Read moreHelping a Reader with Python Web Scraping Refactored.

View from the Top: 3 Takeaways from the Chief Data Officer Symposium

What the World’s Most Innovative CDOs are Doing Today Photo by Skye Studios on Unsplash Earlier this month, I had the pleasure of attending the Chief Data Officer Symposium at MIT for the first time. More than 60 CDOs were there, hailing from across the United States, Canada, Germany, Netherlands, and more. There was representation … Read moreView from the Top: 3 Takeaways from the Chief Data Officer Symposium

How to do Topic Extraction from Customer Reviews in R

Topic Extraction is an integral part of IE (Information Extraction) from Corpus of Text to understand what are all the key things the corpus is talking about. While this can be achieved naively using unigrams and bigrams, a more intelligent way of doing it with an algorithm called RAKE is what we’re going to see … Read moreHow to do Topic Extraction from Customer Reviews in R

AI Powered Search for Extra-terrestrial Intelligence — Signal Classification with Deep Learning

AI FOR SOCIAL GOOD SERIES — PART 2.2 Classifying Radio-Telescope Signals from SETI with Deep Learning Welcome (or welcome back!) to the AI for social good series! In the second part, of this two-part series of articles, we will look at how Artificial intelligence (AI) coupled with the power of open-source tools and techniques like … Read moreAI Powered Search for Extra-terrestrial Intelligence — Signal Classification with Deep Learning

Learn faster with smarter data labeling

So, besides what transfer learning offers, can we further reduce the amount of labeling work? Actually, the answer is yes, and there are a couple of techniques that exist. One of the most well studied is active learning. The principle is straightforward: only label what is useful for your current model. Formally, the active learning … Read moreLearn faster with smarter data labeling

DR for cloud: Architecting Microsoft SQL Server with GCPDR for cloud: Architecting Microsoft SQL Server with GCPSolutions Architect, Google Cloud

Database disaster recovery (DR) planning is an important component of a bigger DR plan, and for enterprises using Microsoft SQL Server on Compute Engine, it often involves critical data. When you’re architecting a disaster recovery solution with Microsoft SQL Server running on Google Cloud Platform (GCP), you have some decisions to make to build an … Read moreDR for cloud: Architecting Microsoft SQL Server with GCPDR for cloud: Architecting Microsoft SQL Server with GCPSolutions Architect, Google Cloud

Data Scientists, The five Graph Algorithms that you should know

A graph with 3 connected components We all know how clustering works? You can think of Connected Components in very layman’s terms as a sort of a hard clustering algorithm which finds clusters/islands in related/connected data. As a concrete example: Say you have data about roads joining any two cities in the world. And you … Read moreData Scientists, The five Graph Algorithms that you should know

How I Got Started With Kaggle Competitions (It’s Not That Hard)

Most people in the data science community know Kaggle as a place to learn and grow your skills. One popular way for practitioners to improve is to compete in prediction challenges. For newcomers, it can be overwhelming to jump in and compete on the site in an actual challenge. At least, that’s how I always … Read moreHow I Got Started With Kaggle Competitions (It’s Not That Hard)

Bayesian Basketball : were the Toronto Raptors really the best team during NBA 2019 season ?

Let’s go back in time and see if we can end up with a different winner for the NBA 2019 title. How ?By using Bayesian simulations. credit : NYTimes [This article was inspired by the work of Baio and Blangiardo (2010), Daniel Weitzenfeld’s great blog article, and Peadar Coyle’s tutorial on Hierarchical models.] Bayesian simulation … Read moreBayesian Basketball : were the Toronto Raptors really the best team during NBA 2019 season ?

Rank the Features, now rank again

Machine Learning and Biomarkers 40 ways to rank your features and my experience selecting biomarker candidates (features) using few observations Here I discuss methods to rank features in 40 ways and a difficult case with an unstable model, caused by the data having many more variables than samples. As a bonus, you will know a … Read moreRank the Features, now rank again

How Data Science can help solve Climate Change

Data-driven solutions will lead the Transition to Clean Energy Photo by Bogdan Pasca on Unsplash Climate Change is real. And even though many scientists agree on the fact that we are already too late, people are just becoming conscious about this problem. And with the people comes politics, and with politics comes the money. That’s … Read moreHow Data Science can help solve Climate Change

A Text Analytics Primer: Key Factors in a Text Analytics Strategy

A Venn diagram of the subfields of text analytics and how they relate (Miner, 2012) Introduction About 90% of all the data in the world we have created in the last 24 months — averaging 2.5 quintillion bytes per day — and about 90% of that is unstructured data, which is things like texts, Tweets, … Read moreA Text Analytics Primer: Key Factors in a Text Analytics Strategy

EARL London – speaker interview, Johannes Tang Kristensen

We sent Johannes Tang Kristensen from Arla Foods a few questions about his upcoming talk at EARL London – ‘How much milk do our cows produce? Lessons learned from putting our first R model into production’. How did the need for your project come about? The project started out as part of a larger initiative … Read moreEARL London – speaker interview, Johannes Tang Kristensen

Which Factors Influence Gas Prices? Do Gas Companies Narratives Hold True?

Like the data hunter he is, my STATWORX colleague Jakob came across a rich data source regarding gas station prices. While his focus has been on checking very common myths about gas prices (check out his blogpost!), he did a fantastic job at cleaning and preparing the raw data to get it in a usable … Read moreWhich Factors Influence Gas Prices? Do Gas Companies Narratives Hold True?

Maximum Likelihood Estimation Explained – Normal Distribution

Wikipedia defines Maximum Likelihood Estimation (MLE) as follows: “A method of estimating the parameters of a distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.” To get a handle on this definition, let’s look at a simple example. Let’s say we have some data and … Read moreMaximum Likelihood Estimation Explained – Normal Distribution

Why R? 2019 – Agenda Released + Regular Registration Ends Aug 31st!

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A month ago we closed Call for Papers for Why R? 2019 Conference. … Read moreWhy R? 2019 – Agenda Released + Regular Registration Ends Aug 31st!

Visualizing Soccer with StatsBomb Data and R, Part 1: Simple xG and Pass Partner Plots!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This will be Part 1 of what I hope to be … Read moreVisualizing Soccer with StatsBomb Data and R, Part 1: Simple xG and Pass Partner Plots!

Reduce Dimensions for Single Cell

Compare dimension reductions for single cell genomics From Becht et al., Nature Biotechnology 2019, image source This is the eighth article in the column Mathematical Statistics and Machine Learning for Life Sciences where I try to cover analytical techniques common for Bioinformatics, Biomedicine, Genetics, Evolutionary Science etc. Today we are going to talk about dimension … Read moreReduce Dimensions for Single Cell

One-tailed or two-tailed test, that is the question

Source: pixabay Learn the difference between two variants of statistical tests and how to implement them in Python In data science/econometrics we see statistical tests in many places: correlation analysis, ANOVA, A/B testing, linear regression results, etc. Therefore, for the practitioners, it is very important to thoroughly understand their meaning and know why a given … Read moreOne-tailed or two-tailed test, that is the question

Reinforcement Learning (DDPG and TD3) for News Recommendation

In the next section, we will try to compare and, primarily, evaluate different reinforcement learning algorithms. But how do we tell if the results are good or not? The critic network assigns the values to our actions; however, are you sure whether the value is meaningful. Well, they are based on critic loss. If critic … Read moreReinforcement Learning (DDPG and TD3) for News Recommendation

AI is transforming politics — for both good and bad

BIG TECH, BIG DATA, BIG MONEY Big Data powering Big Money, the return of direct democracy, and the tyranny of the minority Source: Pixabay Nowadays, artificial intelligence (AI) is one of the most widely discussed phenomena. AI is poised to fundamentally alter almost every dimension of human life — from healthcare and social interactions to … Read moreAI is transforming politics — for both good and bad

Advanced Topics in Neural Networks

As you have likely come to realize from your own adventures with neural networks, and possibly from other articles and research literature, the learning rate is a very important part of neural network training. The learning rate essentially determines how ‘fast’ the network will learn, it determines the step size of the movement. A higher … Read moreAdvanced Topics in Neural Networks

Amazon EMR announces support for runtime installation of external libraries with EMR Notebooks

You can now install external Python libraries on EMR clusters at runtime using EMR Notebooks. Before this feature, you had to use a bootstrap action or use a custom AMI to install additional libraries not packaged with the AMI before you launched the EMR cluster. This feature allows you to import your preferred libraries and … Read moreAmazon EMR announces support for runtime installation of external libraries with EMR Notebooks

Perceptron Algorithms for Linear Classification

Learn how the perceptron algorithms work and the intuition behind them. The basic perceptron algorithm was first introduced by Ref 1 in the late 1950s. It is a binary linear classifier for supervised learning. The idea behind the binary linear classifier can be described as follows. where x is the feature vector, θ is the … Read morePerceptron Algorithms for Linear Classification