The finalfit tables gallery has all the variations you could possibly want

The new finalfit tables gallery vignette is an excellent reference and quick tutorial describing the variety of table outputs available. It focuses on crosstables and regression tables, and demonstrates how to easily generate results in R and export them to Word, PDF or html. Related To leave a comment for the author, please follow … Read more

Categories R Tags ExcerptFavorite

Advancing Open Domain Dialog Systems Through Alexa Prize

Building an Open-Domain Dialogue system is one of the most challenging tasks. Almost all of the tasks related to Open-Domain Dialogue system are believed to be “AI-complete”. In other words, solving the problems of Open-Domain Dialogue systems would need “true intelligence” or “human intelligence”. Open-Domain Dialogue systems require the understanding of Natural Languages. The absence … Read more

Learning Data Science: Modelling Basics

Data Science is all about building good models, so let us start by building a very simple model: we want to predict monthly income from age (in a later post we will see that age is indeed a good predictor for income). For illustrative purposes we just make up some numbers for age and income, … Read more

Categories R Tags ExcerptFavorite

Zooming In and Zooming Out

A Note on Qualitative Sample Sizes Zooming in or zooming out? How close a picture you need will impact your sample size. (Photos from @ryoji__iwata and @13on via This article covers: Why small sample sizes are acceptable in qualitative research What the overall goals of qualitative research are How to determine sample size, and what … Read more

Understanding Encoder-Decoder Sequence to Sequence Model

In this article, I will try to give a short and concise explanation of the sequence to sequence model which have recently achieved significant results on pretty complex tasks like machine translation, video captioning, question answering etc. Prerequisites: the reader should already be familiar with neural networks and, in particular, recurrent neural networks (RNNs). In … Read more

What is a Recurrent NNs and Gated Recurrent Unit (GRUS)

Photo by Tom Grimbert on Unsplash Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many sequential data and among others used by Apples Siri and Googles Voice Search. Their great advantage is that the algorithm remembers its input, due to an internal memory. But despite their recent popularity there exists a … Read more

Using the uniform sum distribution to introduce probability

I’ve never taught an intro probability/statistics course. If I ever did, I would certainly want to bring the underlying wonder of the subject to life. I’ve always found it almost magical the way mathematical formulation can be mirrored by computer simulation, the way proof can be guided by observed data generation processes, and the way … Read more

Categories R Tags ExcerptFavorite

Spectral clustering

The intuition and math behind how it works! What is clustering? Clustering is a widely used unsupervised learning method. The grouping is such that points in a cluster are similar to each other, and less similar to points in other clusters. Thus, it is up to the algorithm to find patterns in the data and group … Read more

Machine Learning for Anyone who Took Math in 8th Grade

Explaining modern “AI” with easy math, pop-culture references, and oversimplified analogies I usually see Artificial Intelligence explained in 1 of 2 ways: through the increasingly sensationalist perspective of the media, or through dense scientific literature riddled with superfluous language and field-specific terms. Source — a classic There’s a less publicized area in between these extremes where I think … Read more

NLP Kaggle Competition

Class Imbalance As we saw above, we have a class imbalance problem. Imbalanced classes are a common problem in machine learning classification where there are a disproportionate ratio of observations in each class. (In this post I explore methods for dealing with class imbalance.) With just 6.6% of our dataset belonging to the target class, … Read more

Predicting the Frequency of Asteroid Impacts with a Poisson Processes

Simulating Asterid Impacts Our objective is to determine the probability distribution of the number of expected impacts in each size category which means we need a time range. To keep things in perspective, we’ll start with 100 years, about the lifespan of a human. This means our distribution will show the probabities for number of impacts … Read more

Semantic Segmentation of Aerial images Using Deep Learning

What is Semantic Segmentation?? What are its Practical Applications?? Semantic segmentation of drone images to classify different attributes is quite a challenging job as the variations are very large, you can’t expect the places to be same. And doing manual segmentation of this images to use it in different application is a challenge and a … Read more

Brandeis and Hugo discuss people of color and under-represented groups in data science.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Brandeis Marshall, Associate Professor of Computer Science in the Computer and Information Sciences Department at Spelman College. Here is the podcast link. Hugo: Hi there, Brandeis, and welcome to DataFramed. Brandeis: Well, thank you. Wonderful to be here. Hugo: It’s such a pleasure to … Read more

Categories R Tags ExcerptFavorite

Organizing R Research Projects: CPAT, A Case Study

Months ago, I asked a question to the community: how should I organize my R research projects? After writing that post, doing some reading, then putting a plan in practice, I now have my own answer. First, some background. In the early months of 2016 I began a research project with my current Ph.D. advisor … Read more

Categories R Tags ExcerptFavorite

February Edition: Data Visualization

8 of the best articles on visualizing data Data visualization is an essential step in any data science process. It’s the final bridge between the data scientist and end users. It communicates, validates, confronts and educates. And when done correctly, it opens up the insights from a data science project to a wider audience. Great … Read more

Building a Better Profanity Detection Library with scikit-learn

Why existing libraries are uninspiring and how I built a better one. A few months ago, I needed a way to detect profanity in user-submitted text strings: This shouldn’t be that hard, right? I ended up building and releasing my own library for this purpose called profanity-check: Of course, before I did that, I looked in the … Read more

ML Algorithms: One SD (σ)- Regression

An intro to machine learning regression algorithms The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The … Read more

An overview of the NLP ecosystem in R (#nlproc #textasdata)

At BNOSAC, R is used a lot to perform text analytics as it is an excellent tool that provides anything a data scientist needs to perform data analysis on text in a business settings. For users unfamiliar with all the possibilities that the wealth of R packages offers regarding text analytics, we’ve made this small … Read more

Categories R Tags ExcerptFavorite

A Guide to Data Visualisation in R for Beginners

Visualisation libraries in R R comes equipped with sophisticated visualisation libraries having great capabilities. Let us have a closer look at some of the commonly used ones. In this section, we will use the built-in mtcars dataset to show the uses of the various libraries. This dataset has been extracted from the 1974 Motor Trend US … Read more

Blender 2.8 Grease Pencil Scripting and Generative Art

5agadoBlockedUnblockFollowFollowing Feb 4 Quick, Draw! — Flock — Conway’s Game of Life What: learning the basics of scripting for Blender Grease-Pencil tool, with focus on generative art as a concrete playground. Less talking, more code (commented) and many examples. Why: mostly because we can. Also because Blender is a very rich ecosystem, and Grease-Pencil in version 2.8 is a powerful … Read more

Dashboard for Sales Trends in Retail

Overview Retail is probably the most talked about industry when it comes to disruption these days. Empty malls are a common blog topic and unusually high number of bankruptcies span across all subsectors. Some of the familiar names that filed for bankruptcy in the last few years span from well know Sears, ToysRUs, Limited Brands to … Read more

Categories R Tags ExcerptFavorite

WooCommerce Image Gallery | Step by Step, Automate with R

Setting up a WooCommerce image gallery for your shop is a grueling process if you use the online forms. Thankfully, you can import goods and setup an image gallery using a simple CSV file. Now, if you have a few products and a few images for each product, preparing the CSV for bulk import using … Read more

Categories R Tags ExcerptFavorite

Sobol Sequence vs. Uniform Random in Hyper-Parameter Optimization

Tuning hyper-parameters might be the most tedious yet crucial in various machine learning algorithms, such as neural networks, svm, or boosting. The configuration of hyper-parameters not only impacts the computational efficiency of a learning algorithm but also determines its prediction accuracy. Thus far, manual tuning and grid searching are still the most prevailing strategies. In … Read more

Categories R Tags ExcerptFavorite

The Face of (Dis)Agreement – Intraclass Correlations

I was recently introduced to Google Dataset Search, an extension that searches for open access datasets. There I stumbled upon this dataset on childrens’ and adult’s ratings of facial expressions. The data comes from a published article by Vesker et al. (2018). Briefly, this study involved having adults and 9-year-old children rate a series of … Read more

Categories R Tags ExcerptFavorite

Is the #10YearChallenge A Sign of the AI Apocalypse?

Viral social media “challenges,” memes, and gimmicks have taken over our feeds in recent years. The term “challenge” is used loosely though since these viral sensations aren’t so much challenging as they are just unique ways to spice up your social media presence. But are they also signs of the impending AI apocalypse? Let’s look … Read more

How It Feels to Learn Data Science in 2019

Seeing the (Random) Forest Through the (Decision) Trees The following is inspired by the article How it Feels to Learn JavaScript in 2016. Do not take this too seriously. This piece is just an opinion, much like people’s definition of data science. I heard you are the one to go to. Thank you for meeting … Read more

Maximum Likelihood Estimation

Coin Flip MLE Let’s derive the MLE estimator for our coin flip model from before. I’ll cover the MLE estimator for our linear model in a later post on linear regression. Recall that we’re modeling the outcome of a coin flip by a Bernoulli distribution, where the parameter p represents the probability of getting a heads. … Read more

Review: DRRN — Deep Recursive Residual Network (Super Resolution)

Up to 52 Convolutional Layers, With Global and Local Residual Learnings, Outperforms SRCNN, FSRCNN, ESPCN, VDSR, DRCN, and RED-Net. Digital Image Enlargement, The Need of Super Resolution In this story, DRRN (Deep Recursive Residual Network) is reviewed. With Global Residual Learning (GRL) and Multi-path mode Local Residual Learning (LRL), plus the recursive learning to control the … Read more

Python Basics: Mutable vs Immutable Objects

Source: After reading this blog post you’ll know: What are an object’s identity, type, and value What are mutable and immutable objects Introduction (Objects, Values, and Types) All the data in a Python code is represented by objects or by relations between objects. Every object has an identity, a type, and a value. Identity An … Read more

Tweets Data Visualization with Circles and User Interaction

Adding Interactivity: Tweet Info by Click After plotting and packing all the circles, we can make each circle to work like a button. To achieve this, we can include help from the function fig.canvas.mpl_connect. The function can take two arguments, the first one is a string that corresponds to the type of interaction (in our case … Read more

A little trick for debugging Shiny

This is gonna be a short post about a little trick I’ve been using while developing Shiny Apps. (Spoiler: nothing revolutionary) A browser anywhere, anytime The first thing to do is to insert an action button, and a browser() in the observeEvent() watching this button. This is a standard approach: at any time, you just … Read more

Categories R Tags ExcerptFavorite

Send UDP Probes (with payloads) and Receive/Process Responses in R

We worked pretty hard over at $DAYJOB on helping to quantify and remediate a fairly significant configuration weakness in Ubiquiti network work gear attached to the internet. Ubiquiti network gear — routers, switches, wireless access points, etc. — are enterprise grade components and are a joy to work with. Our home network is liberally populated … Read more

Categories R Tags ExcerptFavorite

Function Objects and Pipelines in R

Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of … Read more

Categories R Tags ExcerptFavorite

Retail Data Visualization with R and Shiny

Introduction Because of my marketing background, finding information hiding wihtin a marketing dataset is always an interesting topic to me. It makes me feel a sense of accomplishment when I cleaned up a very messy large dataset, and finally discover some insights from it. Therefore, I’ve decided to practice my skills of data cleaning and … Read more

Categories R Tags ExcerptFavorite

Understanding Studies of Racial Demarcations

Studies of racial demarcations typically are implemented in context of what are referred to as regression analyses. Simply put, a regression enables assessments of relations between some variable of interest, say students’ test scores, and variables that define said students, such as race, family income, parents’ professions, parents’ education etc. Pictorially, with x’s denoting variables … Read more

Learning aggregate functions

Machine Learning with relational data This article is inspired by the Kaggle competition . While I did not participate in the competition, I used the data to explore another problem that often arises working with realistic data. All machine learning algorithms work great with the tabular data, but in reality a lot of data … Read more

These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of…

Augmentation operations for NLP proposed in [this paper]. SR=synonym replacement, RI=random insertion, RS=random swap, RD=random deletion. The Github repository for these techniques can be found [here]. Data augmentation is commonly used in computer vision. In vision, you can almost certainly flip, rotate, or mirror an image without risk of changing the original label. However, in natural … Read more

Building Our Own Open Source Supercomputer with R and AWS

How to build a scaleable computing cluster on AWS and run hundreds orthousands of models in a short amount of time. We will completely rely on R andopen source R packages. This is post 1 out of 2. Introduction An ever-increasing number of businesses is moving to the cloud and usingplatforms such as Amazon Web … Read more

Categories R Tags ExcerptFavorite

Transfer Learning using ELMO Embedding

Last year, the major developments in “Natural Language Processing” were about Transfer Learning. Basically, Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task. Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of … Read more

Model-Free Prediction: Reinforcement Learning

Part 4: Model-Free Predictions with Monte-Carlo Learning, Temporal-Difference Learning and TD( λ) Previously, we looked at planning by dynamic programming to solve a known MDP. In this post, we will use model-free prediction to estimate the value function of an unknown MDP. i.e We will look at policy evaluation of an unknown MDP. This series of … Read more

Matplotlib Tutorial: Learn basics of Python’s powerful Plotting library

What is Matplotlib To make necessary statistical inferences, it becomes necessary to visualize your data and Matplotlib is one such solution for the Python users. It is a very powerful plotting library useful for those working with Python and NumPy. The most used module of Matplotib is Pyplot which provides an interface like MATLAB but … Read more

Introduction to TWO approaches of Content-based Recommendation System

A complete guide to resolve the confusion Content-based filtering is one of the common methods in building recommendation systems. While I tried to do some research in understanding the detail, it is interesting to see that there are 2 approaches that claim to be “Content-based”. Below I will share my findings and hope it can … Read more

R Package Update: urlscan

The urlscan package (an interface to the API) is now at version 0.2.0 and supports’s authentication requirement when submitting a link for analysis. The service is handy if you want to learn about the details — all the gory technical details — for a website. For instance, say you wanted to check on … Read more

Categories R Tags ExcerptFavorite

Synthesising Multiple Linked Data Sets and Sequences in R

In my last post I looked at generating synthetic data sets with the ‘synthpop’ package, some of the challenges and neat things the package can do. It is simple to use which is great when you have a single data set with independent features. This post will build on the last post by tackling other … Read more

Categories R Tags ExcerptFavorite