Dichotomy Of A Data Scientist

When I started working in a startup, my behaviour changed completely. My CEO would ask us how can we do this cool thing called x and I would request him that let’s focus on this less cool thing called y because that will be low risk and highly rewarding. As a recently joined data scientist … Read more

A Closer Look At Dataset Columns

A look at a dataset’s columns and other related key terms Researchers and data scientists work with datasets. Datasets are the raw material. When we apply analytical techniques to this raw material we produce summaries, tabulations, estimates, and other output. I’ve previously explained that a dataset consists of rows and columns. In the course of … Read more

Juno Makes Writing Julia Awesome

(http://julialang.org/) Julia is a high-level, multi-paradigm statistical language. Julia is also open-source, and hosts a plethora of advantages over other statistical languages like R. Though Julia is for the most part functional, like Lisp it can very much be used for generalized purposes. Julia is a relatively quick language for having such high-level syntax. Speeds … Read more

Real-Time Sentiment Analytics and Visualization via ElectionTweetBoard

This post will explore various forms of and considerations in data analytics, visualization, as well as Machine Learning by delving deeper into a product that I have been developing recently called ElectionTweetBoard (https://www.electiontweetboard.com/). The “Why” Behind ElectionTweetBoard Every Data Science or AI-related project needs a powerful driving force behind it. I developed ElectionTweetBoard through a … Read more

rstudio::conf 2020 Slides on Futures

[This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Design: Dan LaBar I presented Future: Simple Async, Parallel & Distributed … Read more

Categories R Tags ExcerptFavorite

All Names are not Treated Equally

Exploring How Bias Encoded in Word Embeddings Affects Resume Recommendation As machine learning becomes increasingly popular, more and more tasks are being automated. While this can have many positive benefits, like freeing up humans’ time to do more challenging tasks, there can be unintended consequences. Any machine algorithm is trained using data humans created that … Read more

Primitive Functions List

[This article was first published on Random R Ramblings, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Ever wondered which R functions are actually passed to internal C … Read more

Categories R Tags ExcerptFavorite

Get and Set List Elements with magrittr

[This article was first published on Random R Ramblings, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Did you know that the magrittr pipe, %>%, can be used … Read more

Categories R Tags ExcerptFavorite

A guide to encoding categorical features using R

In this article, we will look at various options for encoding categorical features. We will also present R code for each of the encoding techniques. Categorical feature encoding is an important data processing step required for using these features in many statistical modelling and machine learning algorithms. The material in the article is heavily borrowed … Read more

Categories R Tags ExcerptFavorite

Transform Reality with Pandas

Re-shape data with Transpose, Melt, Merge, and more. Photo by Nick Wood on Unsplash Pure python is a beautifully clear language. But pandas really dumbs it down. Simplifying the python scripting language makes it easier to do even more complex feats of programming. DataFrame operations make math, science, exploration, art, and magic simple and intuitive. … Read more

Herding Model in Financial Markets

This article is devoted to the herding mechanism in stocks. I base it on the articles:1) Estimation of Agent-Based Models: the case of an Asymmetric Herding Model, Alfarano et. al. (2005) 2) Ants, Rationality, and Recruitment, Kirman (1993) In the first part, I present the theoretical model as outlined in the articles. Next, I apply … Read more

Sending Automated Emails from Blackbaud CRM (BBEC) Using Python (Part 1)

Basic Setup for Connecting to BBEC My new job uses Blackbaud CRM as the backbone of their data infrastructure, and my experience is mostly in Python. To my surprise, there’s not a ton of resources out there about connecting to a Blackbaud Enterprise CRM (BBEC) using Python. Recently, I needed to pull a list of … Read more

Industry 4.0: The Fourth Industrial Revolution is Now

Industry 4.0 is revolutionising the way we live, work, interact with our environment and with each other to create a better world for all Source: Shutterstock Much of what would be considered science fiction two decades ago is driving the world toward the Fourth Industrial Revolution. Industry 4.0 will completely change the landscape of business, … Read more

Meena: Google’s New Chatbot

A more human-like and versatile chatbot Google recently published a paper on its new chatbot Meena. Google has hit all the right chords in terms of its design and approach. While the underlying techniques are not entirely new, but it seems to be the right direction in terms of building chatbots which are truly versatile … Read more

Using Stringr and Regex to Extract Features from Textual, Alphanumeric and Punctuation Data in R

A Feature Extraction Tutorial Using the Titanic Dataset Large data sets are often awash with data that is difficult to decipher. One may think of textual data, names, unique identifiers and other sorts of codes. Frequently, people analyzing these datasets are quick to discard these variables. However, sometimes there might be valuable information in this … Read more

Tradable Patterns Hiding in Plain Sight

There are longer-term tradable patterns in the stock market that you don’t need to be a professional trader or statistics PhD to figure out. I’ve spent the last 14 years in sales and business development for investment research firms and capital market data vendors. I am now on a journey to pivot my career from … Read more

Topic Modeling with Latent Dirichlet Allocation By Example

A practical tutorial in modeling tweets topics source: image generated by Jeff Clark’s online Tweet Topics Explorer In this tutorial, we are going to apply Latent Dirichlet Allocation (LDA) to a set of tweets and split them into topics. Let’s get started! We will be working with tweets from the @realDonaldTrump Twitter account. The dataset … Read more

Economic Indicators with Python

Retrieve and Plot Economic Indicators using Python and Plotly Economic indicators are used often by economists and financial analysts to predict the cycle of the economy. This analysis is very important before making investment decisions. In this article, we are going to automate the extraction of Economic indicator data with Python. All that we will … Read more

Deploying BERT using Kubernetes

Google Cloud Platform, Docker, NLP, Micro-services BERT: image credit Introduction BERT is a widely used NLP model at Google. 2019 is considered to be the year of BERT, having replaced majority production systems. BERT uses self-attention based approaches, the model itself consist of transformer blocks, which are series of self-attention layers. BERT demonstrates use of … Read more

Monsters

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Ooh, see the fire is sweepin’Our very street todayBurns like a … Read more

Categories R Tags ExcerptFavorite

Creating a basic pie chart using Matplotlib

After obtaining the dataset, the next step would be to import the data visualization package for analysis. Importing the Data Visualization package The data visualization package that was imported to build pie charts is called Matplotlib and it was imported using the sets of codes below: ## Import data visualization packagesimport matplotlib.pyplot as plt%matplotlib inline … Read more

Microsoft Excel in the era of big data

We never learned this at school… Image: Microsoft Icons/Tendril/Behance It’s a tool we all already know and use in our everyday life. To build reports, create charts or even to schedule projects, we use Microsoft Excel for anything. But we never really learned to use it. While it is not a major concern for most … Read more

How I Outsmarted a FiveThirtyEight Forecasting Algorithm

This year, Nate Silver’s FiveThirtyEight challenged readers to predict NFL game results better than its forecasting algorithm. The result? Less than 2% of the 20,352 readers who participated bested FiveThirtyEight’s Elo algorithm. I was one of them. Using a combination of data sources and 30 different projection models, I ranked 190th out of 20,352 participants. … Read more

The significance of the region on the salary in Sweden, a comparison between different occupational groups

In my last post, I found that the region has a significant impact on the salary of engineers. Is the significance of the region unique to engineers or are there similar correlations in other occupational groups? Statistics Sweden use NUTS (Nomenclature des Unités Territoriales Statistiques), which is the EU’s hierarchical regional division, to specify the … Read more

Categories R Tags ExcerptFavorite