Advent of Code: Most Popular Languages

You might have heard of the Advent of Code,a 25-day challenge involving a programming puzzle a day, to be solvedwith the language of your choice. I’ve noted the popularity of thisactivity in my Twitter timeline but also in my GitHub timeline whereI’ve seen the creation of a few advent-of-code or so repositories. AoC is largely … Read moreAdvent of Code: Most Popular Languages

Learning R: A gentle introduction to higher-order functions

Have you ever thought about why the definition of a function in R is different from many other programming languages? The part that causes the biggest difficulties (especially for beginners of R) is that you state the name of the function at the beginning and use the assignment operator – as if functions were like … Read moreLearning R: A gentle introduction to higher-order functions

Because it’s Friday: CGI you never knew was CGI

Computer-generated imagery in movies has gotten so good these days, much of the time you don’t even realize it’s there. You probably never noticed how Michael Cera’s physique had been altered, or how Lost in Translation used motion capture technology from the future. [embedded content] That’s all from the blog team for this week. Have … Read moreBecause it’s Friday: CGI you never knew was CGI

Anime Recommendation engine: From Matrix Factorization to Learning-to-rank

Anime Obsession gone too far!! OtakusHenry Chang, Joey Chen, Guanhua Zhang, Preetika Srivastava and Cherry Agarwal The vast amount of data that is hosted on the internet today has led to the information overflow and thus there is a constant need to improve the user experience. A recommendation engine is a system that helps support … Read moreAnime Recommendation engine: From Matrix Factorization to Learning-to-rank

Introduction to Web Scraping with BeautifulSoup

Find specific elements in the page The created BeautifulSoup object can now be used to find elements in the HTML. When we inspected the website we saw that every list item in the content section has a class that starts with tocsection- and we can us BeautifulSoup’s find_all method to find all list items with that … Read moreIntroduction to Web Scraping with BeautifulSoup

Home Remodeling Analysis Turned Excel Data Handling in Python

Why cleaning data is the most important step Original Project Mission: Find interesting insights to see where the remodeling market is headed Project Mission (Twist): How to handle well manicured excel data in Python (spoiler: neat is a deceptive word) Timeline: One week (I tell you, it’s not enough!) Project Findings for the Original Goal : … Read moreHome Remodeling Analysis Turned Excel Data Handling in Python

The complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python

A recurring subject in NLP is to understand large corpus of texts through topic extraction. Whether you analyze users’ online reviews, product descriptions, or text entered in search bars, understanding key topics will always come in handy. Popular picture explaining LDA Before going into the LDA method, let me remind you that not reinventing the … Read moreThe complete guide for topics extraction with LDA (Latent Dirichlet Allocation) in Python

Exploratory Data Analysis, Feature Engineering and Modelling using Supermarket Sales Data. Part 1.

The first thing I like to do when doing EDA on a dataset with a reasonable amount of numeric columns, is to check the relationship between my target variable and these numeric features. One quick way to do this is to use the seaborn heatmap plot. This seaborn heatmap takes the correlation matrix calculated on … Read moreExploratory Data Analysis, Feature Engineering and Modelling using Supermarket Sales Data. Part 1.

Want to Cluster Text? Try Custom Word-Embeddings!

Tf-idf vectors with word-embeddings are analyzed for clustering effectiveness. The text corpus examples considered here indicate that custom word-embeddings can help improve clusterability of the corpus That is welcome news after our ho-hum results for text classification when using word-embeddings. In the context of classification we concluded that keeping it simple with naive bayes and tf-idf … Read moreWant to Cluster Text? Try Custom Word-Embeddings!

In case you missed it: November 2018 roundup

Related To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, … Read moreIn case you missed it: November 2018 roundup

Weekly Selection — Dec 14, 2018

3rd Wave Data Visualization By Elijah Meeks — 12 min read Imagine what it was like to do data visualization 30 years ago. It’s 1988 and you’re using Excel 2.0 for simple charts like pie charts and line charts, or maybe something like SPSS for more complicated exploration and Arc/Info for geospatial data visualization. Favorite

CSV Analysis with Amazon Athena

Executing standard SQL queries on your Amazon S3 bucket files Dec 14, 2018 “What’s Amazon Athena?”, I hear you ask. Good question. It’s one of Amazon Web Services’ amenities for architecture in the cloud. More specifically, Athena allows us to query data we hold in another service called Amazon Simple Storage Service (S3) using standard SQL … Read moreCSV Analysis with Amazon Athena

Using Analysis of Variance with Experimentation Data

“The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups (although you tend to only see it used when there are a minimum of three, rather than two groups)”. Having entered the world of digital analytics from a … Read moreUsing Analysis of Variance with Experimentation Data

Get a glimpse of future using time series forecasting using Auto-ARIMA and Artificial Intelligence

Time Series Forecasting using Auto-ARIMA in python. AI and future Currently, there is a lot of development going on in Artificial intelligence research to get an accurate glimpse of the future. If any mathematical model predicts future data taking input as only time then that terminology called as time series forecasting. There are many machine learning and … Read moreGet a glimpse of future using time series forecasting using Auto-ARIMA and Artificial Intelligence

The Naive Bayes Classifier

Joseph Catanzarite The Naïve Bayes Classifier is perhaps the simplest machine learning classifier to build, train, and predict with. This post will show how and why it works. Part 1 reveals that the much-celebrated Bayes Rule is just a simple statement about joint and conditional probabilities. But its blandness belies astonishing power, as we’ll see … Read moreThe Naive Bayes Classifier

Kuzushiji-MNIST – Japanese Literature Alternative Dataset for Deep Learning Tasks

Plus our VGG-ResNet ensemble model with state-of-the-art results MNIST, a dataset with 70,000 labeled images of handwritten digits, has been one of the most popular datasets for image processing and classification for over twenty years. Despite its popularity, contemporary deep learning algorithms handle it easily, often surpassing an accuracy result of 99.5%. A new paper … Read moreKuzushiji-MNIST – Japanese Literature Alternative Dataset for Deep Learning Tasks

My book ‘Deep Learning from first principles:Second Edition’ now on Amazon

The second edition of my book ‘Deep Learning from first principles:Second Edition- In vectorized Python, R and Octave’, is now available on Amazon, in both paperback ($14.99)  and kindle ($9.99/Rs449/-)  versions. Since this book is almost 70% code, all functions, and code snippets have been formatted to use the fixed-width font ‘Lucida Console’. In addition … Read moreMy book ‘Deep Learning from first principles:Second Edition’ now on Amazon

Learning Data Science on Generic Datasets is Useless

Alright it’s most definitely not useless. But it is far more useless than it needs to be. This article will outline some of the potential roles that data plays in learning data science, with an argument against using generic (and static for that matter) datasets. All too often we see machine learning topics taught on … Read moreLearning Data Science on Generic Datasets is Useless

Closing the Sale: Predicting Home Prices via Linear Regression

Imports, Data Cleansing, and EDA Cleaning and EDA are important for this challenge as this data set contains many ordinal / categorical features that may be important in categorization and will need to be converted to numerical values. As a baseline, I imported the following libraries to clean, explore and model the training data. One of … Read moreClosing the Sale: Predicting Home Prices via Linear Regression

Nobody UNDERSTANDS Me … But Soon, Artificial Intelligence Just Might

Our faces and voices can be analyzed for emotion. As I mentioned, biomimicry, or imitating natural design in the things we create, is critical in recreating this human tendency in AI. Our end goal is artificial empathy, which (for now, at least) describes a machine’s ability to recognize and respond to human emotion. Going in line … Read moreNobody UNDERSTANDS Me … But Soon, Artificial Intelligence Just Might

Pdftools 2.0: powerful pdf text extraction tools

A new version of pdftools has been released to CRAN. Go get it while it’s hot: install.packages(“pdftools”) This version has two major improvements: low level text extraction and encoding improvements. About PDF textboxes A pdf document may seem to contain paragraphs or tables in a viewer, but this is not actually true. PDF is a … Read morePdftools 2.0: powerful pdf text extraction tools

Easy CI/CD of GPU applications on Google Cloud including bare-metal using Gitlab and Kubernetes

Summary Are you a data scientist who only wants to focus on modelling and coding and not on setting up a GPU cluster? Then, this blog might be interesting for you. We developed an automated pipeline using gitlab and Kubernetes that is able to run code in two GPU environments, GCP and bare-metal; no need … Read moreEasy CI/CD of GPU applications on Google Cloud including bare-metal using Gitlab and Kubernetes

Yet another visualization of the Bayesian Beta-Binomial model

The Beta-Binomial model is the “hello world” of Bayesian statistics. That is, it’s the first model you get to run, often before you even know what you are doing. There are many reasons for this: It only has one parameter, the underlying proportion of success, so it’s easy to visualize and reason about. It’s easy … Read moreYet another visualization of the Bayesian Beta-Binomial model

Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%. Piping Piping is … Read moreReusable Pipelines in R

Pandas for data.table Users

R and Python are both great languages for data analysis. While they are remarkably similar in some aspects, they are drastically different in others. In this post, I will focus on the similarities and differences between Pandas and data.table, two of the most prominent data manipulation packages in Python/R. There is alreay an excellent post … Read morePandas for data.table Users

An introduction to high-dimensional hyper-parameter tuning

If you ever struggled with tuning Machine Learning (ML) models, you are reading the right piece. Hyper-parameter tuning refers to the problem of finding an optimal set of parameter values for a learning algorithm. Usually, the process of choosing these values is a time-consuming task. Even for simple algorithms like Linear Regression, finding the best … Read moreAn introduction to high-dimensional hyper-parameter tuning

R community update: announcing sessions for useR Delhi December meetup

As referenced in my last blog post, useR Delhi NCR is all set to host our second meetup on 15th December, i.e. upcoming Saturday. We’ve finalized two exciting speaker sessions for the same. They’re as follows: Basics of Shiny and geospatial visualizations by Sean Angiolillo Up in the air: cloud storage based workflows in R … Read moreR community update: announcing sessions for useR Delhi December meetup

Preprocessing with sklearn: a complete and comprehensive guide

For aspiring data scientist it might sometimes be difficult to find their way through the forest of preprocessing techniques. Sklearn its preprocessing library forms a solid foundation to guide you through this important task in the data science pipeline. Although Sklearn a has pretty solid documentation, it often misses streamline and intuition between different concepts. … Read morePreprocessing with sklearn: a complete and comprehensive guide

IcoOmen: Using Machine Learning to Predict ICO Prices

Methodology Choose inputs and outputs. Collect and aggregate the data. Prepare the data. Explore and attempt to understand the data. Choose a Machine Learning Model. Measure the performance of the Model. Save the Model. Use the Model to make predictions. 1. Choosing Inputs and Outputs Inputs Choosing the right inputs and outputs (in the case of … Read moreIcoOmen: Using Machine Learning to Predict ICO Prices

How to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

An Application of Sequence Model to Mine Waze Open Data of Traffic Incidents, using Python and Keras. In this tutorial, I will show you how to use RNN deep learning model to find patterns from Waze Traffic Open Data of Incidents Report, and predict if severe traffic jams will happen shortly. Interventions can be taken out … Read moreHow to Predict Severe Traffic Jams with Python and Recurrent Neural Networks?

How to get the most out of Towards Data Science?

Our Readers’ Guide We have received feedback that some of you find it difficult to efficiently navigate our Medium publication. So we have put together a few bullet points that will hopefully aid your experience on our blog. Subscribe to our publication to receive our Monthly Edition and Weekly Selection directly in your mailbox. Follow us … Read moreHow to get the most out of Towards Data Science?

Gold-Mining Week 15 (2018)

The post Gold-Mining Week 15 (2018) appeared first on Fantasy Football Analytics. Related R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and … Read moreGold-Mining Week 15 (2018)

Vaex: Out of Core Dataframes for Python and Fast Visualization

So… no pandas ?? There are some issues with pandas that the original author Wes McKinney outlines in his insightful blogpost: “Apache Arrow and the “10 Things I Hate About pandas”. Many of these issues will be tackled in the next version of pandas (pandas2?), building on top of Apache Arrow and other libraries. Vaex starts … Read moreVaex: Out of Core Dataframes for Python and Fast Visualization

RTutor: Better Incentive Contracts For Road Construction

Since about two weeks, I face a large additional traffic jam every morning due to a construction site on the road. When passing the construction site, often only few people or sometimes nobody seems to be working there. Being an economist, I really wonder how much of such traffic jams could be avoided with better … Read moreRTutor: Better Incentive Contracts For Road Construction

Recreating the NBA lead tracker graphic

For each NBA game, nba.com has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on 10 Dec 2018: Taken from https://www.nba.com/games/20181210/LACPHX#/matchup I thought it would be cool to try recreating … Read moreRecreating the NBA lead tracker graphic

Music Genre Classification with Python

Objective Companies nowadays use music classification, either to be able to place recommendations to their customers (such as Spotify, Soundcloud) or simply as a product (for example Shazam). Determining music genres is the first step in that direction. Machine Learning techniques have proved to be quite successful in extracting trends and patterns from the large … Read moreMusic Genre Classification with Python

Rsampling Fama French

Today we will continue our work on Fama French factor models, but more as a vehicle to explore some of the awesome stuff happening in the world of tidy models. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and wrangling the data, … Read moreRsampling Fama French

Twins on the up

Are multiple births on the increase? My twin boys turned 5 years old today. Wow, time flies. Life is never dull, because twins are still seen as something of a novelty, so wherever we go, we find ourselves in conversation with strangers, who are intrigued by the whole thing. In order to save time if … Read moreTwins on the up

Named Entity Recognition (NER), Meeting Industry’s Requirement by Applying state-of-the-art Deep…

we are going to have a quick look at the architecture of four different state-of-the-art approaches by referring to the actual research paper and then we will move on to implement the one with the highest accuracy. Bidirectional LSTM-CRF: More details and implementation in keras. from the paper(Bidirectional LSTM-CRF Models for Sequence Tagging) 2. Bidirectional LSTM-CNNs: … Read moreNamed Entity Recognition (NER), Meeting Industry’s Requirement by Applying state-of-the-art Deep…

The Importance of Being Recurrent for Modeling Hierarchical Structure

RNNs have inherent performance limitations For a while, it seemed that RNN’s were taking the Natural Language Processing (NLP) world by storm (from about 2014–17). However, we’ve recently started realizing the limitations of RNN’s, primarily that they are “inefficient and not scalable”. While there is great promise in overcoming these limitations by using more specialized … Read moreThe Importance of Being Recurrent for Modeling Hierarchical Structure

Text Summarization on the Books of Harry Potter

Hermione interrupted them. “Aren’t you two ever going to read Hogwarts, A History?” How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it’s a lot. How many nights do the three of them spend in the library, reading through every book … Read moreText Summarization on the Books of Harry Potter