Economics for Tech People — Supply (Part 2)

Load Libraries To work through the project, we will need the readxl and tidyverse packages. If you do not have them installed in your R environment, simply remove the “#” sign before the “install.packages…” lines of code [1]. Once they are installed, you will not need to install them again on your machine. Here’s the … Read more Economics for Tech People — Supply (Part 2)

PySnpTools

Reading and Manipulating Genomic Data in Python Photo by National Cancer Institute on Unsplash PySnpTools is a Python library of reading and manipulating genomic data in Python. It allows users to efficiently select and reorder individuals (rows) and SNP locations (columns). It then reads only the data selected. Originally developed to support FaST-LMM — a … Read more PySnpTools

R/Pharma October 2020

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The R/Pharma virtual conference this year was held October 13-15th, 2020. R/Pharma … Read more R/Pharma October 2020

Spark vs Pandas, part 3 — Scala vs Python

Of course programming languages play an important role, although their relevance is often misunderstood. Having the right programming language in your CV may eventually be one of the deciding factors for getting a specific job or project. This is a good example where the relevance of programming languages might be misunderstood, especially in the context … Read more Spark vs Pandas, part 3 — Scala vs Python

Multi-Layer Perceptron & Backpropagation — Implemented from scratch

Writing a custom implementation of a popular algorithm can be compared to playing a musical standard. For as long as the code reflects upon the equations, the functionality remains unchanged. It is, indeed, just like playing from notes. However, it lets you master your tools and practice your ability to hear and think. In this … Read more Multi-Layer Perceptron & Backpropagation — Implemented from scratch

Modify RStudio prompt to show current git branch

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the last Raddicts Paris Meetup, Romain Francois (to be followed on twitter … Read more Modify RStudio prompt to show current git branch

Use Numpy for Statistics and Arithmetic Operations in 2020

As you can see, a list uses more than double the memory compared to a Numpy array. To install Numpy, type the following command pip install numpy To import the NumPy library you type the following code. import numpy as np The ‘as np’ is not necessary. It allows you to use ‘np’ instead of … Read more Use Numpy for Statistics and Arithmetic Operations in 2020

Talking to Python from Javascript: Flask and the fetch API

Now we have a working example we can expand it to include actual data. In reality, this could involve accessing a database, decrypting some information or filtering a table. For the purpose of this tutorial we create a data array from which we index elements: ######## Example data, in sets of 3 ############data = list(range(1,300,3))print … Read more Talking to Python from Javascript: Flask and the fetch API

Little useless-useful R function – Psychedelic Square root with x11()

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Yes, the name of today’s function is wacky, because it gives … Read more Little useless-useful R function – Psychedelic Square root with x11()

5 types of plots that will help you with time series analysis

Photo by Isaac Smith on Unsplash And how to quickly create them using Python While starting any project related to time series (and not only), one of the very first steps is to visualize the data. We do so to inspect the data we are dealing with and learn something about it, for example: are … Read more 5 types of plots that will help you with time series analysis

Cheat sheet for implementing 7 methods for selecting the optimal number of clusters in Python

Select the optimal number of clusters based on multiple clustering validation metrics like Gap Statistic, Silhouette Coefficient, Calinski-Harabasz Index etc. Photo by Mehrshad Rajabi on Unsplash Segmentation provides a data driven angle for examining meaningful segments that executives can use to take targeted actions and improve business outcomes. Many executives run the risk of making … Read more Cheat sheet for implementing 7 methods for selecting the optimal number of clusters in Python

Rapid Internationalization of Shiny Apps: shiny.i18n Version 0.2

Rapid Multi-Language Support for Shiny Apps Have you ever created a multilingual Shiny application? Chances are the answer is yes, if you are a big fan of Appsilon‘s blog and have read our first article on the topic. If that’s not the case – fear not – we’ll cover everything you need to know today about … Read more Rapid Internationalization of Shiny Apps: shiny.i18n Version 0.2

How much of your Neural Network’s Prediction can be Attributed to each Input Feature?

Neural networks are known to be black box predictors where the data scientist does not usually know which particular input feature influenced the prediction the most. This can be rather limiting if we want to get some understanding of what the model actually learned. Having this kind of understanding may allow us to find bugs … Read more How much of your Neural Network’s Prediction can be Attributed to each Input Feature?

3 Python Tricks to Read, Create, and Run Multiple Files Automatically

Automate Boring Stuff with Python and Bash For Loop Photo by Sincerely Media on Unsplash When putting your code into production, you will most likely need to deal with organizing the files of your code. It can be really time-consuming to read, create, and run many files of data. This article will show you how … Read more 3 Python Tricks to Read, Create, and Run Multiple Files Automatically

Visualizing COVID-19 Vulnerability With Plot.ly For Python

The first thing we are going to need to visualize our US counties is a set of JSON data that will tell Plot.ly the areas of a map that correspond to the correct FIPS. We can get this from the Plot.ly Github at: https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json We could either wget this data and then read it in … Read more Visualizing COVID-19 Vulnerability With Plot.ly For Python

Build a LIME explainer dashboard with the fewest lines of code

In an earlier post, I described how to explain a fine-grained sentiment classifier’s results using LIME ( Local Interpretable Model-agnostic Explanations). To recap, the following six models were used to make fine-grained sentiment class predictions on the Stanford Sentiment Treebank (SST-5) dataset. Rule-based models: TextBlob and VADER Feature-based models: Logistic regression and Support Vector Machine … Read more Build a LIME explainer dashboard with the fewest lines of code

Gold-Mining Week 7 (2020)

[This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Favorite

Eye-catching animated maps in R — a simple introduction

The data that we are going to display is connected to a current and important topic: case data for the ongoing COVID-19 pandemic. The World Health Organisation publishes daily case statistics for all countries in csv format which we can readily access. WHO Covid Dataset (Image by Author) The dataset provides us with daily statistics … Read more Eye-catching animated maps in R — a simple introduction

Vice Presidential and Presidential Debate Analysis using Data Science

Debate Analysis using Data Science: Using YouTube Comments to find the true intent of voters Image Source I believe Data Science allows me to express my curiosity in ways I’d never imagine. The coolest thing in Data Science is that I see data not as numbers but as an opportunity (business problem), insights(predictive modeling, stats, … Read more Vice Presidential and Presidential Debate Analysis using Data Science

Quirky Keras: Custom and Asymmetric Loss Functions for Keras in R

[Image by MontyLov on unsplash] TL;DR — this tutorial shows you how to use wrapper functions to construct custom loss functions that take arguments other than y_pred and y_true for Keras in R. See example code for linear exponential error (LINEXE) and weighted least squared error (WLSE). In statistical learning, the loss function is a … Read more Quirky Keras: Custom and Asymmetric Loss Functions for Keras in R

Little useless-useful R function – R-jobs title generator

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Another post on the series of useless-useful R functions. This time … Read more Little useless-useful R function – R-jobs title generator

Query and analyze Stripe data in Python

MRR and Churn calculations source: https://unsplash.com/photos/ZVprbBmT8QA Stripe is an online payment company that offers software and APIs for processing payments and business management. I love that Stripe has different APIs for different languages, which makes people’s lives a lot easier. I primarily use the Stripe Python API. To install: pip install –upgrade stripe You can … Read more Query and analyze Stripe data in Python

Analysing Company Earning Calls with Python

Building a Python script to analyse company earning calls. During this post, we will extract the main key points on the company recent and future performance as stated by management in the latest earning calls. The code will be very simple. We will pass the ticker of any company of our interest and the outcome … Read more Analysing Company Earning Calls with Python

Riinu Pius – R for Health Data Science – from clinicians who code to Shiny interventions

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Few weeks ago we finished Why R? 2020 conference. We had an honour … Read more Riinu Pius – R for Health Data Science – from clinicians who code to Shiny interventions

Approaches to Time Series Data with Weak Seasonality

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In the previous article, we have tried to model the gold price in … Read more Approaches to Time Series Data with Weak Seasonality

The Evolution of Distributed Programming in R

Both R and distributed programming rank highly on my list of “good things”, so imagine my delight when two new packages used for distributed programming in R were released: ddR (https://github.com/vertica/ddR) and multidplyr (https://github.com/hadley/multidplyr) Distributed programming is normally taken up for a variety of reasons: To speed up a process or piece of code To scale up … Read more The Evolution of Distributed Programming in R

Create interactive maps for Instagram with Python

Now we get to some more customisation. First, we’ll load in an image to use as a custom icon for the markers on the map. Then we’ll load in another geographic boundary — this time the Oxford City boundary. As you can imagine, there is a higher density of pubs in Oxford city centre, compared … Read more Create interactive maps for Instagram with Python

How To Make A Twitter Bot For Free

15% of Twitter users might be, in reality, bots. Businesses, brands, and influencers use bots to manage their Twitter accounts. Even I made my own not long ago. If you’re here, I guess you want to make one yourself. This article will show you how. You’ll learn about the different options available to make a … Read more How To Make A Twitter Bot For Free

How to carry column metadata in pivot_longer

Category Tags Pivoting data can be a pain point in bioinformatics workflows. Lots of bioinformatics software are tied to the wide format with data spread out among multiple columns while the whole tidyverse/ggplot system requires long data with as few columns as possible. Becoming proficient at switching your data to long format has several benefits. … Read more How to carry column metadata in pivot_longer

Displaying increasing U.S. eligible voter diversity with a slopegraph in R

I maintain small package CGPfunctions on Githubas well as CRAN. I know afew people use it because occasionally I get feedback (usually accompanied byfeature requests 🙂 ). I am absolutely sure that it’s in no danger of “goingviral” but the discipline of maintaining does me some good at least, and it’snice to know a few … Read more Displaying increasing U.S. eligible voter diversity with a slopegraph in R

parking riddle

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The Riddler of this week had a quick riddle: if … Read more parking riddle

Julia Silge – Data visualization for machine learning practitioners

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Few weeks ago we finished Why R? 2020 conference. We had an honour … Read more Julia Silge – Data visualization for machine learning practitioners

Xception from scratch using Tensorflow — Even better than Inception

Even better than Inception Figure 1. Xception architecture (Source: Image from the original paper) Convolutional Neural Networks (CNN) have come a long way, from the LeNet-style, AlexNet, VGG models, which used simple stacks of convolutional layers for feature extraction and max-pooling layers for spatial sub-sampling, stacked one after the other, to Inception and ResNet networks … Read more Xception from scratch using Tensorflow — Even better than Inception

Types of Samplings in PySpark 3

Sampling is the process of determining a representative subgroup from the dataset for a specified case study. Sampling stands for crucial research and business decision results. For this reason, it is essential to use the most appropriate and useful sampling methods with the provided technology. This article is mainly for data scientists and data engineers … Read more Types of Samplings in PySpark 3

10 Of My Favorite Python Libraries For Data Analysis

To start us off, I am going with a library for data visualization that is pretty well-known, but some might have never heard of. Plot.ly is a graphing library that takes interactivity to a whole new level. I would genuinely advise using Plot.ly over something like Matplotlib or Seaborn. This is because Plot.ly comes with … Read more 10 Of My Favorite Python Libraries For Data Analysis

Python Comprehensions With Implementation

List | Dictionary | Set comprehensions Photo by: Debby | Unsplash.com Introduction: Python is a popular language that allows programmers to write elegant, easy-to-write and read code like plain English. The unique feature of Python is a different type of comprehensions. In Python, there are three types of comprehensions viz. List, Dictionary and Set. By … Read more Python Comprehensions With Implementation

How to Visualize Time Series Data: Tidy Forecasting in R

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Interested in more time series tutorials? Learn more R tips: 👉 Register for … Read more How to Visualize Time Series Data: Tidy Forecasting in R

Haskell is fast

Updated title:Haskell is fast, but Julia is faster (see updates atthe end). My R package ‘HypergeoMat’ provides a Rcpp implementation of Koev &Edelman’s algorithm for the evaluation of the hypergeometric function ofa matrix argument. I also implemented this algorithm inJuliaand in Haskell. So let us benchmark now. Here is the hypergeometric function of a matrix … Read more Haskell is fast

The Bachelorette Ep. 2 – Petal to the Metal – Data and Drama in R

Week two brought us some new Clare drama. We decided to “strip” the data down to its essentials and try to avoid “dodging” any tough questions. In case you missed last week’s recap, you can find it here. Since only one man was eliminated this week (for his failure to adequately compliment Clare), this week’s … Read more The Bachelorette Ep. 2 – Petal to the Metal – Data and Drama in R

Jan Vitek – R MELTS BRAINS – or How I Learned to Love Failing at Compiling R

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Few weeks ago we finished Why R? 2020 conference. We had an honour … Read more Jan Vitek – R MELTS BRAINS – or How I Learned to Love Failing at Compiling R

Missing Value Imputation with Python and K-Nearest Neighbors

This housing dataset is aimed towards predictive modeling with regression algorithms, as the target variable is continuous (MEDV). It means we can train many predictive models where missing values are imputed with different values for K and see which one performs the best. But first, the imports. We need a couple of things from Scikit-Learn … Read more Missing Value Imputation with Python and K-Nearest Neighbors

Your (imaginary) first day as a Data Analyst

Finish your first project with the help of your prior online course knowledge Have you ever asked yourself how a successful first day as a data analyst looks? To get my Data Science Nanodegree, I will show you a simple scenario. You will learn: What Cross-Selling is? How to analyze your customers. How to use … Read more Your (imaginary) first day as a Data Analyst