Thinking Of Switching Careers To A Developer?

I Have The Answers. But How? I know what you’re wondering: how do I even have the answers? Well, I could say from experience but as an aspiring data scientist, to demonstrate how data science can make any decision making process easier and ensure you make the correct decision. I’ll be using data from the 2018 … Read more

Unmaking Graphs

This is how things usually go when I first create any graph: Imagine I just got my hands on a juicy new dataset and I’m doing some exploratory data analysis — hunched over the keyboard with a magnifying glass looking for correlations and analyzing clues. I decide to conjure up some graphs to visualize the data because … Read more

Book review: Beyond Spreadsheets with R

Disclaimer: Manning publications gave me the ebook version of Beyond Spreadsheets with R – A beginner’s guide to R and RStudio by Dr. Jonathan Carroll free of charge. Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You’ll build on simple programming techniques … Read more

Categories R Tags ExcerptFavorite

Announcing new software peer review editors: Melina Vidoni and Brooke Anderson

We are pleased to welcome Brooke Anderson and Melina Vidoni to our team of Associate Editors for rOpenSci Software Peer Review. They join Scott Chamberlain, Anna Krystalli, Lincoln Mullen, Karthik Ram, Noam Ross and Maëlle Salmon. With the addition of Brooke and Melina, our editorial board now includes four women and four men, located in … Read more

Categories R Tags ExcerptFavorite

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

I have been playing around with historical newspaper data (seehere andhere). I have extracted thedata from the largest archive available, as described in the previous blog post, and now createda shiny dashboard where it is possible to visualize the most common words per article, as well asread a summary of each article.The summary was made … Read more

Categories R Tags ExcerptFavorite

missing digit in a 114 digit number [a Riddler’s riddle]

A puzzling riddle from The Riddler (as Le Monde had a painful geometry riddle this week): this number with 114 digits 530,131,801,762,787,739,802,889,792,754,109,70?,139,358,547,710,066,257,652,050,346,294,484,433,323,974,747,960,297,803,292,989,236,183,040,000,000,000 is missing one digit and is a product of some of the integers between 2 and 99. By comparison, 76! and 77! have 112 and 114 digits, respectively. While 99! has 156 digits. … Read more

Categories R Tags ExcerptFavorite

The Unsung Heroes of Modern Software Development

Open Source Foundation Leaders I’ll highlight six open source foundations that are key to many important projects. For each foundation I’ll give a brief bio, provide the number of projects being supported as of early 2019, and highlight some well-known projects. Note that these groups fall under various IRS classifications for charitable and trade organizations — not … Read more

Introducing the AI Project Canvas

AI Project Canvas Imagine the following scenario: You have a brilliant idea for a new AI project. To make it happen, you need to convince management to fund your idea. You need to pitch your AI project idea to stakeholders and management. Yuck. This is the first step where the AI Project Canvas comes into play. … Read more

The Grass Really is Greener on the Other Side: Buying Local and its Shortcomings

Evidence-Based Policy is Bigger than You or Your Feelings — Part II Just because your vegetables travel thousands of kilometers to your kitchen table doesn’t mean they can’t be better for the environment than produce from your local farmer’s market. There. I’ve said it. As unpopular opinions go, this one is somewhere between ‘pineapple on pizza’ and ‘healthcare … Read more

ML Algorithms: One SD (σ)

The obvious questions to ask when facing a wide variety of machine learning algorithms, is “which algorithm is better for a specific task, and which one should I use?” Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of … Read more

How to Pace the London Marathon: Fuelled by Data

Chris is a current MSc Computer Science student at the University of Warwick, UK. He is also the co-founder of Sustain Investing. Before that, Chris worked at Citi Ventures and at Citi Markets. It started with an excuse Hi, I’m Chris. While building Sustain with my cofounders Andre, Nick Foden and Sylwia Zieba I’ve been studying … Read more

The Blockchain Scalability Problem & the Race for Visa-Like Transaction Speed

Yes, blockchain has a scalability problem. Here’s what it is, and here’s what people are doing to solve it. The battle for a scalable solution is the blockchain’s moon race. Bitcoin processes 4.6 transactions per second. Visa does around 1,700 transactions per second on average (based on a calculation derived from the official claim of … Read more

Making Sense of Startup Valuations with Data Science

The following is a condensed and slightly modified version of a Radicle working paper on the startup economy in which we explore post-money valuations by venture capital stage classifications. We find that valuations have interesting distributional properties and then go on to describe a statistical model for estimating an undisclosed valuation with considerable ease. In … Read more

Value Investing with Machine Learning

Your favourite holding period doesn’t have to be forever… The Oracle of Omaha once said: “Price is what you pay, value is what you get.” Warren Buffet But how can you be certain that you are paying a fair price for an investment? How can you make the most of a fair or unfair situation? This … Read more

Introducing Snorkel

How this Tiny Project Solves One of the Major Problems in Real World Machine Learning Solutions Building high quality training datasets is one of the most difficult challenges of machine learning solutions in the real world. Disciplines like deep learning have helped us to build more accurate models but, to do so, they require vastly … Read more

Fast Static Maps Built with R

Luke Whyte posted an article (apologies for a Medium link) over on Towards Data Science showing how to use a command line workflow involving curl, node and various D3 libraries and javascript source files to build a series of SVG static maps. It’s well written and you should give it a read especially since he … Read more

Categories R Tags ExcerptFavorite

Cross Validation — Why & How

So, you have been working on an imbalanced data set for a few days now and trying out different machine learning models, training them on a part of your data set, testing their accuracy and you are ecstatic to see the score going above 0.95 every-time. Do you really think you have achieved 95% accuracy … Read more

Price’s Protein Puzzle: 2019 update

Chains of amino acids strung together make up proteins and since each amino acid has a 1-letter abbreviation, we can find words (English and otherwise) in protein sequences. I imagine this pursuit began as soon as proteins were first sequenced, but the first reference to protein word-finding as a sport is, to my knowledge, “Price’s … Read more

Categories R Tags ExcerptFavorite

R Markdown Template for Business Reports

In this post I’d like to introduce the R Markdown template for business reports by INWTlab. It’s been my aim to have a nice and clean template that is easy to customize in colors, cover and logo. I know there are quite a few templates available, but I was missing one to be used in … Read more

Categories R Tags ExcerptFavorite

Quick Hit: Using seymour to Subscribe to your Git[la|hu]b Repo Issues in Feedly

The seymour Feedly API package has been updated to support subscribing to RSS/Atom feeds. Previously the package was intended to just treat your Feedly as a data source, but there was a compelling use case for enabling subscription support: subscribing to code repository issues. Sure, there’s already email notice integration for repository issues on most … Read more

Categories R Tags ExcerptFavorite

Using Tensorflow Serving GRPC

How to write a GRPC Client for the wrapped model Once you have your Tensorflow or Keras based model trained, one needs to think on how to use it in,deploy it in production. You may want to Dockerize it as a micro-service, implementing a custom GRPC (or REST- or not) interface. Then deploy this to server … Read more

A Dog Detector and Breed Classifier

In a field like physics, things keep getting harder, to the point that it’s very difficult to understand what’s going on at the cutting edge unless it’s in highly simplified terms. In computer science though, and artificial intelligence in particular, knowledge built up slowly over 70+ years by people all over the world is still … Read more

Build a Pipeline for Harvesting Medium Top Author Data

Nuts and Bolts One key requirement was to make deployment of my Luigi workflow very simple. I wanted to assume only one thing about the deployment environment; that the Docker daemon would be available. With Docker, I wouldn’t need to be concerned with Python version mismatches or other environmental discrepancies. It took me a little while … Read more

New R package: load and chart oceanic storms

Mapping historical storms data is now a little bit easier. Off the back of this blog, I have authored an R package (available at basilesimon/noaastorms) that downloads, cleans and parses NOAA IBtrack data for you. The National Oceanic and Atmospheric Administration releases datasets known as International Best Track Archive for Climate Stewardship. These datasets are … Read more

Categories R Tags ExcerptFavorite

Time Travel with RStudio Package Manager 1.0.4

We all love packages. We don’t love when broken package environments prevent usfrom reproducing our work. In version 1.0.4 of RStudio Package Manager,individuals and teams can navigate through repository checkpoints,making it easy to recreate environments and reproduce work. The new release alsoadds important security updates, improvements for Git sources, further access toretired packages, and beta … Read more

Categories R Tags ExcerptFavorite

December 2108: “Top 40” New CRAN Packages

By my count, 157 new packages stuck to CRAN in December. Below are my “Top 40” picks in ten categories: Computational Methods, Data, Finance, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities and Visualization. This is the first time I have used the Medicine category. I am pleased that a few packages that appear to … Read more

Categories R Tags ExcerptFavorite

Using custom scales with the ‚scales‘ package

Maybe you already heard of the package “scales” – and if you didn’t hear about it, you might have used it without knowing (e.g., in the context of ggplot2 graphs). I want to show you a few of the functionalities of the “scales” package. I will also show you how to create your own scales. … Read more

Categories R Tags ExcerptFavorite

Power BI

Using Power BI and R Tutorial here: Run R scripts in Power BI Desktop The only twist that I want to add is an idea on how to enable users without admin access to run R code. This can be achieved by storing a portable r installation on a mountable file storage. R Download the … Read more

Pix2Pix

Shocking result of Edges-to-Photo Image-to-Image translation using the Pix2Pix GAN Algorithm This article will explain the fundamental mechanisms of a popular paper on Image-to-Image translation with Conditional GANs, Pix2Pix, following is a link to the paper: Article Outline I. Introduction II. Dual Objective Function with Adversarial and L1 Loss III. U-Net Generator IV. PatchGAN Discriminator … Read more

Probability — Fundamentals of Machine Learning (Part 1)

The Mathematics of Probability In the beginning, I suggested that probability theory is a mathematical framework. As with any mathematical framework there is some vocabulary and important axioms needed to fully leverage the theory as a tool for machine learning. Probability is all about the possibility of various outcomes. The set of all possible outcomes … Read more

Why visual literacy is essential to good data visualization

We know data literacy matters. But visual literacy matters too. Here’s why. Photo by Markus Spiske on Unsplash Data is all around us, and the way people work has changed because of it. Companies are now investing more in roles like Chief Data Officer, building their data science teams and talking about things like “data literacy” in … Read more

Correlation analysis of cyclically adjusted valuation measures and subsequent returns

In this post we’ll test three different cyclically-adjusted valuation measures: CAPE (earnings), CAPD (dividends) and CAPB (book value). CAPE is calculated like the P/E ratio, but by dividing the current real price with the last ten year’s average inflation-adjusted earnings. CAPD uses dividends instead of earnings, and CAPB uses book value. We’ll test the optimal … Read more

Categories R Tags ExcerptFavorite

My journey applying AI to horse racing

My journey into machine learning began in the summer of 2016. It all started at a barbecue party at the home of my fiancé’s aunt and uncle’s in northern Stockholm. I was sitting outside at a garden table together with the older men of her family. These are old and tough Finish men, her granddad … Read more

EARL London 2019 announcement

EARL London is back for 2019! We are thrilled to announce that the Enterprise Applications of the R Language Conference will be returning to the Tower Hotel from the 10-12 September 2019. If you’d like to see what you can expect during 3 days of EARL, then check out our highlights from last year’s conference. We are pleased … Read more

Categories R Tags ExcerptFavorite

Using Travis-CI to Create R-bloggers for Taiwan

R-bloggers.com is a great platform for R users, but I sometimes feel awkward to publish posts on R-bloggers when I have things to share that are only relevant to users in Taiwan. Inspired by R-bloggers, I thought maybe I could use Travis-CI and GitHub to create a blog that automatically updates its posts by retrieving … Read more

Categories R Tags ExcerptFavorite

A.I enhanced molecular discovery and optimization

Awesome! But how do we get there? Researchers at the forefront of their fields have been trying to use the existing tools we have on hand to solve this problem. There is a pattern in the modus operandi of current research, and the same general process applies to any A.I based science project. Researchers are carpenters, … Read more

Creating a word cloud on R-bloggers posts

This post will go through how to create a word cloud of article titles scraped from the awesome R-bloggers. Our goal will be to use R’s rvest package to search through 50 successive pages on the site for article titles. The stringr and tm packages will be used for string cleaning and for creating a … Read more

Categories R Tags ExcerptFavorite

Hash Me If You Can

We are living in the era of Big Data but the problem of course is that the bigger our data sets become the slower even simple search operations get. I will now show you a trick that is the next best thing to magic: building a search function that practically doesn’t slow down even for … Read more

Categories R Tags ExcerptFavorite

benchmarkme: new version

When discussing how to speed up slow R code, my first question is what is your computer spec? It’s always surprised me that people are wondering why analysing big data is slow, yet they are using a five-year-old cheap laptop. Spending a few thousand pounds would often make their problems disappear. To quantify the impact … Read more

Categories R Tags ExcerptFavorite

Web Scraping Google Sheets with RSelenium

Photo by freestocks.org on Unsplash I love to learn new things and one of ways I learn best is by doing. Also it’s been said that you never fully understand a topic until you are able to explain it , I think blogging is a low barrier to explaining things. Someone I met at a local data … Read more

Categories R Tags ExcerptFavorite

Everything you need to know about Scatter Plots for Data Visualisation

If you’re a Data Scientist there’s no doubt that you’ve worked with scatter plots before. Despite their simplicity, scatter plots are a powerful tool for visualising data. There’s a lot of options, flexibility, and representational power that comes with the simple change of a few parameters like color, size, shape, and regression plotting. Here you’ll … Read more

Making Music with Machine Learning

Image from https://www.maxpixel.net/Circle-Structure-Music-Points-Clef-Pattern-Heart-1790837 Music is not just an art, music is an expression of the human condition. When an artist is making a song you can often hear the emotions, experiences, and energy they have in that moment. Music connects people all over the world and is shared across cultures. So there is no way … Read more

Exploration of the Social News TV: The Communication Behavior of #ajnewsgrid

What is NewsGrid? NewsGrid is a young news program broadcast globally by Al Jazeera since 2016. It is Al Jazeera’s first interactive news hour. The show is produced in three parts, top stories of the day presented by one presenter, stories create huge social reaction on Twitter presented by a social media presenter, and the … Read more

Hey, Who Moved the Goalposts?

Part of 10 reasons why Software Development projects fail series The most successful software development projects have a timeline and a series of milestones to accomplish that project within a set period of time. Those milestones are critical, because they help to divide a large project into a series of much smaller projects, and they … Read more