Economics for Tech People — Supply (Part 2)

Load Libraries To work through the project, we will need the readxl and tidyverse packages. If you do not have them installed in your R environment, simply remove the “#” sign before the “install.packages…” lines of code [1]. Once they are installed, you will not need to install them again on your machine. Here’s the … Read more Economics for Tech People — Supply (Part 2)

Analysis of Wave Power

Wave power analysis from data recorded on offshore buoys located in New Hampshire and Rhode Island. Image by author Although most people don’t think of locations like New Hampshire and Rhode Island as popular destinations for surfing, wave energy data obtained from the National Data Buoy Center shows that on occasion there is significant wave … Read more Analysis of Wave Power

The nocode revolution is coming — are you ready for it?

Predicting a paradigm shift in the (programming) world, and how to not to get left behind in its wake. Lego! (Photo by Kelly Sikkema on Unsplash) Are you a Lego fan? Lego was my first true obsession. Although I’m a recovering legoholic these days, I once had quite the collection that filled a gargantuan sack … Read more The nocode revolution is coming — are you ready for it?

94% Perfect: the Surprising Solution to the $200 Billion Inventory Problem

Many companies still use the traditional Target Stock Level (TSL) model to drive stock decisions. Which is a complex way of saying sell one get one — the literal meaning of the world Replenishment. And, the highly uncertain nature of demand requires inventory managers to operate on much higher safety stock margins than otherwise necessary. … Read more 94% Perfect: the Surprising Solution to the $200 Billion Inventory Problem

PySnpTools

Reading and Manipulating Genomic Data in Python Photo by National Cancer Institute on Unsplash PySnpTools is a Python library of reading and manipulating genomic data in Python. It allows users to efficiently select and reorder individuals (rows) and SNP locations (columns). It then reads only the data selected. Originally developed to support FaST-LMM — a … Read more PySnpTools

The question of single unit semantics in deep networks

Do we have reason to be a little less bewildered by so-called grandmother cells in the context of deep neural networks? A familiar hypothesis in deep learning is that a single higher-layer unit of a network may correspond to a complex semantic entity, such as a person or a specific type of dog [2, 9]. … Read more The question of single unit semantics in deep networks

BERT Explained: What it is and how does it work?

If you even slightly follow the NLP world, or even the ML news you have most likely come across Google’s BERT model or one of its relatives. If you haven’t and still somehow have stumbled across this article, let me have the honor of introducing you to BERT — the powerful NLP beast. BERT stands … Read more BERT Explained: What it is and how does it work?

A Quick Experiment with the CARTO BigQuery Tiler!

We created a couple demos for this presentation, and we wanted to share it out as a bit of a “hello, world” exercise for the tiler! We have loaded the data and the queries/code onto a Github repository and you’re welcome to give it a whirl. When building these for the first time, we found … Read more A Quick Experiment with the CARTO BigQuery Tiler!

How Medium Helped Me Land My First Job In Data Science

Photo by Hunters Race on Unsplash Interviews are horrifying. The nervousness, the ability to deal with pressure, and the nagging voices in our heads whether the job which we are appearing would turn out in our favor or yet another cursed day making our goal to get that offer letter go far away from our … Read more How Medium Helped Me Land My First Job In Data Science

Predicting HDB Housing Prices Using Neural Networks

Ever wondered how much your HDB is worth besides asking your housing agent? Or wondered what are the main features that have a great impact on your HDB? Seeing that over 92.2% of Singaporean own HDB, I believe these are common questions that most Singaporeans have had, and it would be good to see if … Read more Predicting HDB Housing Prices Using Neural Networks

Animating Yourself as a Disney Character with AI

In 2018, NVIDIA published a groundbreaking paper that manages to generate high-quality images (1024×1024) titled “A Style-Based Generator Architecture for Generative Adversarial Networks”. One of the novelty is that it disentangles the latent space which allows us to control the attributes at a different level. For example, the lower layer would be able to control … Read more Animating Yourself as a Disney Character with AI

Machine learning advancements in Arabic NLP

A discussion of Arabic natural language processing (NLP) for social media text, with code examples and in-depth analysis of the cutting-edge technology driving the most recent advancements. Multi-lingual word cloud from tweets about the Beirut explosion (August 2020). Image by Author. Natural language processing (NLP) is not a new discipline, its roots date back to … Read more Machine learning advancements in Arabic NLP

R/Pharma October 2020

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The R/Pharma virtual conference this year was held October 13-15th, 2020. R/Pharma … Read more R/Pharma October 2020

Creating a Chess Engine with Deep Learning

and understanding how neural networks can be used to indirectly solve problems Photo by Hassan Pasha on Unsplash In most chess engines, a searching algorithm along with a heuristic function gives the chess AI the main insight into the best moves to play. The bulk of the programming and most of the “brains” behind this … Read more Creating a Chess Engine with Deep Learning

Spark vs Pandas, part 3 — Scala vs Python

Of course programming languages play an important role, although their relevance is often misunderstood. Having the right programming language in your CV may eventually be one of the deciding factors for getting a specific job or project. This is a good example where the relevance of programming languages might be misunderstood, especially in the context … Read more Spark vs Pandas, part 3 — Scala vs Python

Why you should be plotting learning curves in your next machine learning project

The bias-variance dilemma is a widely known problem in the field of machine learning. Its importance is such, that if you don’t get the trade-off right, it won’t matter how many hours or how much money you throw at your model. In the illustration above, you can get a feel for what bias and variance … Read more Why you should be plotting learning curves in your next machine learning project

Pandas vs SQL – Explained with Examples

How to do typical tasks in data analysis using both Photo by Coffee Geek on Unsplash Pandas is a Python library for data analysis and manipulation. SQL is a programming language that is used to communicate with a database. Most relational database management systems (RDBM) use SQL to operate on tables stores in a database. … Read more Pandas vs SQL – Explained with Examples

Multi-Layer Perceptron & Backpropagation — Implemented from scratch

Writing a custom implementation of a popular algorithm can be compared to playing a musical standard. For as long as the code reflects upon the equations, the functionality remains unchanged. It is, indeed, just like playing from notes. However, it lets you master your tools and practice your ability to hear and think. In this … Read more Multi-Layer Perceptron & Backpropagation — Implemented from scratch

Modify RStudio prompt to show current git branch

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the last Raddicts Paris Meetup, Romain Francois (to be followed on twitter … Read more Modify RStudio prompt to show current git branch

Microsoft unlocks the full potential of the smart building ecosystem

It is impossible to understate the profound impacts the events of 2020 have had on the real estate industry. Buildings of all kinds, from commercial offices to retail, hospitals, manufacturing plants, and more, remain in need of transformative solutions that will enable employees to return to work safely. In times of crisis, it is innovation … Read more Microsoft unlocks the full potential of the smart building ecosystem

Archiving and Logging Your Use of Public Data

Dealing with the impermanence of public data sets Credit: Ula Kuźma One worry that I always have when downloading data sets off the internet is their impermanence. Links die, data changes, ashes to ashes, dust to dust. That’s why I’ve been introducing the Wayback Machine into my workflow. But even then, it’s tough to be … Read more Archiving and Logging Your Use of Public Data

Three Often Overlooked Sources of Data for your Next Passion Project

APK Code Data Photo by Kevin Ku on Unsplash From malware detection to AI, code data is an often overlooked form of data you might want to consider for a project or challenge. But what is code data and how do we work with it? Code data comes in a variety of forms in line … Read more Three Often Overlooked Sources of Data for your Next Passion Project

Use Numpy for Statistics and Arithmetic Operations in 2020

As you can see, a list uses more than double the memory compared to a Numpy array. To install Numpy, type the following command pip install numpy To import the NumPy library you type the following code. import numpy as np The ‘as np’ is not necessary. It allows you to use ‘np’ instead of … Read more Use Numpy for Statistics and Arithmetic Operations in 2020

Unsupervised NLP : Methods and Intuitions behind working with unstructured texts

tldr; this is a primer in the domain of unsupervised techniques in NLP and their applications. It begins with the intuition behind word vectors, their use and advancements. This evolves to the centerstage discussion about the language models in detail — introduction, active use in industry and possible applications for different use-cases. In the fledgling, … Read more Unsupervised NLP : Methods and Intuitions behind working with unstructured texts

Principal Component Analysis (PCA) with Scikit-learn

Hi everyone! This is the second unsupervised machine learning algorithm that I’m discussing here. This time, the topic is Principal Component Analysis (PCA). At the very beginning of the tutorial, I’ll explain the dimensionality of a dataset, what dimensionality reduction means, main approaches to dimensionality reduction, reasons for dimensionality reduction and what PCA means. Then, … Read more Principal Component Analysis (PCA) with Scikit-learn

Deep learning to perform quantum chemical calculations.

The overall picture of Fermi Net is as follows. The output is a wave function ψ for several nuclei (position R) and electrons (position r). The characteristic feature is that each electron — all-nucleus interaction and each electron — electron interaction is propagated in a corresponding (position-wise) manner. The overall picture of Fermi Net. The … Read more Deep learning to perform quantum chemical calculations.

A data team’s product is decisions

Strategic decisions are luxury, bespoke, and artisanal decisions; decisions that are made infrequently but have a large impact. There is a large amount of uncertainty and it’s unlikely there is going to be a frequent feedback loop. For augmenting these decisions one would benefit from a business mindset, communication skills, and an ability to extract … Read more A data team’s product is decisions

Talking to Python from Javascript: Flask and the fetch API

Now we have a working example we can expand it to include actual data. In reality, this could involve accessing a database, decrypting some information or filtering a table. For the purpose of this tutorial we create a data array from which we index elements: ######## Example data, in sets of 3 ############data = list(range(1,300,3))print … Read more Talking to Python from Javascript: Flask and the fetch API

Little useless-useful R function – Psychedelic Square root with x11()

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Yes, the name of today’s function is wacky, because it gives … Read more Little useless-useful R function – Psychedelic Square root with x11()

Introduction to MCMC

What are Monte Carlo approximations and how does the Metropolis algorithm work? Illustration of the Metropolis algorithm. Image by Author Probabilistic modelling is all the rage these days but there was always one thing that bugged me when I first learned about it. Many Bayesian modelling methods require the computation of integrals and any worked … Read more Introduction to MCMC

The essence behind an award-winning photo — an AI approach

It is possible to visualize both the filter and the feature maps. The filters are also an image that depicts a particular feature. Applying those filters lead to the feature maps. In essence, the shallower the layer is the more the feature map looks like the original input. In this article, I want to focus … Read more The essence behind an award-winning photo — an AI approach

5 types of plots that will help you with time series analysis

Photo by Isaac Smith on Unsplash And how to quickly create them using Python While starting any project related to time series (and not only), one of the very first steps is to visualize the data. We do so to inspect the data we are dealing with and learn something about it, for example: are … Read more 5 types of plots that will help you with time series analysis

Cheat sheet for implementing 7 methods for selecting the optimal number of clusters in Python

Select the optimal number of clusters based on multiple clustering validation metrics like Gap Statistic, Silhouette Coefficient, Calinski-Harabasz Index etc. Photo by Mehrshad Rajabi on Unsplash Segmentation provides a data driven angle for examining meaningful segments that executives can use to take targeted actions and improve business outcomes. Many executives run the risk of making … Read more Cheat sheet for implementing 7 methods for selecting the optimal number of clusters in Python

Start using Linux commands to quick analyze structured Data, not Pandas

Column-wise analysis Columns are important in structured data, and you cut/extract particular columns from the data and analyze them. We can use CUT commands to extract particular columns. cut -d'<delim>’ -f<col_num> <file> Here I am cutting column 5 and 6 (sex, and age) and the doing head over that. You can also provide a column … Read more Start using Linux commands to quick analyze structured Data, not Pandas

(Deep) Learning from Kaggle Competitions

I started using Kaggle seriously a couple of months ago when I joined the SIIM-ISIC Melanoma Classification Competition. The initial reason, I think, was that I wanted a serious way to test my Machine Learning (ML) and Deep Learning (DL) skills. At the time, I was studying for the Coursera AI4Medicine Specialization and I was … Read more (Deep) Learning from Kaggle Competitions

Read this to learn what it means to be a data scientist. Learn what a data scientist does. Advice and tips for the data science job interview,

A retrospective of my first two years (8–10 fiscal quarters and beyond) of data science In a million ways my data science career has not gone as planned. I have shared some of my biggest mistakes. And I have shared about some of my favorite days. This article is for anyone interested in understanding what … Read more Read this to learn what it means to be a data scientist. Learn what a data scientist does. Advice and tips for the data science job interview,

What does it take to secure a job as a Data Scientist?

A practical guide based on the actual job posting data Photo by Markus Winkler on Unsplash Are you aspiring to be a data scientist but not sure what it takes to secure a job? In this article, I am going to outline the expectation of recruiters from the potential candidates in terms of Tools specific … Read more What does it take to secure a job as a Data Scientist?

Generating Short Star Wars Text With LSTM’s

A demonstration of the working model can be seen below. Image by author In order to replicate this model, you need to download the code from here. Then, install all needed dependencies (it is very recommended to create a new virtual environment and install all packages on it) if you are using Linux with: pip … Read more Generating Short Star Wars Text With LSTM’s

Why VAE are likelihood-based generative models

Maximum Likelihood Estimation (MLE) is a fundamental tool for parameters estimation From Bayes Theorem Bayes Theorem, from https://www.saedsayad.com/naive_bayesian.htm we see the Likelihood is identified as P(X|C) but what does it mean? Strictly speaking, it represents a conditional probability hence what is the probability of X given C is true But what happens when C is … Read more Why VAE are likelihood-based generative models

Rapid Internationalization of Shiny Apps: shiny.i18n Version 0.2

Rapid Multi-Language Support for Shiny Apps Have you ever created a multilingual Shiny application? Chances are the answer is yes, if you are a big fan of Appsilon‘s blog and have read our first article on the topic. If that’s not the case – fear not – we’ll cover everything you need to know today about … Read more Rapid Internationalization of Shiny Apps: shiny.i18n Version 0.2

Linear Regressions for the Survey of Consumer Finances

How to deal with the SCF’s multiple datasets Credit: Michael Longmire Last month the Federal Reserve released their triennial survey on the state of households finances in the U.S. in 2019: the Survey of Consumer Finances (SCF). Although they provided a good summary of what’s changed since 2016, what is on everyone’s mind now is … Read more Linear Regressions for the Survey of Consumer Finances

Presidential Elections Forecast

My forecast relies on the historical data of the popular data in every state Thanks to Asma Barakat for helping me in gathering the needed data for this research! Many factors affect the election results as the COVID-19, impeachment, economy, unemployment rate, natural disasters’ response, climate change, foreign policy, people’s loyalty to their party, debates, … Read more Presidential Elections Forecast

Struggling with data imbalance? Semi-supervised & Self-supervised learning help!

To begin with, I would like to first summarize the main contribution of this article in one sentence: We have verified both theoretically and empirically that, for learning problems with imbalanced data (categories), using Semi-supervised learning — that is, using more unlabeled data; or, Self-supervised learning — that is, without using any extra data, just … Read more Struggling with data imbalance? Semi-supervised & Self-supervised learning help!

How much of your Neural Network’s Prediction can be Attributed to each Input Feature?

Neural networks are known to be black box predictors where the data scientist does not usually know which particular input feature influenced the prediction the most. This can be rather limiting if we want to get some understanding of what the model actually learned. Having this kind of understanding may allow us to find bugs … Read more How much of your Neural Network’s Prediction can be Attributed to each Input Feature?