Load Libraries To work through the project, we will need the readxl and tidyverse packages. If you do not have them installed in your R environment, simply remove the “#” sign before the “install.packages…” lines of code . Once they are installed, you will not need to install them again on your machine. Here’s the … Read more Economics for Tech People — Supply (Part 2)
Wave power analysis from data recorded on offshore buoys located in New Hampshire and Rhode Island. Image by author Although most people don’t think of locations like New Hampshire and Rhode Island as popular destinations for surfing, wave energy data obtained from the National Data Buoy Center shows that on occasion there is significant wave … Read more Analysis of Wave Power
Predicting a paradigm shift in the (programming) world, and how to not to get left behind in its wake. Lego! (Photo by Kelly Sikkema on Unsplash) Are you a Lego fan? Lego was my first true obsession. Although I’m a recovering legoholic these days, I once had quite the collection that filled a gargantuan sack … Read more The nocode revolution is coming — are you ready for it?
Many companies still use the traditional Target Stock Level (TSL) model to drive stock decisions. Which is a complex way of saying sell one get one — the literal meaning of the world Replenishment. And, the highly uncertain nature of demand requires inventory managers to operate on much higher safety stock margins than otherwise necessary. … Read more 94% Perfect: the Surprising Solution to the $200 Billion Inventory Problem
Reading and Manipulating Genomic Data in Python Photo by National Cancer Institute on Unsplash PySnpTools is a Python library of reading and manipulating genomic data in Python. It allows users to efficiently select and reorder individuals (rows) and SNP locations (columns). It then reads only the data selected. Originally developed to support FaST-LMM — a … Read more PySnpTools
Do we have reason to be a little less bewildered by so-called grandmother cells in the context of deep neural networks? A familiar hypothesis in deep learning is that a single higher-layer unit of a network may correspond to a complex semantic entity, such as a person or a specific type of dog [2, 9]. … Read more The question of single unit semantics in deep networks
If you even slightly follow the NLP world, or even the ML news you have most likely come across Google’s BERT model or one of its relatives. If you haven’t and still somehow have stumbled across this article, let me have the honor of introducing you to BERT — the powerful NLP beast. BERT stands … Read more BERT Explained: What it is and how does it work?
We created a couple demos for this presentation, and we wanted to share it out as a bit of a “hello, world” exercise for the tiler! We have loaded the data and the queries/code onto a Github repository and you’re welcome to give it a whirl. When building these for the first time, we found … Read more A Quick Experiment with the CARTO BigQuery Tiler!
Photo by Hunters Race on Unsplash Interviews are horrifying. The nervousness, the ability to deal with pressure, and the nagging voices in our heads whether the job which we are appearing would turn out in our favor or yet another cursed day making our goal to get that offer letter go far away from our … Read more How Medium Helped Me Land My First Job In Data Science
Ever wondered how much your HDB is worth besides asking your housing agent? Or wondered what are the main features that have a great impact on your HDB? Seeing that over 92.2% of Singaporean own HDB, I believe these are common questions that most Singaporeans have had, and it would be good to see if … Read more Predicting HDB Housing Prices Using Neural Networks
In 2018, NVIDIA published a groundbreaking paper that manages to generate high-quality images (1024×1024) titled “A Style-Based Generator Architecture for Generative Adversarial Networks”. One of the novelty is that it disentangles the latent space which allows us to control the attributes at a different level. For example, the lower layer would be able to control … Read more Animating Yourself as a Disney Character with AI
A discussion of Arabic natural language processing (NLP) for social media text, with code examples and in-depth analysis of the cutting-edge technology driving the most recent advancements. Multi-lingual word cloud from tweets about the Beirut explosion (August 2020). Image by Author. Natural language processing (NLP) is not a new discipline, its roots date back to … Read more Machine learning advancements in Arabic NLP
[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. For classification problems I argue one of the biggest … Read more Your Lopsided Model is Out to Get You
[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The R/Pharma virtual conference this year was held October 13-15th, 2020. R/Pharma … Read more R/Pharma October 2020
and understanding how neural networks can be used to indirectly solve problems Photo by Hassan Pasha on Unsplash In most chess engines, a searching algorithm along with a heuristic function gives the chess AI the main insight into the best moves to play. The bulk of the programming and most of the “brains” behind this … Read more Creating a Chess Engine with Deep Learning
Of course programming languages play an important role, although their relevance is often misunderstood. Having the right programming language in your CV may eventually be one of the deciding factors for getting a specific job or project. This is a good example where the relevance of programming languages might be misunderstood, especially in the context … Read more Spark vs Pandas, part 3 — Scala vs Python
The bias-variance dilemma is a widely known problem in the field of machine learning. Its importance is such, that if you don’t get the trade-off right, it won’t matter how many hours or how much money you throw at your model. In the illustration above, you can get a feel for what bias and variance … Read more Why you should be plotting learning curves in your next machine learning project
How to do typical tasks in data analysis using both Photo by Coffee Geek on Unsplash Pandas is a Python library for data analysis and manipulation. SQL is a programming language that is used to communicate with a database. Most relational database management systems (RDBM) use SQL to operate on tables stores in a database. … Read more Pandas vs SQL – Explained with Examples
Writing a custom implementation of a popular algorithm can be compared to playing a musical standard. For as long as the code reflects upon the equations, the functionality remains unchanged. It is, indeed, just like playing from notes. However, it lets you master your tools and practice your ability to hear and think. In this … Read more Multi-Layer Perceptron & Backpropagation — Implemented from scratch
[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. At the last Raddicts Paris Meetup, Romain Francois (to be followed on twitter … Read more Modify RStudio prompt to show current git branch
It is impossible to understate the profound impacts the events of 2020 have had on the real estate industry. Buildings of all kinds, from commercial offices to retail, hospitals, manufacturing plants, and more, remain in need of transformative solutions that will enable employees to return to work safely. In times of crisis, it is innovation … Read more Microsoft unlocks the full potential of the smart building ecosystem
Dealing with the impermanence of public data sets Credit: Ula Kuźma One worry that I always have when downloading data sets off the internet is their impermanence. Links die, data changes, ashes to ashes, dust to dust. That’s why I’ve been introducing the Wayback Machine into my workflow. But even then, it’s tough to be … Read more Archiving and Logging Your Use of Public Data
— — — — — — — An economist’s combination of skills produce a unique type of data scientist. — — — — — — — Photo by Stephen Cook on Unsplash Starting out as a data scientist, I struggled to understand the value economics brings. Now that I understand that data science is far … Read more An Economist’s Value in Data Science
APK Code Data Photo by Kevin Ku on Unsplash From malware detection to AI, code data is an often overlooked form of data you might want to consider for a project or challenge. But what is code data and how do we work with it? Code data comes in a variety of forms in line … Read more Three Often Overlooked Sources of Data for your Next Passion Project
As you can see, a list uses more than double the memory compared to a Numpy array. To install Numpy, type the following command pip install numpy To import the NumPy library you type the following code. import numpy as np The ‘as np’ is not necessary. It allows you to use ‘np’ instead of … Read more Use Numpy for Statistics and Arithmetic Operations in 2020
tldr; this is a primer in the domain of unsupervised techniques in NLP and their applications. It begins with the intuition behind word vectors, their use and advancements. This evolves to the centerstage discussion about the language models in detail — introduction, active use in industry and possible applications for different use-cases. In the fledgling, … Read more Unsupervised NLP : Methods and Intuitions behind working with unstructured texts
Hi everyone! This is the second unsupervised machine learning algorithm that I’m discussing here. This time, the topic is Principal Component Analysis (PCA). At the very beginning of the tutorial, I’ll explain the dimensionality of a dataset, what dimensionality reduction means, main approaches to dimensionality reduction, reasons for dimensionality reduction and what PCA means. Then, … Read more Principal Component Analysis (PCA) with Scikit-learn
The overall picture of Fermi Net is as follows. The output is a wave function ψ for several nuclei (position R) and electrons (position r). The characteristic feature is that each electron — all-nucleus interaction and each electron — electron interaction is propagated in a corresponding (position-wise) manner. The overall picture of Fermi Net. The … Read more Deep learning to perform quantum chemical calculations.
Strategic decisions are luxury, bespoke, and artisanal decisions; decisions that are made infrequently but have a large impact. There is a large amount of uncertainty and it’s unlikely there is going to be a frequent feedback loop. For augmenting these decisions one would benefit from a business mindset, communication skills, and an ability to extract … Read more A data team’s product is decisions
[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Yes, the name of today’s function is wacky, because it gives … Read more Little useless-useful R function – Psychedelic Square root with x11()
What are Monte Carlo approximations and how does the Metropolis algorithm work? Illustration of the Metropolis algorithm. Image by Author Probabilistic modelling is all the rage these days but there was always one thing that bugged me when I first learned about it. Many Bayesian modelling methods require the computation of integrals and any worked … Read more Introduction to MCMC
It is possible to visualize both the filter and the feature maps. The filters are also an image that depicts a particular feature. Applying those filters lead to the feature maps. In essence, the shallower the layer is the more the feature map looks like the original input. In this article, I want to focus … Read more The essence behind an award-winning photo — an AI approach
Photo by Isaac Smith on Unsplash And how to quickly create them using Python While starting any project related to time series (and not only), one of the very first steps is to visualize the data. We do so to inspect the data we are dealing with and learn something about it, for example: are … Read more 5 types of plots that will help you with time series analysis
A technical guide on SQL and 10 problems to check out Image by Alexandra Acea on Unsplash When you hear “data scientist” you think of modeling, insightful analysis, machine learning, and other cool buzzwords. So, let’s not beat around the bush: databases and SQL are not the most “fun” parts of being a data scientist. … Read more Data Science Interviews: SQL
Select the optimal number of clusters based on multiple clustering validation metrics like Gap Statistic, Silhouette Coefficient, Calinski-Harabasz Index etc. Photo by Mehrshad Rajabi on Unsplash Segmentation provides a data driven angle for examining meaningful segments that executives can use to take targeted actions and improve business outcomes. Many executives run the risk of making … Read more Cheat sheet for implementing 7 methods for selecting the optimal number of clusters in Python
Column-wise analysis Columns are important in structured data, and you cut/extract particular columns from the data and analyze them. We can use CUT commands to extract particular columns. cut -d'<delim>’ -f<col_num> <file> Here I am cutting column 5 and 6 (sex, and age) and the doing head over that. You can also provide a column … Read more Start using Linux commands to quick analyze structured Data, not Pandas
I started using Kaggle seriously a couple of months ago when I joined the SIIM-ISIC Melanoma Classification Competition. The initial reason, I think, was that I wanted a serious way to test my Machine Learning (ML) and Deep Learning (DL) skills. At the time, I was studying for the Coursera AI4Medicine Specialization and I was … Read more (Deep) Learning from Kaggle Competitions
A retrospective of my first two years (8–10 fiscal quarters and beyond) of data science In a million ways my data science career has not gone as planned. I have shared some of my biggest mistakes. And I have shared about some of my favorite days. This article is for anyone interested in understanding what … Read more Read this to learn what it means to be a data scientist. Learn what a data scientist does. Advice and tips for the data science job interview,
[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Network plots can be hit or miss. However, the visNetwork package greatly simplifies … Read more Getting started with network plots
[This article was first published on R on Andres’ Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. For reasonably experienced R users this simple topic might not … Read more Customizing your package-library location
A practical guide based on the actual job posting data Photo by Markus Winkler on Unsplash Are you aspiring to be a data scientist but not sure what it takes to secure a job? In this article, I am going to outline the expectation of recruiters from the potential candidates in terms of Tools specific … Read more What does it take to secure a job as a Data Scientist?
The next thing I look for is text fields. When I use text fields, I either use them as categories, discussed next, or as plain text that will be displayed or used for additional information. But what happens if you are using your text fields for modeling? You may need to consider different types of … Read more 25 Questions to Ask as You Clean Data
A demonstration of the working model can be seen below. Image by author In order to replicate this model, you need to download the code from here. Then, install all needed dependencies (it is very recommended to create a new virtual environment and install all packages on it) if you are using Linux with: pip … Read more Generating Short Star Wars Text With LSTM’s
Maximum Likelihood Estimation (MLE) is a fundamental tool for parameters estimation From Bayes Theorem Bayes Theorem, from https://www.saedsayad.com/naive_bayesian.htm we see the Likelihood is identified as P(X|C) but what does it mean? Strictly speaking, it represents a conditional probability hence what is the probability of X given C is true But what happens when C is … Read more Why VAE are likelihood-based generative models
Rapid Multi-Language Support for Shiny Apps Have you ever created a multilingual Shiny application? Chances are the answer is yes, if you are a big fan of Appsilon‘s blog and have read our first article on the topic. If that’s not the case – fear not – we’ll cover everything you need to know today about … Read more Rapid Internationalization of Shiny Apps: shiny.i18n Version 0.2
How to deal with the SCF’s multiple datasets Credit: Michael Longmire Last month the Federal Reserve released their triennial survey on the state of households finances in the U.S. in 2019: the Survey of Consumer Finances (SCF). Although they provided a good summary of what’s changed since 2016, what is on everyone’s mind now is … Read more Linear Regressions for the Survey of Consumer Finances
My forecast relies on the historical data of the popular data in every state Thanks to Asma Barakat for helping me in gathering the needed data for this research! Many factors affect the election results as the COVID-19, impeachment, economy, unemployment rate, natural disasters’ response, climate change, foreign policy, people’s loyalty to their party, debates, … Read more Presidential Elections Forecast
To begin with, I would like to first summarize the main contribution of this article in one sentence: We have verified both theoretically and empirically that, for learning problems with imbalanced data (categories), using Semi-supervised learning — that is, using more unlabeled data; or, Self-supervised learning — that is, without using any extra data, just … Read more Struggling with data imbalance? Semi-supervised & Self-supervised learning help!
Neural networks are known to be black box predictors where the data scientist does not usually know which particular input feature influenced the prediction the most. This can be rather limiting if we want to get some understanding of what the model actually learned. Having this kind of understanding may allow us to find bugs … Read more How much of your Neural Network’s Prediction can be Attributed to each Input Feature?