Optimisation of a Weibull survival model using Optimx() in R

[This article was first published on R | Joshua Entrop, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In this blog post we will optimise a Weibull regression … Read more

Categories R Tags ExcerptFavorite

mapping congressional roll calls

[This article was first published on Jason Timm, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Introduction A bit of a depot for things-/methods- mapping with R & … Read more

Categories R Tags ExcerptFavorite

Please allow me to introduce myself: Torch for R

[This article was first published on RStudio AI Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Related Favorite

Categories R Tags ExcerptFavorite

Domain Expertise: What deep learning needs for better COVID-19 detection

The world probably doesn’t need another neural network, but it needs a coffee chat with those on the front lines. By now, you’ve probably seen a few, if not many, articles on how deep learning could help detect COVID-19. In particular, convolutional neural networks (CNNs) have been studied as a faster and cheaper alternative to … Read more

Learning to Rank for Information Retrieval: A Deep Dive into RankNet.

Machine Learning and Artificial Intelligence are currently driving innovation in the field of Computer Science and they are being applied on a multitude of fields across disciplines. However, traditional ML models can be still be broadly categorized into solutions of two types of problems. Classification — Which aims at labelling a particular instance of data … Read more

Building the Ultimate AI Agent for Doom using Dueling Double Deep Q-Learning

A Reinforcement Learning Implementation in Pytorch. Over the last few articles, we’ve discussed and implemented various value-learning architectures for the VizDoom environment, and examined their performance in maximizing reward. To summarize, these include: Overall, vanilla Deep Q-learning is a highly flexible and responsive online reinforcement learning approach that utilizes rapid intra-episodic updates to it’s estimations … Read more

Evolutionary Decision Trees: When Machine Learning draws its Inspiration from Biology

2.5. Mutation Mutations refer to small random choices made in individuals of a population. It is essential in ensuring genetic diversity and enabling the genetic algorithm to search a broader space. In the context of Decision Trees, it can be implemented by randomly change an attribute and split the value of a node randomly selected. … Read more

Model Lifecycle: From ideas to value

Value scoping, discovery, delivery, and stewardship Created by Authors based on Youtube video Monarch Butterfly Metamorphosis time-lapse FYV 1080 HD In Part 1 of this series we examined the key differences between software and models; in Part 2 we explored the twelve traps of conflating models with software; and in Part 3 we looked at … Read more

Business Intelligence Visualizations with Python

Installation process is pretty straight forward. Just open your terminal and insert the following command: pip install matplotlib A. Line Plot After having installed the library, we can jump on to plot creation. The first type we’re going to create is a simple Line Plot: # Begin by importing the necessary libraries:import matplotlib.pyplot as plt … Read more

Data Processing Example using Python

Just some of the steps involved in prepping a dataset for analysis and machine learning. Source: Image Created by Author Forbes’s survey found that the least enjoyable part of a data scientist’s job encompasses 80% of their time. 20% is spent collecting data and another 60% is spent cleaning and organizing of data sets. Personally, … Read more

MLflow Part 1: Getting Started with MLflow!

Helping you take your first step into the machine learning lifecycle flow with this handy tool Hello again friends! We’re back here with another quick tip, and because I do attempt to keep these posts quick, this is actually going to be part one in a series of tips related to MLFlow. In the spirit … Read more

But What is a Model?

A Wittgensteinian Approach to Data Science Planetarium from 1766, Photo by Sage Ross, Creative Commons The term model gets thrown around a lot. The word is ubiquitous to the point of lost meaning. The Wikipedia page alone shows the variety of usage of the word model, including statistics, astronomy, biology, product design, art, as well … Read more

Progress bars for Python with tqdm

Not long after I began working on machine learning projects in Python, I ran into computationally-intensive tasks that just took a long time to run. Usually this was associated with some kind of iterable process. A couple that immediately come to mind are (1) running a grid search on p, d, and q orders to … Read more

Training Better Deep Learning Models for Structured Data using Semi-supervised Learning

Deep learning is known to work well when applied to unstructured data like text, audio, or images but can sometimes lag behind other machine learning approaches like gradient boosting when applied to structured or tabular data.In this post, we will use semi-supervised learning to improve the performance of deep neural models when applied to structured … Read more

Latent Dirichlet Allocation: Intuition, math, implementation and visualisation

TL;DR — Latent Dirichlet Allocation (LDA, sometimes LDirA/LDiA) is one of the most popular and interpretable generative models for finding topics in text data. I’ve provided an example notebook based on web-scraped job description data. Although running LDA on a canonical dataset like 20Newsgroups would’ve provided clearer topics , it’s important to witness how difficult … Read more

Machine Learning Model Explanation using Shapley Values

Learn how to interpret a black box model using SHAP (SHapley Additive exPlanations) Photo by Frank Vessia on Unsplash Article Outline Why SHAP (SHapley Additive exPlanations) About Dataset Loading Dataset Model Fitting Shaply values estimation Variable Importance plot Summary plot Dependence Plot Force Plot Tutorial DataSet Why SHAP (SHapley Additive exPlanations)? The very common problem … Read more

Understanding Apache Parquet

Data Warehousing | Data Lake | Parquet Understand why Parquet should be used for warehouse/lake storage Apache Parquet is a columnar storage format available to any project […], regardless of the choice of data processing framework, data model or programming language.— https://parquet.apache.org/ This description is a good summary of this format. This post will talk … Read more

path.chain: Concise Structure for Chainable Paths

[This article was first published on krzjoa, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. path.chain package provides an intuitive and easy-to-use system ofnested objects, which represents different … Read more

Categories R Tags ExcerptFavorite

Slicing the onion 3 ways- Toy problems in R, python, and Julia

Between writing up my thesis, applying to jobs hire me! I’m quite good at programming, and the ongoing pandemic, I don’t really have time to write full blogposts. I have however decided to brush up my python skills and dive headfirst into Julia. As such, I like to answer the toy problems posted at fivethirtyeight’s … Read more

Categories R Tags ExcerptFavorite

Building apps with {shinipsum} and {golem}

In my previous blog post I showed youhow I set up my own Shiny server using a Raspberry Pi 4B. If you visited the followinglink you’ll be connecting to myRaspberry Pi and can play around with a Shiny app that I called golemDemo.It’s been quite a few months that I wanted to discuss this app: … Read more

Categories R Tags ExcerptFavorite

100 Time Series Data Mining Questions – Part 5

In the last post we managed to find similar patterns between two time series. For the next question, we will still be using the datasets available at https://github.com/matrix-profile-foundation/mpf-datasets so you can try this at home. The original code (MATLAB) and data are here. Now let’s start: If you had to summarize this long time series … Read more

Categories R Tags ExcerptFavorite

What is AI? A straight-forward introduction

Artificial Intelligence (AI) is a part of our daily lives — from language translation to medical diagnostics and driverless cars to facial recognition — it’s making more of an impact on industry and society every day. But what exactly is AI? Simply put, AI is a technology that replicates human intelligence through computers, systems or … Read more

Generalized Linear Models and Plots with edgeR – Advanced Differential Expression Analysis

[This article was first published on R – Myscape, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Generalized linear models (GLM) are a classic method for analyzing RNA-seq … Read more

Categories R Tags ExcerptFavorite

On Demand Materialized Views: A Scalable Solution for Graphs, Analysis or Machine Learning

Let’s create a simple example with some mock data. In this example we will aggregate generic posts and determine how many posts each profile has, then we will aggregate comments. If you are using the code snippets to follow this article, you will want to create a few data points following the style below. However … Read more

National Weekly Death Rates

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 library(tidyverse) library(covdata) rate_rank – … Read more

Categories R Tags ExcerptFavorite

Solving a Social Distancing Problem using Genetic Algorithms

“Social distancing” has become very popular these days but it is not always obvious how the rules can fit our daily life. In this story, we are going to study a social distancing problem and find solution to it using Genetic Algorithms. After setting the problem and its constraints, I’ll summarize the principles of Genetic … Read more

Why and How to use Cross Entropy

Working out the cross entropies of each observation shows that when the model incorrectly predicted 1 with a low probability, there was a smaller loss than when the model incorrectly predicted 0 with a high probability. Minimizing this loss function will prevent high probabilities from being assigned to incorrect predictions. To demonstrate why cross entropy … Read more

Convolution Neural Network Maths Intuition

Image by Author Forward Pass X is the input image, say (3*3 matrix) and Filter is a (2*2 matrix). Both will be convoluted to give output XX (2*2 matrix). Now, XX will be flattened and will be fed to a fully connected network with w (1*4 matrix) as weights, which will give an— Output. Finally, … Read more

Vectorizing code matters

I come from the world of MATLAB and numerical computing, where for loops are shorn and vectors are king. During my PhD at UVM, Professor Lakoba’s Numerical Analysis class was one of the most challenging courses I took and the deep knowledge of numerical code still sticks with me. My favorite example of a vectorization … Read more

Pytest for Data Scientists

A Comprehensive Guide to Pytest for your Data Science Projects Photo by Startup Stock Photos from Pexels It is fun to apply different python code to process your data in your notebook, but in order to make your code reproducible, you need to put them into functions and classes. When you put your code in … Read more

How to Plan and Organize a Data Science Project? | by Yin Zhang

Conducting a data science/analytics project always takes time and has never been easy. A successful and comprehensive analytics project is way beyond coding. Instead, it involves sophisticated planning and a large amount of communication. Photo by Octavian Dan on Unsplash What is the Life Cycle of an Analytics Project? To complete a data science/analytics project, … Read more

How to make your deep learning experiments reproducible and your code extendible

Lessons learned from building an open-source deep learning for time series framework. Photo by author (taken while hiking at the Cutler Coast Preserve in Machias ME) Note this is roughly based on a presentation I made back in February at the Boston Data Science Meetup Group. You can find the full slide deck here. I … Read more

Ultimate Pandas Guide — Joining data with Python

Photo by Laura Woodbury from Pexels Master the difference between “Merge” and “Join” Everyone who works in data knows this: before you build machine learning models or produce stunning visualizations, you have to get your hands dirty with data wrangling. And one of the core skills in data wrangling is learning how to join together … Read more

A Summer as a Data Scientist

A retrospective on my summer as a data scientist and how GSI Technology’s summer program breaks the internship status quo. GSI Technology. Reposted with Author’s Permission Data science is a field that can be hard to break into, especially if you are an undergraduate student. My name is Braden Riggs and some of you reading … Read more

Convolutional Neural Network: How is it different from the other networks? | by YANG Xiaozhou | Sep, 2020 | Towards Data Science

Roughly speaking, there are two important operations that make a neural network:1. Forward propagation2. Backpropagation Forward propagation This is the prediction step. The network reads the input data, computes its values across the network, and gives a final output value. But how does the network computes an output value? Let’s see what happens in a … Read more

Introducing TMS: a Trading Market Simulator

An easy to use trading simulator to test trading (ML/AI) algorithms and strategies on Python Simulation of AAPL on September 9th, 2020, using TMS. Sometime ago, I wrote an article on how to download stocks market data for free using Alpaca, a trading broker and API. I published this article because I had worked on … Read more

A gentle intro to Clojure

👋 It’s easiest to generate a Personal Access Token to use the API. At this point, it’s best to try and access the GitHub API to make sure our Authentication and URLs are correct. Adjust the def’s below to your own info/settings and paste them into the REPL: Promises The HttpKit (http/get) function returns a … Read more

Now it’s even easier to connect JetBrains IDEs to Amazon RDS or Redshift Databases

Customers can use database features with DataGrip and other premium JetBrains IDEs such as IntelliJ IDEA Ultimate, PyCharm Professional, WebStorm and Rider.  The AWS Toolkit for JetBrains is an open-source plugin lets you leverage the integrated development environment (IDE) for the creation, debugging, and deployment of software applications on Amazon Web Services. This new feature … Read more

Categories AWS ExcerptFavorite

How to do more with less data ?— Active learning

It goes without saying that choosing an evaluation set is the most important step in any machine learning process. This becomes even more crucial when it comes to active learning since this will be our measure of how well our model performance improves during our iterative labelling process. Furthermore, it also helps us decide when … Read more

AWS Copilot CLI launches v0.4 focused on autoscaling and operations

Today, the AWS Copilot CLI for Amazon Elastic Container Service (ECS) launched version 0.4.0. Starting with this release, you can enable autoscaling for services based on average CPU and memory utilization and provide a maximum and minimum number of tasks. AWS Copilot will also retain the service’s desired count after autoscaling occurred, so that if … Read more

Categories AWS ExcerptFavorite

Become an Expert at the Technical Interview — Part I

If you’re reading this, then you’re most likely going through the grind of preparing for technical interviews (software engineers, data scientists, etc). By now you should know that tech interviews are not like regular ‘old school’ interviews — we can’t just woo the hiring manager with our charm and talk out of our ass — … Read more

Logistic Regression for Binary Classification

Supervised Learning Methods in Machine Learning Image from ¹wikicommons In previous articles, I talked about deep learning and the functions used to predict results. In this article, we will use logistic regression to perform binary classification. Binary classification is named this way because it classifies the data into two results. Simply put, the result will … Read more

Free workshop on Deep Learning with Keras and TensorFlow

[This article was first published on Shirin’s playgRound, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Workshop announcement Because this year’s UseR 2020 in Munich couldn’t happen as … Read more

Categories R Tags ExcerptFavorite

Running an R Script on a Schedule: Overview

There are lots of rstats tutorials about creating beautiful plots, setting up shiny applications and even a few on setting up plumber APIs (but we could use more). However a lot of work consists of running a script without any interaction. This is an overview page for the tutorials I’ve created so far. This overview … Read more

Categories R Tags ExcerptFavorite

Kmeans Clustering of Penguins

[This article was first published on r on Joel Soroos, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. In today’s blog, I explore k-means clustering capabilities in R … Read more

Categories R Tags ExcerptFavorite

AWS Launch Wizard now supports SAP deployments with Red Hat Enterprise Linux Version 8.1

AWS Launch Wizard offers a guided way of sizing, configuring, and deploying AWS resources for SAP HANA and SAP HANA-based Netweaver systems with a purpose built, easy to use wizard.   On-boarding a new operating system version for an SAP application involves close collaboration between SAP and Operating System (OS) teams, who analyze SAP notes, … Read more

Categories AWS ExcerptFavorite