My workflow for open and reproducible science as an academic researcher in biomedicine

RMarkdown and R projects The output from snakemake is usually not the end but generates tables or files that are ready for further processing with R. When starting a new project, I start off by making a new *Rproj and structuring my folder like this: $ cd ~/tutorial$ treetutorial├── .git│ └── HEAD├── .gitignore├── code│ ├── … Read more

Highlights from Data + AI Summit NA 2021

One of the biggest conferences in the data field — Data + AI Summit North America 2021 happened last week and this time I didn’t contribute with my own talk, but the more I enjoyed the sessions as a listener. In this short report, I want to summarize my notes related to the new features … Read more

Neural Network Inference on FPGAs

11. Continue to the instance selection step. Choose the m5.xlarge instance type and click the “Next: Configure instance details” button. 12. Choose the created “FpgaDevRole” for the IAM role entry. Click “Add storage”. 13. Delete the additional EBS volume and raise the size of the root partition to 200GB. 14. Click “Review and Launch” and … Read more

Host Shiny Apps with Docker

[This article was first published on R – Hosting Data Apps, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. You learned about Shiny, Docker, how to dockerize Shiny … Read more

Categories R Tags ExcerptFavorite

Debugging Tips for Neural Networks

Often times the bottleneck in a neural network-based project isn’t the network implementation. Rather, after you’ve written all the code and tried a whole bunch of hyperparameter configurations, sometimes the network will just not work. I’ve been there before. After some time dealing with finicky networks, I’ve collected a few methods that have helped me … Read more

2021-01 Accessing ‘grid’ from ‘ggplot2’

[This article was first published on R – Stat Tech, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. This report describes the ‘gggrid’ package, which provides a convenient … Read more

Categories R Tags ExcerptFavorite

Explainable Deep Neural Networks

The emerging subject of deep learning mathematical analysis [1] has been tasked with answering some “mysterious” facts that appear to be inexplicable using traditional mathematical methodologies. They are attempting to comprehend what a neural network actually does. Deep Neural Networks (DNN) transform data at each layer, producing a new representation as output. DNN attempts to … Read more

3 Data-Backed Ways To Significantly Speed Up Your MySQL Bulk Inserts

In Data Science projects, a common last step in the data pipeline is to persist the result into a database (e.g. MySQL). The result of this data pipeline is usually big, so optimizing the writes into the database is important to achieve the acceptable pipeline latency. I’ve benchmarked and analyzed many MySQL bulk insert setups … Read more

ivreg: Two-stage least-squares regression with diagnostics

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The ivreg function for instrumental variables regression had first been introduced in … Read more

Categories R Tags ExcerptFavorite

Write Your Own Julia

Every time Julia is instantiated, it runs a series of files that are loaded into your new Julia environment. A great example of one of these files is the Base.jl file. All of the methods and types that are available in Julia’s base are stored here, and whenever you start Julia, they are automatically there … Read more

Questions I’ve Been Asked By Machine Learning Clients

“Less than you think.” Clients were always surprised and happy to hear this. Note that we weren’t building models to detect financial fraud or predict the weather. Problems were often simple (ie: pricing a new product or estimating the risk of X), and the data to solve them readily available. I’ve seen classification and regression … Read more

Smartphone for Activity Recognition

A modern smartphone is equipped with sensors such as an accelerometer and gyroscope to give advanced capabilities and facilitate a better user experience. The accelerometer in a smartphone is used to detect the orientation of the phone. The gyroscope adds an additional dimension to the information supplied by the accelerometer by tracking rotation or twist. … Read more

An Introduction To Linear Algebra In Julia

Before we get into creating algebraic expressions in Julia, we first need to go over the different types that we might see when doing linear algebra and the differences between them. The first on this list of types is a vector. A vector in Julia is the same exact concept as a list in Python, … Read more

Data science in cities: a story

Photo by NASA on Unsplash Question: How can data science be used in helping us understand developing cities better? Hello Medium! Data science is an interdisciplinary field, with most of its applications still undeveloped. One of the areas data science can help improve is urban planning and development. For instance, I have always been interested … Read more

Tuning XGBoost with XGBoost: Writing your own Hyper Parameters Optimization engine

As you probably know if you are familiar with Data Science, Machine Learning, Toward Data Science or my previous post on the subject, fine-tuning your model is crucial for getting the best performances. You simply cannot rely on default values. As Satyam Kumar states in his last article, several methods exist to perform this optimization. … Read more

more air for MCMC

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Aki Vehtari, Andrew Gelman, Dan Simpson, Bob Carpenter, and Paul-Christian … Read more

Categories R Tags ExcerptFavorite

Clustering on numerical and categorical features.

Using Gower Distance in Python. Photo by Munro Studio on Unsplash During the last year, I have been working on projects related to Customer Experience (CX). In these projects, Machine Learning (ML) and data analysis techniques are carried out on customer data to improve the company’s knowledge of its customers. Recently, I have focused my … Read more

Keep your code clean using Black & Pylint & Git Hooks & Pre-commit

Git hooks are defined scripts that are launched when a git action is run.There are two types of hooks: Client-side hooks: are run after committing and merging Server-side hooks: are run on network operations, after pushing commits for example In this article we will focus on client-side ones which workflow can be described by the … Read more

Tips And Tricks For Data Scientists Vol.8

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. We have started a series of articles on tips and … Read more

Categories R Tags ExcerptFavorite

Build a Conversational Assistant with Rasa

Have you ever wished to have a personal assistant to answer repetitive messages? You might be hesitant to do so because you don’t know where to start. It turns out that creating a conversational assistant doesn’t need to be difficult. In this article, I will show you how to create one for yourself using Rasa. … Read more

Rock Containerized GPU Machine Learning Development with VS Code

Never struggle with broken infrastructure again Photo by Fab Lentz on Unsplash Running machine learning algorithms on GPUs is a common practice. Although there are cloud ML services like Paperspace and Colab, the most convenient/flexible way to prototype is still a local machine. Since the beginning of machine learning libraries (e.g., TensorFlow, Torch and Caffe), … Read more

GooglyPlusPlus2021 is now fully interactive!!!

GooglyPlusPlus2021 is now fully interactive. Please read the below post carefully to see the different ways you can interact with the data in the plots. There are 2 main updates in this latest version of GooglyPlusPlus2021 a) GooglyPlusPlus gets all ‘touchy, feely‘ with the data and now you can interact with the plot/chart to get … Read more

Categories R Tags ExcerptFavorite

Mario Kart 64 World Records

[This article was first published on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. (Tidy Tuesday is a project to supply weekly data sets for R users … Read more

Categories R Tags ExcerptFavorite

5 Must-Know Operations on Python Sets

2. Updating a set Updating a set with another set means adding the elements in the second set to the first set. Consider the following two sets. myset = set([0, 1, 2, 3, 5])myotherset = set([3, 4, 5, 6, 7]) We can update “myset” with “myotherset” as follows: myset.update(myotherset)print(myset){0, 1, 2, 3, 4, 5, 6, … Read more

Tips For Creating A Data Culture

This should be obvious to many however, sometimes (often!) people are lazy. The real goal here is to drive out this kind of laziness. If there’s a (relatively) simple way to automate a process, then it needs to be done. Putting the onus on Mike gives him responsibility over his own data and instantly sends … Read more

Amazon RDS for Oracle now supports April 2021 Patch Set Update (PSU) for Oracle Database 12.1

Amazon RDS for Oracle now supports the April 2021 Patch Set Update (PSU) for Oracle Database 12.1. April 2021 Release Update (RU) for Oracle Database 12.2, 18c, and 19c have already been launched. Oracle PSUs contain bug fixes and other critical security updates. Beginning with Oracle Database version 12.2.0.1, Amazon RDS for Oracle supports Release … Read more

Categories AWS ExcerptFavorite

Announcing Amazon CloudWatch Resource Health

Amazon CloudWatch Resource Health is a new feature that enables you to automatically discover, manage, and visualize the health and performance of Amazon Elastic Compute Cloud (Amazon EC2) hosts across your applications in a single view. With Resource Health, you can visualize the health of your Amazon EC2 hosts in a map (or list) view … Read more

Categories AWS ExcerptFavorite

Amazon CloudWatch Logs announces Dimension support for Metric Filters

Amazon CloudWatch Logs announces Dimension support for Metric Filters. CloudWatch Logs Metric Filters allow you to create filter patterns to search for and match terms, phrases, or values in your CloudWatch Logs log events, and turn these into metrics that you can graph in CloudWatch Metrics or use to create a CloudWatch Alarm. Now with … Read more

Categories AWS ExcerptFavorite

Tips for the Data Science Job Search

Job hunting is tough. Here are some tips to make it a bit easier. Photo by Marten Newhall on Unsplash Getting a data science job can be tough. At the junior level, there is a ton of competition due to the meteoric rise in the popularity of data science as a field. I’ve mentored and … Read more

Introducing AWS App Runner Integration in the AWS Toolkit for JetBrains IDEs

The AWS Toolkit for JetBrains now provides developers with convenient IDE functionality to create and manage deployments from their code or image repositories using AWS AppRunner. AWS App Runner that was recently announced  is a new service that makes it easy for customers without any prior containers or infrastructure experience to build, deploy, and run containerized … Read more

Categories AWS ExcerptFavorite

Anthos 101 learning series: All the videos in one placeAnthos 101 learning series: All the videos in one placeDeveloper Advocate, Google

Do you need to develop, run and secure applications across your hybrid and multicloud environments? Look no further than Anthos, our managed application platform that extends Google Cloud services and engineering practices to your environments so you can modernize apps faster and establish operational consistency across them.  To help you get started, we created the … Read more

Bioconductor Asia Membership Increasing Due to Going Virtual

[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. R Consortium talked to Bioconductor Asia co-organizer Matt Ritchie about the upcoming … Read more

Categories R Tags ExcerptFavorite

6 businesses transforming with SAP on Google Cloud6 businesses transforming with SAP on Google CloudManaging Director for SAP, Google Cloud

Rodan + Fields: Achieving business continuity for retail workloads Since its founding in 2002, Rodan + Fields, one of the leading skincare brands in the U.S., has been delighting customers worldwide with its innovative product portfolio. Recently, however, after taking stock of its pre-existing IT infrastructure, Rodan + Fields realized it needed a more modern, … Read more

Working with files and folders in R-Ultimate Guide

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Working with files and folders in R, In this tutorial, we … Read more

Categories R Tags ExcerptFavorite

Alzheimer Diagnosis with Deep Learning: Model Implementation

Recommendations After a thorough analysis of the results and comparison with other important work of recent years, a series of recommendations can be made to data scientists that begin working with AI based Alzheimer’s diagnostic systems. First, it should be noted that the use of convolutional neural networks should be the immediate option, since they … Read more

Tooltips with Python’s Matplotlib

Tooltips Great! We know how to add and modify elements in our plot and detect the movement of the cursor. We’ll create a blank annotation and check if the mouse position is over one of the plotted points. When it is, we change the text, position, and visibility of the annotation accordingly. plt.close(‘all’)fig, ax = … Read more

Using {pagedown} in Docker

[This article was first published on R | datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I’m building an automated reporting system which generates PDF reports. My … Read more

Categories R Tags ExcerptFavorite

A comparison of terra and raster packages

[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. The terra package looks designed to replace an old favourite raster. It … Read more

Categories R Tags ExcerptFavorite

A forecasting tool (API) with examples in curl, R, Python

This post is about a predictive analytics tool (in beta version) I started building during the first lockdown in March 2020. I used R and Python for this purpose, and more specifically Flask and rpy2. It’s been a pretty cool and instructive experience for me, of having both R and Python interacting into a web … Read more

Categories R Tags ExcerptFavorite

Understanding *args and **kwargs in Python

Learn how to pass variable number of arguments to functions in Python Photo by Chris Ried on Unsplash If you are a beginning Python programmer, you might come across function declarations with parameters that look like this: def do_something(num1, num2, *args, **kwargs): The * and the ** operators above allow you to pass in variable … Read more

Making Decisions with Data Science

A simple example on how you can optimize business decisions using an algorithm developed for use in chemical physics. By turning a real word business decision into a function to minimise, we can apply data science right where it will have the biggest impact. Just beware of local minima! Image by Susanne Stöckli from Pixabay. … Read more

Integrating Eventarc and WorkflowsIntegrating Eventarc and Workflows Developer Advocate

One limitation of Eventarc is that it currently only supports Cloud Run as targets. This will change in the future with more supported event targets. It’d be nice to have a future Eventarc trigger to route events from different sources to Workflows directly.  In absence of such a Workflows enabled trigger today, you need to … Read more

The State & Local Government tech tightrope: Balancing COVID-19 impacts and the road aheadThe State & Local Government tech tightrope: Balancing COVID-19 impacts and the road aheadDirector of State and Local Government & Canada Public Sector, Google Cloud

State and local government (SLG) agencies are reeling from a combination of unbudgeted COVID-related expenses and reduced tax revenue caused by unemployment and business closures. Any way you look at it, the situation is challenging. To understand how SLG agencies are coping, Google Cloud collaborated with MeriTalk to survey 200 SLG IT and program managers, … Read more

Batch Norm Explained Visually — Why does it work

HANDS-ON TUTORIALS, INTUITIVE DEEP LEARNING SERIES A Gentle Guide to the reasons for the Batch Norm layer’s success in making training converge faster, in Plain English Photo by AbsolutVision on Unsplash The Batch Norm layer is frequently used in deep learning models in association with a Convolutional or Linear layer. Many state-of-the-art Computer Vision architectures … Read more