Cloud solutions rule the world of modern computing. Even the biggest players use solutions provided by Amazon (AWS stands for Amazon Web Services), Google or other cloud providers instead of establishing their own infrastructure. Such a solution saves time (and money), but the number of tasks that can be transferred outside is bigger than just using external servers. In fact, we can use various serverless solutions to deploy our application (e.g. Google App Engine), analyse real-time streaming data (e.g. Amazon Kinesis) and solve many other problems.
What is AWS Lambda?
Not only big tasks can be done without creating proper instances, buckets, containers, etc.; small, but frequent jobs can also be done without setting up an instance and keeping it “alive” waiting for requests. For such cases, Amazon created Lambda. AWS Lambda is a serverless service for performing small (up to 15 minutes) tasks that can occur very frequently. Lambda can be triggered by almost any event performed on the AWS service (e.g. new data uploaded into S3 Bucket) and its result can be used in almost any AWS service (e.g. you can load results into Amazon Redshift data warehouse). But the most impressive feature of AWS Lambda is that you do not have to care about the number of events! It will assign as many resources as needed and nothing more than that is required, so you will pay only for the actually used resources. And if you want just to test Lambda, Amazon provides you 1 million free requests and 400,000 GB-seconds of compute time per month.
Lambda also has some restrictions. As was mentioned previously, the time for a single task is limited to 15 minutes. Also, memory is limited to 3 GB. Finally, AWS Lambda by default supports only a limited number of frameworks, including Java, Python, Go, Ruby, C#, Node.js and PowerShell. Fortunately, the latter is no longer an issue, since in December 2018, Amazon introduced custom runtimes for AWS Lambda. This allows you to use almost any programming language, including R.
Let’s say you have an ML model with hundreds of users, each one requesting predictions and uploading data. Of course you cannot just put the raw data inside the model, you have to pre-process it. You might have to normalize the values, and/or take care of missing data points. Using R for pre-processing is a good option. With an AWS Lambda + R combination, you don’t have to worry about the number of requests — each request is pre-processed separately and instantly, in parallel. It’s like going to the supermarket and every customer gets to walk straight to the counter — no waiting. And you don’t have to worry about creating and supporting the infrastructure, which means one less job to do for you.
How to use R in AWS Lambda
There are two ways to include your custom runtime (an environment with a framework you would like to use) in AWS Lambda. You can add it to your function code or you can provide it as a layer. We will focus on the latter. A layer is a ZIP archive, in which you can add all kinds of dependencies, including runtimes. You can provide up to 5 layers in your Lambda function, and, what is probably most important, you can reuse your layers in different Lambdas.
The key idea behind layers is that the Lambda mechanism looks for required dependencies starting from the first one in provided order. Such behaviour allows you to use separate layers with different parts of the environment required by your function. As a first layer, you should choose the one that contains the basic framework environment – in this particular case, we will need a layer with R (and all dependencies). If you need some additional packages, not included in the base layer, you can add them as an additional one. Sounds easy, right?
Layers in R
The biggest concern you may have now is how to get the required layers. Fortunately, reusability of layers extends beyond a single user sandbox. You can share your layers on various levels, allow access for a restricted group of people (e.g. your team) or make a layer public and accessible by the whole community. There are various custom runtimes for different frameworks that you can use in your project. At Appsilon Data Science Consulting, we have also created layers that can be accessed by the community, and – guess what – they contain R.
You can check the list of our R layers in the GitHub repository. But how can you use them? It’s extremely simple and you can do it in a few steps
- First, create a new Lambda function. Remember to choose “custom runtime”!
- Second, in the code editor, remove bootstrap and hello.sh files. Bootstrap file is one of the elements of the runtime and we will provide it in our R layer; hello.sh contains a Lambda “Hello world” function in the shell — but you want to use R, don’t you?
- Third, create a new script file and name it. Remember to provide the ‘.R’ file extension! Put your R code inside. It must have the shape of a function!
- Fourth, in the Handler field provide script name (without extension) followed by a dot and a function name (e.g. my_script.my_function).
- Fifth, you will have to change your function timeout. The default of 3 seconds is too short for R.
- Finally, add layers with an R runtime. Adding them is pretty simple if you know the ARN of a layer. For example,
is an ARN of the base R layer created by Appsilon, accessible in the eu-central-1 region (Frankfurt). Be aware that each layer is available only in a single region. You can provide it in the layers section. If you want to add more than one layer, remember that the one with the runtime has to be first!
That’s it. Congratulations, you have created an R function in AWS Lambda. To be sure that it works properly, you can use the included test tool. Just provide the proper function arguments in a JSON format.
Create Your Own Environment
Building your own environment for a custom language like R takes several steps and may be quite difficult, especially if that it is not a part of everyday routine. Just to make our lives easier, at Appsilon we decided to create a unified workflow for that, and we want to share it with the community. With the workflow you can easily create your own runtime with your choice of R version and included packages. How to do that? You can find detailed instructions in our repository, but here you can get a general view.
The only thing that you have to do is configure your credentials in AWS CLI tool and create a cryptographic key to be able to connect with the EC2 instances used in the process of preparing layers. After that, our scripts will do the work for you, and you will only have to publish your layers.
To build custom runtime in AWS Lambda, you have to use a specific instance image, but don’t worry, our workflow will choose it for you. If you want to create a base R layer, just provide a path to your key and the script will set up an instance, install R and download an archive with R installation. Another script will add some additional required files (remember the bootstrap file?) and the last thing to do will be just to publish the layer.
Creating a layer with R packages is even more simplified. To do that you don’t need to install R because we have done that for you! Just use our AMI image with preinstalled R (AMI id: ami-0a1147e8e86aa6175) and add your packages. How to do that? Again, provide your AMI ID and names of the packages to our script and it will do the work for you. It will create the instance, install packages and download the archive. And again, just publish it as a layer. Of course, you may want to create your own AMI with a specific version of R, but you probably will not be surprised that you can do it with our scripts.
Give It a Try!
As you can see, using R in AWS Lambda can be extremely simple. Maybe you should consider adding it to your pipeline.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…