DevOps is the close sibling of the Cloud computing paradigm that helps you take your applications from the developers’ sandbox to the cloud-based production environment at the click of a button. At a high level, the DevOps process covers the CI/CD pipelines, security assessments, version management, release and deployment, performance and reliability mechanism, and other non-functional aspects of an application operation.
Lambda is a serverless offering from AWS that follows the Function-as-a-Service (FaaS) paradigm. It is an excellent platform with native support for multiple programming languages and runtimes, such as Node.js, Python, Ruby, Java, GO, .Net, and other custom installed runtimes.
When looking at AWS lambda from a development team’s perspective, there are enough significant reasons for adoption, such as homogenous performance, on-demand scalability, low cost, resource demand spikes, and more. However, you also need to consider the perspective of the DevOps team that is going to ensure the operation of your serverless offerings to your internal and external users.
Version and Release Management
The biggest struggle of the teams using AWS Lambda is about managing the version of the function that is in development that need deployed in production. Thankfully, Lambda provides 2 powerful concepts that together help in managing the versions accurately: Lambda versions and aliases. Lambda versions are immutable snapshots of the deployed functions. Aliases are the named references to any Lambda version. The good thing is that more than one aliases can point to the same version.
How does it all work together? When the developer commits a code to the source code versioning repository, such as Git, the AWS CLI pushes the code change to Lambda. Dev, QA, and Prod are specific aliases of existing versions. When a new version is available, the alias is simply modified to point to the $Latest version of the Lambda function. The change in the version is completely transparent to the users who continue to use the alias assigned to them. When the $Latest version of the Lambda is ready to be pushed to production, the Prod alias is updated to point to it and the end-users start using it.
A well-crafted CI/CD pipeline takes care of triggering the push from the code repository to the AWS Lambda, updating the aliases at respective stages of verification and testing, and making the final release by update the Prod alias. All of this happens automatically.
Performance Monitoring and Reliability Management
Performance and reliability are two sides of the same coin. While performance focuses on maintaining the velocity, concurrency, and throughput, reliability focuses on the availability and correctness of the functions. DevOps team needs to continuously monitor key metrics to ensure desired performance and high reliability of Lambda functions. Some of these metrics are available for easy monitoring using AWS CloudWatch service, while others need to be configured using a custom-built DevOps monitoring solution.
For ensuring high performance, it is important to monitor all or most of the following metrics.
- Duration: It is the time taken to execute one cycle of a Lambda function.
- Resource consumption: It is important to capture the minimum and maximum amount of memory consumed, CPU cycles utilized, and thread-pool sizes.
- Invocations: It is the number of times the Lambda function was invoked by the users.
- Concurrent execution: It is the calculated sum of all executions happening at a given point in time.
For ensuring good availability and reliability, the following metrics can be monitored.
- Throttles: It is the number of times a function could not be invoked because it had reached the maximum provisioned concurrency.
- Timeouts: It is the number of times a function aborted execution mid-way due to unavailability of any of the resources or taking too long to complete the execution.
- Errors: It is the count of errors caused in the code due to unhandled exceptions and runtime errors.
- Dead-letter Errors: It is the number of times Lambda failed to send an unprocessed event to the dead-letter queue. It directly indicates the amount of data loss by the Lambda function.
While most of these metrics are inherently provided by AWS CloudWatch, it is important to use alternate sources such as application logs and application events to capture metrics such as resource consumption and errors. DevOps team needs to continuously analyze these metrics to adjust the performance and reliability of the functions. There are APM tools available for improving the observability of the Lambda functions.
Security assessment and Vulnerabilities
There are multiple security pitfalls with serverless functions that Lambda inherits from the architecture. They range into the broad categories of access management, vulnerability and penetration assessment, and compliances.
AWS provides strong integration with the AWS IAM service that takes care of the resource-based access policy and role-based access policy. Under the shared responsibility model, AWS allows its customers to configure these policies according to their specific business needs and AWS enforces them on the Lambda functions.
With regards to vulnerability assessment and penetration testing, the biggest pitfall is having a public event source triggering the execution of the Lambda functions. This might result in brute-force threats and an uncontrolled number of invocations, executions, and throttles for the Lambda function.
DevOps team needs to configure various standard code review tools, static and dynamic code analysis tools, dependency vulnerability check tools, and security test tools for clearly assessing the possible threats and security issues with the code as well as the configuration of the Lambda functions. This might be a one-time activity with a new version of the Lambda function read for release to the production environment.