Cloud Security

How Serverless Functions Work – Using AWS Lambda as an Example

By Xiao Sun

[This post is Part Two of a three part series. Part One covered Why Serverless, and Part Three will cover creating and Securing Serverless.]

 

In this post, using the AWS Lambda service as a basis, we will investigate what is behind serverless, how functions are invoked, and what resources are available to build a powerful service.

First, understand that the serverless run-time environment is not magic. Your function will still be running on a machine, which is a highly customized micro-vm. And depending on your function’s programming language it builds a minimal runtime which is designed for this language only. The micro-vm used by AWS is called Firecracker. You can check it out at https://firecracker-microvm.github.io/.

From the diagram provided by Firecracker above, we can see it is a kvm based vm just like all other hypervisor systems we use. It stores your function in the /var/runtime/bin folder and waits for the function to be called.

But how does the function actually run?  Let’s look at the process running in the micro-vm after your function is invoked. Let’s say your function is a python function.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
496          1  0.0  1.1 305612  7488 ?        Ssl  22:52   0:00 /var/rapid/init --bootstrap/var/runtime/bootstrap
496          8 30.0  4.1 232276 26332 ?        S    22:52   0:00 /var/lang/bin/python3.7/var/runtime/bootstrap

So, the basic process is a bootstrap, which in the core of the bootstrap gets the user input to pass to the handler, then returns the result, like below:

while True:
    event_request = lambda_runtime_client.wait_next_invocation()   // get the user input

    _GLOBAL_AWS_REQUEST_ID = event_request.invoke_id

    update_xray_env_variable(event_request.x_amzn_trace_id)


    handle_event_request(lambda_runtime_client,        // process user input.

        request_handler,
        event_request.invoke_id,
        event_request.event_body,
        event_request.content_type,
        event_request.client_context,
        event_request.cognito_identity,
        event_request.invoked_function_arn,
        event_request.deadline_time_in_ms)

The basic function of the bootstrap is to get the event which will contain the user input, then pass it to the request_handler for processing.  The request_handler is your function’s name and it will handle the event.

If we check the runtime folder content, we can see the bootstrap and the basic lib needed to run your function and the bin folder which contains your function


bin
bootstrap
bootstrap.py
boto3
boto3-1.9.221.dist-info
botocore
botocore-1.12.221.dist-info
dateutil
docutils
docutils-0.15.2.dist-info
jmespath
jmespath-0.9.4.dist-info
lambda_runtime_client.py
lambda_runtime_exception.py
lambda_runtime_marshaller.py
__pycache__
python_dateutil-2.8.0.dist-info
runtime-release
s3transfer
s3transfer-0.2.1.dist-info
six-1.12.0.dist-info
six.py
test_bootstrap.py
test_lambda_runtime_client.py
test_lambda_runtime_marshaller.py
urllib3

As can be seen from the example above, there is nothing special about invoking this function. A Firecracker micro-vm matching your function’s language has been linked to your function. When a trigger is set, bootstrap starts fetching user input and invoking your function to process it. After it has completed, it returns the response. All resources get invoked and stopped when the user issues a request and gets the response. What makes this special is that AWS provides the infrastructure to schedule all required resources for you and provides a large number of supporting components around it to make your function powerful.

One valuable component from AWS provides verification of the method as the trigger for invoking the function. This provides a range of inputs for your function to collect and process user data. For the trigger provide below, here is the information provided by AWS:

For synchronous invocation, the other service waits for the response from your function and might retry on errors.

Services That Invoke Lambda Functions Synchronously

For asynchronous invocation, Lambda queues the event before passing it to your function. The other service gets a success response as soon as the event is queued and isn’t aware of what happens afterwards. If an error occurs, Lambda handles retries, and can send failed events to a dead-letter queue that you configure.

Services That Invoke Lambda Functions Asynchronously

With this trigger, your function is invoked when a trigger has been set. The Firecraker micro-vm will launch and call the bootstrap then pass the input to the event_handler (which is your function).

Now, what can your function can do with the input?  Below is a list of the resources a lambda function can operate on. For example, perform a DynamoDB operation, create, modify, or delete an EC2 instance, trigger another api call, or use Kinesis to start a streaming data process.

AWS CloudFormation

AWS IoT

AWS Key Management Service

AWS Lambda

AWS Secrets Manager

AWS X-Ray

Alexa for Business

Amazon API Gateway

Amazon CloudWatch

Amazon CloudWatch Logs

Amazon Cognito Identity

Amazon Cognito Sync

Amazon DynamoDB

Amazon EC2

Amazon EventBridge

Amazon Kinesis

Amazon Resource Group Tagging API

Amazon S3

Amazon SNS

Amazon SQS

Identity And Access Management

Manage Amazon API Gateway

As stated in part one, you will be charged for the number of invocations per month. The free tier is 1 million calls per month. Of course, if your function uses other resources like S3 storage or databases then these services are charged separately.

When your function finishes executing, what happens next? For simple functions you may just need to return the response to the user. But if you need further processing, AWS provides a destination so you can invoke another Lambda function, or post an event to sqs, sns and the EventBridge for Asynchronous process. Or you can just stream the results to DynamoDB, Kinesis etc.

As a best practice, a function should do one thing, and one thing only. Don’t interrupt the function in the middle of processing to perform an asynchronous action such as calling another Lambda function, or making a request to an outside resource. Do it at the end as a next step. Use the output, the destination to trigger another function to do it.

What if the service you provide is complicated and can’t fit in one function?  If you have a lot of functions to form a logical working flow, the answer is a step function.

Below is a call center example from AWS. You can see multiple functions are defined to handle different aspects of a call, with AWS step functions to link them together to form a process flow. The step from one function call to another function is called a state transition, and you will be charged based on the number of state transitions. The free tier limit is 4000 state transitions per month, and after that $0.025 per 1000.

These are the basics concepts of creating a function. There are more advanced topics to enhance the overall capability of functions which are listed below.

Use api gateway to support canary-release

https://docs.aws.amazon.com/en_pv/apigateway/latest/developerguide/canary-release.html

Use cloudwatch or xray to monitor your function and event

https://aws.amazon.com/cloudwatch/

https://aws.amazon.com/xray/

Add user roles to bound the function capability, with best practice using the least privilege principle

https://aws.amazon.com/iam/

Cognito can provide user sign-in, signup  and user access control

https://aws.amazon.com/cognito/

Build a layer for shared library/code and add to your function

https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

Access VPC through your function or use SNS/SQS to process result

Of course, it is possible to fine tune function configuration parameters such as runtime, timeout, concurrency, memory limit, environment variables etc.

https://docs.aws.amazon.com/lambda/latest/dg/resource-model.html

There are many more resources available to use in a function. Serverless is easy to get started, but becomes more difficult as functions get deeper and wider. And with every resource that is required there will be a price tag for usage. It might be that by the time you notice all the charges, you’re already locked into a vendor, which is the downside of serverless.

To sum up what we have covered so far:

  • The serverless user’s responsibility is only to provide the function required, with all infrastructure required managed by the cloud provider.
  • A function is only invoked when a trigger is set, and a micro-VM is created for each invocation and released when the function finishes.
  • There’s no persistent machine or system to worry about, and no active resource exists after completion of the execution.

This can dramatically decrease the effort that is required and lets you focus on the business logic of your function. Of course, there’s still a lot to learn about the infrastructure which AWS provides. You will need to configure the resources correctly and minimize costs, so it’s not just building functions or writing code that is actually required.

In the next post we will discuss serverless security and what development tools are available to develop serverless functions using best practices.

About the Author

Xiao Sun

Xiao Sun is a software engineer at NeuVector.

By Xiao Sun |Tags: |No Comments