Quantcast
Channel: Engineers @ The LEGO Group - Medium
Viewing all articles
Browse latest Browse all 69

AWS Step Function Map State: Fetching & Merging PDFs

$
0
0
Emmet catching his waffles from a toaster
The LEGO® MOVIE 2™: Step 9 of Emmet’s morning routine

Are you trying to fetch and merge PDFs? Are you using AWS? Are you interested in Serverless? Do you want to use AWS Step Function? Are you interested in using Map State? Then here is a walkthrough of my Serverless architecture for fetching PDFs and then merging them into a single PDF, to be sent to a user.

Disclosure: This blog focuses on the Step Function architecture and not the underlying code. In addition, the architecture below has been simplified but the Step Function functionality represents a high-level model of what was done in our final service.

Scenario

We needed to build a backend service that accepts user input from a frontend form, makes an HTTP request to a 3rd party vendor to generate a PDF file depending on the user input. One of the user’s input is how many generated PDFs they need (max 5), this means we need to make 5 unique requests to the 3rd party vendor, however, we want to only display a single PDF to the user.

Summary of Solution

We decided to use Step Function in AWS to create a Serverless architecture and the Step Function Map State to concurrently fetch data from the 3rd party vendor. Then the response from each Map result will be merged together.

TLDR — Analogy of Architecture

  1. Mother shouts at me for eating all the ice-creams.
  2. So I write a shopping list of my favourite treats.
  3. I use Shadow-clone Jutsu to duplicate myself (max 5 clones), and each version of myself will go find the item on my shopping list in the store.
  4. We meet at the checkout point and put our treats in a single basket to pay.
  5. Walk home happily eating ice cream as a treat.
A diagram of my step function architecture. Starting from a HTTP request to a Lambda to generate a payload, then a MAP state that fetches PDFs and the final will merge the PDFs together.
Fetch & Merging PDFs Architecture (Simplified)

The flow of the architecture

  1. The front-end user fills in a form that asks them how many PDFs they need and other inputs that customise the PDF file. Once the form is submitted, an HTTP Request will be made that triggers the AWS Step Function.
  2. The first Lambda (GeneratePayload) accepts the user’s input and transforms their input into the 3rd party vendor’s expected payload. An array of payloads will then be passed as the input of the next task in the Step Function (and the array size is the number of PDFs the user has requested).
  3. The Map State iterates through the array input and triggers the second Lambda (FetchPDF) with the value of the current index being the event input into the Lambda. FetchPDF will then use the input which is a payload to be sent to the 3rd Party vendor who would return PDF data. When all the Lambdas have finished executing, the Map State will pass their responses in a single array to the next task in the Step Function.
  4. Finally, MergePDF Lambda will then use the node_module: pdf-merger-js to merge the array of PDF data together into a single PDF. This combined PDF will then be put into an S3 Bucket.

A simplified example of my serverless.yml

Note: I am using the Serverless framework to deploy our services. Find out more about serverless.yml.

stepFunctions:
stateMachines:
fetch-pdfs-stepfn:
name: fetch-pdfs-stepfn
events: # step function entry event
definition:
StartAt: GeneratePayload
States:
GeneratePayload: ... # Task step
MergePDFs: ... # Task Step
FetchPDFs:
Type: Map
ItemsPath: $.payload # Array of data
ResultPath: $.responses # Array of PDF files
MaxConcurrency: 5 # Max number of parallel invocations
Iterator:
StartAt: FetchPDF
States:
FetchPDF:
Type: Task
Resource:
Fn::GetAtt:
- fetchPDF # Name of Lambda
- Arn
End: true
Next: MergePDFs

The example above only shows a simplified version of the Step Function from the architecture design and I’ve only included configuration which I consider to be vital.

Map State Params

“Iterator” expects a state machine object, which is run for each value in the input array.

“ItemsPath” is for defining the path to the array value, when an object is passed in the event. It expects an array of JSON so that it can iterate through the values. The length of the array determines how many Lambda invocations there will be.

MaxConcurrency” param can affect the cost and performance of your architecture, if you do not define a max value then the Map State will go wild with no quota and invoke as many iterations as possible in parallel.

“ResultPath” filters the output object and returns the value of the specified path.

“ResultSelector” expects a collection of key-value pairs, which are used to filter the output object to return an object.

“Retry” allows you to define an “ErrorEquals” value for which you can define specific conditions for when to retry the “Iterator” definition. “MaxAttemps” can also be defined so that retries are restricted. Also, “IntervalSeconds” can also be defined to represent the seconds before retrying.

“Catch” allows you to define an “ErrorEquals” value for which you can define specific error conditions you want to catch and then the “Next” step can be defined as an error state rather than continuing on the happy flow.

More descriptions about the params and other params can be found in the AWS documentation for Map State.

Alternatives approaches

Instead of the Map State, you could use a single Lambda to handle multiple requests asynchronously and stitch the results together once all the requests have finished. For example, in JavaScript, you can use a Promise.all approach to wait for all the HTTP Requests to respond and then merge the responses.

Step Function Map state VS Alternative approach

The following pros and cons compare my solution for the given scenario to the alternative approach described above. They are just a few comparisons that came to mind when I was developing my solution. Hopefully, my brain dump can help you decide what approach is best for your own requirements.

Pros:

  • You can configure the Map State to run the Lambdas concurrently, which can help with the overall Step Function run time.
  • Using Map State allows you to modularise complicated code and run it in smaller reusable Lambdas.
  • Ability to define “Retry” params for Step Function Tasks within the Map State definition, making error handling simpler.
  • Map State allows you to bring together the results of the Task that was run within the Map State.

Cons:

  • Cold starts if the Lambda is not invoked often. If your Lambda is not invoked frequently but you still want the runtime to be low then there may be some occurrences when a Map State may lag (However, this could be mitigated by adding provisioned concurrency to your Lambda).
  • It is not possible to have a large dataset as input into a Step Function task as the max input size for each Step Function task is 262KB, see quotas related to task executions. It is possible to get around this limit by persisting data in a DynamoDB or an S3 bucket.

Conclusion

If you are building a service that requires a fast response time and has repetitive functionality, then using Step Functions with Map state could be for you.

The architecture is highly scalable as each key function has been separated into independent Lambdas but also using Step Function Map State allows us to concurrently perform repetitive tasks and output a compilation of all the results. The compiled results can then be inputted into the next step and manipulated to meet the requirements of the service.

Hopefully, you have found something interesting in my blog about my Map State use case.

finito.

Khoa Phan, Application Engineer at the LEGO Group in the Cart/Checkout squad for LEGO.com. He supports the team with planning, developing, and maintaining features that ensure users can checkout smoothly.


AWS Step Function Map State: Fetching & Merging PDFs was originally published in Engineers @ The LEGO Group on Medium, where people are continuing the conversation by highlighting and responding to this story.


Viewing all articles
Browse latest Browse all 69

Latest Images

Trending Articles





Latest Images