Most HTML to PDF conversion or creation libraries have system dependencies. Because in Lambda we only have the code and Lambda itself takes care of the underlying infrastructure, we can not install dependencies, however there is a solution to this.
By using Lambda Layers an environment can be created for an executable which can interact with the system itself. The well known library wkhtmltopdf has a zip archive for a Lambda Layer available.
The Lambda Layer will be consumed by a Lambda function which will call the binary inside the Lambda Layer in order to do the actual conversion.
This all can be setup in a few simple steps.
Step 1 – Setup Lambda Layer
From the official wkhtmltopdf website download the Amazon Linux 2 (lambda zip) for x86_64.
Login to AWS console and navigate to Lambda and click Layers under Additional Resources. This will show the Lambda Layers overview, click the button “Create Layer” on the right top of the page.
Fill out the following data:
- Name
- Description
- Upload the ZIP file that you just downloaded from htmltopdf website
- Select x86_64 architecture
Click “Create” to upload the archive and create the Lambda Layer.
This will redirect to the detail page and will show the ARN, the ARN will be in the following format:
arn:aws:lambda:[region]:[accountId]:layer:wkhtmltopdf-layer:1
Step 2 – Create Lambda Function
In order to call the Lambda Layer, a Lambda function needs to be created.
There are different ways to call a Lambda function and initiate a conversion from HTML PDF.
Personally I use the following flow:
- Subscribe the PDF conversion Lambda to a SNS topic.
- Save the HTML to convert into a S3 bucket.
- From anywhere in the application, publish a message to the SNS topic, with the Bucket & Filename of the HTML file to convert.
- The PDF conversion Lambda gets triggered on a publish and fetches the HTML.
- The Lambda spawns a wkhtmltopdf process in the Lambda Layer to do the HTML to PDF conversion.
- The resulting PDF is stored into the S3 bucket and can be downloaded by the calling function.
This is just an example of how an implementation could be, to reduce the overhead of this post, below is the actual conversion code, which would be step 5 in the flow above.
const htmlString = "<strong>Example HTML to PDF Conversion</strong>";
const options = [];
const bufs = [];
const proc = spawn("/bin/sh", ["-o", "pipefail", "-c", `wkhtmltopdf ${options.join(" ")} - - | cat`]);
proc.on("error", error => {
callback(error,null)
}).on("exit", code => {
if (code) {
const error = `wkhtmltopdf process exited with code ${code}`;
callback(error,null)
} else {
/* This buffer holds PDF data.
* Save/return it with ContentType application/pdf
*/
const buffer = Buffer.concat(bufs);
callback(null,"pdf done");
}
});
proc.stdin.end(htmlString);
proc.stdout.on("data", data => {
bufs.push(data);
}).on("error", error => {
callback(error,null)
});
This snippet can be easily integrated inside your own flow. The HTML content that needs to be converted should be loaded into the htmlString variable.
The output will end up in the buffer variable. This data needs to be saved or returned with the content type “application/pdf“.
Step 3 – Connect Lambda with Lambda Layer
Before we can use the Lambda Layer inside our Lambda, it needs to be connected.
There are several ways for it, if you use the Serverless framework, it can be included in the “serverless.yml” file you can add the following lines to the function that wants to use the Lambda Layer.
layers:
- arn:aws:lambda:[region]:[accountId]:layer:wkhtmltopdf-layer:1
The ARN should be the ARN which was generated in Step 1 when creating the Lambda Layer.
The other way is by using the AWS console. You will need to go to Lambda and search for the function which you want to connect. On the bottom of the code tab you will see the Layer configuration.
Click the “Add a layer” button.
- Select Custom layers as Layer source
- In Custom Layers dropdown select the layer that is created in step 1
- Alternatively, select “Specify an ARN” and copy/paste the ARN from step 1
- Click “Add” to add the Layer to Lambda.
Conclusion
With just a few steps it is possible to implement HTML to PDF conversion using Lambda with NodeJS.
Leave a Reply