By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Serverless Search Engine Optimization

José Matías Rivero
March 21, 2023

Context

When working with SPAs (like React or Angular apps), hosting them via serverless solutions like a CDN is a great way to ensure high scalability at a low cost. At Ensolvers, we have several apps that are served this way via AWS infrastructure using CloudFront. How we do this? By following the rules below.

  1. The deployment process is done by executing the bundling scripts that the framework provides and then uploading the content to S3 bucket. The S3 bucket is linked to a CloudFront distribution that is invalidated by the script itself, triggering a resource refresh in all edges globally.
  2. Cloudfront distribution is configured to return a 200 OK and contents of /index.html if a 404 is found

So, let's assume we have an app written in React, which is bundled and served in https://mydomain.com/home - so, basically, we have a /home path in our app. Then, when a consumer tries to access this URL, the following happens:

  1. There is no /home resource in the CDN distribution, but since we have the rule described above, the contents of /index.html are returned (serving the bundled app)
  2. The browser loads the app
  3. After loading, the Router component in the app checks that the current route is /home, which is linked to a React component that renders the view
  4. The view is rendered and the user sees what he/she expects

That sounds like a great solution, but the inherent high availability and low cost comes with some drawbacks. Let's talk about one: SEO Optimization.

The problem

As we've said previously we are serving the same /index.html content for any path requested if that content is not found. That means: the same meta-content is served for all pages in the app. For SEO we want each particular path/page to have their own <title> and OpenGraph <meta> tags, among other features. How do we address this issue?

The solution(s)

There is no unique solution for this. We've implemented several solutions depending on the requirements of each customer. Most of the solutions imply to generate an optimized version of the contents of the index.html page in different ways.

Solution 1: Page post-generation within the deployment process

If we have a well-known, reduced set of pages that we need to be SEO-optimized, we can run a script that takes the index.html and applies the corresponding transformations. We do this quite simply by using Jsoup Java library for DOM manipulation. These scripts consist of a basic app that is run after the bundling but before the app uploads and generates the optimized version for a particular route. For instance,

will take the file located in index.html and will generate the optimized version called /home, by reading a specific optimization DSL spec file designed by us and called optimizations.json. This file contains the HTML tags that need to be added/modified.

Solution 2: Generating optimized version on-the-fly

Page post-processing and/or pre-rendering is a pretty common technique in content management apps. In this case, we apply the same code that we used in Solution 1, but dynamically when some domain object changes and, as a result, we need to optimize some particular path in which it is shown. It consists in:

  1. Fetching the index.html page
  2. Applying the optimizations needed depending on the requirements of the particular app
  3. Write the optimized version directly in S3, with the expected path (in this case, /home)
  4. Invalidate the CloudFront cache for the involved paths so they are updated immediately in all edges

This approach requires a bit of simplification in the index.html structure to avoid re-generating all pages when the index.html content or structure changes, but that can be described in a different thread.

Solution 3: using Lambda@Edge

AWS CloudFront provides a very powerful feature that is the ability to introduce dynamic code via Lambda@Edge. While this has some limits (for instance, function size has to be no more than 1 MB), it's very useful to introduce lightweight behavior.

In this case, we've implemented two approaches so far:

  1. Proxying (depicted below): when the transformations are complex, we implement a specific endpoint or microservice that can do the optimizations and we proxy them via a Lambda function
  2. On-the-fly: if transformations are simple (for instance, string replacements or insertions), we can do it directly in the Lambda without involving external microservices.  We just need to ensure that we have the ability to access to all the required resources (S3, DynamoDB, etc.) needed in order to obtain the index.html to be optimized and, also,  the optimization specs
Figure 1. Proxy class used in Lambda@Edge to implement dynamic request proxy for a particular paths

Conclusion

In this note we've described different ways in which we can implement SEO while preserving all the advantages of serverless frontend app serving. The techniques here described (used in different projects at Ensolvers) can be applied to other fields as well, like request queueing, caching optimization, etc.

Interested in our services?
Please book a call now.