Kevin W. McConnell

A fast(er) de-referer service with CloudFront Functions

Yesterday I read an interesting article about building a fast de-referer service on AWS. I hadn’t come across de-referers before, but the article does a great job of explaining what they are, and why someone might want one.

The solution described in the article is really quite nice. It uses AWS Global Accelerator paired with several nano-sized EC2 instances deployed to multiple AWS regions. Since Global Accelerator will route traffic to the nearest instance for each request, it results in really good performance, no matter where the requests are coming from.

But as I read the article, I found myself wondering if something similar could be built more simply with CloudFront, and still have equally good performance.

I gave it a shot, and it turns out to work quite well.

Why CloudFront can help

Although the Global Accelerator-based design has a lot going for it, it sounds like there are a couple of downsides as well. None of these are significant problems, but they present some areas to try to improve on — which is always a fun thing to do :) Specifically,

If we’re able to use CloudFront to run the service, we ought to be able to address these items:

CloudFront Functions (not to be confused with Lambda@Edge) is a simple, and very restricted, programming model for CloudFront. It allows you to write JavaScript functions that inspect and modify the requests and responses of CloudFront traffic. If your responses are simple enough, this can even include generating the entire response from the function, avoiding the need for CloudFront to fetch anything from an origin server or its cache.

These functions run inside the many regional CloudFront endpoints, so they can respond quickly by running close to the user. They also have no cold starts, run in sub-millisecond time, and can scale to millions of requests per second (if you can afford it…)

The main downside to CloudFront functions are the restrictions that it places on functions. In particular, the functions must complete within 1ms, and they can’t access the request body.

Despite these limitations, there are still a lot of cases where they are useful.

How the service works

The crux of a de-referer service is that instead of linking directly to a 3rd-party site, you’d link to an intermediary page hosted by the service. This intermediary page includes a refresh directive that instructs the browser to navigate to the original URL.

To the end user, not much has changed; but as far as the 3rd-party service is concerned, the referer will now be the de-referer service, rather than the page containing the link that was clicked.

The simplest de-referer, then, is just a service that returns the correct refresh directive.

Luckily that is something we can do within the restrictive model of CloudFront Functions.

A minimal de-referer function

Here’s a very simple implementation of a de-referer service as a CloudFront function:

js
function handler(event) {
var dest = event.request.querystring['to'];
if (!!dest) {
var response = {
statusCode: 200,
statusDescription: 'OK',
headers: {
refresh: { value: "0; url='" + decodeURIComponent(dest.value) + "'" },
},
};
return response;
}
return event.request;
}

The function checks whether the request has a to query param, and if it does, we return a tiny response with a Refresh header containing the value of that param.

If the request doesn’t have a to param, we let CloudFront handle it as normal. This lets us host both the service, and the homepage of the service, from the same CloudFront distribution.

A production service would probably do a little more, like validating the format of the URL in the to param. But this version is close enough to test with.

We can deploy everything we need to try this out with a little bit of CDK code:

typescript
import * as cdk from '@aws-cdk/core';
import * as cloudfront from '@aws-cdk/aws-cloudfront';
import * as origins from '@aws-cdk/aws-cloudfront-origins';
import * as s3 from '@aws-cdk/aws-s3';
import * as deploy from '@aws-cdk/aws-s3-deployment';
class Stack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string) {
super(scope, id);
const bucket = new s3.Bucket(this, 'Assets');
new deploy.BucketDeployment(this, 'AssetDeployment', {
destinationBucket: bucket,
sources: [deploy.Source.asset('static')],
});
const cfFunction = new cloudfront.Function(this, 'Function', {
code: cloudfront.FunctionCode.fromFile({ filePath: 'handler.js' }),
});
new cloudfront.Distribution(this, 'ServiceDistribution', {
defaultRootObject: 'index.html',
defaultBehavior: {
origin: new origins.S3Origin(bucket),
functionAssociations: [
{
function: cfFunction,
eventType: cloudfront.FunctionEventType.VIEWER_REQUEST,
},
],
},
});
}
}
const app = new cdk.App();
new Stack(app, 'Dereferer');

This CDK code deploys a CloudFront distribution, backed by an S3 bucket which contains some static content (for the homepage of the service). It also attaches our service function to the distribution as a VIEWER_REQUEST type, meaning it will run at the start of every request.

(If you’re interested in trying it out, I’ve put all the code in a repo so you can easily deploy it from there.)

A “real” deployment of a service like this would also need a custom domain; I haven’t added that here since it doesn’t affect the testing. You’d probably also want to send the access logs to an S3 bucket in order to do analytics on the traffic.

Performance

We should expect performance to be good, since the function is adding at most 1ms to a standard CloudFront request. And CloudFront is generally pretty fast.

In my testing I’ve found the p95 response time to be around 150ms, with an average closer to 30ms.

For example, running from my location (Edinburgh):

Response time histogram:
0.017 [1] |
0.032 [909] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.047 [27] |■
0.062 [2] |
0.077 [1] |
0.092 [0] |
0.107 [3] |
0.122 [10] |
0.138 [12] |■
0.153 [19] |■
0.168 [16] |■
Latency distribution:
10% in 0.0190 secs
25% in 0.0200 secs
50% in 0.0207 secs
75% in 0.0220 secs
90% in 0.0286 secs
95% in 0.1189 secs
99% in 0.1652 secs

I’m quite pleased with the performance of this, especially given the design is so simple and easy to operate.

Running costs

The dominant cost in running this will be CloudFront’s request pricing.

There will also be the cost of each function invocation, and some bandwidth charges, although both of those are likely to be quite small (especially since the size of each response is so small).

The original article mentions serving 80 million requests per month. So if we use that as a basis for some estimates, and assume for the sake of argument that requests are predominantly from the US, it would look something like:

ItemPriceTotal
80,000,000 HTTPS requests$0.01 per 10,000$80.00
80,000,000 function invocations$0.10 per million$8.00
80,000,000 500-byte responses, ~40GB$0.085 per GB$3.40

So around $90/month to serve that load.

For low-ish traffic this would be a very cheap way to run the service, due to the lack of any fixed monthly costs. When there is a lot of traffic that per-request cost is certainly going to add up. That said, this sort of serverless approach has the nice benefit of being almost completely managed for you. So if you find that it saves you some operational time, that can definitely be worth paying a little extra for.

Posted November 10, 2021.