Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Adds consistent redirects handling for cf workers. #3895

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

danyx23
Copy link
Contributor

@danyx23 danyx23 commented Aug 22, 2024

Cloudflare has support for redirects but only up to a limited number (I think 2000?). We have a lot of chart redirects in the chart_redirects table (about 1600 or so). Because everything under the /grapher/ route on our website has to run through a CF function anyways (to update the thumbnail image to include the correct query string), we decided to bake these redirects into a json file and do custom redirects handling in the CF function.

We did this for html pages already but not for thumbnails, but now that we also have .config.json requests to worry about and would like to have consistent redirects handling that doesn't overly complicate the logic of our handlers.

This PR makes it so that both requests for the html pages at /grapher/[slug] and requests for /grapher/[slug].config.json will be redirected according to the _grapherRedirects.json file using a catch error handler. This is done as a catch handler that checks for 404 pages so that the common, happy path does not have to fetch the redirects file.

@owidbot
Copy link
Contributor

owidbot commented Aug 26, 2024

Quick links (staging server):

Site Admin Wizard

Login: ssh owid@staging-site-cf-function-consistent-redirects

SVG tester:

Number of differences (default views): 0 ✅
Number of differences (all views): 0 ✅

Edited: 2024-09-10 08:28:01 UTC
Execution time: 1.19 seconds

@danyx23 danyx23 force-pushed the fetch-grapher-config-from-r2 branch from 7a9ac89 to 797db83 Compare August 28, 2024 15:50
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from af2f857 to 08a9354 Compare August 28, 2024 15:50
Copy link
Member

@marcelgerber marcelgerber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done!

I did find a bunch more cases that we need to cover, or may want to cover in a nicer way.

Comment on lines 31 to 33
"Access-Control-Allow-Origin": "*",
},
})
}

export const onRequestGet: PagesFunction = async (context) => {
// Makes it so that if there's an error, we will just deliver the original page before the HTML rewrite.
// Only caveat is that redirects will not be taken into account for some reason; but on the other hand the worker is so simple that it's unlikely to fail.
context.passThroughOnException()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really try to keep this in as the first line of business.

It allows us that if any unhandled exception happens in this handler, we're still failing open and are sending over the unmodified /grapher/... response as-is from the origin server. Which is a good fallback (and will work for graphers unless there's a redirect - but never for .json, .png and .svg, obviously.)

See https://developers.cloudflare.com/workers/runtime-apis/context/#passthroughonexception.

Comment on lines +56 to +62
const allExtensions = Object.values(extensions)
.map((ext) => ext.replace(".", "\\.")) // for the regex make sure we match only a single dot, not any character
.join("|")
const regexForKnownExtensions = new RegExp(
`^(?<slug>.*?)(?<extension>${allExtensions})?$`
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code will be simpler by assuming that a slug can't contain a dot, i.e. something like /^(?<slug>[^\s\.]+)(?<extension>\.\S+)?$/.

In theory we could then check after the fact that the extension is valid (and throw a fitting error), but it's also fine if we're not doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah true but I think I like the idea of using a list of known extensions as a communication device that is added to further up in the stack. I'm inclined to leave it as is for now.

Comment on lines +63 to +66
const matchResult = fullslug.match(regexForKnownExtensions)
const slug = matchResult?.groups?.slug ?? fullslug
const extension = matchResult?.groups?.extension ?? ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what this code doesn't currently seem to handle is non-lowercase slugs.
Our rule is that a slug needs to be all-lowercase, and if it isn't we want to redirect it.

In the old code, I tried to be clever about it so we never need to run through two redirects, avoiding the scenario REDIR-SOURCE -> redir-source -> redir-target. I don't think we need to take care of that again, which means we would just do:

if (slug.toLowerCase() !== slug)
  return redir(302, slug.toLowerCase())

@@ -215,6 +217,13 @@ export async function fetchGrapherConfig(
): Promise<FetchGrapherConfigResult> {
const fetchResponse = await fetchUnparsedGrapherConfig(slug, env, etag)

if (fetchResponse.status === 404) {
// we throw 404 errors instad of returning a 404 response so that the router
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// we throw 404 errors instad of returning a 404 response so that the router
// we throw 404 errors instead of returning a 404 response so that the router

@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from c7d5f53 to 546f675 Compare September 5, 2024 12:14
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from 546f675 to ecffac1 Compare September 5, 2024 12:31
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from ecffac1 to 2b761ca Compare September 5, 2024 18:37
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from 2b761ca to ade6496 Compare September 9, 2024 09:33
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from ade6496 to a8553c5 Compare September 9, 2024 11:53
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from a8553c5 to 563e74b Compare September 9, 2024 14:44
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from 563e74b to 275fa56 Compare September 10, 2024 07:59
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from 275fa56 to c902103 Compare September 10, 2024 08:06
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from c902103 to b0cd9d5 Compare September 10, 2024 09:55
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from b0cd9d5 to 2f44d84 Compare September 10, 2024 10:24
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from 2f44d84 to 3589dce Compare September 10, 2024 10:35
@danyx23 danyx23 changed the base branch from fetch-grapher-config-from-r2 to graphite-base/3895 September 10, 2024 10:54
This means that both request for the html pages at /grapher/[slug] and
requests for /grapher/[slug].config.json will be redirected according
to the _grapherRedirects.json file. This is done as a catch handler
that checks for 404 pages so that the common, happy path does not
have to fetch the redirects file.
@danyx23 danyx23 force-pushed the cf-function-consistent-redirects branch from 3589dce to d25b1ef Compare September 10, 2024 10:54
@danyx23 danyx23 changed the base branch from graphite-base/3895 to master September 10, 2024 10:55
Copy link
Contributor Author

danyx23 commented Sep 10, 2024

Merge activity

  • Sep 10, 7:13 AM EDT: @danyx23 started a stack merge that includes this pull request via Graphite.
  • Sep 10, 7:14 AM EDT: @danyx23 merged this pull request with Graphite.

@danyx23 danyx23 merged commit 3a153b9 into master Sep 10, 2024
21 checks passed
@danyx23 danyx23 deleted the cf-function-consistent-redirects branch September 10, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants