-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle non ASCII characters in cache-tag headers #2645
Conversation
📊 Package size report 0%↑
Unchanged files
🤖 This report was automatically generated by pkg-size-action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we might need to expand the encoding? AFAIK, encodeURL
will leave reserved characters unencoded, which means a URL like http://foo.com/resource,123
would emit a cache tag with a comma in it. Should we use encodeURIComponent
instead?
I don't think it's that simple - I was running some tests here and it seems to me that comma case need to be handled on different level than current change - current change only applies to page router, because app router already actually handles it fine (I did add test cases in last commit here for it) The comma problem however applies right now to both Pages and App router in similar way, so just swapping currently added Also |
I was looking into comma case, and this is getting quite not straight forward with our usage of it basically looks like we can't really encode this specifically because we can't distinguish between between comma being a separator and being actual part of one of the items. It does feel like we kind of have to apply same handling and actually produce 2 (or more) cache tag if actual tag contains comma and instead make sure the way we interact with tag manifest blobs and purge cdn API also need to split on it - this might mean accidentally purging more content than desired but because we don't handle entire pipeline at least for app router - we just can't make it work well in this scenario in which case it would make sense to arrive and similar handling for both app and pages router? |
I was doing much more tests with comma and this getting super confusing and annoying ... cache tags for app router are generated differently when prerendered and when rendering on demand later ... in prerendered comma does get encoded: {
path: '/product/事前レンダリングされていない,test'
responseCacheTags: [
'_N_T_/layout',
'_N_T_/product/layout',
'_N_T_/product/[slug]/layout',
'_N_T_/product/[slug]/page',
// notice '%2Ctest' which is ',test'
'_N_T_/product/%E4%BA%8B%E5%89%8D%E3%83%AC%E3%83%B3%E3%83%80%E3%83%AA%E3%83%B3%E3%82%B0%2Ctest'
]
} while non-prerendered gets this: {
path: '/product/事前レンダリングされていない,test',
responseCacheTags: [
'_N_T_/layout',
'_N_T_/product/layout',
'_N_T_/product/[slug]/layout',
'_N_T_/product/[slug]/page',
// in here comma was not encoded so our splitting on ',' made 'test' into separate cache tag
'_N_T_/product/%E4%BA%8B%E5%89%8D%E3%83%AC%E3%83%B3%E3%83%80%E3%83%AA%E3%83%B3%E3%82%B0%E3%81%95%E3%82%8C%E3%81%A6%E3%81%84%E3%81%AA%E3%81%84',
'test'
]
} I'm thinking that maybe we should split on both |
Maybe we could make the splitting a bit smarter and continue to split on comma, but only if the comma is followed by |
fetch cache tags are not prepended with |
true, but the fetch cache tags are always prepended in the |
If that's true, then it probably would work, but this feels way more complex than it should be and also more fragile to next.js changes I think. I'll give it a try regardless just to see how messy setup would have to be for it |
Ughhh. There is another layer here with comma in fetch cache tags ... sigh
and
Will produce same thing (both prerendered and generated on demand) - this would still work with current code in this PR (albeit it would always treat both of above as the last one in practice), but smart splitting idea (which I hoped would result in more exact purges) wouldn't help here - how should:
those work when you can't distinguish them? the only thing that does work is splitting on comma again - but then how do you handle
? because we don't know which one ( Now I'm even more in favor of current approach that just works with both of those cases (it might purge more than needed if there is overlap with different cache tag after splitting on both comma and encoded comma, but this feels better than not being able to invalidate something at all) |
I think you're right - the current solution is the best because the impact of any problems is limited - the worst scenario I can think of is that a URL has a comma in it and, after splitting, the first part matches a layout, so a whole bunch of unintended pages get purged, but that's an edge case and why would you use a URL scheme like that anyway?! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, as discussed
Description
Fixes https://linear.app/netlify/issue/FRB-1352/handle-non-ascii-characters-in-cache-tag-headers
Handles non ASCII characters in cache-tag headers by URI encoding them (does not encode / or valid characters just the non ASCII ones)
Documentation
Tests
Added e2e test case for cacheable page router pages with non-ascii paths
Relevant links (GitHub issues, etc.) or a picture of cute animal