Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve redirects #16

Open
Treora opened this issue Aug 19, 2017 · 1 comment
Open

Resolve redirects #16

Treora opened this issue Aug 19, 2017 · 1 comment
Labels
snapshot quality Improving fidelity/size/durability/etc of the output

Comments

@Treora
Copy link
Contributor

Treora commented Aug 19, 2017

Too many links are nowadays obscured by link shorteners and tracker URLs. For example, on Twitter, a link would point to https://t.co/1PT68A6LEt when the author meant to refer to https://voice.mozilla.org/. Learning the intended link target requires querying the shortener service, thus depending on external service to still exist and be reachable. Not so nice.

We could therefore consider href values of such links to be resources that belong to the document, and should thus be fetched and stored. It may be tough to decide when a link is an undesired redirect, and when it is a 'legit' redirect that should be retained. One approach is to always resolve all redirects. The original URL would of course be kept as an extra attribute.

A question is still whether we can actually obtain the redirection location. fetch(url, {method: 'head'}) sounds appropriate, but looking at the fetch specification (here), it looks like it might hide all redirection information for security reasons..

@Zegnat
Copy link

Zegnat commented Jun 10, 2018

It may be tough to decide when a link is an undesired redirect, and when it is a 'legit' redirect that should be retained.

From the perspective of a user, I think freeze-dry should get the final resolved URL, after all redirects. That is the only way to answer the question: what did this link point to at the time the snapshot was created?

From the perspective of a developer, I would say it is fine to resolve any and all permanent redirects (301 and 308) and leave the temporary ones in place.

But you are right that the nature of fetch() may make all of this hard to do. Although, isn’t the final URL available? Per spec (emphasis mine):

Except for the last response URL, if any, a response’s url list cannot be exposed to script.

@Treora Treora added the snapshot quality Improving fidelity/size/durability/etc of the output label Apr 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshot quality Improving fidelity/size/durability/etc of the output
Projects
None yet
Development

No branches or pull requests

2 participants