Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts/CI to check for 404 URLs after a big move #1726

Open
mrjones-plip opened this issue Dec 4, 2024 · 1 comment · May be fixed by #1729
Open

Add scripts/CI to check for 404 URLs after a big move #1726

mrjones-plip opened this issue Dec 4, 2024 · 1 comment · May be fixed by #1729

Comments

@mrjones-plip
Copy link
Contributor

Hugo does a great job ensuring all pages that link to each other internally don't 404. However, for large moves like we did recently with forms, we may 404 a number of inbound links from other sources, or bookmarks folks have. To ensure these don't break, it's nice to generate a list of all known URLs on main, do a big move, and then check that all the known URLs safely redirect.

Two scripts were written already which we may choose to repurpose - but likely this should be:

  • rewritten in node
  • run in CI and block a merge if it fails
  • allow users to run locally so they don't have to wait for CI
@mrjones-plip
Copy link
Contributor Author

mrjones-plip commented Dec 9, 2024

Ok! I did some exploratory research and here's what I think the rough structure is - open to input though! For every PR that wants to merge to main, CI will:

  1. build a version of the site based off the branch - see how we do this already for a weekly link check
  2. get every current URL by downloading the site map from production
  3. using curl for npm - download every page on the branch build running in the CI hugo server
  4. check the response and HTML for each:
    • 200 response - if yes, check if it has a http-equiv="refresh" in the HTML and that this in turn has a 200 (recursive 'til no meta refresh?)
    • 404 response - note the page has a 404 and should be instead have an alias (meta refresh)

the site map saves us quite a bit of recursion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant