Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace broken links on import + add a script to download links #64

Merged
merged 4 commits into from
Nov 22, 2024

Conversation

pcraig3
Copy link
Collaborator

@pcraig3 pcraig3 commented Nov 22, 2024

Summary

This PR does 3 things:

  1. adds a function to auto-replace known broken links on import
  2. adds a script that we can use to grab all links across all NOFOs
  3. removes an old view that would download a list of links from an individual NOFO

1. auto-replace known broken links on import

We noticed that some links 404 but still return a 200 level status code. This is very annoying. It is also not something we can do much about, as we only control one website.

However, we can find/replace stuff that we import, so if we know what URLs have this behaviour, we can resolve the problem when the file is imported. Right now there are 3 links we have identified:

bad link good link
www.grants.gov/web/grants/search-grants.html grants.gov/search-grants
www.grants.gov/web/grants/forms/sf-424-family.html grants.gov/forms/forms-repository/sf-424-family
www.cdc.gov/grants/dictionary/index.html www.cdc.gov/grants/dictionary-of-terms/

So that's super and we love it.

2. adds a script that we can use to grab all links across all NOFOs

Pretty simple: added a script that loops through all non-archived NOFOs and will pull all external links. This helps us do analysis on them if we need to for some reason. (In this case, our reason was to manually replace bad links)

3. Removes an old view that would download a list of links from an individual NOFO

We don't need it! So let's axe it.

We noticed that some links 404 but still return a 200 level status code. This is very annoying. It is also not something we can do much about, as we only control one website.

The upshot is that our external link checker can't do anything with these links.
If we check for that URL, it will say 200 all good, even if the user is actually
seeing the Page not found message.

However, we can find/replace stuff that we import, so if we know what URLs have this behaviour, we can resolve the problem when the file is imported. Right now there are 3 links we have identified:

| bad link | good link  |
|--------|--------|
|    [www.grants.gov/web/grants/search-grants.html](https://www.grants.gov/web/grants/search-grants.html)    | [grants.gov/search-grants](https://grants.gov/search-grants)    |
| [www.grants.gov/web/grants/forms/sf-424-family.html](https://www.grants.gov/web/grants/forms/sf-424-family.html) | [grants.gov/forms/forms-repository/sf-424-family](https://grants.gov/forms/forms-repository/sf-424-family) |
| [www.cdc.gov/grants/dictionary/index.html](https://www.cdc.gov/grants/dictionary/index.html) | [www.cdc.gov/grants/dictionary-of-terms/](https://www.cdc.gov/grants/dictionary-of-terms/) |
Pretty simple: added a script that loops through all non-archived NOFOs and will pull all external links. This helps us do analysis on them if we need to for some reason. (In this case, our reason was to manually replace bad links)
We were not using this, it was a one-off.

Plus, the new script handles this same functionality.

The best code is no code at all.™️
@pcraig3 pcraig3 merged commit 10c73d4 into main Nov 22, 2024
4 checks passed
@pcraig3 pcraig3 deleted the grants-gov-get-better branch November 22, 2024 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant