GitHub Action
Waystation
Waystation is a GitHub Action that makes it easy to archive your repository's GitHub Pages site automatically in the Internet Archive's Wayback Machine.
Many projects use GitHub Pages for documentation and other purposes. GitHub Pages are wonderful, but they are not archived. To help ensure long-term access to your GitHub Pages, you may want to preserve them in the Internet Archive's Wayback Machine. That's the purpose of this GitHub Action.
Waystation (Wayback site archiving automation) automates the task of sending your project's GitHub Pages URL to the Wayback Machine. It's intended to be triggered on software releases in your repository and uses the Wayback Machine GitHub Action to send your repository's configured GitHub Pages URL to the Wayback Machine, thereby ensuring that the latest copy of your site is archived. You can change the trigger condition if needed.
GitHub is incredibly popular today, but the content is not guaranteed to be permanent; moreover, GitHub has in the past changed the URLs and policies surrounding GitHub Pages—and may do so again in the future. The Wayback Machine is a free digital archive of the World Wide Web founded by the Internet Archive. Web pages saved in the Wayback Machine continue to exist even after the original project repository changes or is removed from the web, and they can be searched for, shared, and linked to normally. You can also view previous versions of a site if they were archived.
This action is available from the GitHub Marketplace. Once you find the page in the GitHub Marketplace, do the following:
- In the main branch of your repository, create a
.github/workflows
directory if this directory does not already exist. - In the
.github/workflows
directory, create a file namedarchive-github-pages.yml
. - Paste the following content into the file:
on: release: types: [published] jobs: Workflow: runs-on: ubuntu-latest steps: - uses: caltechlibrary/waystation@main with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- Save the file, add it to your git repository, and commit the changes.
- (If you did the steps above outside of GitHub) Push your repository changes to GitHub.
Refer to the next section for more information.
The trigger condition that causes Waystation to run is determined by the on
statement in your archive-github-pages.yml
workflow file.
Several parameters control the behavior of this GitHub Action; they are described below.
Setting the parameter dry_run
to true
will cause the action to execute without sending the URL to the Wayback Machine. This is useful during testing, especially if you want to try different trigger conditions.
Here is an example workflow definition using dry_run
:
# .github/workflows/archive-github-pages.yml
on:
release:
types: [published]
jobs:
Workflow:
runs-on: ubuntu-latest
steps:
- uses: caltechlibrary/waystation@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
dry_run: true
Setting the parameter debug
to true
will cause the action to print the values of the input variables
and the GitHub context. This is useful for debugging the workflow.
Here is an example workflow definition using debug
:
on:
release:
types: [published]
jobs:
Workflow:
runs-on: ubuntu-latest
steps:
- uses: caltechlibrary/waystation@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
dry_run: true
debug: true
This corresponds to the parameter saveErrors
in the Wayback Machine GitHub Action. A value of true
will make the action tell the Wayback Machine to save web pages that return an HTTP status code in the range 4xx or 5xx. The default is false
.
This corresponds to the parameter saveOutlinks
in the Wayback Machine GitHub Action. A value of true
will make the action tell the Wayback Machine to archive external pages that are linked to from your GitHub Pages. The default in Waystation is true
(unlike the default in the Wayback Machine GitHub Action) because the author finds this useful in producing a more complete archive of a GitHub Pages site.
This corresponds to the parameter saveScreenshot
in the Wayback Machine GitHub Action. A value of true
will make the action tell the Wayback Machine to save a screenshot of the page located at the GitHub Pages URL. The default in Waystation is true
(unlike the default in the Wayback Machine GitHub Action) because the author finds this useful in producing a more complete archive of a GitHub Pages site.
If you find an issue, please submit it in the GitHub issue tracker for this repository.
Your help and participation in enhancing Waystation is welcome! Please visit the guidelines for contributing for some tips on getting started.
Software produced by the Caltech Library is Copyright © 2022 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.
This work was funded by the California Institute of Technology Library.
Waystation makes use of the excellent Wayback Machine GitHub Action by Jaime Magee.