Skip to content
You're viewing an older version of this GitHub Action. Do you want to see the latest version instead?
camera

GitHub Action

Waystation

v1.2

Waystation

camera

Waystation

Archive a repository's GitHub Pages in the Wayback Machine

Installation

Copy and paste the following snippet into your .yml file.

              

- name: Waystation

uses: caltechlibrary/[email protected]

Learn more about this action in caltechlibrary/waystation

Choose a version

Waystation

Waystation is a GitHub Action that makes it easy to archive your repository's GitHub Pages site automatically in the Internet Archive's Wayback Machine.

License Latest release

Table of contents

Introduction

Many projects use GitHub Pages for documentation and other purposes. GitHub Pages are wonderful, but they are not archived. To help ensure long-term access to your GitHub Pages, you may want to preserve them in the Internet Archive's Wayback Machine. That's the purpose of this GitHub Action.

How does Waystation work?

Waystation (Wayback site archiving automation) automates the task of sending your project's GitHub Pages URL to the Wayback Machine. It's intended to be triggered on software releases in your repository and uses the Wayback Machine GitHub Action to send your repository's configured GitHub Pages URL to the Wayback Machine, thereby ensuring that the latest copy of your site is archived. You can change the trigger condition if needed.

Why would you want to use it?

GitHub is incredibly popular today, but the content is not guaranteed to be permanent; moreover, GitHub has in the past changed the URLs and policies surrounding GitHub Pages—and may do so again in the future. The Wayback Machine is a free digital archive of the World Wide Web founded by the Internet Archive. Web pages saved in the Wayback Machine continue to exist even after the original project repository changes or is removed from the web, and they can be searched for, shared, and linked to normally. You can also view previous versions of a site if they were archived.

Installation

This action is available from the GitHub Marketplace. Once you find the page in the GitHub Marketplace, do the following:

  1. In the main branch of your repository, create a .github/workflows directory if this directory does not already exist.
  2. In the .github/workflows directory, create a file named archive-github-pages.yml.
  3. Paste the following content into the file:
    on:
      release:
        types: [published]
    jobs:
      Workflow:
        runs-on: ubuntu-latest
        steps:
          - uses: caltechlibrary/waystation@main
            with:
              GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  4. Save the file, add it to your git repository, and commit the changes.
  5. (If you did the steps above outside of GitHub) Push your repository changes to GitHub.

Refer to the next section for more information.

Usage

The trigger condition that causes Waystation to run is determined by the on statement in your archive-github-pages.yml workflow file.

Several parameters control the behavior of this GitHub Action; they are described below.

dry_run (default: false)

Setting the parameter dry_run to true will cause the action to execute without sending the URL to the Wayback Machine. This is useful during testing, especially if you want to try different trigger conditions.

Here is an example workflow definition using dry_run:

# .github/workflows/archive-github-pages.yml
on:
  release:
    types: [published]
jobs:
  Workflow:
    runs-on: ubuntu-latest
    steps:
      - uses: caltechlibrary/waystation@main
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          dry_run: true

debug (default: false)

Setting the parameter debug to true will cause the action to print the values of the input variables and the GitHub context. This is useful for debugging the workflow.

Here is an example workflow definition using debug:

on:
  release:
    types: [published]
jobs:
  Workflow:
    runs-on: ubuntu-latest
    steps:
      - uses: caltechlibrary/waystation@main
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          dry_run: true
          debug: true

save_errors (default: false)

This corresponds to the parameter saveErrors in the Wayback Machine GitHub Action. A value of true will make the action tell the Wayback Machine to save web pages that return an HTTP status code in the range 4xx or 5xx. The default is false.

save_outlinks (default: true)

This corresponds to the parameter saveOutlinks in the Wayback Machine GitHub Action. A value of true will make the action tell the Wayback Machine to archive external pages that are linked to from your GitHub Pages. The default in Waystation is true (unlike the default in the Wayback Machine GitHub Action) because the author finds this useful in producing a more complete archive of a GitHub Pages site.

save_screenshot (default: true)

This corresponds to the parameter saveScreenshot in the Wayback Machine GitHub Action. A value of true will make the action tell the Wayback Machine to save a screenshot of the page located at the GitHub Pages URL. The default in Waystation is true (unlike the default in the Wayback Machine GitHub Action) because the author finds this useful in producing a more complete archive of a GitHub Pages site.

Getting help

If you find an issue, please submit it in the GitHub issue tracker for this repository.

Contributing

Your help and participation in enhancing Waystation is welcome! Please visit the guidelines for contributing for some tips on getting started.

License

Software produced by the Caltech Library is Copyright © 2022 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.

Acknowledgments

This work was funded by the California Institute of Technology Library.

Waystation makes use of the excellent Wayback Machine GitHub Action by Jaime Magee.