Skip to content

Add CI for checking for broken links manually, weekly and in PRs #5

Add CI for checking for broken links manually, weekly and in PRs

Add CI for checking for broken links manually, weekly and in PRs #5

Workflow file for this run

name: Check URLs
on:
workflow_dispatch:
schedule:
- cron: '17 5 * * 0' # 5:17 AM every Sunday
pull_request:
branches: [ main ]
env:
ignore_url_patterns: |
http://localhost:4000
https://preview.bssw.io
https://github.com/<your-github-handle>
ignore_file_patterns: |
docs/
images/
utils/
Events/
jobs:
check-urls:
runs-on: ubuntu-latest
steps:
- name: Reformat environment variables
id: setup_vars
run: |
tmp=$(echo "${{ env.ignore_url_patterns }}" | tr '\n' ',' | sed -e 's/,,$//')
echo "ignore_url_patterns=$tmp" >> $GITHUB_OUTPUT
tmp=$(echo "${{ env.ignore_file_patterns }}" | tr '\n' ',' | sed -e 's/,,$//')
echo "ignore_file_patterns=$tmp" >> $GITHUB_OUTPUT
- name: Checkout Repository
uses: actions/checkout@v4
- name: Get Changed Files (for PRs)
if: ${{ github.event_name == 'pull_request' }}
id: changed-files
uses: tj-actions/changed-files@v42
with:
separator: ','
- name: Generate lists of files to check and ignore
id: file_list
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
echo "files=${{ steps.changed-files.outputs.all_changed_files }}" >> $GITHUB_OUTPUT
echo "ignore_file_patterns=" >> $GITHUB_OUTPUT
else
echo "files=" >> $GITHUB_OUTPUT
echo "ignore_file_patterns= ${{ steps.setup_vars.outputs.ignore_file_patterns }}" >> $GITHUB_OUTPUT
fi
- name: Check URLs in selected files
uses: urlstechie/urlchecker-action@update/0.0.35
with:
# Work only on markdown files
file_types: .md
# Choose whether to include file with no URLs in the prints.
print_all: false
# More verbose summary at the end of a run
verbose: true
# How many times to retry a failed request (defaults to 1)
retry_count: 3
# Google Forms is having enormous timeouts
timeout: 10
# Ignore certificate issues
no_check_certs: True
# Exclude these patterns from the checker
exclude_patterns: ${{ steps.setup_vars.outputs.ignore_url_patterns }}
# Exclude these dirs and files
exclude_files: "${{ steps.file_list.outputs.ignore_file_patterns }}"
# Operate only on the specific list of files selected above
include_files: "${{ steps.file_list.outputs.files }}"
#
# Description:
#
# Triggers in one of three ways; 1) manually, 2) scheduled weekly Sunday's 5:17 AM
# or 3) pull request
#
# Stores ignore pattern cases in env. variables and then reformats those (because
# they have newlines) into a comma-separated single line string that can be digested
# as inputs to other actions.
#
# For PRs, uses changed-files action to get list of changed files and passes this
# to urlchecker via `include_files param. Also, ignore file patterns is set to
# empty string for PRs because we think URLs anywhere in PRs should be checked.
#
# For scheduled or manual triggers, uses fact that empty `include_files` param
# causes urlchecker to process *all* files that match in `file_type` param but do
# not match any `exclude_files` patterns. These file patterns for exclude work
# more or less like file globs. So, specifying the initial part of the string
# for a file (path) name is sufficient to ignore the file.
#
# We include Events in file patterns to ignore because of all content we host,
# we suspect Event URLs are the most likely to go stale rather quickly **and** because
# the URL validness is important only during the short window prior to the event.
# That said, we don't want to ignore Events in PRs and we do not as per above.
#