This extension provides maintenance scripts to automate updates to an external (Google Sheets) database of all external links and related data, in particular status data pertaining to the links' "health".
- Download the extension
- Run
composer install --no-dev
in the extension's directory - Add
wfLoadExtension( 'KZBrokenLinks' )
toLocalSettings.php
or your custom PHP config file
Main Key | sub-key | default | description |
---|---|---|---|
$wgKZBrokenLinksGoogleConfig | keyPath |
empty | local path to Google Client authentication key JSON |
$wgKZBrokenLinksGoogleConfig | sheetId |
empty | ID of the Google Sheets document to sync to |
$wgKZBrokenLinksGoogleConfig | rateLimit |
60 | Maximum Google API callouts per minute |
$wgKZBrokenLinksHttpConfig | proxy |
empty | optional proxy configuration for HTTP callouts |
$wgKZBrokenLinksHttpConfig | timeout |
30 | timeout in seconds for HTTP callouts |
$wgKZBrokenLinksHttpConfig | agent |
Kol-Zchut Broken Links HealthCheckLinks | agent name for HTTP callouts |
$wgKZBrokenLinksHttpConfig | followRedirects |
true | Should HTTP redirects be followed |
$wgKZBrokenLinksHttpConfig | excludedProtocols |
empty | array of protocols to exclude from link health checks (e.g., ftp) |
The appropriate Google Sheet can be created by uploading the included google-sheets-template.xslx
;
make sure it is converted to a native Google Sheet, otherwise the extension won't be able to use it.
Usage: php extensions/KZBrokenLinks/maintenance/SyncLinksSheet.php --chunksize={chunk_size} --maxlinks={maxlinks}
Parameter | Type | Description |
---|---|---|
chunksize | Integer | Maximum number of external links to sync from Mediawiki to Google Sheets per API call (default 500) |
maxlinks | Integer | Maximum number of external links to sync before exiting (default unlimited) |
Usage: php extensions/KZBrokenLinks/maintenance/HealthCheckLinks.php --runtime={runtime} --maxlinks={maxlinks}
Parameter | Type | Description |
---|---|---|
runtime | Integer | Maximum number of seconds to execute before exiting (default 300) |
maxlinks | Integer | Maximum number of links to process before exiting (default unlimited) |
batchsize | Integer | Maximum number of link status rows per callout to the Google Sheets batch update API (default 20) |
querysize | Integer | Maximum number of rows to query per callout to the Google Sheets get API (default 1000) |