-
Notifications
You must be signed in to change notification settings - Fork 92
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Generate links from the page dump instead of using the links dump.
This lets us ignore noisy links from footer sections (See Also, References, etc) and templates and simplifies the pipeline. Extracting links in Go takes ~160 CPU minutes and <8GB RAM. The 14GB multistream page has many independent bzip streams, allowing it to be processed in parallel in <30 real minutes on a 4C/8T workstation and potentially even less on a large VM instance.
- Loading branch information
Showing
5 changed files
with
446 additions
and
213 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.