Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 479 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 479 Bytes

wat-benchmark

This repository acts as a Hello World for working with WARC files.

Its subfolders contain implementations that fetch a WARC file and search all captures from .com domains for a regex that detects YouTube links.

See also the blog post.

This is not bulletproof, production-ready code - I/O retries, closing resources and robust character decoding is omitted to focus on the WARC aspect of the code.