Rerun 2022_05_01

- [x] Make a copy of the first crawl's data on GCS to not be overwritten @giancarloaf 
- [x] Make a copy of the first crawl's data on BigQuery @giancarloaf 
- [x] Enable crawling 1 level of secondary pages @pmeenan 
- [x] Add the ability to distinguish between primary/secondary pages in the Dataflow pipeline @giancarloaf _waiting on #12_ 
- [x] Add metadata to identify the original test URL to the HAR @pmeenan 
- [x] Update the Dataflow pipeline to parse the page `url`, `pageid`, and `requestid` fields from the metadata above @giancarloaf
- [x] Flush Pub/Sub queue before starting the crawl @giancarloaf 
- [x] Restart the summary pipeline @giancarloaf 
- [x] Start the second crawl using the same URLs as before @pmeenan 

Anything else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rerun 2022_05_01 #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rerun 2022_05_01 #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions