-
Notifications
You must be signed in to change notification settings - Fork 8
"How does it work?" diagrams (WIP)
Matteo Cargnelutti edited this page Mar 22, 2023
·
4 revisions
flowchart LR
A[Scoop]
B[Playwright]
C[Chromium]
D[Website]
E[HTTP Proxy]
A <--> |Controls| B
B <--> C
C <--> D
A <-.-> |Capture| E <-.-> C
Unless specified otherwise, everything the browser "sees" is captured via the HTTP proxy, which allows for the enforcement of time and size constraints while preserving partial responses.
flowchart LR
A[Scoop]
B[curl, yt-dlp ...]
C[Resource]
D[HTTP Proxy]
A <--> |Controls| B
B <--> C
A <-.-> |Capture| D <-.-> B
Unless specified otherwise, everything captured "out of band" goes through the HTTP proxy.
flowchart TD
A(Url)
B(Options)
A-->C
B-->C
C[Scoop class]
C-->D
D([Filter options])
D-->E
E([Filter url])
E-->F
F{{Ready to capture}}
- Filter options: Defaults are used for options that are not explicitly provided.
- Filter url: Url must be valid in format and not match against blocklist
stateDiagram-v2
state "Start Intercepter" as intercepter
state "Detect non-web resource" as nonwebdetect
state "Capture of non-web resource" as nonwebcapture
state "Initial page load" as pageload
state "Capture page info" as pageinfo
state "Browser scripts" as browserscripts
state "Network idle" as networkidle
state "Scroll up" as scrollup
state "Screenshot" as screenshot
state "DOM snapshot*" as domsnapshot
state "PDF snapshot*" as pdfsnapshot
state "Capture video(s) as attachments" as capturevideo
state "Detect noarchive directive" as detectnoarchive
state "Capture of certificates" as certscapture
state "Gather Provenance Info" as provenanceinfo
state "Teardown" as teardown
[*] --> intercepter
intercepter --> nonwebdetect
nonwebdetect --> nonwebcapture
nonwebdetect --> pageload
nonwebcapture --> certscapture
pageload --> pageinfo
pageinfo --> browserscripts
browserscripts --> networkidle
networkidle --> scrollup
scrollup --> screenshot
screenshot --> domsnapshot
domsnapshot --> pdfsnapshot
pdfsnapshot --> capturevideo
capturevideo --> detectnoarchive
detectnoarchive --> certscapture
certscapture --> provenanceinfo
provenanceinfo --> teardown
teardown --> [*]
- Steps marked with
*
are deactivated by default. - Unless specified otherwise
- Capture state is used determine if the next step should be run or not.
- Each steps counts towards the overall capture time and size limits, unless specified otherwise via an option flag. See options list for details.
- At the end of this capture process, Scoop holds everything it captured in memory, as state of the Scoop class.
🚧🚧🚧
🚧🚧🚧
🚧🚧🚧