Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make principal web archive capture optional? #25

Open
matteocargnelutti opened this issue Oct 28, 2022 · 5 comments
Open

Make principal web archive capture optional? #25

matteocargnelutti opened this issue Oct 28, 2022 · 5 comments

Comments

@matteocargnelutti
Copy link
Collaborator

matteocargnelutti commented Oct 28, 2022

Should it be possible to skip the web capture step?

Potential use case: only capturing provenance summary, screenshot, pdf snapshot and video extraction on a given web page?

@matteocargnelutti matteocargnelutti changed the title Feature | Make principal web archive capture optional (?) Make principal web archive capture optional (?) Feb 19, 2023
@matteocargnelutti matteocargnelutti changed the title Make principal web archive capture optional (?) Make principal web archive capture optional? Mar 8, 2023
@edsu
Copy link

edsu commented Jan 25, 2024

Is the idea that it would cut down on the amount of storage?

@mdellabitta
Copy link

I can't address your question, but wanted to say: Nice to see you here, @edsu!

@matteocargnelutti
Copy link
Collaborator Author

Hi @edsu!

Is the idea that it would cut down on the amount of storage?

It is more to account for use cases that do not revolve around capturing HTTP exchanges in a WARC.
For example, some users might just want to make a PDF capture or screenshot of a web page using Scoop, and only care about that artifact.

@edsu
Copy link

edsu commented Jan 25, 2024

But don't you need to do the HTTP exchanges to generate the screenshot?

@matteocargnelutti
Copy link
Collaborator Author

@edsu Yes and no.

  • Yes: the HTTP exchanges will pass through the proxy as Scoop navigates to the page to take the screenshot
  • No: If I am only interested in the screenshot, I don't need to record these HTTP exchanges, and can also skip some intermediate steps, for example some of the browser behaviors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants