[Bug]: a list of different QA issues #2336

tuehlarsen · 2025-01-23T12:35:39Z

Browsertrix Version

v1.13.2-a21b2ff

What did you expect to happen? What happened instead?

First of all - the QA beta setup is fantastic - there are some issues, scalling issues and new wishes.
Thanks for your great work! I have not split my issues up in separate issues - sorry for that...

QA of of tv2.dk frontpage with 1 hop out and using a browserprofile with an accept cookie ( the crawl took 16 minuts with 6 crawler windows).
QA analysis took 48 minuts for 183 pages and 3-4 GB

the replay of the missing video hangs for ever.
I dont know why. Could't reproduce it for most of the other pages.
2)
It does not save when i label it as not ok in the left column.
But have done it here:

This here is ok even though it says 60 % differs in screen compare

If you click page refresh it looses all labels:

But if you go back to crawls and click QA again they are back:

This is actually ok - you can replay the page and it's only one page which have this issue.

Here the bad screen dump score seems to be caused by different load positions in the screen comparation:

There is a lot of these examples.

Why has it changed the time with 1 hour and why is the text missing during text analysis:

How can I fix or update the missing parts?

I'm most interested in the biggest issues and how to fix it. I need some way to focus my QA on that. I need some GUI support for that
I need also the possibility to update or label many pages in one step.
How can i set a crawl to be automaticaly QA analysed after have been crawled?
I would also like to run QA analysis on big jobs with many 100GB's and 1000 of pages - I dont think it is possible today or reserve our local platform for days of QA analysis. Will that be possible in the future? Any idees to how we can seperate the io and cpu usage from the crawling platform?

Here is some futher test with a bigger tv2 crawl >100GB see attached pdf

screencapture-kb-dk-atlassian-net-wiki-spaces-NARK-pages-599589388-Considerations-regarding-automatic-and-manual-QA-of-deeper-tv2-dk-harvesting-2025-02-12-15_24_51.pdf

Reproduction instructions

see above

Screenshots / Video

No response

Environment

No response

Additional details

No response

tuehlarsen added the bug Something isn't working label Jan 23, 2025

github-project-automation bot added this to Webrecorder Projects Jan 23, 2025

github-project-automation bot moved this to Triage in Webrecorder Projects Jan 23, 2025

rien333 mentioned this issue Jan 27, 2025

[Feature]: Automatically run QA after crawl #2337

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: a list of different QA issues #2336

[Bug]: a list of different QA issues #2336

tuehlarsen commented Jan 23, 2025 •

edited

Loading

[Bug]: a list of different QA issues #2336

[Bug]: a list of different QA issues #2336

Comments

tuehlarsen commented Jan 23, 2025 • edited Loading

Browsertrix Version

What did you expect to happen? What happened instead?

Reproduction instructions

Screenshots / Video

Environment

Additional details

tuehlarsen commented Jan 23, 2025 •

edited

Loading