Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: a list of different QA issues #2336

Open
tuehlarsen opened this issue Jan 23, 2025 · 0 comments
Open

[Bug]: a list of different QA issues #2336

tuehlarsen opened this issue Jan 23, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@tuehlarsen
Copy link

tuehlarsen commented Jan 23, 2025

Browsertrix Version

v1.13.2-a21b2ff

What did you expect to happen? What happened instead?

First of all - the QA beta setup is fantastic - there are some issues, scalling issues and new wishes.
Thanks for your great work! I have not split my issues up in separate issues - sorry for that...

QA of of tv2.dk frontpage with 1 hop out and using a browserprofile with an accept cookie ( the crawl took 16 minuts with 6 crawler windows).
QA analysis took 48 minuts for 183 pages and 3-4 GB

Image
Image the replay of the missing video hangs for ever.
I dont know why. Could't reproduce it for most of the other pages.
2)
Image It does not save when i label it as not ok in the left column.
But have done it here:
Image
This here is ok even though it says 60 % differs in screen compare
Image
If you click page refresh it looses all labels:
Image
But if you go back to crawls and click QA again they are back:
Image
This is actually ok - you can replay the page and it's only one page which have this issue.
Image

Here the bad screen dump score seems to be caused by different load positions in the screen comparation:
Image
There is a lot of these examples.

Why has it changed the time with 1 hour and why is the text missing during text analysis:
Image
How can I fix or update the missing parts?

  1. I'm most interested in the biggest issues and how to fix it. I need some way to focus my QA on that. I need some GUI support for that

  2. I need also the possibility to update or label many pages in one step.

  3. How can i set a crawl to be automaticaly QA analysed after have been crawled?

  4. I would also like to run QA analysis on big jobs with many 100GB's and 1000 of pages - I dont think it is possible today or reserve our local platform for days of QA analysis. Will that be possible in the future? Any idees to how we can seperate the io and cpu usage from the crawling platform?

Here is some futher test with a bigger tv2 crawl >100GB see attached pdf

screencapture-kb-dk-atlassian-net-wiki-spaces-NARK-pages-599589388-Considerations-regarding-automatic-and-manual-QA-of-deeper-tv2-dk-harvesting-2025-02-12-15_24_51.pdf

Reproduction instructions

see above

Screenshots / Video

No response

Environment

No response

Additional details

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Triage
Development

No branches or pull requests

1 participant