Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarking Framework] Establish e2e benchmarking framework #255

Closed
zdeveloper opened this issue Sep 26, 2024 · 2 comments · Fixed by #318
Closed

[Benchmarking Framework] Establish e2e benchmarking framework #255

zdeveloper opened this issue Sep 26, 2024 · 2 comments · Fixed by #318
Assignees

Comments

@zdeveloper
Copy link
Collaborator

zdeveloper commented Sep 26, 2024

Establish e2e benchmarking framework by using the created reportvision-dataset-1
we want to automate the measurement or accuracy and speed for running that dataset into the ocr pipeline

Acceptance Criteria

  • write automated tests to measure e2e OCR reportVision accuracy and time to run against the reportvision-dataset-1
  • also measure the confidence accuracy and at what confidence level are the results not correct
  • create the templates manually or using frontend, creating the templates should not be part of the benchmark
  • store the benchmark with all required data/scripts on a benchmark/reportvision-dataset-1 with clear documentation on how to run it.

Additional context

  • for python example tests, look at OCR model benchmark tests
@zdeveloper zdeveloper added the OCR label Sep 26, 2024
@bora-skylight bora-skylight changed the title Establish e2e benchmarking framework [Benchmarking Framework] Establish e2e benchmarking framework Sep 26, 2024
@zdeveloper
Copy link
Collaborator Author

per @schreiaj, please save the segmentation templates so its also standardized for every run

@arinkulshi-skylight
Copy link
Collaborator

arinkulshi-skylight commented Oct 15, 2024

The steps taken to so far to address this ticket:

  • Created segmentation template for datasets, updated output of labels.json file
  • Created and implemented batch segmentation and OCR processing script
  • Added time measurement to OCR processing script
  • Create batch metrics for OCR pipeline
  • Create csv of leven + confidence for values where leven is greater than 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants