Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve progress in normalize and load steps #1567

Open
rudolfix opened this issue Jul 9, 2024 · 0 comments
Open

improve progress in normalize and load steps #1567

rudolfix opened this issue Jul 9, 2024 · 0 comments
Labels
community This issue came from slack community workspace tech-debt Leftovers from previous sprint that should be fixed over time

Comments

@rudolfix
Copy link
Collaborator

rudolfix commented Jul 9, 2024

Background
Progress reporting in normalize and load steps are far from perfect.

  1. in normalize we report progress on file level but that only is updated when a worker process is finished
  2. in load the reported metrics do not survive restarts (see implement LoadInfo and ExtractInfo missing tracing #853 )

Tasks
Step1. fix normalize:

  • use metrics collected in extract (per job and resource) to correctly report processed row per resource (where we have total number of records as well)
  • right now there's no communication between worker and main process. but we need to start reporting metrics back. so we need to update

Step 2. Fix load:

  • see implement LoadInfo and ExtractInfo missing tracing #853 use package state to track the elapsed times (task created, start, stop of job)
  • we are interested in following metrics to be displayed: jobs processed, average elapsed time, average lag (from job created to job started)

Implementation

  1. you'll need to use package state to store extract metrics (ExtractInfo) and normalize metrics
  2. if those elements are not present in the state you must fallback gracefully ie. reporting only the progress of the files. the job processing must be plain: if there are files they will be processed even if state is not present

ADDITIONAL THOUGHTS (@IlyaFaer ):
There are two different cases:

  • We extract and then normalize data - in this case we can take rows count from ExtractInfo
  • We normalize the data, extracted earlier
@rudolfix rudolfix added community This issue came from slack community workspace tech-debt Leftovers from previous sprint that should be fixed over time labels Jul 9, 2024
@IlyaFaer IlyaFaer self-assigned this Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community This issue came from slack community workspace tech-debt Leftovers from previous sprint that should be fixed over time
Projects
Status: Todo
Development

No branches or pull requests

2 participants