-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tablemetadata: track more job metrics and update job progress while running #131426
Conversation
229373d
to
d396826
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @kyle-a-wong, and @xinhaoz)
-- commits
line 3 at r1:
It's helpful to have a commit message even on first review so I can compare your stated intent to the code.
pkg/sql/tablemetadatacache/table_metadata_updater.go
line 49 at r1 (raw file):
) (updated int, err error) { it := newTableMetadataBatchIterator(u.ie) estimatedRowsToUpdate, err := u.getRowsToUpdateCount(ctx)
is there no way to infer the estimate from the iterator?
pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go
line 185 at r1 (raw file):
} resumer.updateProgress(ctx, .01)
why is this necessary?
pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go
line 197 at r1 (raw file):
return func(ctx context.Context, totalRowsToUpdate int, rowsUpdated int) { batchNum++ // Only update progress if the update will happen in more than 10 batches.
is there a cost to doing lots of updates that's not worth paying? I think explaining why you have this shortcut here is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @kyle-a-wong, and @xinhaoz)
pkg/sql/tablemetadatacache/update_table_metadata_cache_job_test.go
line 100 at r1 (raw file):
} func TestUpdateTableMetadataCacheProgressUpdate(t *testing.T) {
This is a pretty heavy e2e test. I assume we already have coverage for the job directly, could we for instance pass a fake resumer
to the real callback and test it that way?
d396826
to
b6ea213
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @dhartunian, and @xinhaoz)
pkg/sql/tablemetadatacache/table_metadata_updater.go
line 49 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
is there no way to infer the estimate from the iterator?
As it stands, there isn't a way to infer this. The iterator is essentially just using cursor based pagination and continues querying until no more results are found.
pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go
line 185 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
why is this necessary?
I guess its not necessary, I think its a nice to have to show that the job has done something
pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go
line 197 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
is there a cost to doing lots of updates that's not worth paying? I think explaining why you have this shortcut here is helpful.
Updated to provide a comment explaining why. Ultimately, it feels unnecessary to constantly update the progress at such a granular level if customer's aren't going to be checking progress at such a high frequency (think <10ms)
pkg/sql/tablemetadatacache/update_table_metadata_cache_job_test.go
line 100 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
This is a pretty heavy e2e test. I assume we already have coverage for the job directly, could we for instance pass a fake
resumer
to the real callback and test it that way?
Rewrote the test to not require full e2e testing as you suggested
b6ea213
to
9b22fd0
Compare
bc58ef4
to
1a7b2c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First commit.
1a7b2c8
to
f3ec2b1
Compare
// newTableMetadataUpdater creates a new iTableMetadataUpdater. | ||
var newTableMetadataUpdater = func(job *jobs.Job, metrics *TableMetadataUpdateJobMetrics, ie isql.Executor) iTableMetadataUpdater { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we continue to return the concrete type here? I believe it's typically preferred over returning the interface type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/sql/tablemetadatacache/update_table_metadata_cache_job_test.go
Outdated
Show resolved
Hide resolved
d60359c
to
fe3bc2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of these comments are blocking, feel free to merge after addressing. Thanks for working on this!
|
||
// Create a new job so that we don't have to wait for the existing one be claimed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// Create a new job so that we don't have to wait for the existing one to be claimed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering - do we need to disable the creation of the job at startup with this approach? Wondering if we should just disable the automatic startup of this job for the test since we are calling the updater directly.
|
||
// CreateTestingKnobs creates a testing knob in the unit tests. | ||
// | ||
// Note: The table metadata update jopb uses follower read (AS OF SYSTEM TIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: jopb -> job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// registered migrations and creates the stats system tables. By using follower | ||
// read, we shift the transaction read timestamp far enough to the past. This | ||
// means it is possible in the unit tests, the read timestamp would be chosen to | ||
// be before the creation of the stats table. This can cause 'descriptor not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just mean system table here instead of stats table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return &tableMetadataUpdater{ie: ie, upsertQuery: newTableMetadataBatchUpsertQuery(tableBatchSize)} | ||
var _ tablemetadatacacheutil.ITableMetadataUpdater = &tableMetadataUpdater{} | ||
|
||
// newTableMetadataUpdater creates a new iTableMetadataUpdater. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the comment here to remove ITableMetadataUpdater
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
// RunUpdater prunes then updates table_metadata and records metrics for the run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead we can add a comment here referencing that RunUpdater implements ITableMetadataUpdater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
import "context" | ||
|
||
// ITableMetadataUpdater is an interface that exposes a RunUpdater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this defined in the util pkg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed in util because it used in TestingKnobs
which is also defined in util and would cause a circular dependency otherwise. I tried moving TestingKnobs into tablemetadatacache, but that also introduces a circular dependency of tablemetadatacache -> sql -> tablemetadatacache
😞. It'd be nice to not need to depend on the sql package, but it looks like it is needed in order to get JobExecContext
updates the job progress value of the update table metadata job as it processes batches of tables. The progress will not be updated on every successful batch update, but will instead be updated every nth batch, where n is defined by `batchesPerProgressUpdate`. This is done because each batch executes relatively quickly, and it is unnecessary to provide such granular updates to the progress, each of which results in a write to the database. Adds additional metrics to the update table metadata job: * UpdatedTables - The total number of table rows written to system.table_metadata * Errors - The total number of errors emitted from job runs * Duration - The time spent executing the job Part of: cockroachdb#130249 Epic: CRDB-37558 Release note: None
fe3bc2f
to
efc635a
Compare
Tftr bors r+ |
tablemetadatacache: update progress of update tmj
updates the job progress value of the
update table metadata job as it processes
batches of tables. The progress will not be
updated on every successful batch update, but
will instead be updated every nth batch, where
n is defined by
batchesPerProgressUpdate
. Thisis done because each batch executes relatively
quickly, and it is unnecessary to provide such
granular updates to the progress, each of which
results in a write to the database.
Part of: #130249
Epic: CRDB-37558
Release note: None
tablemetadatacache: add metrics to tmj
Adds additional metrics to the update table metadata
job:
written to system.table_metadata
from job runs
Part of: #130249
Epic: CRDB-37558
Release note: None