Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tablemetadata: track more job metrics and update job progress while running #131426

Merged
merged 1 commit into from
Oct 4, 2024

Conversation

kyle-a-wong
Copy link
Contributor

@kyle-a-wong kyle-a-wong commented Sep 26, 2024

tablemetadatacache: update progress of update tmj

updates the job progress value of the
update table metadata job as it processes
batches of tables. The progress will not be
updated on every successful batch update, but
will instead be updated every nth batch, where
n is defined by batchesPerProgressUpdate. This
is done because each batch executes relatively
quickly, and it is unnecessary to provide such
granular updates to the progress, each of which
results in a write to the database.

Part of: #130249
Epic: CRDB-37558
Release note: None


tablemetadatacache: add metrics to tmj

Adds additional metrics to the update table metadata
job:

  • UpdatedTables - The total number of table rows
    written to system.table_metadata
  • Errors - The total number of errors emitted
    from job runs
  • Duration - The time spent executing the job

Part of: #130249
Epic: CRDB-37558
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kyle-a-wong kyle-a-wong marked this pull request as ready for review September 26, 2024 17:33
@kyle-a-wong kyle-a-wong requested review from a team as code owners September 26, 2024 17:33
@kyle-a-wong kyle-a-wong removed the request for review from a team September 26, 2024 17:33
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @kyle-a-wong, and @xinhaoz)


-- commits line 3 at r1:
It's helpful to have a commit message even on first review so I can compare your stated intent to the code.


pkg/sql/tablemetadatacache/table_metadata_updater.go line 49 at r1 (raw file):

) (updated int, err error) {
	it := newTableMetadataBatchIterator(u.ie)
	estimatedRowsToUpdate, err := u.getRowsToUpdateCount(ctx)

is there no way to infer the estimate from the iterator?


pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go line 185 at r1 (raw file):

	}

	resumer.updateProgress(ctx, .01)

why is this necessary?


pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go line 197 at r1 (raw file):

	return func(ctx context.Context, totalRowsToUpdate int, rowsUpdated int) {
		batchNum++
		// Only update progress if the update will happen in more than 10 batches.

is there a cost to doing lots of updates that's not worth paying? I think explaining why you have this shortcut here is helpful.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @kyle-a-wong, and @xinhaoz)


pkg/sql/tablemetadatacache/update_table_metadata_cache_job_test.go line 100 at r1 (raw file):

}

func TestUpdateTableMetadataCacheProgressUpdate(t *testing.T) {

This is a pretty heavy e2e test. I assume we already have coverage for the job directly, could we for instance pass a fake resumer to the real callback and test it that way?

Copy link
Contributor Author

@kyle-a-wong kyle-a-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons, @dhartunian, and @xinhaoz)


pkg/sql/tablemetadatacache/table_metadata_updater.go line 49 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

is there no way to infer the estimate from the iterator?

As it stands, there isn't a way to infer this. The iterator is essentially just using cursor based pagination and continues querying until no more results are found.


pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go line 185 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

why is this necessary?

I guess its not necessary, I think its a nice to have to show that the job has done something


pkg/sql/tablemetadatacache/update_table_metadata_cache_job.go line 197 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

is there a cost to doing lots of updates that's not worth paying? I think explaining why you have this shortcut here is helpful.

Updated to provide a comment explaining why. Ultimately, it feels unnecessary to constantly update the progress at such a granular level if customer's aren't going to be checking progress at such a high frequency (think <10ms)


pkg/sql/tablemetadatacache/update_table_metadata_cache_job_test.go line 100 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

This is a pretty heavy e2e test. I assume we already have coverage for the job directly, could we for instance pass a fake resumer to the real callback and test it that way?

Rewrote the test to not require full e2e testing as you suggested

@kyle-a-wong kyle-a-wong changed the title tablemetadatacache: update progress of update tmj tablemetadata: track more job metrics and update job progress while running Sep 27, 2024
@kyle-a-wong kyle-a-wong force-pushed the tmj_progress branch 2 times, most recently from bc58ef4 to 1a7b2c8 Compare September 30, 2024 14:18
Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit.

Comment on lines 48 to 49
// newTableMetadataUpdater creates a new iTableMetadataUpdater.
var newTableMetadataUpdater = func(job *jobs.Job, metrics *TableMetadataUpdateJobMetrics, ie isql.Executor) iTableMetadataUpdater {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we continue to return the concrete type here? I believe it's typically preferred over returning the interface type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pkg/sql/tablemetadatacache/table_metadata_updater.go Outdated Show resolved Hide resolved
@kyle-a-wong kyle-a-wong force-pushed the tmj_progress branch 6 times, most recently from d60359c to fe3bc2f Compare October 3, 2024 18:33
Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these comments are blocking, feel free to merge after addressing. Thanks for working on this!


// Create a new job so that we don't have to wait for the existing one be claimed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

// Create a new job so that we don't have to wait for the existing one to be claimed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering - do we need to disable the creation of the job at startup with this approach? Wondering if we should just disable the automatic startup of this job for the test since we are calling the updater directly.


// CreateTestingKnobs creates a testing knob in the unit tests.
//
// Note: The table metadata update jopb uses follower read (AS OF SYSTEM TIME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: jopb -> job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// registered migrations and creates the stats system tables. By using follower
// read, we shift the transaction read timestamp far enough to the past. This
// means it is possible in the unit tests, the read timestamp would be chosen to
// be before the creation of the stats table. This can cause 'descriptor not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just mean system table here instead of stats table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return &tableMetadataUpdater{ie: ie, upsertQuery: newTableMetadataBatchUpsertQuery(tableBatchSize)}
var _ tablemetadatacacheutil.ITableMetadataUpdater = &tableMetadataUpdater{}

// newTableMetadataUpdater creates a new iTableMetadataUpdater.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the comment here to remove ITableMetadataUpdater.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

// RunUpdater prunes then updates table_metadata and records metrics for the run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead we can add a comment here referencing that RunUpdater implements ITableMetadataUpdater.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


import "context"

// ITableMetadataUpdater is an interface that exposes a RunUpdater
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this defined in the util pkg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed in util because it used in TestingKnobs which is also defined in util and would cause a circular dependency otherwise. I tried moving TestingKnobs into tablemetadatacache, but that also introduces a circular dependency of tablemetadatacache -> sql -> tablemetadatacache 😞. It'd be nice to not need to depend on the sql package, but it looks like it is needed in order to get JobExecContext

updates the job progress value of the
update table metadata job as it processes
batches of tables. The progress will not be
updated on every successful batch update, but
will instead be updated every nth batch, where
n is defined by `batchesPerProgressUpdate`. This
is done because each batch executes relatively
quickly, and it is unnecessary to provide such
granular updates to the progress, each of which
results in a write to the database.

Adds additional metrics to the update table metadata
job:
  * UpdatedTables - The total number of table rows
                    written to system.table_metadata
  * Errors        - The total number of errors emitted
                    from job runs
  * Duration      - The time spent executing the job

Part of: cockroachdb#130249
Epic: CRDB-37558
Release note: None
@kyle-a-wong
Copy link
Contributor Author

Tftr

bors r+

@craig craig bot merged commit 74af8a5 into cockroachdb:master Oct 4, 2024
23 checks passed
@kyle-a-wong kyle-a-wong deleted the tmj_progress branch October 4, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants