Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: more fully undo addOuterBuckets in stripOuterBuckets #135310

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Nov 15, 2024

Near the end of stats.(*histogram).adjustCounts, if there is leftover DistinctCount after adjusting each histogram bucket, we call addOuterBuckets. This function creates two "outer" buckets that fully cover the lower and upper ends of the domain with the remaining DistinctCount.

These "outer" buckets interfere with partial stats, so we remove them with stripOuterBuckets when merging partial stats. stripOuterBuckets was simply slicing off the two buckets, but this could potentially leave non-zero DistinctCount in the first bucket (which is added by addOuterBuckets). stripOuterBuckets needs to also zero the DistinctCount and NumRange of the first bucket to fully undo the actions of addOuterBuckets.

We were only catching this when trying to forecast using the merged histogram, which would sometimes produce a forecasted histogram with non-zero DistinctRange in the first bucket, leading to the infamous "histogram had first bucket with non-zero NumRange or DistinctRange" Sentry failure.

Unfortunately, I don't think this bug fully explains all of the Sentry failures that look like #93892 because many of those predate the introduction of partial stats. I believe this specific failure started with v24.3.0-alpha.0.

Informs: #93892
Fixes: #134031

(No release note because automatic collection of partial stats is not yet enabled in any available release.)

Release note: None

@michae2 michae2 requested review from rytaft, yuzefovich and a team November 15, 2024 18:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 added the backport-24.3.x Flags PRs that need to be backported to 24.3 label Nov 15, 2024
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! :lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)


pkg/sql/stats/merge.go line 109 at r1 (raw file):

// [{1, 0, 0, 2}, {1, 0, 0, 3}, {1, 0, 0, 4}]
//
// Partial Statistic: {row, 8, dist: 4, null: 0, size: 1}

nit: s/row, 8/row: 8/ 😃

Near the end of `stats.(*histogram).adjustCounts`, if there is leftover
DistinctCount after adjusting each histogram bucket, we call
`addOuterBuckets`. This function creates two "outer" buckets that fully
cover the lower and upper ends of the domain with the remaining
DistinctCount.

These "outer" buckets interfere with partial stats, so we remove them
with `stripOuterBuckets` when merging partial stats. `stripOuterBuckets`
was simply slicing off the two buckets, but this could potentially leave
non-zero DistinctCount in the first bucket (which is added by
`addOuterBuckets`). `stripOuterBuckets` needs to also zero the
DistinctCount and NumRange of the first bucket to fully undo the actions
of `addOuterBuckets`.

We were only catching this when trying to forecast using the merged
histogram, which would sometimes produce a forecasted histogram with
non-zero DistinctRange in the first bucket, leading to the infamous
"histogram had first bucket with non-zero NumRange or DistinctRange"
Sentry failure.

Unfortunately, I don't think this bug fully explains all of the Sentry
failures that look like cockroachdb#93892 because many of those predate the
introduction of partial stats. I believe this specific failure started
with v24.3.0-alpha.0.

Informs: cockroachdb#93892
Fixes: cockroachdb#134031

(No release note because automatic collection of partial stats is not yet
enabled in any available release.)

Release note: None
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=yuzefovich

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @yuzefovich)


pkg/sql/stats/merge.go line 109 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/row, 8/row: 8/ 😃

lol, how did you see this? nice. done.

craig bot pushed a commit that referenced this pull request Nov 16, 2024
135310: sql/stats: more fully undo addOuterBuckets in stripOuterBuckets r=yuzefovich a=michae2

Near the end of `stats.(*histogram).adjustCounts`, if there is leftover DistinctCount after adjusting each histogram bucket, we call `addOuterBuckets`. This function creates two "outer" buckets that fully cover the lower and upper ends of the domain with the remaining DistinctCount.

These "outer" buckets interfere with partial stats, so we remove them with `stripOuterBuckets` when merging partial stats. `stripOuterBuckets` was simply slicing off the two buckets, but this could potentially leave non-zero DistinctCount in the first bucket (which is added by `addOuterBuckets`). `stripOuterBuckets` needs to also zero the DistinctCount and NumRange of the first bucket to fully undo the actions of `addOuterBuckets`.

We were only catching this when trying to forecast using the merged histogram, which would sometimes produce a forecasted histogram with non-zero DistinctRange in the first bucket, leading to the infamous "histogram had first bucket with non-zero NumRange or DistinctRange" Sentry failure.

Unfortunately, I don't think this bug fully explains all of the Sentry failures that look like #93892 because many of those predate the introduction of partial stats. I believe this specific failure started with v24.3.0-alpha.0.

Informs: #93892
Fixes: #134031

(No release note because automatic collection of partial stats is not yet enabled in any available release.)

Release note: None

Co-authored-by: Michael Erickson <[email protected]>
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @rytaft)


pkg/sql/stats/merge.go line 109 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

lol, how did you see this? nice. done.

Was reading through the comments to understand what's happening in the code :)

@craig
Copy link
Contributor

craig bot commented Nov 16, 2024

Build failed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
3 participants