-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEVPROD-11215 Create benchmarks for TS bucket-level optimizations #1262
Open
mattBoros
wants to merge
9
commits into
master
Choose a base branch
from
DEVPROD-11215
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+337
−10
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ecbb338
three new benchmarks for ts
mattBoros 4137a3a
Merge branch 'master' into DEVPROD-11215
mattBoros 4770ceb
benchmark for in extended range
mattBoros f83bf1b
cleanup
mattBoros e33bff0
auto-tasks-local
mattBoros 6c5a95b
generate docs
mattBoros 6d291a2
update benchmarks
mattBoros 9b100c3
remove extra benchmark
mattBoros d0f9394
Change owner to QI
mattBoros File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
SchemaVersion: 2018-07-01 | ||
Owner: Query Integration | ||
Description: | | ||
This workload runs queries on time-series collections with data before the unix epoch (extended range). | ||
Some optimizations can only be made on post-1970 data. This benchmark is intended to give us an idea | ||
of how much performance we lose on extended range data. | ||
|
||
Keywords: | ||
- timeseries | ||
- aggregate | ||
|
||
GlobalDefaults: | ||
Database: &database test | ||
Collection: &collection Collection0 | ||
DocumentCount: &documentCount 1e7 | ||
Repeat: &repeat 200 | ||
Threads: &threads 1 | ||
MaxPhases: &maxPhases 6 | ||
MetaCount: &metaCount 10 | ||
|
||
Clients: | ||
Default: | ||
QueryOptions: | ||
maxPoolSize: 400 | ||
|
||
Actors: | ||
# Clear any pre-existing collection state. | ||
- Name: ClearCollection | ||
Type: CrudActor | ||
Database: *database | ||
Threads: 1 | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [0] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Threads: 1 | ||
Collection: *collection | ||
Operations: | ||
- OperationName: drop | ||
|
||
- Name: CreateTimeseriesCollection | ||
Type: RunCommand | ||
Threads: 1 | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [1] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Database: *database | ||
Operation: | ||
OperationMetricsName: CreateTimeseriesCollection | ||
OperationName: RunCommand | ||
OperationCommand: | ||
{ | ||
create: *collection, | ||
timeseries: | ||
{ | ||
timeField: "time", | ||
metaField: "meta", | ||
granularity: "seconds", | ||
}, | ||
} | ||
|
||
- Name: InsertData | ||
Type: Loader | ||
Threads: 1 | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [2] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Threads: 1 | ||
Database: *database | ||
CollectionCount: 1 | ||
DocumentCount: *documentCount | ||
BatchSize: 1000 | ||
Document: | ||
time: | ||
^IncDate: | ||
start: 1960-01-01 | ||
# 100ms step ensures full bucket of 1000 documents under the "seconds" granularity. | ||
step: 100 | ||
meta: | ||
^Cycle: | ||
ofLength: *metaCount | ||
fromGenerator: | ||
^RandomString: | ||
length: 6 | ||
alphabet: "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
|
||
# Phase 2: Ensure all data is synced to disk. | ||
- Name: Quiesce | ||
Type: QuiesceActor | ||
Threads: 1 | ||
Database: *database | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [3, 5] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Threads: 1 | ||
|
||
# The control.min.time field can be used as an accurate bucket minimum if it's not an object or | ||
# an array. | ||
- Name: BlockProcessingExtendedRangeMinTime | ||
Type: CrudActor | ||
Database: *database | ||
Threads: *threads | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [4] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: *repeat | ||
Database: *database | ||
Collection: *collection | ||
Operations: | ||
- OperationMetricsName: TsBlockExtendedRangeMinTime | ||
OperationName: aggregate | ||
OperationCommand: | ||
Pipeline: | ||
[ | ||
{$project: {time: 1, meta: 1}}, | ||
{$group: {_id: "$meta", gb: {$min: "$time"}}} | ||
] | ||
|
||
# The control.max.time field can only be used as an accurate bucket maximum if its after 1970. | ||
- Name: BlockProcessingExtendedRangeMaxTime | ||
Type: CrudActor | ||
Database: *database | ||
Threads: *threads | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [6] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: *repeat | ||
Database: *database | ||
Collection: *collection | ||
Operations: | ||
- OperationMetricsName: TsBlockExtendedRangeMaxTime | ||
OperationName: aggregate | ||
OperationCommand: | ||
Pipeline: | ||
[ | ||
{$project: {time: 1, meta: 1}}, | ||
{$group: {_id: "$meta", gb: {$max: "$time"}}} | ||
] | ||
|
||
AutoRun: | ||
- When: | ||
mongodb_setup: | ||
$eq: | ||
- replica | ||
- replica-80-feature-flags | ||
- replica-all-feature-flags | ||
branch_name: | ||
$gte: v8.0 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both extended range and not extended range won't be applicable for $group rewrites with $min on the timeField because it's a rounded down value right? So we shouldn't be losing performance here between extended range and not. I'm not saying this isn't relevant here but it should be around the same for extended range and normal time-series collections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea! That's true. I was thinking it's probably best to cover both $min/$max instead of just $max, while were writing these benchmarks. Like if in the future we run into a similar $min bug, we'll already have this benchmark and it's perf history.