-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEVPROD-11215 Create benchmarks for TS bucket-level optimizations #1262
base: master
Are you sure you want to change the base?
Changes from 8 commits
ecbb338
4137a3a
4770ceb
f83bf1b
e33bff0
6c5a95b
6d291a2
9b100c3
d0f9394
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
SchemaVersion: 2018-07-01 | ||
Owner: Query Execution | ||
Description: | | ||
This workload runs queries on time-series collections with data before the unix epoch (extended range). | ||
Some optimizations can only be made on post-1970 data. This benchmark is intended to give us an idea | ||
of how much performance we lose on extended range data. | ||
|
||
Keywords: | ||
- timeseries | ||
- aggregate | ||
|
||
GlobalDefaults: | ||
Database: &database test | ||
Collection: &collection Collection0 | ||
DocumentCount: &documentCount 1e7 | ||
Repeat: &repeat 200 | ||
Threads: &threads 1 | ||
MaxPhases: &maxPhases 6 | ||
MetaCount: &metaCount 10 | ||
|
||
Clients: | ||
Default: | ||
QueryOptions: | ||
maxPoolSize: 400 | ||
|
||
Actors: | ||
# Clear any pre-existing collection state. | ||
- Name: ClearCollection | ||
Type: CrudActor | ||
Database: *database | ||
Threads: 1 | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [0] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Threads: 1 | ||
Collection: *collection | ||
Operations: | ||
- OperationName: drop | ||
|
||
- Name: CreateTimeseriesCollection | ||
Type: RunCommand | ||
Threads: 1 | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [1] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Database: *database | ||
Operation: | ||
OperationMetricsName: CreateTimeseriesCollection | ||
OperationName: RunCommand | ||
OperationCommand: | ||
{ | ||
create: *collection, | ||
timeseries: | ||
{ | ||
timeField: "time", | ||
metaField: "meta", | ||
granularity: "seconds", | ||
}, | ||
} | ||
|
||
- Name: InsertData | ||
Type: Loader | ||
Threads: 1 | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [2] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Threads: 1 | ||
Database: *database | ||
CollectionCount: 1 | ||
DocumentCount: *documentCount | ||
BatchSize: 1000 | ||
Document: | ||
time: | ||
^IncDate: | ||
start: 1960-01-01 | ||
# 100ms step ensures full bucket of 1000 documents under the "seconds" granularity. | ||
step: 100 | ||
meta: | ||
^Cycle: | ||
ofLength: *metaCount | ||
fromGenerator: | ||
^RandomString: | ||
length: 6 | ||
alphabet: "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
|
||
# Phase 2: Ensure all data is synced to disk. | ||
- Name: Quiesce | ||
Type: QuiesceActor | ||
Threads: 1 | ||
Database: *database | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [3, 5] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: 1 | ||
Threads: 1 | ||
|
||
# The control.min.time field can be used as an accurate bucket minimum if it's not an object or | ||
# an array. | ||
- Name: BlockProcessingExtendedRangeMinTime | ||
Type: CrudActor | ||
Database: *database | ||
Threads: *threads | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [4] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: *repeat | ||
Database: *database | ||
Collection: *collection | ||
Operations: | ||
- OperationMetricsName: TsBlockExtendedRangeMinTime | ||
OperationName: aggregate | ||
OperationCommand: | ||
Pipeline: | ||
[ | ||
{$project: {time: 1, meta: 1}}, | ||
{$group: {_id: "$meta", gb: {$min: "$time"}}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both extended range and not extended range won't be applicable for $group rewrites with $min on the timeField because it's a rounded down value right? So we shouldn't be losing performance here between extended range and not. I'm not saying this isn't relevant here but it should be around the same for extended range and normal time-series collections There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea! That's true. I was thinking it's probably best to cover both $min/$max instead of just $max, while were writing these benchmarks. Like if in the future we run into a similar $min bug, we'll already have this benchmark and it's perf history. |
||
] | ||
|
||
# The control.max.time field can only be used as an accurate bucket maximum if its after 1970. | ||
- Name: BlockProcessingExtendedRangeMaxTime | ||
Type: CrudActor | ||
Database: *database | ||
Threads: *threads | ||
Phases: | ||
OnlyActiveInPhases: | ||
Active: [6] | ||
NopInPhasesUpTo: *maxPhases | ||
PhaseConfig: | ||
Repeat: *repeat | ||
Database: *database | ||
Collection: *collection | ||
Operations: | ||
- OperationMetricsName: TsBlockExtendedRangeMaxTime | ||
OperationName: aggregate | ||
OperationCommand: | ||
Pipeline: | ||
[ | ||
{$project: {time: 1, meta: 1}}, | ||
{$group: {_id: "$meta", gb: {$max: "$time"}}} | ||
] | ||
|
||
AutoRun: | ||
- When: | ||
mongodb_setup: | ||
$eq: | ||
- replica | ||
- replica-80-feature-flags | ||
- replica-all-feature-flags | ||
branch_name: | ||
$gte: v8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think QE should own these? I feel like QI will probably re-enable the optimization, but I'm not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea, QI will probably be more involved in this benchmark in the future. I'll change it