Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVPROD-11215 Create benchmarks for TS bucket-level optimizations #1262

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions docs/generated/workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -3864,6 +3864,26 @@ The queries in this workload exercise group stage that uses an enum like field f
timeseries, aggregate, group


## [TimeseriesExtendedRange](https://www.github.com/mongodb/genny/blob/master/src/workloads/query/TimeseriesExtendedRange.yml)
### Owner
Query Execution


### Support Channel
[#query-execution](https://mongodb.enterprise.slack.com/archives/CKABWR2CT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think QE should own these? I feel like QI will probably re-enable the optimization, but I'm not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea, QI will probably be more involved in this benchmark in the future. I'll change it



### Description
This workload runs queries on time-series collections with data before the unix epoch (extended range).
Some optimizations can only be made on post-1970 data. This benchmark is intended to give us an idea
of how much performance we lose on extended range data.



### Keywords
timeseries, aggregate


## [TimeseriesFixedBucketing](https://www.github.com/mongodb/genny/blob/master/src/workloads/query/TimeseriesFixedBucketing.yml)
### Owner
Query Integration
Expand Down
11 changes: 11 additions & 0 deletions evergreen/system_perf/5.0/genny_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2015,6 +2015,17 @@ tasks:
test_control: timeseries_enum
name: timeseries_enum
priority: 5
- commands:
- command: timeout.update
params:
exec_timeout_secs: 86400
timeout_secs: 86400
- func: f_run_dsi_workload
vars:
auto_workload_path: src/genny/src/workloads/query/TimeseriesExtendedRange.yml
test_control: timeseries_extended_range
name: timeseries_extended_range
priority: 5
- commands:
- command: timeout.update
params:
Expand Down
11 changes: 11 additions & 0 deletions evergreen/system_perf/6.0/genny_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2015,6 +2015,17 @@ tasks:
test_control: timeseries_enum
name: timeseries_enum
priority: 5
- commands:
- command: timeout.update
params:
exec_timeout_secs: 86400
timeout_secs: 86400
- func: f_run_dsi_workload
vars:
auto_workload_path: src/genny/src/workloads/query/TimeseriesExtendedRange.yml
test_control: timeseries_extended_range
name: timeseries_extended_range
priority: 5
- commands:
- command: timeout.update
params:
Expand Down
11 changes: 11 additions & 0 deletions evergreen/system_perf/7.0/genny_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2015,6 +2015,17 @@ tasks:
test_control: timeseries_enum
name: timeseries_enum
priority: 5
- commands:
- command: timeout.update
params:
exec_timeout_secs: 86400
timeout_secs: 86400
- func: f_run_dsi_workload
vars:
auto_workload_path: src/genny/src/workloads/query/TimeseriesExtendedRange.yml
test_control: timeseries_extended_range
name: timeseries_extended_range
priority: 5
- commands:
- command: timeout.update
params:
Expand Down
11 changes: 11 additions & 0 deletions evergreen/system_perf/7.3/genny_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2015,6 +2015,17 @@ tasks:
test_control: timeseries_enum
name: timeseries_enum
priority: 5
- commands:
- command: timeout.update
params:
exec_timeout_secs: 86400
timeout_secs: 86400
- func: f_run_dsi_workload
vars:
auto_workload_path: src/genny/src/workloads/query/TimeseriesExtendedRange.yml
test_control: timeseries_extended_range
name: timeseries_extended_range
priority: 5
- commands:
- command: timeout.update
params:
Expand Down
11 changes: 11 additions & 0 deletions evergreen/system_perf/8.0/genny_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2080,6 +2080,17 @@ tasks:
test_control: timeseries_enum
name: timeseries_enum
priority: 5
- commands:
- command: timeout.update
params:
exec_timeout_secs: 86400
timeout_secs: 86400
- func: f_run_dsi_workload
vars:
auto_workload_path: src/genny/src/workloads/query/TimeseriesExtendedRange.yml
test_control: timeseries_extended_range
name: timeseries_extended_range
priority: 5
- commands:
- command: timeout.update
params:
Expand Down
14 changes: 14 additions & 0 deletions evergreen/system_perf/master/genny_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ buildvariants:
- name: timeseries_block_processing
- name: timeseries_count
- name: timeseries_enum
- name: timeseries_extended_range
- name: timeseries_stress_unpacking
- name: union_with
- name: unwind_group
Expand Down Expand Up @@ -409,6 +410,7 @@ buildvariants:
- name: timeseries_block_processing
- name: timeseries_count
- name: timeseries_enum
- name: timeseries_extended_range
- name: timeseries_stress_unpacking
- name: union_with
- name: unwind_group
Expand Down Expand Up @@ -963,6 +965,7 @@ buildvariants:
- name: timeseries_block_processing
- name: timeseries_count
- name: timeseries_enum
- name: timeseries_extended_range
- name: timeseries_stress_unpacking
- name: union_with
- name: unwind_group
Expand Down Expand Up @@ -2769,6 +2772,17 @@ tasks:
test_control: timeseries_enum
name: timeseries_enum
priority: 5
- commands:
- command: timeout.update
params:
exec_timeout_secs: 86400
timeout_secs: 86400
- func: f_run_dsi_workload
vars:
auto_workload_path: src/genny/src/workloads/query/TimeseriesExtendedRange.yml
test_control: timeseries_extended_range
name: timeseries_extended_range
priority: 5
- commands:
- command: timeout.update
params:
Expand Down
163 changes: 163 additions & 0 deletions src/workloads/query/TimeseriesExtendedRange.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
SchemaVersion: 2018-07-01
Owner: Query Execution
Description: |
This workload runs queries on time-series collections with data before the unix epoch (extended range).
Some optimizations can only be made on post-1970 data. This benchmark is intended to give us an idea
of how much performance we lose on extended range data.

Keywords:
- timeseries
- aggregate

GlobalDefaults:
Database: &database test
Collection: &collection Collection0
DocumentCount: &documentCount 1e7
Repeat: &repeat 200
Threads: &threads 1
MaxPhases: &maxPhases 6
MetaCount: &metaCount 10

Clients:
Default:
QueryOptions:
maxPoolSize: 400

Actors:
# Clear any pre-existing collection state.
- Name: ClearCollection
Type: CrudActor
Database: *database
Threads: 1
Phases:
OnlyActiveInPhases:
Active: [0]
NopInPhasesUpTo: *maxPhases
PhaseConfig:
Repeat: 1
Threads: 1
Collection: *collection
Operations:
- OperationName: drop

- Name: CreateTimeseriesCollection
Type: RunCommand
Threads: 1
Phases:
OnlyActiveInPhases:
Active: [1]
NopInPhasesUpTo: *maxPhases
PhaseConfig:
Repeat: 1
Database: *database
Operation:
OperationMetricsName: CreateTimeseriesCollection
OperationName: RunCommand
OperationCommand:
{
create: *collection,
timeseries:
{
timeField: "time",
metaField: "meta",
granularity: "seconds",
},
}

- Name: InsertData
Type: Loader
Threads: 1
Phases:
OnlyActiveInPhases:
Active: [2]
NopInPhasesUpTo: *maxPhases
PhaseConfig:
Repeat: 1
Threads: 1
Database: *database
CollectionCount: 1
DocumentCount: *documentCount
BatchSize: 1000
Document:
time:
^IncDate:
start: 1960-01-01
# 100ms step ensures full bucket of 1000 documents under the "seconds" granularity.
step: 100
meta:
^Cycle:
ofLength: *metaCount
fromGenerator:
^RandomString:
length: 6
alphabet: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

# Phase 2: Ensure all data is synced to disk.
- Name: Quiesce
Type: QuiesceActor
Threads: 1
Database: *database
Phases:
OnlyActiveInPhases:
Active: [3, 5]
NopInPhasesUpTo: *maxPhases
PhaseConfig:
Repeat: 1
Threads: 1

# The control.min.time field can be used as an accurate bucket minimum if it's not an object or
# an array.
- Name: BlockProcessingExtendedRangeMinTime
Type: CrudActor
Database: *database
Threads: *threads
Phases:
OnlyActiveInPhases:
Active: [4]
NopInPhasesUpTo: *maxPhases
PhaseConfig:
Repeat: *repeat
Database: *database
Collection: *collection
Operations:
- OperationMetricsName: TsBlockExtendedRangeMinTime
OperationName: aggregate
OperationCommand:
Pipeline:
[
{$project: {time: 1, meta: 1}},
{$group: {_id: "$meta", gb: {$min: "$time"}}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both extended range and not extended range won't be applicable for $group rewrites with $min on the timeField because it's a rounded down value right? So we shouldn't be losing performance here between extended range and not. I'm not saying this isn't relevant here but it should be around the same for extended range and normal time-series collections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea! That's true. I was thinking it's probably best to cover both $min/$max instead of just $max, while were writing these benchmarks. Like if in the future we run into a similar $min bug, we'll already have this benchmark and it's perf history.

]

# The control.max.time field can only be used as an accurate bucket maximum if its after 1970.
- Name: BlockProcessingExtendedRangeMaxTime
Type: CrudActor
Database: *database
Threads: *threads
Phases:
OnlyActiveInPhases:
Active: [6]
NopInPhasesUpTo: *maxPhases
PhaseConfig:
Repeat: *repeat
Database: *database
Collection: *collection
Operations:
- OperationMetricsName: TsBlockExtendedRangeMaxTime
OperationName: aggregate
OperationCommand:
Pipeline:
[
{$project: {time: 1, meta: 1}},
{$group: {_id: "$meta", gb: {$max: "$time"}}}
]

AutoRun:
- When:
mongodb_setup:
$eq:
- replica
- replica-80-feature-flags
- replica-all-feature-flags
branch_name:
$gte: v8.0
Loading