Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: carry metrics in flight metadata from datanode to frontend #3113

Merged
merged 7 commits into from
Jan 17, 2024

Conversation

shuiyisong
Copy link
Contributor

@shuiyisong shuiyisong commented Jan 7, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

This pr mainly supports carrying metrics from datanode to frontend using metrics in flight metadata.

We use RecordBatchMetrics in json format for now. A better form is expected after #2374

discussion: is it possible to reduce the number of stream wrapper?

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

@github-actions github-actions bot added docs-not-required This change does not impact docs. Size: M labels Jan 7, 2024
Cargo.toml Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 7, 2024

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (c2edaff) 85.63% compared to head (bda604f) 85.13%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3113      +/-   ##
==========================================
- Coverage   85.63%   85.13%   -0.51%     
==========================================
  Files         829      829              
  Lines      135773   135866      +93     
==========================================
- Hits       116273   115668     -605     
- Misses      19500    20198     +698     

@waynexia
Copy link
Member

waynexia commented Jan 8, 2024

cc @NiwakaDev you might be interested in this

@MichaelScofield
Copy link
Collaborator

Why not put the metrics in app_metadata in message FlightData?

https://arrow.apache.org/docs/format/Flight.html#protocol-buffer-definitions

@shuiyisong
Copy link
Contributor Author

Why not put the metrics in app_metadata in message FlightData?

https://arrow.apache.org/docs/format/Flight.html#protocol-buffer-definitions

            FlightMessage::Metrics(s) => {
                let metadata = FlightMetadata {
                    affected_rows: None,
                    metrics: Some(Metrics {
                        metrics: s.as_bytes().to_vec(),
                    }),
                }
                .encode_to_vec();
                FlightData {
                    flight_descriptor: None,
                    data_header: build_none_flight_msg().into(),
                    app_metadata: metadata.into(),
                    data_body: ProstBytes::default(),
                }
            }

It's now put in FlightMetadata next to affected_rows, which is later put in app_metadata. It is sent in a separate package at the end of a record batch stream since we need to finish executing plan to collect metrics.

@MichaelScofield
Copy link
Collaborator

@shuiyisong Are the metrics mainly for the Frontend to deal with RCUs? If so, is it possible to break the whole metrics into each recordbatch's metadata, instead of put them at end of a recordbatch stream? This comes in two favor:

  1. Obvious storage place for metrics -- we can simply add a field in Recordbatch to store the metrics.
  2. We can accumulate the metrics on the fly. Should there be a big query (e.g. select * from a table with 1 million rows) that would exceed the RCUs, we can stop it in the middle, no need to wait for the last metric message.

@waynexia
Copy link
Member

waynexia commented Jan 9, 2024

Are the metrics mainly for the Frontend to deal with RCUs?

No. This feature aims to provide a way to pass execution metrics through a distributed plan tree. A simple extension over ExecutionPlan::metrics.

Obvious storage place for metrics -- we can simply add a field in Recordbatch to store the metrics.

As said above, this is derived from ExecutionPlan::metrics which gives the plan-level execution metrics. And obviously, this is not available before a plan is finished.

We can accumulate the metrics on the fly. Should there be a big query (e.g. select * from a table with 1 million rows) that would exceed the RCUs, we can stop it in the middle, no need to wait for the last metric message.

I doubt if this is viable. Do we or Can we have such a real-time CU administration system?

@fengys1996 fengys1996 self-requested a review January 9, 2024 09:58
if let Some(m) = recordbatches.metrics() {
let metrics = FlightMessage::Metrics(m);
let _ = tx.send(Ok(metrics)).await;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @waynexia
If transmission of metrics fails, I guess that it might be better to inform frontend about it. This might be because frontend continue to wait for metrics corresponding to each distributed query when executing DISTRIBUTED ANALYZE PLAN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "inform"? The current implementation puts metrics at the end of each stream. This is because the metric is only available when a corresponding plan is finished.

For distributed analyze scenario, what do you think of discarding data and making the metric the first and only result?

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/common/recordbatch/src/adapter.rs Show resolved Hide resolved
src/common/recordbatch/src/adapter.rs Show resolved Hide resolved
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengjiachun fengjiachun added this pull request to the merge queue Jan 17, 2024
Merged via the queue into GreptimeTeam:main with commit a29b9f7 Jan 17, 2024
16 checks passed
@shuiyisong shuiyisong deleted the chore/metadata_metrics branch January 18, 2024 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants