Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export gossip with macro #73

Draft
wants to merge 47 commits into
base: prometheus
Choose a base branch
from
Draft

Conversation

enriquefynn
Copy link
Member

Derive ExportPrometheus into GossipStats, it creates a function called write_prometheus(.) that writes all fields from the struct that derives it.
Currently, we only support Counter metrics used in GossipStats, but we probably could extend it in the future.

Also, submit_gossip_stats clears the gossip stats, that's not very good for our metrics, so we could just accumulate it instead

When referring to our node, we use simply `node` as opposed to
`cluster`.
Instead of propagating `rpc_processor`, use the `block_commitment_cache`
structure to get different bank confirmation levels
As ruuda pointed out, there is a deadlock if we acquire a read lock for
both `block_commitment_cache` and `bank_forks`, in ruuda words:
- We take the read lock on block_commitment_cache.
- Some other thread takes a write lock on bank_forks.
- We want to take the read lock on bank_forks, but this blocks because
  the other thread holds a write lock.
- The other thread wants to take a write lock on block_commitment_cache,
  but that blocks because we hold the read lock.
- Deadlock!

Instead, we guard the `block_commitment_cache` read lock so we don't
hold two read locks concurrently. This might lead, as ruuda also
pointed out, to an inconsistency that's known to Solana and dealt
already in the code the same way it's dealt in `rpc/src/rpc.rs`.
Should be "counter", not "count"
enriquefynn and others added 16 commits July 27, 2022 11:25
Down the rabbit hole from the arguments, propagating it to the
prometheus part, we add a flag so we can track multiple vote accounts.
Modify how metrics are written to account for when there's nothing to
report.
Add metrics about votes: last voted and vote balance
Change variable and function names, and add information about vote
credits
Slight change on how metrics are recorded
Co-authored-by: Ruud van Asseldonk <[email protected]>
Co-authored-by: Ruud van Asseldonk <[email protected]>
Co-authored-by: Ruud van Asseldonk <[email protected]>
With the epoch start slots and the number of slots in the epoch (and the
current slot, which we already had), we can infer/estimate:

 * Epoch progress percentage
 * Slots left until the next epoch
 * Time left until the next epoch (from slot height increase)

These are useful metrics to have about the network.
Derive a macro for `GossipStats`, a less invasive and more concise way
of writing all metrics from it.
Currently, the macro only accepts `Counter` type and will fail if used
otherwise, but we can probably expand it in the future to work with more
data types.
This function clears the gossip stats.
@enriquefynn
Copy link
Member Author

enriquefynn commented Aug 6, 2022

I'm not so sure about this PR, I like the procedural macro business and all but I don't know if we'd make sense of these gossip data, we'd need to also stop these metrics from clearing out, so I just commented the code that does it.

Edit: Some metrics do look interesting, like process_gossip_packets_time.
But we need to further understand what these metrics mean, e.g. get_votes and get_votes_count. 😬

Since we commented the previous line, doesn't make sense to update this
variable.
@enriquefynn enriquefynn marked this pull request as draft August 6, 2022 03:08
@ruuda
Copy link

ruuda commented Aug 10, 2022

But we need to further understand what these metrics mean

Yep, it doesn’t really make sense to monitor something when we have no clue what it means. (Although sometimes if there is an outage and you see one of the metrics behaving in an unusual way around that time, you can learn something. But I prefer the thoughtful approach over the dump-all-the-data-approach.)

Copy link

@ruuda ruuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea to do the macro. I have some suggestions.

submit_gossip_stats(&self.stats, &self.gossip, &stakes);
*last_print = Instant::now();
// submit_gossip_stats(&self.stats, &self.gossip, &stakes);
// *last_print = Instant::now();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ”submit” actually resets the counters?

I can think of two ways to make this work upstream.

  1. Make logging metrics and Prometheus metrics mutually exclusive, and put the choice in a CLI flag. Then if Prometheus metrics are enabled, we would skip the log + reset here.
  2. If GossipStats holds only counters, we could implement std::ops::Sub for it, and keep two instances of it: the current stats, and the last logged stats. Then when it is time to log, we log current - last_logged, and then set last_logged = current. That way the metrics remain increasing for Prometheus, and the log output remains unchanged.

I think option 2 is kind of neat. It’s a bit more tedious in the GossipStats, but it may be simpler in the end than threading a CLI flag through everything to this point. And it’s nicer to not have to choose.

out,
&solana_prometheus_utils::MetricFamily {
name: &format!("solana_gossip_{}", stringify!(#idents)),
help: "Auto generated with Prometheus macro",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get the doc comment from the field and put it here? Doc comments are secretly attributes, but I don’t know if syn parses them that way.

Or maybe alternatively, we should require a #[metric(help = "blah", type = "counter")] attribute on the fields that we want to export. It’s a more invasive change, but also more explicit about what it does.

@@ -0,0 +1,45 @@
extern crate proc_macro2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea to make a macro for this! It’s a nice approach on the definition side, and the macro itself looks pretty simple too, nice.

ruuda pushed a commit that referenced this pull request Jun 13, 2024
…abs#86)

windows: Use vcpkg for openssl dep (#73)

(cherry picked from commit b78c070)

Co-authored-by: Jon C <[email protected]>
ruuda pushed a commit that referenced this pull request Jun 14, 2024
…abs#86)

windows: Use vcpkg for openssl dep (#73)

(cherry picked from commit b78c070)

Co-authored-by: Jon C <[email protected]>
ruuda pushed a commit that referenced this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants