Goal: Validator Monitoring #443

outerlook · 2024-08-06T12:57:26Z

Objective

Assess the participation of validators (node operators) in the network during normal operations.

Description

For example, a visual representation of this could be a graph showing how many blocks a day a certain operator was also supporting the network.

The mechanism includes a node's signature that gets validated, and then the indexer exposes its public key on block information after the consensus.

Probably, the internal cometBFT endpoint already supports that: https://github.com/cometbft/cometbft/blob/v0.38.x/spec/core/data_structures.md

Then, what is the easiest path to expose this data from the kwil indexer? Should we expose a node cometBFT API endpoint?

To Do

Define what is the scope of this goal:

just providing API to get the data
create the metrics consumption pipeline
visualizing data

Problems

Blocked By

feat: Metrics kwilteam/kwil-db#995

Instrumentation

Validator monitoring

outerlook · 2024-08-06T13:11:31Z

Hey, @zolotokrylin, can you also verify and include this goal with the correct priority on the roadmap?

Please share if you think we need more information to evaluate the business value for it correctly 🙏

rsoury · 2024-08-07T01:03:19Z

@brennanjl - Can you confirm that CometBFT block data is available via the Indexer?

brennanjl · 2024-08-07T03:56:34Z

For example, a visual representation of this could be a graph showing how many blocks a day a certain operator was also supporting the network.

The mechanism includes a node's signature that gets validated, and then the indexer exposes its public key on block information after the consensus.

I'm actually not sure what this means. It is mostly presumed that a node operator is supporting the network 100% of the time; if at any point in time <=2/3rds of the validating power is not running, the network will halt. Are you simply looking to track for how long a certain validator has been a validator?

Can you confirm that CometBFT block data is available via the Indexer?

The full block data is not (this can be read from a node directly), but indexed block metadata such as proposer can be queried from the indexer.

outerlook · 2024-08-07T11:17:56Z

Are you simply looking to track for how long a certain validator has been a validator?

If I'm correctly aligned, the scenario is that there will soon be 12 node operators running TSN. Even if they are all registered as validators, but 2 of them (less than 1/3rds) remain disconnected for days in a month or are inconsistent, we should have easy ways to track it

rsoury · 2024-08-08T01:45:28Z

@brennanjl -

I'm actually not sure what this means

@outerlook - basically clarified.

We want to index the CometBFT blocks to determine which Validators are participating on each block.
"<=2/3rds of the validating power" is only considered when the public key is marked as a Validator.
We want to determine how many public keys and signatures appear for each block from CometBFT to determine whether they match our count of partner Node Operators.

this can be read from a node directly

In this case, we could create (or use an existing) CometBFT indexer - correct?

rsoury · 2024-08-15T06:42:54Z

Confirmed by @brennanjl

The indexer does not support this.
We can make it index this quite easily.

How high priority is this? - @zolotokrylin to determine priority of this. Especially as related to #415.

zolotokrylin · 2024-08-16T04:30:14Z

@rsoury, if you are not busy with anything else, please define the Spec document for this Goal.
While Raffael and Mic are working the other goals, this goal can be speced.

rsoury · 2024-08-16T04:55:59Z

@zolotokrylin - The Spec for this has been merged with #415, and is already established.

The idea of Observability whether for internal and external analysis is essentially a single Goal.
This specific issue covers a single data source for this overarching Reliability goal.

zolotokrylin · 2024-08-16T07:45:31Z

@rsoury, could you please remove (or merge into the Specs doc if still relevant) everything from the description of this task and attach the relevant Spec file here?
Is there a clear separation between this and that goal in the spec doc?

rsoury · 2024-08-17T00:30:33Z

Yes, it's referenced under the Validator Monitoring: https://docs.google.com/document/d/1-yxCyunqLhIHqLGJrIqScqRduo_lB3Ee6LGyeyY4B3A/edit#heading=h.bjrhx35jayz0

It's distinguished quite clearly. Where blockchain consensus data is a source for observability and reliability, it covers this issue associated to Validator Monitoring.

markholdex · 2024-09-17T12:37:10Z

@outerlook is this goal a duplicate of

Goal: 99.9% TSN Reliability #415 ?

zolotokrylin · 2024-09-18T03:23:03Z

@markholdex no. This Goal is about understanding validators' performance.

markholdex · 2024-09-18T19:59:45Z

@markholdex no. This Goal is about understanding validators' performance.

@zolotokrylin but in the Reliability, there are problems and specs around the performance and penalty for validators that perform poorly. So It's confusing me or maybe I don't get something.

outerlook · 2024-09-19T11:55:11Z

@markholdex, along the way, per #443 (comment) I see it was merged in the process. I previously saw #415 as an individual level reliability issue (are our nodes operating well? are they contributing to the network?) and this goal as a network level monitoring (which nodes aren't contributing?)

they are very related and are overlapping in some ways. We can:

split Goal: 99.9% TSN Reliability #415 validator monitoring to here, but that would be against the initial decision of merging (which I believe has a reason)
close this goal as duplicate, merging any remaining aspect to the other
maintain this goal to a network level monitoring, keeping Goal: 99.9% TSN Reliability #415 focused on our nodes data, and simpler

zolotokrylin · 2024-09-26T07:24:44Z

@markholdex, feel free to optimise the naming if you need it.

markholdex · 2024-09-30T13:46:18Z

@outerlook I believe that:

within Goal: 99.9% TSN Reliability #415 you will gather information on which nodes are not contributing and about their operation. Right?
Then we can only keep this goal if there is anything extra we would like to do for the analysis of validator performance. I don't see it for now and considering to close this Goal as duplicate.

outerlook · 2024-09-30T14:04:46Z

@markholdex

you will gather information on which nodes are not contributing

Partially. I initially thought it's a simpler step for #415 goal to emit their own data about contribution (already available)

That goal, to be simpler, would answer: "how is my node contributing to the network?" and have alarms for it as its our own responsibility to maintain it, and know when we messed up. It seems to be simpler (almost free) than the next issue:

"How are all nodes contributing to the network?" is what I thought about this (#443) goal. It has a little more setup because it needs an indexer-like behavior collecting blocks, getting the node list that was supposed to contribute, and emitting a metric for each if it contributed.

Maybe this will get easier or less relevant after #415

But again, this is what I understood, and it made sense. However, I'm ok with evaluating the need again after #415 -- if it's really easy to assess other nodes contribution within those tasks, I sure will to avoid more effort here

another point of view: would #443 already cover what we ask about validation on #415? yes, but #443 is harder, and #415 just needs what is already done on cometbft available metrics

outerlook changed the title ~~Validator Monitoring~~ Goal: Validator Monitoring Aug 6, 2024

outerlook mentioned this issue Aug 6, 2024

Goal: officially launch TSN with 12 node operators #438

Closed

12 tasks

outerlook added the type: goal label Aug 6, 2024

rsoury mentioned this issue Aug 14, 2024

Goal: 99.9% TSN Reliability #415

Closed

27 tasks

This was referenced Dec 11, 2024

Problem: CometBFT Prometheus validator stats not collected #575

Open

Problem: Validator participation data not collected #583

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goal: Validator Monitoring #443

Goal: Validator Monitoring #443

outerlook commented Aug 6, 2024 •

edited by markholdex

Loading

outerlook commented Aug 6, 2024

rsoury commented Aug 7, 2024

brennanjl commented Aug 7, 2024

outerlook commented Aug 7, 2024

rsoury commented Aug 8, 2024

rsoury commented Aug 15, 2024

zolotokrylin commented Aug 16, 2024

rsoury commented Aug 16, 2024 •

edited

Loading

zolotokrylin commented Aug 16, 2024

rsoury commented Aug 17, 2024

markholdex commented Sep 17, 2024 •

edited

Loading

zolotokrylin commented Sep 18, 2024

markholdex commented Sep 18, 2024

outerlook commented Sep 19, 2024 •

edited

Loading

zolotokrylin commented Sep 26, 2024

markholdex commented Sep 30, 2024

outerlook commented Sep 30, 2024 •

edited

Loading

Goal: Validator Monitoring #443

Goal: Validator Monitoring #443

Comments

outerlook commented Aug 6, 2024 • edited by markholdex Loading

Objective

Description

To Do

Problems

Blocked By

Instrumentation

Validator monitoring

outerlook commented Aug 6, 2024

rsoury commented Aug 7, 2024

brennanjl commented Aug 7, 2024

outerlook commented Aug 7, 2024

rsoury commented Aug 8, 2024

rsoury commented Aug 15, 2024

zolotokrylin commented Aug 16, 2024

rsoury commented Aug 16, 2024 • edited Loading

zolotokrylin commented Aug 16, 2024

rsoury commented Aug 17, 2024

markholdex commented Sep 17, 2024 • edited Loading

zolotokrylin commented Sep 18, 2024

markholdex commented Sep 18, 2024

outerlook commented Sep 19, 2024 • edited Loading

zolotokrylin commented Sep 26, 2024

markholdex commented Sep 30, 2024

outerlook commented Sep 30, 2024 • edited Loading

outerlook commented Aug 6, 2024 •

edited by markholdex

Loading

rsoury commented Aug 16, 2024 •

edited

Loading

markholdex commented Sep 17, 2024 •

edited

Loading

outerlook commented Sep 19, 2024 •

edited

Loading

outerlook commented Sep 30, 2024 •

edited

Loading