Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uptime to the relay monitoring schema #34

Open
taylorjdawson opened this issue Nov 4, 2022 · 2 comments
Open

Add uptime to the relay monitoring schema #34

taylorjdawson opened this issue Nov 4, 2022 · 2 comments

Comments

@taylorjdawson
Copy link

It would be nice to have a standard way to measure a relay's uptime. Currently we have the /eth/v1/builder/status endpoint and this is inadequate as there is no way to determine if a relay is creating artificial uptime by returning a static page.

Relay monitor has this endpoint /monitor/v1/faults.
Would be nice to either:
a) rename to from /monitor/v1/faults to /monitor/v1/stats and include { faults: {...} } as a part of the payload
b) add a new monitor/v1/stats that includes uptime stat along with other relevant metrics

@ralexstokes
Copy link
Owner

uptime would be neat to see

my only concern is getting a super precise signal but if we are ok w/ some lossy-ness then I think the relay monitor could support this

do you have anything particular in mind?

I would think to start w/ a simple poll of /eth/v1/builder/status although you raise a good point about static or cached assets making this endpoint a little less meaningful

I don't really want to maintain a set of routines per relay to do some arbitrary liveness check though...

another thought I had was to encourage an ecosystem norm that relays expose metrics although im not sure how to do this in a DoS-resistant way; another option is to encourage a norm that relays just expose a liveness check for this purpose with reputation backing the claim that it is a reliable signal and not cached etc in some way

what do you think?

@metachris
Copy link
Collaborator

metachris commented Nov 8, 2022

What's the goal you want to accomplish?

I'm not sure uptime is an important metric for relays. They can be down and it has no impact if they don't submit any bids. Therefore I'm not sure this is a relevant task that should be added to the relay monitor responsibilities.

For reference, there's also the discussion here about having the relay status endpoint return the latest slot to prove it's not just a static page. Alternatively, you could just call getHeader on every slot and see the actual latency and uptime based on that? (although not every relay would provide a bid for every slot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants