Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(traces): OTeL Traces implementation(duty flow) #1980

Draft
wants to merge 17 commits into
base: stage
Choose a base branch
from
Draft

Conversation

oleg-ssvlabs
Copy link
Contributor

@oleg-ssvlabs oleg-ssvlabs commented Jan 14, 2025

Description/Questions/Suggestions

  • Are both logs and events needed for recording of the same type of event? Should it be either-or? Example.

  • Is Duty ID as a Trace Attribute necessary? Currently bothSlot and Epoch attributes are added separately. Would it be better to just add a Committee attribute instead?

  • Duty flows are separated into Committee versus everything else (method/function level). This is reflected in different span names. Should we keep separating them on a span level, or should we use the same names with attributes that help differentiate duties (like ssv.runner.role, etc)
    Span name examples:

    • ssv.validator.execute_committee_duty
    • ssv.validator.execute_duty
    • ssv.validator.start_committee_duty
    • ssv.validator.start_duty
  • There are three statuses for spans: Ok, Error, and Unset. Ideally, all spans should have their status explicitly set to either Ok or ErrorUnset is not expected. Please use Grafana UI and verify if we receive any of the spans in Unset status.

  • Look into namespaces for metrics, traces, and attributes. Namespaces should be consistent across all observability "primitives". There is a chance we have some inconsistencies
    Example: ssv.validator.duty vs. ssv.duty. If Duty belongs to Validator, use ssv.validator.duty everywhere.

  • OpenTelemetry Specification explains why OK status is used without a message. ("Description MUST be IGNORED for StatusCode Ok & Unset values."). Even if the message is set for OK statuses, it will be ignored by OTeL and not displayed in Grafana (yeah, SDKs could have been better here)

  • Some libraries needs to be updated for proper context propagation, especially for methods that perform I/O (e.g., HTTP calls). Example: p2p.Broadcast().

  • Some enums in the libraries used by SSV Node lack a String() method, which complicates logging and tracing.
    Example: types.PartialSigMsgType (ssv-spec lib). Something that should potentially be implemented by these libraries (we own source code)

  • Should these enums be moved to the spec types package instead?

@oleg-ssvlabs oleg-ssvlabs changed the title feat(traces): OTeL Traces implementation feat(traces): OTeL Traces implementation(duty flow) Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 60.18349% with 434 lines in your changes missing coverage. Please review.

Project coverage is 47.7%. Comparing base (6574447) to head (e6b0b98).
Report is 95 commits behind head on stage.

Files with missing lines Patch % Lines
operator/validator/controller.go 12.5% 84 Missing ⚠️
observability/attributes.go 0.0% 80 Missing ⚠️
cli/operator/node.go 0.0% 56 Missing ⚠️
ibft/storage/store.go 71.5% 31 Missing and 10 partials ⚠️
registry/storage/validatorstore.go 77.0% 29 Missing and 10 partials ⚠️
observability/observability.go 0.0% 35 Missing ⚠️
operator/validator/metadata/syncer.go 88.8% 20 Missing and 6 partials ⚠️
api/handlers/exporter.go 0.0% 20 Missing ⚠️
operator/duties/scheduler.go 58.6% 12 Missing ⚠️
eth/executionclient/execution_client.go 47.0% 9 Missing ⚠️
... and 8 more
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@oleg-ssvlabs oleg-ssvlabs force-pushed the traces branch 6 times, most recently from 34dc244 to 819f619 Compare January 16, 2025 10:25
@oleg-ssvlabs oleg-ssvlabs marked this pull request as ready for review January 16, 2025 10:35
@oleg-ssvlabs oleg-ssvlabs force-pushed the traces branch 2 times, most recently from c79a8cc to f63cc5e Compare January 16, 2025 11:59
y0sher and others added 13 commits January 20, 2025 11:20
* chore(networkconfig): add new SSV Labs bootnodes

* Update mainnet.go

* Update holesky.go
…tivation (#1689)

* fix: (EventHandler) update non-committee shares upon liquidation/reactivation

---------

Co-authored-by: olegshmuelov <[email protected]>
…adata sync (#1805)

* rename setupEventHandling to syncContractEvents

* refactor beaconprotocol.UpdateValidatorsMetadata to MetadataFetcher

* don't pass logger to operatorNode.Start()

* don't pass logger to reportOperators

* don't pass logger to p2pNetwork Setup and Start

* fix TestSetupValidatorsExporter

* reduce errors text in fetchAndUpdateValidatorsMetadata

* extract a package for metadata updating; start it before p2p setup

* minor cleanup

* pass context

* get rid of update metadata loop in validator controller

* remove unused code from validator controller

* fix tests for StartValidators

* initialize metadata updater before validator controller

* various fixes in metadata updater

* sharesStorage -> shareStorage

* remove redundant comment

* avoid blocking on channel send

* return shares instead of nil on timeout

* fix TODO's; add tests

* fix linter

* review comments and some code improvements

* review comments [2]

* minor improvements

* move metadata updater inside validator

* review comments [3]

* add comments

* add another comment

* network/p2p: extract logger changes to another PR

* network/p2p: revert leftovers

* resolve a busy loop

* remove logic with indices diff

* wrap context in reportIndicesChange

* review comments

* ValidatorSyncer -> Syncer

* fix comment

* rename receiver

* get rid of fetcher

* fix TestUpdateValidatorMetadata

* NewValidatorSyncer -> NewSyncer

* minor renames

* add a comment in HandleMetadataUpdates

* revert removal of active index comparison

* add self subnets logic missed on merge conflicts

* fix leftovers

* apply changes from #1969

* filter shares by own subnets

* use fixed subnets

* improve the last batch sleep comment

* minor rename

* comments

* comment

* comment

* logs

---------

Co-authored-by: zippoxer <[email protected]>
cmd.Parent().Short,
cmd.Parent().Version,
observability.WithMetrics())
observability.WithMetrics(),
observability.WithTraces("stage-alloy.alloy.svc:4317", true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite this being a stale PR that we'll soon revive, this should be configured via env vars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take it in consideration

@oleg-ssvlabs oleg-ssvlabs marked this pull request as draft March 5, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants