Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Prometheus lib 1.x #7880

Merged
merged 28 commits into from
Nov 27, 2024
Merged

Conversation

fab-10
Copy link
Contributor

@fab-10 fab-10 commented Nov 13, 2024

PR description

Prometheus Java Metrics library development now focus on version 1.x, while we are still using the simple client version 0.16.
Since version 1.x introduced many changes and it is not compatible with 0.16, the upgrade is not straightforward and some refactoring is needed to migrate to the latest version, so I took the opportunity to also make the Prometheus code more OO.
The Prometheus HTTP server is not used instead of our custom implementation, so there is less effort to keep it updated, and we get by default improvement, like new export formats like protobuf.

See the CHANGELOG for the breaking changes introduced by this upgrade.

Some JVM metrics have changed name to adhere to the OTEL standard (see the table below), Besu Full Grafana dashboard is updated to support both names

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

@fab-10 fab-10 force-pushed the upgrade-prometheus-1.x branch 2 times, most recently from 52258c5 to f5c4ebf Compare November 13, 2024 19:41
@fab-10 fab-10 self-assigned this Nov 13, 2024
@fab-10 fab-10 force-pushed the upgrade-prometheus-1.x branch 4 times, most recently from 8c3c5ba to 3f13acf Compare November 19, 2024 14:49
Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>

# Conflicts:
#	metrics/core/build.gradle
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/noop/NoOpMetricsSystem.java
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/opentelemetry/OpenTelemetrySystem.java
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/prometheus/PrometheusMetricsSystem.java
#	metrics/rocksdb/src/main/java/org/hyperledger/besu/metrics/rocksdb/RocksDBStats.java
#	plugin-api/build.gradle
#	plugin-api/src/main/java/org/hyperledger/besu/plugin/services/MetricsSystem.java
# Conflicts:
#	ethereum/core/src/main/java/org/hyperledger/besu/ethereum/chain/DefaultBlockchain.java
#	metrics/core/build.gradle
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/prometheus/PrometheusMetricsSystem.java
# Conflicts:
#	CHANGELOG.md
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/prometheus/PrometheusMetricsSystem.java
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/prometheus/PrometheusSuppliedValueCollector.java
@fab-10 fab-10 force-pushed the upgrade-prometheus-1.x branch 2 times, most recently from 591a123 to 553b8ee Compare November 20, 2024 18:19
# Conflicts:
#	CHANGELOG.md
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/prometheus/PrometheusMetricsSystem.java
#	metrics/core/src/main/java/org/hyperledger/besu/metrics/prometheus/PrometheusSuppliedValueCollector.java
Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 marked this pull request as ready for review November 22, 2024 15:00
Copy link
Contributor

@Matilda-Clerke Matilda-Clerke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but leaving unapproved as I'm not an expert on Promethius. Please consider my review comments


import java.util.function.Supplier;

/** The interface Labelled supplied summary. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this kind of javadoc comment... it doesn't tell us anything the class name doesn't already tell us

Comment on lines +22 to +25
* Labels.
*
* @param summarySupplier the summary supplier
* @param labelValues the label values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc also fails to convey any useful information that the method signature doesn't already convey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, and I also found these comments just redundant, but since the build process requires us to put comments regardless the need of it, also in cases like this, where the code is self documenting, idk what it is better to do, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest writing javadoc comments such that someone completely new to the code could read it and understand what the code is aiming to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten the javadoc to be more useful and removed it where the code is self-explanatory.
Classes in this package are not public and using it requires at least some base knowledge of metrics and Prometheus, so the doc should take that for granted in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the javadoc requirement feels like an unnecessary barrier to me. While it is good to write something more meaningful rather than generated, I don't believe the forced comments are providing any actual value. Developers already add adhoc javadoc if it has value.

# Conflicts:
#	plugin-api/build.gradle
# Conflicts:
#	gradle/verification-metadata.xml
@fab-10
Copy link
Contributor Author

fab-10 commented Nov 25, 2024

@Matilda-Clerke I update the PR to improve the javadoc, please check if it is clearer now

This reverts commit 66eaa69.

Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 force-pushed the upgrade-prometheus-1.x branch 2 times, most recently from 5fc91c4 to 74ae047 Compare November 25, 2024 21:36
Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10 fab-10 force-pushed the upgrade-prometheus-1.x branch 2 times, most recently from 8144419 to c8b8d1f Compare November 26, 2024 17:10
Copy link
Contributor

@macfarla macfarla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks ok, I'm assuming you have run up nodes etc and tested the metrics work as expected

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -4,6 +4,21 @@

### Breaking Changes
- Removed Retesteth rpc service and commands [#7833](https://github.com/hyperledger/besu/pull/7783)
- With the upgrade of the Prometheus Java Metrics library, there are the following changes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some projects (eg teku) that use metrics core modules - need to communicate breaking changes to them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will support them for the upgrade

CHANGELOG.md Outdated
Comment on lines 8 to 9
- Gauge names are not allowed to end with `total`, therefore the metric `besu_blockchain_difficulty_total` is losing the `_total` suffix
- The `_created` timestamps are not returned by default, you can set the env var `BESU_OPTS="-Dio.prometheus.exporter.includeCreatedTimestamps=true"` to enabled them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are either of these metrics used on the Besu full dashboard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not used by our official dashboard

- With the upgrade of the Prometheus Java Metrics library, there are the following changes:
- Gauge names are not allowed to end with `total`, therefore the metric `besu_blockchain_difficulty_total` is losing the `_total` suffix
- The `_created` timestamps are not returned by default, you can set the env var `BESU_OPTS="-Dio.prometheus.exporter.includeCreatedTimestamps=true"` to enabled them
- Some JVM metrics have changed name to adhere to the OTEL standard (see the table below), [Besu Full Grafana dashboard](https://grafana.com/grafana/dashboards/16455-besu-full/) is updated to support both names
Copy link
Contributor

@siladu siladu Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great there's a dashboard update to go along with this 👍 . For completeness and extra visibility can you also mention this in the PR desc please?
I think we should also call it out in the release notes and when we announce on socials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the description

Comment on lines +22 to +25
* Labels.
*
* @param summarySupplier the summary supplier
* @param labelValues the label values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the javadoc requirement feels like an unnecessary barrier to me. While it is good to write something more meaningful rather than generated, I don't believe the forced comments are providing any actual value. Developers already add adhoc javadoc if it has value.

Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your efforts on this one 👍

It would be good to call out in the PR desc what testing has been done - I assume it has been tested with the updated dashboard at least?

Has the push gateway functionality also been tested?

It seems there are quite extensive changes to the Timer code and I think the sync duration metrics were relying on some esoteric besu-specific code - so might be worth @pinges taking a look unless you're confident this is all still working as expected @fab-10?

Does these (or any other) docs need updating as well? https://besu.hyperledger.org/public-networks/how-to/monitor/metrics

Co-authored-by: Sally MacFarlane <[email protected]>
Signed-off-by: Fabio Di Fabio <[email protected]>
@fab-10
Copy link
Contributor Author

fab-10 commented Nov 27, 2024

@macfarla @siladu I have tested that our official dashboard works fine after this update, and timers are working fine, I have also verified that OpenTelemetry is working, but haven't yet tried the push gateway functionality, so I will test it before merging.

For the final user, apart the changes described in the changelog, there is no impact, but will review the doc in case it is referring to some of the breaking change.

# Conflicts:
#	plugin-api/build.gradle
@fab-10
Copy link
Contributor Author

fab-10 commented Nov 27, 2024

@siladu verified that also the Prometheus push gateway works

@fab-10 fab-10 enabled auto-merge (squash) November 27, 2024 10:55
Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fab-10

@fab-10 fab-10 merged commit 5b2da5a into hyperledger:main Nov 27, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants