-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with metrics and prometheus exporter with otel4s #353
Conversation
dd9bba4
to
c464484
Compare
There is a new snapshot otel4s snapshot |
thanks a lot @iRevive, I'll bump it later. btw there is an issue with prometheus exporter built-in server when I deploying using
I think prometheus exporter try to to spawn a new process which is not allowed here. Do you have any idea how to by pass this? |
Hm, by default, the Prometheus server is created as: val routes = PrometheusHttpRoutes.routes[F](exporter, writerConfig)
EmberServerBuilder
.default[F]
.withHost(host)
.withPort(port)
.withHttpApp(Router("metrics" -> routes).orNotFound)
.build Where the default host is You can change the default host/port via environment variables or system properties. Here is the complete set of available Prometheus exporter settings.
|
thanks, I'll try it later today. |
countDuration | ||
.recordDuration(TimeUnit.MILLISECONDS, Attribute(indexAttributeKey, q.index(query).value)) | ||
.surround: | ||
client | ||
.execute(q.countDef(query)) | ||
.flatMap(toResult) | ||
.map(_.count) | ||
.onError(_ => countErrorCounter.inc(Attribute(indexAttributeKey, q.index(query).value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything below doesn't necessarily apply to your service and infrastructure. But it might be useful in the future.
You can use attributes to distinguish errored/succeeded actions. The OpenTelemetry Semantic Conventions encourages this approach.
For example, count.duration
indicates how long it takes to execute the count
query. If you add error.type
attribute, you can track successful and unsuccessful queries within the same metric.
In Grafana, you can query data as:
count_duration_count{"error_type" != ""} # shows the number of failed queries
count_duration_count{"error_type" = ""} # shows the number of succeeded queries
Code:
def withErrorType(static: Attributes)(ec: Resource.ExitCase) = ec match
case Resource.ExitCase.Succeeded =>
static
case Resource.ExitCase.Errored(e) =>
static.added(Attribute("error.type", e.getClass.getName))
case Resource.ExitCase.Canceled =>
static.added(Attribute("error.type", "canceled"))
countDuration
.recordDuration(
TimeUnit.MILLISECONDS,
withErrorType(Attributes(Attribute(indexAttributeKey, q.index(query).value)))
)
.surround:
client
.execute(q.countDef(query))
.flatMap(toResult)
.map(_.count)
If we take it one step further, we can follow the OTel specification:
opDuration <- meter.histogram[Double]("db.client.operation.duration").withUnit("s").create
def search[A](query: A, from: From, size: Size)(using q: Queryable[A]): F[List[Id]] =
opDuration
.recordDuration(
TimeUnit.SECONDS,
withErrorType(Attributes(Attribute("db.operation.name", "search"), Attribute(indexAttributeKey, q.index(query).value)))
)
.surround:
...
def count[A](query: A)(using q: Queryable[A]): F[Long] =
opDuration
.recordDuration(
TimeUnit.SECONDS,
withErrorType(Attributes(Attribute("db.operation.name", "count"), Attribute(indexAttributeKey, q.index(query).value)))
)
.surround:
...
And the queries:
db_client_operation_duration_count{"db_operation_name" = "search", "error_type" == ""}
db_client_operation_duration_count{"db_operation_name" = "search", "error_type" != ""}
db_client_operation_duration_count{"db_operation_name" = "count", "error_type" == ""}
db_client_operation_duration_count{"db_operation_name" = "count", "error_type" != ""}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is extremely useful, I think I could apply it right now. Thanks a lot @iRevive.
Also do you mind bump otel4s-experimental-metric
version to match otel4s 0.11-8e1f500-SNAPSHOT
? This is binary incompatible with the previous one somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a new build: 0.4.0-6-8c1230f-SNAPSHOT
.
Thanks @iRevive: #353 (comment)
229bf3f
to
68881fe
Compare
Auto config server doesn't work because: Failed at step EXEC spawning lila-search-ingestor/bin/app: Permission denied
Thanks @iRevive: #353 (comment)
Does everything work? I can release a stable version of otel4s then. |
yes, it just works! I run them in prod for few days, use build in server for ingestor tool and manual hook prometheus exporter in the server. Both works, here are a screenshot of our grafana. The only problem is I don't know much about metric/grafana and how to create meaningfull dashboad 😂 . Thanks again @iRevive for your help. Pls take a look and add comments if you have time, much appreciated! |
A few ideas: Average time per ES operation
Legend: Heap memory usageQuery A:
Legend: Query B:
Legend: |
thanks @iRevive ! |
No description provided.