-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dotnet): export opentelemetry metrics #398
base: main
Are you sure you want to change the base?
Conversation
config/clients/dotnet/template/Telemetry/Attributes.cs.mustache
Outdated
Show resolved
Hide resolved
config/clients/dotnet/template/Telemetry/Attributes.cs.mustache
Outdated
Show resolved
Hide resolved
| `http.client.request.duration` | `int` | The total request time for FGA requests | | ||
| `http.server.request.duration` | `int` | The amount of time the FGA server took to internally process nd evaluate the request | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that these shouldn't be tags. Custom metrics ingest for most SaaS distributed tracing vendors charge based on the unique combinations of the metrics tags, so a latency distribution of 50-150ms would be result in "100 custom metrics" being charged, times the number of unique combinations of the other tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for heads up @Hawxy!
We removed these tags for now, but we're thinking of converting them to a bucket, something like:
0-5ms
5-15ms
15-50ms
50-150ms
150ms-500ms
500ms+
But your question raises a few other problems for the other attributes/tags
fga-client.user
will be a problem for anyone with more than a couple of users- Model ID, Store ID, URL (Full) will be a problem for anyone who is working across many stores/models
We are thinking of making them Opt-in, do you have feedback on how this is normally approached?
We can add config on our side to Opt-in to them. But we were wondering if OTEL has standard tooling/config to opt-in/out of specific tags.
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/docs/metrics/customizing-the-sdk/README.md#select-specific-tags seems to indicate that end users can choose to drop specific tags they don't need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we're thinking of converting them to a bucket, something like:
The question to ask is "are users going to slice metrics by these tags", which is the whole point of the tags to begin with. Keep in mind that these aren't tracing tags where you'd typically go ham with a pile of metadata to bake into individual traces. Does it make sense for a user to filter a metric graph of fga-client.request.duration
by http.client.request.duration
? I don't really think so, thus it's not a valuable tag to include. Is slicing by http.request.resend_count
something people are going to do?
We are thinking of making them Opt-in, do you have feedback on how this is normally approached?
Some projects add a configuration option added to enable verbose tagging/metrics. Delegating to OTEL filtering is fine, albeit a bit clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more about this, we're thinking of adding "fine-grained" config for the attributes/tags.
Something along the lines of:
var configuration = new ClientConfiguration() {
ApiUrl = "http://localhost:8080",
StoreId = "...",
Credentials = new Credentials() { ... },
Telemetry = new OpenFgaTelemetryConfig {
Metrics: {
[TelemetryHistograms.RequestDuration] = {
Attributes: [Attributes.AttributeRequestMethod, Attributes.AttributeRequestStoreId]
},
[TelemetryCounters.TokenExchangeCountKey] = {
Attributes: [Attributes.AttributeRequestModelId, Attributes.AttributeRequestClientId]
},
}
}
};
var fgaClient = new OpenFgaClient(configuration);
If not set, we would enable a base set of metrics with minimal attributes, if configured, we follow whatever is configured. We will couple that with warnings in the OTEL config documentation around which attributes could be cost-prohibitive.
This allows folks to not enable this by accident (they'd have to manually opt-in), while giving them the ability to be able to have visibility on things like:
- Whether a client id is sending a disproportionate amount of calls or request tokens (could be an indication that it was misconfigured - eg.g they are initializing the SDK multiple times causing a credential request per call)
- Whether their new model is causing significantly more latency than the old one
- Whether slow requests are due to retries (tracing helps here, but usually traces are sampled and people might miss this)
- The ratio of success vs. bad requests vs rate limits by model id, store id or client id so folks can understand whether a particular client is being called incorrectly or a particular model is problematic
- Understanding whether they have still old clients running that they need to upgrade and how that goes with the errors they are getting (through the user agent)
How does that sound to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that sounds pretty good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hawxy FYI - this has just been released in v0.5.1 - we're backporting those to the SDK Generator soon
Docs: https://github.com/openfga/dotnet-sdk/blob/main/OpenTelemetry.md
0fcb161
to
cae5eb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note: These metrics have never been released in the .NET SDK
This makes it consistent with Proto naming
Considering that we already have histograms for them, and the const impact of having attributes with high variance, we're dropping the following two attributes: - `http.client.request.duration` - `http.server.request.duration`
cae5eb9
to
fbbe656
Compare
Description
Adds initial OpenTelemetry metrics to the .NET SDK
References
Generates: openfga/dotnet-sdk#69
Review Checklist
main