Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request more fine-grained control for remote sampling #6127

Open
garrettlish opened this issue Sep 19, 2024 · 5 comments
Open

Request more fine-grained control for remote sampling #6127

garrettlish opened this issue Sep 19, 2024 · 5 comments
Labels

Comments

@garrettlish
Copy link

Problem Statement

The PerOperationSampler provides operation-level customized sampling probabilities but lacks support for more fine-grained control, such as adjusting sampling probabilities based on specific tag key-value pairs (see sampling.proto).

For instance, in a real-world scenario, we might want to enforce sampling for a particular user session by customizing sampling probabilities based on specific tag key-value pairs.

Proposed Solution

Introduce a tag key-value pair in OperationSamplingStrategy to enable fine-grained control for remote sampling.

@dmathieu
Copy link
Member

cc @yurishkuro as codeowner of the jaegerremote sampler.

@garrettlish
Copy link
Author

Thanks @dmathieu! @yurishkuro The proposed schema change is as follows: while you're right that a schema change alone is insufficient, we can also make corresponding adjustments to the OTEL SDKs to enable fine-grained control for remote sampling. What are your thoughts on this?

message Tag {
  string key = 1;
  string value = 2;
}

message TagBasedSamplingStrategy {
  repeated Tag matchingTags = 1;
  ProbabilisticSamplingStrategy probabilisticSampling = 2;
}

message OperationSamplingStrategy {
  string operation = 1;

  // Default sampling probability for the operation.
  ProbabilisticSamplingStrategy defaultSampling = 2;

  // Tag-based sampling customization, which overrides default sampling when matched.
  repeated TagBasedSamplingStrategy tagBasedSampling = 3;
}

@yurishkuro
Copy link
Member

yurishkuro commented Sep 28, 2024

The current implementation performs stratified sampling by dividing all requests into strata where each stratum corresponds to one of the endpoints of the service. In order to make sampling sensitive to tags the strata need to be redefined carefully, such that the overall space remains deterministically partitioned.

I don't agree that the proposed schema change is the best partitioning, because it only does sub-partitioning within existing strata by the endpoint. But I can easily imaging someone wanting to say "sample all errors 100%" regardless of the endpoint, which is not really possible via this schema.

Fundamentally, I think this requires a large re-design where we treat all dimensions of a span equally. There is technically nothing special about the operation, we can consider it as yet another attribute. Then the sampling expression language becomes more uniform. It requires more complex house-keeping by the sampler, but I think the same complexity would be introduces by the proposed change anyway (going from one-level maps to two-level maps), only it will be a lot more rigid.

There are a couple of proposals in the OTEL spec (I don't have time to look for them) that propose more generic configuration for samplers. We need to align with those proposals, not build something bespoke. Especially because changing the sampling definition schema means that Jaeger's adaptive sampling would also need to be changed accordingly (keeping track of the same strata as the SDK), and right now it's pretty much hard-coded with service/operation partitioning.

@garrettlish
Copy link
Author

Thanks @yurishkuro for your rely. Your concerns are well-founded. Relying solely on endpoint-based partitioning does limit the flexibility needed for more advanced use cases. As you pointed out, redesigning the sampling approach to treat all dimensions of a span equally could enable fine-grained control in remote sampling.

While there are several proposals in the OTEL spec, most don’t directly address remote sampling strategies. Do you think it's worth iterating on our current approach to introduce a v3 version that supports fine-grained control in remote sampling?

@yurishkuro
Copy link
Member

Yes, I would be supportive of designing a more flexible sampling strategy data model and gradually implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants