Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Sampling Configuration proposal #240

Closed
wants to merge 13 commits into from

Conversation

PeterF778
Copy link

@PeterF778 PeterF778 commented Nov 8, 2023

Presents a configuration model for describing built-in and third-party Samplers. This is written to support a class of future samplers, as described in the OTEP.

(description written by @jmacd)

@PeterF778 PeterF778 requested a review from a team November 8, 2023 03:57
...
- SAMPLERn
```
The AnyOf sampler takes a list of Samplers as the argument. When making a sampling decision, it goes through the list to find a sampler that decides to sample. If found, this sampling decision is final. If none of the samplers from the list wants to sample the span, the AnyOf sampler drops the span.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we have such sampler type currently in the spec. The challenge with it is not introducing a configuration, but defining statistically meaningful behavior and sample weight calculation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't. This proposal does not call for adding such sampler to the specs, it can remain an internal gadget, but adding it to the specs is also a possibility.
I admit I do not understand your second sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I do not understand your second sentence.

there was a lot of effort put into the spec to develop sampling strategies that allow calculating summary statistics by capturing weight of each sample (simplest example: probabilistic 1-in-10 sampler gives each sample/span a weight=10). When you have a sequential composite sampler like you have here, how is the weight of a sample will be calculated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent Probability Sampling users have their own composite samplers, which properly handle logical-or and logical-and operations on consistent probability samplers, so they do not need AnyOf or AllOf samplers proposed here, and they should not use them.
Generally, composing consistent probability samplers with non-probabilistic samplers leads to undefined (or incorrect) adjusted counts anyway.

@PeterF778 PeterF778 changed the title Initial Sampling Configuration proposal A Sampling Configuration proposal Nov 13, 2023
text/0240-Sampling_Configuration.md Show resolved Hide resolved
text/0240-Sampling_Configuration.md Outdated Show resolved Hide resolved
A SAMPLER is described by a structure with two fields:

```yaml
samplerType: TYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a scalar value, it may be valuable to support this property being a map type, acting as a tagged union: https://jsontypedef.com/docs/jtd-in-5-minutes/#discriminator-schemas

Parseability and extensibility can be greatly enhanced by tagged unions. If everything's a string, then you must invent some kind of URI-type format to refer to things.

- PARAMn
```

The mechanism to map the sampler TYPE to the sampler implementation will be platform specific. For Java, the TYPE can be a simple class name, or a part of class name, and the implementing class can be found using reflection. Specifying a fully qualified class name should be an option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semantics of TYPE are platform specific

It's worth emphasizing this. If these configs aren't portable ("Here's a sampler config for a JVM workload; there's one for Node.js workload", e.g.), then that limits the ability of a central team to author and manage a small set of configs that many downstream workloads adopt.

Perhaps if we at least define behavior for when a sampler is not usable by a given platform, that'd be ok? E.g., if a Node.js workload has sampler config that directs it to use a sampler with type like

- kind: jvm.classpath
  package: org.something.or.other
  class: CoolSampler

then the Node.js OTel SDK could say "Ah, I don't support jvm.classpath samplers. In this case the sampler is inactive". TBD precisely what "inactive" means for consistent probability sampling.

text/0240-Sampling_Configuration.md Outdated Show resolved Hide resolved

### Rule based sampling

For rule-based sampling (e.g. decision depends on Span attributes), we need a RuleBasedSampler, which will take a list of Rules as an argument, an optional Span Kind and a fallback Sampler. Each rule will consist of a Predicate and a Sampler. For a sampling decision, if the Span kind matches the optionally specified kind, the list will be worked through in the declared order. If a Predicate holds, the corresponding Sampler will be called to make the final sampling decision. If the Span Kind does not match the final decision is not to sample. If the Span Kind matches, but none of the Predicates evaluates to True, the fallback sampler makes the final decision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuleBased here reminds me of the consistent probability sampler I defined last year called first:

# 'first' is a composite sampler, which takes a sequence of child
# samplers and uses the first active child.
kind: first
children:
  - condition: 'trace.service == "A"'
    kind: probability
    adjusted_count: 20
    
  - condition: 'trace.service == "B"'
    kind: probability
    adjusted_count: 50
    
  - kind: probability
    adjusted_count: 1000

The root sampler's p-value is that of the first active child sampler (child whose predicate is true). This is the evaluation model that both X-Ray and Refinery use, and expressiveness-wise it's a superset of Jaeger's config (a mapping from the pair (service name, span name) -> sampler). So I think it's a strong starting place for this conversation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerwilson Do you have any references for it? I'll be happy to include them as prior art.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I never pushed any of the actual config speccing to the internet (prior to these comments, anyway).

For X-Ray and Refinery, their documentation are the primary sources. I probably linked to the relevant parts in my notes in #213.

- 1000 # spans/second
```

The above example uses plain text representation of predicates. Actual representation for predicates is TBD. The example also assumes that a hypothetical `RateLimitingSampler` is available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual representation for predicates is TBD

Here are some possible directions, for reference:

- SAMPLERn
```

The AnyOf sampler takes a list of Samplers as the argument. When making a sampling decision, it goes through the list to find a sampler that decides to sample. If found, this sampling decision is final. If none of the samplers from the list wants to sample the span, the AnyOf sampler drops the span.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paraphrasing @yurishkuro in another thread:

The challenge with [an AnyOf / OR sampler] is not introducing [its configuration schema], but defining statistically meaningful behavior and sample weight calculation.

I spent a lot of time thinking about this last year. The following is adapted from notes that I never quite edited enough to put up in a PR, but I did present to Sampling SIG in some biweekly meeting (though I can't find any notes/YouTube recording, unfortunately).

Also, I'll also note that I focused entirely on consistent probability samplers since I'm so enthusiastic about validly estimating span counts that I limited my analysis to a world where every sampler is a consistent probability sampler. For the OTel-instrumented projects I work on, I will always lobby hard for keeping our spans countable in a statistically valid manner.


Evaluating a tree of consistent probability samplers

When I was thinking about how to compute the weight/p-value/etc. remains well-defined in a tree of consistent probability samplers, I arrived at the following design:

Definitions:

  • Sampler: Any object that, given a span, produces a p in (0, 1] indicating the probability with which to sample the span.
  • A sampler that considers the p-values of other samplers is called a composite sampler. A non-composite sampler may be called a leaf sampler.

First, start with a sampling policy: config specifying a tree of (predicate, sampler) pairs. Then, to make a sample/drop decision for a span you evaluate the tree:

  1. Evaluate every predicate. If a sampler's predicate is true, call that sampler 'active'.
  2. Disregard any samplers whose condition is false. Also discard any composite samplers with 0 active children. Call the resulting subtree the 'tree of active samplers'.
  3. Evaluate active sampler tree:
    1. Ask the root sampler for its p-value. If the sampler is composite, it will calculate its p-value using the p-values of its children, and so on, recursively.
    2. Finally, perform a single Bernoulli trial with the single resulting p. SAMPLE or DROP as appropriate.

(update p-value -> r-value or your preferred 'weight' notion as necessary)

Complication: Samplers with side effects

The above works well for stateless, no-side-effects samplers. But some samplers have side effects. For example, a sampler that collaborates with samplers in other processes in order to stay under a system-wide throughput limit does I/O to communicate with those processes. Or even simpler: a sampler that maintains some internal statistics and adapts its selectivity based only on what spans it has "seen". In what circumstances should it write updates to those statistics?

A policy containing a single effectful sampler that's activated on all traces has clear semantics: do all effects as normal. The semantics are less obvious when an effectful sampler only activates for some traces, or is composed with other samplers. In the 'tree-of-conditional-samplers' model, in what circumstances should a sampler perform its effects (vs. electing not to perform them at all)?

A sampler with side effects might have unconditional effects—effects triggered for every trace that is submitted to the policy—but it may want to also have effects that are conditional on things like the following:

  • Am I in the active sampler tree? If not, was it my own predicate or that of an ancestor (or both?) that caused me to be excluded?
  • If the sampler did make it to the active sampler tree,
    • What decision was made using the p-value I provided?
    • Was my decision "used"? This concept is akin to short-circuiting in logical operators in programming languages.
      • A sampler is not used if its parent (if it has one) is not used.
      • OR's children 2..n are not used if its first child decides to SAMPLE
      • AND's children 2..n are not used if its first child decides to DROP
      • FIRST's children 2..n are never used
    • If my decision was used, was it also "pivotal" to the overall decision? That is, if my decision were inverted (and all other decisions stayed the same), would the final decision have been inverted?
  • What was the final decision for the trace?

You could imagine that samplers interested in knowing these facts could provide a callback which, if defined, the tree evaluator calls after a final decision has been made. It could invoke the callback with 'receipt' data that conveys 'Here are a bunch of facts about a decision just made, and your relation to it / impact on it.' The sampler could then do with that information what it will.

What systems confront and answer this question today?

  • Refinery's rule-based policy calls to an instance of the secondary sampler only if it's used. It then unconditionally updates its state.

Hopefully the above inspires some ideas! I'm not sure how much of this complexity must be addressed at this stage in order to avoid boxing ourselves in. I suspect much of it is 'severable', able to be punted on. I figured I'd raise it now though in case someone smarter than me sees something critical.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are challenges with stateful (or with other side effects) samplers, as well as with samplers that modify the tracestate. We've made some effort to provide a clear framework for these samplers to operate. We want to see how far we can go with useful enhancements to the samplers library without having to change the specs for the head samplers API, but that means we cannot provide support for your "used" and "pivotal" concepts.

@mtwo
Copy link
Member

mtwo commented Nov 20, 2023

@carlosalberto @jack-berg can we mark this as triaged with priority p1?

- issue [173](https://github.com/open-telemetry/opentelemetry-specification/issues/173): Way to ignore healthcheck traces when using automatic tracer across all languages?
- issue [679](https://github.com/open-telemetry/opentelemetry-specification/issues/679): Configuring a sampler from an environment variable
- issue [1060](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060): Exclude URLs from Tracing
- issue [1844](https://github.com/open-telemetry/opentelemetry-specification/issues/1844): Composite Sampler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is very relevant to my comment here. If we had a spec'd component for a composable rule bases sampler, we could define its configuration across languages in opentelemetry-configuration, and allow users to specify complex rule-based sampling configuration in a YAML syntax with file configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way this OTEP enables us to experiment with sampler configuration generally, because (as I think other commenters, including @spencerwilson have shown) this is complicated. After we have integrated the new configuration model, with the new tracestate information in #235 and demonstrated prototypes compatible with head sampling, tail sampling, and Jaeger (for example), then I think we can work on specifying a standard rule-based sampler configuration for OTel.

Adding more references.
@PeterF778
Copy link
Author

Thank you all for your questions, comments and sharing your ideas.
I'll work on a modification of this proposal and introduce a possibility to declare a sampler. I hope this will improve:

  • readability, as the sampler arguments no longer will have to be purely positional
  • applicability for non-Java platforms
  • error checking and messaging

and will bring the proposal one step further to become a true meta-sampler.

@djaglowski
Copy link
Member

Can we clarify that this proposal is for sampling traces only? Similarly, should the proposed configuration structs be nested under a traces section?

@PeterF778
Copy link
Author

Can we clarify that this proposal is for sampling traces only? Similarly, should the proposed configuration structs be nested under a traces section?

Added the clarification. The proposal complements, but does not replace the file based configuration as defined by OTEP 225., so there's no need to specify traces.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding support to this proposal, which deserves extra attention from folks familiar with Jaeger remote sampling and from the OTel Configuration working group.

I appreciate that this proposal includes support for custom samplers, as I think this is an important area for experimentation. Sampler configuration is similar to Exemplar configuration in the OTel Trace SDK spec, where we are making similar allowances for custom samplers to be installed.

@jack-berg
Copy link
Member

@jmacd the goals of this should be compatible with what we're working on in the configuration group. Our working principle is to include all spec'd components as types in the configuration schema, but to ensure we support referencing / configuring custom extensions components, like samples, exporters, processors, propagators, metric readers, etc. This means that the schema directly specifies all the built-in samplers. In terms of configuring custom components, we're definitely still in the prototyping phase (here's a steel thread demo I've been using as a guiding use case in java) so we need to work out the details, but its a key feature we're committed to supporting.

IMO, a concrete task we can take out of this OTEP is for the sampling WG to work out the details of any required rule based samplers (I think I've also heard these called deterministic samplers) and add the definitions to the specification. After they're in the spec, the configuration SIG can add them to the schema.

@jmacd
Copy link
Contributor

jmacd commented Jan 4, 2024

🎉 @PeterF778 this is looking great!


### Rule based sampling

For rule-based sampling (e.g. decision depends on Span attributes), we need a `rule_based` sampler, which will take a list of Rules as an argument, and an optional Span Kind. Each rule will consist of a Predicate and a Sampler. For a sampling decision, if the Span kind matches the optionally specified kind, the list will be worked through in the declared order. If a Predicate holds, the corresponding Sampler will be called to make the final sampling decision. If the Span Kind does not match, or none of the Predicates evaluates to True, the final decision is not to sample.
Copy link

@brettmc brettmc Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we break this idea of a rule-based sampler out into its own proposal for a new Sampler type? I'm working on something very similar at the moment, with a working name of AttributeSampler. It seems like it could be a first-class sampler alongside parent-based, traceidratio & friends and therefore could/should live in the SDK instead of contrib?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after receiving some feedback I see that this OTEP is too large. I'm going to close it and re-open new ones with tailored scopes - easier to read, and hopefully easier to approve. I intend to work on them soon, but right now I'm really tied up in other things.

sampler: always_on
- predicate: true # works as a fallback sampler
sampler: trace_id_ratio_based
ratio: 0.25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not valid yaml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing it out.

@jmacd
Copy link
Contributor

jmacd commented Oct 10, 2024

@PeterF778 Would you say we can close this in favor of #250, or at least until there is progress with #250?

@PeterF778
Copy link
Author

@PeterF778 Would you say we can close this in favor of #250, or at least until there is progress with #250?

Yes, we can close it, and revisit the ideas later.

@PeterF778 PeterF778 closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.