Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change recording to create a span of type "record_root" #1703

Open
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-gtokernliang
Copy link
Contributor

@sfc-gh-gtokernliang sfc-gh-gtokernliang commented Dec 19, 2024

Description

Updated the app context manager to be able to create a span of type "record_root"

Other details good to know for developers

For reference, this is the current set of spans as produced in the notebook:

{
    "name": "nested2",
    "context": {
        "trace_id": "0xac43c20656ef1ed158f02f4e62c97a04",
        "span_id": "0x424b5fe0f1f1a867",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0xe197ec362bed5183",
    "start_time": "2024-12-19T17:17:48.214895Z",
    "end_time": "2024-12-19T17:17:48.828808Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "name": "nested2",
        "kind": "SPAN_KIND_TRULENS",
        "parent_span_id": 4777017249493002343,
        "trulens.record_id": "bbd1b4dc-75fb-4d93-993d-e989e161330a",
        "nested2_ret": "nested2: test",
        "nested2_args[0]": "test",
        "status": "STATUS_CODE_UNSET"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "trulens"
        },
        "schema_url": ""
    }
}
{
    "name": "nested",
    "context": {
        "trace_id": "0xac43c20656ef1ed158f02f4e62c97a04",
        "span_id": "0xe197ec362bed5183",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0x90ebf55e6bebc620",
    "start_time": "2024-12-19T17:17:47.576714Z",
    "end_time": "2024-12-19T17:17:49.875737Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "name": "nested",
        "kind": "SPAN_KIND_TRULENS",
        "parent_span_id": 16255721097426456963,
        "trulens.record_id": "bbd1b4dc-75fb-4d93-993d-e989e161330a",
        "nested_attr1": "value1",
        "status": "STATUS_CODE_UNSET"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "trulens"
        },
        "schema_url": ""
    }
}
{
    "name": "respond_to_query",
    "context": {
        "trace_id": "0xac43c20656ef1ed158f02f4e62c97a04",
        "span_id": "0x90ebf55e6bebc620",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": "0x47c29031b69b4971",
    "start_time": "2024-12-19T17:17:46.523158Z",
    "end_time": "2024-12-19T17:17:50.869185Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "name": "respond_to_query",
        "kind": "SPAN_KIND_TRULENS",
        "parent_span_id": 10442709946874971680,
        "trulens.record_id": "bbd1b4dc-75fb-4d93-993d-e989e161330a",
        "status": "STATUS_CODE_UNSET"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "trulens"
        },
        "schema_url": ""
    }
}
{
    "name": "root",
    "context": {
        "trace_id": "0xac43c20656ef1ed158f02f4e62c97a04",
        "span_id": "0x47c29031b69b4971",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": null,
    "start_time": "2024-12-19T17:17:42.002877Z",
    "end_time": "2024-12-19T17:17:51.852615Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "kind": "SPAN_KIND_TRULENS",
        "name": "root",
        "trulens.span_type": "record_root",
        "trulens.record_root.app_name": "default_app",
        "trulens.record_root.app_version": "base",
        "trulens.record_root.app_id": "app_hash_baf7b2cb6402e84fa3b0b3a028d4bf65",
        "trulens.record_root.record_id": "bbd1b4dc-75fb-4d93-993d-e989e161330a"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "trulens"
        },
        "schema_url": ""
    }
}

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to
    not work as expected)
  • New Tests
  • This change includes re-generated golden test results
  • This change requires a documentation update

Important

Add record_root span type to app context manager for enhanced tracing with OpenTelemetry.

  • Behavior:
    • Add span type record_root to app context manager in app.py.
    • Update App class in instrument.py to manage record_root spans with OpenTelemetry.
  • Attributes:
    • Add SPAN_TYPE and RECORD_ID attributes in trace.py for span identification.
  • Imports:
    • Change import paths from trulens.experimental.otel_tracing.core.app to trulens.experimental.otel_tracing.core.instrument in app.py.

This description was created by Ellipsis for 14869e1. It will automatically update as commits are pushed.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 19, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

examples/experimental/otel_exporter.ipynb Show resolved Hide resolved
src/otel/semconv/trulens/otel/semconv/trace.py Outdated Show resolved Hide resolved
tracer = trace.get_tracer_provider().get_tracer(TRULENS_SERVICE_NAME)

# Calling set_baggage does not actually add the baggage to the current context, but returns a new one
# To avoid issues with remembering to add/remove the baggage, we attach it to the runtime context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/To/to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't really get this token business, can you enlighten me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I can tell - to set the baggage you need to:

  1. call set_baggage, which returns a new, updated context
  2. attach it to the current context with context_api.attach - this returns a token that can then be used to remove it.
  3. Once the record is over, we want to remove it, and the way to do it via OTEL context is by telling it the baggage you want it to detach via context.detach(token)

root_span.set_attribute(SpanAttributes.RECORD_ROOT.APP_ID, self.app_id)
root_span.set_attribute(
SpanAttributes.RECORD_ROOT.RECORD_ID, otel_record_id
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantic convention that Piotr laid out also has the MAIN_INPUT/MAIN_OUTPUT/MAIN_ERROR, but I don't see how we can get that even during __exit__, so I'm inclined to say we can remove them. Though that raises the question of, how will the UI know what the input and output are to display. It's too close to the finish line of 2024 that I don't want to investigate how this is determined currently but can next year haha.

There's also TOTAL_COST but I'm not as worried about that.

Copy link
Contributor Author

@sfc-gh-gtokernliang sfc-gh-gtokernliang Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we should chat about this - I think one idea I had while typing this up was to omit the record_root span type, and instead:

  1. Attributes that are initially proposed to be stored in record_root should now be stored in the baggage name/version/id/record_id)
  2. Every trace in the record should track all of the attributes above
  3. the root of the record is essentially the span with no parent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the root span type though to denote when we start a record though since if you create a otel trace before you start calling your app a bunch of times they'll all be part of the same trace but we want them to be separate records.

But more importantly, I'm confused, how does this help us determine the MAIN_INPUT/MAIN_OUTPUT/MAIN_ERROR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you create a otel trace before you start calling your app a bunch of times they'll all be part of the same trace but we want them to be separate records.

I know we discussed this before, but my memory's failing me - what does the pseudocode for that look like? is it:

with tru_app as recording:
  tru_app.respond_to_query(query)
  tru_app.do_something_else(query)

just so I can experiment with it a little more :)

But more importantly, I'm confused, how does this help us determine the MAIN_INPUT/MAIN_OUTPUT/MAIN_ERROR?

we aren't doing this in code as of yet, but I'm thinking that we should track the input/output/error for every span, so the way we determine the main input/output/error semantically would be:

  1. Find the span with no parent
  2. Use its input/output/error as the main input/output/error for the span.

@@ -51,6 +58,10 @@ def wrapper(*args, **kwargs):
span.set_attribute(
"parent_span_id", parent_span.get_span_context().span_id
)
span.set_attribute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want app/run id in the baggage as well I guess, but given that it's not totally clear yet no point in doing it now I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put the app id in baggage for now I guess since that seems pretty good to me. For run_id it's a little more ambiguous what the shape of the API will look like so I'll leave that out for now.

Base automatically changed from garett/SNOW-1854278 to main December 23, 2024 19:31
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Dec 23, 2024
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Dec 24, 2024
@@ -421,6 +422,18 @@ def tru(self) -> core_connector.DBConnector:
pydantic.PrivateAttr(default_factory=dict)
)

tokens: list[object] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think list[object] is worse than List[object] since the former doesn't work in python 3.8 which we do support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is the type of the token varying or something? Why object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! didn't know about the difference between the two before.

object is chosen because that's the signature of the attach function in the OTEL context API

Screenshot 2024-12-24 at 10 03 17 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-12-24 at 10 03 59 AM

@@ -84,6 +90,20 @@ def wrapper(*args, **kwargs):
# It's on the user to deal with None as a return value.
func_exception = e

span.set_attribute("name", func.__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something that the event table expects or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root_span.set_attribute(SpanAttributes.RECORD_ROOT.APP_ID, self.app_id)
root_span.set_attribute(
SpanAttributes.RECORD_ROOT.RECORD_ID, otel_record_id
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the root span type though to denote when we start a record though since if you create a otel trace before you start calling your app a bunch of times they'll all be part of the same trace but we want them to be separate records.

But more importantly, I'm confused, how does this help us determine the MAIN_INPUT/MAIN_OUTPUT/MAIN_ERROR?

# See https://github.com/open-telemetry/opentelemetry-python/issues/2432#issuecomment-1593458684
context_api.detach(self.tokens.pop())

if self.span_context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this ever not be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it, but didn't think it would hurt to include it, if nothing, at least for the type checking haha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants