Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/update attributes in event schema #634

Open
6 of 7 tasks
Tracked by #638
cortadocodes opened this issue Mar 11, 2024 · 10 comments
Open
6 of 7 tasks
Tracked by #638

Add/update attributes in event schema #634

cortadocodes opened this issue Mar 11, 2024 · 10 comments
Assignees

Comments

@cortadocodes
Copy link
Member

cortadocodes commented Mar 11, 2024

Feature request

Use Case

  • Add sender attribute to make it easier to find the SRUID of the child from a message
  • Add uuid or id
  • Add recipient
  • Add originator (question originator)
  • Rename message_number to order
  • Rename version to sender_sdk_version?
  • Add no_store flag?
@cortadocodes cortadocodes changed the title Add more attributes to events Add/update attributes in event schema Mar 14, 2024
@thclark
Copy link
Contributor

thclark commented Mar 20, 2024

What about a no-store flag which if true (default false) would prevent things from being streamed to any long term storage? Essentially this would allow the originator to request that the chain of events isn’t stored anywhere - great for use cases like testing against live services.

@thclark
Copy link
Contributor

thclark commented Mar 20, 2024

in terms of the ID didn't we establish that messages have an ID already?

@thclark
Copy link
Contributor

thclark commented Mar 20, 2024

Also should the sender be split into sender-namespace, sender-name, sender-version allowing refined filtering to a whole service or just a revision.

(This has made me think that the namespace is probably common to the whole workspace, making it redundant for this purpose)

@cortadocodes
Copy link
Member Author

What about a no-store flag which if true

Good idea. There's a slightly related forward_logs attribute too that's maybe a bit outdated. Maybe we should remove that and add no_store

in terms of the ID didn't we establish that messages have an ID already?

There is but it's pub/sub specific so won't be there for other backends and won't be consistent with the ID we add for them.

Also should the sender be split into sender-namespace, sender-name, sender-version allowing refined filtering to a whole service or just a revision.

I think we should add the namespace as we may have more than one namespace in a workspace. What's the use case for splitting the SRUID into its three parts instead of keeping it whole?

@cortadocodes
Copy link
Member Author

cortadocodes commented Mar 21, 2024

How would the no_store attribute relate with the save_diagnostics attribute? Does the latter respect the former or are they independent?

@cortadocodes cortadocodes self-assigned this Mar 21, 2024
@thclark
Copy link
Contributor

thclark commented Mar 21, 2024

What about a no-store flag which if true

Good idea. There's a slightly related forward_logs attribute too that's maybe a bit outdated. Maybe we should remove that and add no_store

Now that we have a store, if we wanted not to store logs in general, we would probably store them but clean up after 7 days (or whatever period).

Of course if you don't want to see logs, but don't care about service performance (or maybe want them stored for posterity), you can still stream them to the store and simply filter out the log event kind in your subscription or query.

However, the forward_logs attribute would still cover the narrow use case of accelerating communication (by allowing you to avoid incurring the expense of sending them TO pubsub in the first place). I've no idea how valuable this is; but it's certainly something we've never actually used.

I would leave it alone, since it's still a legitimate feature albeit unused to date... unless it's getting in the way somehow in which case feel free to steamroller it.

in terms of the ID didn't we establish that messages have an ID already?
There is but it's pub/sub specific so won't be there for other backends and won't be consistent with the ID we add for them.

ah, ok

Also should the sender be split into sender-namespace, sender-name, sender-version allowing refined filtering to a whole service or just a revision.
I think we should add the namespace as we may have more than one namespace in a workspace.
I'd never intended that to be a possibility. Surely if you were doing services all in one project/workspace they'd be namespaced together. That said, leaving the namespace there would keep open the option for later (also keep openthe option for doing things like bridging across tables in different namespaces/workspaces to merge events from multiple services... so I'm fine wiht keeping it.

What's the use case for splitting the SRUID into its three parts instead of keeping it whole?

You want to be able to filter on:

  • all events from a namespace
  • all events from a service
  • all events from a service revision

My understanding of the filters is that they use equality match (if attribute == value).
If you have an attribute sruid:

  • you get service revision events by doing if sruid == 'octue/ourservice/0.1.9'
  • but how would you get all events from octue/ourservice?

Whereas if you had attributes namespace service version:

  • you get service revision events with if namespace=='octue' && service == 'ourservice' && version == '0.1.9
  • you get all service events with if namespace=='octue' && service == 'ourservice'

@cortadocodes
Copy link
Member Author

cortadocodes commented Mar 27, 2024

but how would you get all events from octue/ourservice?

You could actually do this with the filter hasPrefix(attributes.sruid, "octue/ourservice:") - see here but it wouldn't work if you wanted, for example, all services with the name ourservice on any namespace.

If we're splitting the SRUID like this, I'll have to do it for three fields, not just one:

  • Originator
  • Sender
  • Recipient

Is that ok or too much? It's still well within the maximum number of attributes we can have (100).

@thclark
Copy link
Contributor

thclark commented Mar 28, 2024

That's actually too much; if there's a hasPrefix filter that totally meets our use case without overcompensating things.

(I thought you could only filter on match)

I can't imagine a clear use case where we'd want the later example you have (all services named x from different namespaces) so let's not worry about that.

(TBH I'm still not convinced that namespace would ever change at all in these tables anyhow, rendering it moot)

@cortadocodes
Copy link
Member Author

Oh no I just added and tested all of those

@cortadocodes
Copy link
Member Author

cortadocodes commented Mar 28, 2024

How would the no_store attribute relate with the save_diagnostics attribute? Does the latter respect the former or are they independent?

Still wondering about this.

Save diagnostics

When save_diagnostics=True for a question asked to a child, these things are uploaded to a cloud storage bucket for each question:

  • Configuration values
  • Configuration manifest and datasets
  • Input values
  • Input manifests and datasets
  • The events received from any children the child asked its own questions to as part of the question

When save_diagnostics=False, none of this is uploaded.

Store events

In contrast, when no_store=False, these things are saved in the event store:

  • The question event, which includes:
    • Input values
    • Input manifest (which can be used to get the input datasets)
  • All events sent by the child back to the parent, including the result event which includes:
    • Output values
    • Output manifest (which can be used to get the output datasets)

When no_store=True, none of these will be stored.

The difference

The two have a partially overlapping use case but have some key differences.

Property save_diagnostics=True no_store=False
Stores configuration values/manifest True False
Stores all events relating to immediate question False True
Stores response events relating to first layer of subquestions True Possibly

I think:

  • The event store could replace the current save_diagnostics implementation if the configuration used to answer each question was emitted as an event
  • We should ultimately remove the save_diagnostics attribute while merging its functionality into the event store
  • We can't remove it yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants