Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for ComposeDescriptor to accept arbitrary kwargs to create descriptor #286

Closed

Conversation

rosesyrett
Copy link

@rosesyrett rosesyrett commented Aug 23, 2023

Description

Currently the DAQ-Core team at diamond are in need of being able to pass metadata to descriptor documents, primarily for nexus file writing.

Motivation and Context

In our case, we want descriptor documents from streams to be easily identifiable and match up with the Application definitions for the experiments they represent (e.g. for tomography, detectors require an image key DataKey field).

How Has This Been Tested?

I've written a test confirming that extra kwargs are passed to the resulting descriptor document.

@rosesyrett
Copy link
Author

At the moment I'm seeing mypy errors:

event_model/__init__.py:2427: error: Unsupported type "dict[str, Any]" for ** expansion in TypedDict  [typeddict-item]
Found 1 error in 1 file (checked 4 source files)

This is because I'm trying to unpack kwargs into the EventDescriptor, which is a TypedDict. The JSON schema suggests we can have any arbitrary fields, but the python TypedDict is obviously restrictive in what it can handle. Is there any nice way to deal with this?

Copy link
Contributor

@evalott100 evalott100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I don't think there'll be a mypy compliment way to add any possible key from kwargs into the __init__ of the typed dict, I'd suggest updating the dict after it's been initialised, but I'm open to whatever

@@ -2402,6 +2402,7 @@ def __call__(
time=None,
uid=None,
validate=True,
**kwargs,
Copy link
Contributor

@evalott100 evalott100 Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle this the same way as RunStart (without the unpacking in the __call__ kwargs)

metadata: Optional[Dict] = None,

I'm happy to change it to unpacking if others prefer, but I think they should be consistent

@evalott100
Copy link
Contributor

I deleted some previous comments on this, they also result in mypy errors down the line... Should maybe discuss solutions at the meeting

@evalott100
Copy link
Contributor

This passes mypy on event-model:

# =================== __init__.py
class ComposeDescriptor:
    start: RunStart
    streams: dict
    event_counters: Dict[str, int]

    def __call__(
        self,
        name,
        data_keys,
        hints=None,
        configuration=None,
        object_keys=None,
        time=None,
        uid=None,
        validate=True,
        metadata=None,
    ) -> ComposeDescriptorBundle:
        if time is None:
            time = ttime.time()
        if uid is None:
            uid = str(uuid.uuid4())
        if hints is None:
            hints = {}
        if configuration is None:
            configuration = {}
        if object_keys is None:
            object_keys = {}

        doc = EventDescriptor(
            configuration=configuration,
            data_keys=data_keys,
            name=name,
            object_keys=object_keys,
            run_start=self.start["uid"],
            time=time,
            uid=uid,
            hints=hints,
        )
        if metadata:
            doc.update(metadata)

        if validate:
            if name in self.streams and self.streams[name] != set(data_keys):
                raise EventModelValidationError(
                    "A descriptor with the name {} has already been composed with "
                    "data_keys {}. The requested data_keys were {}. All "
                    "descriptors in a given stream must have the same "
                    "data_keys.".format(name, self.streams[name], set(data_keys))
                )
            schema_validators[DocumentNames.descriptor].validate(doc)

        if name not in self.streams:
            self.streams[name] = set(data_keys)
            self.event_counters[name] = 1

        return ComposeDescriptorBundle(
            descriptor_doc=doc,
            compose_event=ComposeEvent(
                descriptor=doc, event_counters=self.event_counters
            ),
            compose_event_page=ComposeEventPage(
                descriptor=doc, event_counters=self.event_counters
            ),
        )

# =================== test.py
run_doc, compose_descriptor, compose_resource, compose_stop = compose_run()

descriptor_doc, compose_event, compose_event_page = compose_descriptor(
    "name", {}, metadata={"a": "b"}, validate=True
)


descriptor_doc["data_keys"]
descriptor_doc["run_start"]
descriptor_doc["a"]


def foo(descriptor_doc: EventDescriptor):
    ...

foo(descriptor_doc)

@DiamondJoseph
Copy link
Contributor

Decided to place this metadata on StartDocument, with some mapping object. Assumption is that enough information available at run start to resolve onto any eventual stream creations (even if creating streams proceedurally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants