Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a TODO to start implementation of HED support in annotations #13059

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

VisLab
Copy link

@VisLab VisLab commented Jan 13, 2025

In response to #11519

Added a TODO to annotations.py as agreed to get started on implementation of basic-level HED (Hierarchical Event Descriptors) support in mne annotations to allow HED tags to be used in epoching.

Copy link

welcome bot commented Jan 13, 2025

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴

ch_names=None,
):
hed = _soft_import("hedtools", "validation of HED tags in annotations") # noqa
# TODO is some sort of initialization of the HED cache directory necessary?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the users cache directory is in $USERHOME/.hedtools/hed_cache. The hedtools distribution has in hed/schema/schema_data the most recent HED schemas and when this directory is created, it is initialized with this data. If the user accesses a version that is not cached, it goes to the web to fetch it and caches it.

The default location can be overridden if this would be better for MNE-Python, but it isn't necessary for HED.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the users cache directory is in $USERHOME/.hedtools/hed_cache.

yeah, and I noticed that directory was empty for me after installing/importing/fiddling with the package, which is why I asked. Sounds like that wasn't a problem.

orig_time=orig_time,
ch_names=ch_names,
)
# TODO validate the HED version the user claims to be using.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the user tries to load the schema version --- if it can't find it, the load fails. This happens at the beginning of validation. For now I think that is sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in other words, when the hed_strings are passed to the validator, the schema version is also passed in (and validated). That makes sense. So no need to do a separate validation of the schema version here.

f"Number of HED tags ({len(hed_tags)}) must match the number of "
f"annotations ({len(self)})."
)
# TODO insert validation of HED tags here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might need clarification here. My understanding is that this is at the stage of annotations for the continuous data before epoching. We would have the message "Number of HED strings..." since HED tags refer to the individual tags rather than the comma-separated list. Yes this is where it would occur. The validator would take strings in. If there are validation errors, how would they be reported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable hed_tags is a list of strings. I'm assuming you would pass that to a validation func in your hedtools library, and it would raise an error if the strings represent tags that aren't in the hierarchy? Is that not how things work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validator returns a list of dicts with the issues. There is a function to get printable strings out of this. For your use case, we would probably want to validate the entire list. The validators take a ErrorHandler which manages the context so that it can identify which element had the error (and which file if this is applicable).

We could wrap this to raise an error. For the current situation are you validating each sublist individually or are you doing it by event?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Answering more thoroughly now that I'm at my desk instead of my phone):

HED tags refer to the individual tags rather than the comma-separated list

OK, we should change the variable name from hed_tags to hed_strings then

My understanding is that this is at the stage of annotations for the continuous data before epoching.

yes. Here I'm assuming that each annotated segment will have a single string (containing comma-separated tags) associated with it (i.e., it's neither necessary nor allowed to associate a list of HED strings with a single annotation). Correct me if I'm wrong about that please.

The validator would take strings in [...] The validator returns a list of dicts with the issues.

OK, so maybe something like:

hed_results = func_that_validates_list_of_HED_strings(hed_strings)
# or if the validator takes in single strings instead of a list, then maybe
# hed_results = list(map(func_that_validates_one_HED_string, hed_strings))
if any(map(len, hed_results)):
    err_strings = list(map(func_that_gets_printable_strings, hed_results))
    raise ValueError(
        "Some HED strings in your annotations failed to validate:\n"
        "\n  - ".join(err_strings)
    )

For the current situation are you validating each sublist individually or are you doing it by event?

not sure what "sublist" is here, do you mean "list of HED strings, of same length as the number of annotated events"? We can structure it so that each hed_string is passed individually to a validator function, or so that the list of HED strings (one per annotation span) is passed to the validator all at once. On our end it doesn't matter, it's determined by what your API is designed to take in. If your API can handle either case, then I'd decide based on which option yields cleaner, clearer, simpler code for us. Can you point me to the validator function(s) that we're talking about here, so I can look at their params/returns?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Here I'm assuming that each annotated segment will have a single string (containing comma-separated tags) associated with it (i.e., it's neither necessary nor allowed to associate a list of HED strings with a single annotation). Correct me if I'm wrong about that please.

Yes there will be a single string but I need clarification on what an annotated segment is. A HED annotation would ordinarily be associated with a single time marker. Assuming that you are not using the Onset, Inset, Offset mechanism, but rather sticking to Duration this would be fine.

NOTE: If you have a table of events (with onset and HED annotations), you can also compute the (Event-context, (tags representing ongoing event processes)). I assume we would not be using that -- but that MNE would take care of that.

Copy link
Author

@VisLab VisLab Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the validator function(s) that we're talking about here, so I can look at their params/returns?

For individual strings it is the HedValidator class. The input string would be converted to a HedString before validation. Here is a rough example (you only need one HedValidator for all of the strings).

error_handler = ErrorHandler(check_for_warnings=False)
validator = HedValidator(schema)
hed_obj = HedString(mystring, schema)
issues = validator.validate(hed_obj, False, error_handler=error_handler)
issue_str = get_printable_issue_string(issues)

If you want to validate an entire column (with header "HED"), you would create a TabularInput and call its validate method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need clarification on what an annotated segment is. A HED annotation would ordinarily be associated with a single time marker. Assuming that you are not using the Onset, Inset, Offset mechanism, but rather sticking to Duration this would be fine.

in MNE Annotations (and by extension, HEDAnnotations), each "annotation span" has attributes onset, duration, description (and for HEDAnnotations, also hed_string). Duration may be zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a table of events (with onset and HED annotations), you can also compute the (Event-context, (tags representing ongoing event processes)). I assume we would not be using that -- but that MNE would take care of that.

I think that is something we can handle later in a separate PR, once the HEDAnnotation class is in place. IIUC, that would be useful when creating epochs (i.e., create epochs around button press events, but only within an event context of "response window is open")

Comment on lines +876 to +896
def append(self, onset, duration, description, ch_names=None):
"""TODO."""
pass

def count(self):
"""TODO. Unlike Annotations.count, keys should be HED tags not descriptions."""
pass

def crop(
self, tmin=None, tmax=None, emit_warning=False, use_orig_time=True, verbose=None
):
"""TODO."""
pass

def delete(self, idx):
"""TODO."""
pass

def to_data_frame(self, time_format="datetime"):
"""TODO."""
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisLab these TODOs are for me. So as you can see some things aren't going to work yet, but we're already at least able to do:

$ ipython
In [1]: import mne
In [2]: foo = mne.HEDAnnotations([0, 1], [0.5, 1.2], ['foo', 'bar'], ['hed/foo', 'hed/
   ...: bar'])
In [3]: foo
Out[3]: <HEDAnnotations | 2 segments: hed/bar (1), hed/foo (1)>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely sure what these do, but would be willing to help as needed. Would the get_annotations_per_epoch then have an additional list for HED annotations in the list of lists?

Thanks @drammock

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated MWE:

$ ipython
In [1]: import mne
In [2]: foo = mne.HEDAnnotations([0, 1], [0.5, 1.2], ['foo', 'bar'], ['sensory-eve
   ...: nt, visual-presentation, (blue, square)', 'agent-action, (push, (left, mou
   ...: se-button))'])
In [3]: foo
Out[3]: <HEDAnnotations | 2 segments: agent-action, (push, (left, mouse-button)) ...>

@VisLab the HED strings are a bit longer than I expected, so in the HEDAnnotations object repr (last line of output above) we only see one before the ellipsis kicks in. Does your library have a built-in way to show a "compact" representation of a HED String? If so I'd like to use it in the repr

duration,
description,
hed_strings,
hed_version="8.3.0", # TODO @VisLab what is a sensible default here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisLab what is a sensible default for schema version?

return (
super().__eq__(self, other)
and np.array_equal(self.hed_strings, other.hed_strings)
and self.hed_version == other.hed_version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisLab if we want to compare equality of two HEDAnnotations objects, and we know already that their HED Strings are equivalent, should we care that they were validated with different HED schema versions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tags are the same, but it should be re-validated using the latest version of the schema.

RE: Although once a tag is in the schema, it is always there (unless there is a major version change which we don't anticipate and even then -- every effort would be made to keep tags). This being said, the schema tags have attributes which may affect how they are validated -- also they might also have a different path in the hierarchy as upper level tags are added. (That is why the annotations should use the short form as much as possible and use tools to expand if needed.)

In other words -- if you have two datasets and they have different versions of the schema then I think it should work if you revalidate using the latest of the two versions of the schema. (Am I correctly understanding that within a given dataset the files would use a single version of HED?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrasing my question will help: if I have 2 HED Strings, and as strings they are identical (i.e., they both say "sensory-event, visual-presentation, (blue, square)"), does it even make sense to say "these aren't equal" simply because one was validated against schema version X and another was validated against schema version X+1? When I phrase the question that way, the answer seems obvious to me: schema version doesn't matter when comparing equality of the strings (and thus there is no point to doing the extra computation of re-validating the strings against the newer schema version when testing their equality). But I'm still not clear on whether you'd agree with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this might also help: MNE-Python does not deal with datasets. That is the job of MNE-BIDS. Within MNE-BIDS, I think it is safe to assert/require that only one schema version is used to do all validation of annotations within that dataset. So the question I'm asking is really about the collection of HED strings attached to a single recording and what counts as "the same" when talking about those HED strings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants