Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output MMIF format #1

Open
keighrim opened this issue Jun 7, 2024 · 2 comments
Open

output MMIF format #1

keighrim opened this issue Jun 7, 2024 · 2 comments

Comments

@keighrim
Copy link
Member

keighrim commented Jun 7, 2024

This thread to discuss output representation of R-F bindings in MMIF syntax and vocab.

@wricketts
Copy link
Contributor

@keighrim - I had some floating questions about RFB and general MMIF structure

  1. When calling mmif[<annotation_id>] on a mmif object, I noticed that it only works if it's the annotation's .long_id. Using regular .id gives a KeyError. Is this intentional?

  2. Maybe I'm answering my own question, but I also noticed that the .id number is not unique globally, but only unique within a view. As in, v_1 and v_2 can both have a TextDocument td_1 in them. Is there an implicit assumption that this td_1 should correspond to the same document across views? If so, are there any innate enforcements/guards for that assumption, or are clams apps developers supposed to write logic complying with that assumption?

  3. The RFB is implemented to return an empty csv if no roles/fillers are identified in the input (due to noise), or if the parser fails. @haydenmccormick brought up the suggestion that we add a runtime parameter to control whether or not the app should generate an annotation if the CSV content is empty. I thought this sounded reasonable, but had a concern related to the above 2 Q's.

    If for example, docTR's td_1 was too noisy, and the user opts to have RFB omit empty CSVs, then it's possible that RFB's td_1 could correspond to docTR's td_2 (or a higher number) , which is not super intuitive. Ultimately, the number mismatch won't prevent us from tracing the relation because we have alignments, but it could be less "user-friendly" to not have a global 1-to-1 mapping between id and document.

  • Do we care about this?
  • If we do, should we not include this runtime param, and have RFB always return annotations for each OCR textdocument?

@keighrim
Copy link
Member Author

Regarding q1, I have started a new issue to make id unambiguous. clamsproject/mmif#228 The problem is that when we start to force long_id everywhere, that'll break any future apps from past MMIFs (or past apps that generates past MMIFs).

You are right about the annotation id without view-id prefix are implicitly "scoped" to the view it resides. That said, the annotation id can be re-used to refer to different objects as long as the "scope" is different. Thus, having v1:td2 is aligned to v2:td1 is totally fine and we don't care.

Al that put together, I don't think it's a good idea to produce "empty" text document when the RFB parsing fails - it doesn't add any information while adding space and time complexity to handle the MMIF outputs (storage-wise and json.load-wise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants