-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make use of paragraph milestones in original XML in the styling #6
Comments
@jtauber can you provide more context or a link for "Reference Model: C2" above? This seems like it will be another "unit" of chunking, right? |
Reference Model: C2 is just the broad classification of features from https://github.com/deep-philology/DeepReader/wiki/A-Reference-Model-for-Capabilities-of-Online-Readers It might not be a chunking type (although it is in MorphGNT, see https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data ). It could just be visual styling, e.g. indentation or margins. |
In other words, it's rare for people to use paragraphs as a citation scheme but they are very common just as a way to visually break up the text. Obviously the fact they aren't generally uses as a citation scheme doesn't mean they can't be :-) Beowulf and Homer both have paragraphs marked up but I don't think anyone would ever say "in paragraph 53...". A nice rendering of either Beowulf or Homer, though, might want to have vertical space between paragraphs or something. That said, for prose it might be more useful for citation / addressing. |
Thanks for the context. I guess what I was getting at with "chunking" was "grouping", so even if you don't "reference" (as a human) paragraph 53 or we don't handle a "query" (from the frontend) for a particular paragraph, we're doing something within the data layer to annotate that the token with the value |
I'd still argue there's a difference between structure (which could be used for all sorts of things including visual treatment, citation, pagination, etc) and mere visual treatment. Note also that often paragraph breaks are what is marked, not the overall structural unit of a paragraph. One could chose to map a paragraph break to All this said, I don't think there's any harm in having the notion of a paragraph reference on a token. In ReadBeowulf I have:
(unlike https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data where I have the various chunking schemes defined individually and mapped to token numbers but you can obviously easily switch between one representation and the other) Note in the Beowful token model I have a |
Reference Model: C2
The text was updated successfully, but these errors were encountered: