Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make use of paragraph milestones in original XML in the styling #6

Open
jtauber opened this issue Apr 5, 2019 · 5 comments
Open

make use of paragraph milestones in original XML in the styling #6

jtauber opened this issue Apr 5, 2019 · 5 comments
Labels
backend-api New API or change to existing API

Comments

@jtauber
Copy link
Member

jtauber commented Apr 5, 2019

Reference Model: C2

@jtauber jtauber added the backend-api New API or change to existing API label Apr 5, 2019
@jacobwegner
Copy link
Contributor

@jtauber can you provide more context or a link for "Reference Model: C2" above? This seems like it will be another "unit" of chunking, right?

@jtauber
Copy link
Member Author

jtauber commented May 28, 2019

Reference Model: C2 is just the broad classification of features from https://github.com/deep-philology/DeepReader/wiki/A-Reference-Model-for-Capabilities-of-Online-Readers

It might not be a chunking type (although it is in MorphGNT, see https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data ). It could just be visual styling, e.g. indentation or margins.

@jtauber
Copy link
Member Author

jtauber commented May 28, 2019

In other words, it's rare for people to use paragraphs as a citation scheme but they are very common just as a way to visually break up the text. Obviously the fact they aren't generally uses as a citation scheme doesn't mean they can't be :-)

Beowulf and Homer both have paragraphs marked up but I don't think anyone would ever say "in paragraph 53...". A nice rendering of either Beowulf or Homer, though, might want to have vertical space between paragraphs or something.

That said, for prose it might be more useful for citation / addressing.

@jacobwegner
Copy link
Contributor

Thanks for the context.

I guess what I was getting at with "chunking" was "grouping", so even if you don't "reference" (as a human) paragraph 53 or we don't handle a "query" (from the frontend) for a particular paragraph, we're doing something within the data layer to annotate that the token with the value μῆνιν (which is idx 0 for the whole work, position 1 within urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1 is part of a larger "paragraph 1".

@jtauber
Copy link
Member Author

jtauber commented May 28, 2019

I'd still argue there's a difference between structure (which could be used for all sorts of things including visual treatment, citation, pagination, etc) and mere visual treatment.

Note also that often paragraph breaks are what is marked, not the overall structural unit of a paragraph. One could chose to map a paragraph break to <br/><br/><br/> (ugh!) which might achieve the desired visual effect while having zero to say about structure / grouping / chunking.

All this said, I don't think there's any harm in having the notion of a paragraph reference on a token.

In ReadBeowulf I have:

fitt_id = models.IntegerField(db_index=True)
para_id = models.IntegerField(db_index=True)
para_first = models.BooleanField()
line_id = models.IntegerField(db_index=True)
half_line = models.CharField(max_length=1)
token_offset = models.IntegerField()

(unlike https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data where I have the various chunking schemes defined individually and mapped to token numbers but you can obviously easily switch between one representation and the other)

Note in the Beowful token model I have a para_first boolean which indicates "this is the first token in a new paragraph". It could be that, for example, that triggers the visual treatment, rather than an actual para_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-api New API or change to existing API
Projects
None yet
Development

No branches or pull requests

2 participants