make use of paragraph milestones in original XML in the styling #6

jtauber · 2019-04-05T15:17:01Z

Reference Model: C2

jacobwegner · 2019-05-28T16:36:50Z

@jtauber can you provide more context or a link for "Reference Model: C2" above? This seems like it will be another "unit" of chunking, right?

jtauber · 2019-05-28T17:06:17Z

Reference Model: C2 is just the broad classification of features from https://github.com/deep-philology/DeepReader/wiki/A-Reference-Model-for-Capabilities-of-Online-Readers

It might not be a chunking type (although it is in MorphGNT, see https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data ). It could just be visual styling, e.g. indentation or margins.

jtauber · 2019-05-28T17:09:38Z

In other words, it's rare for people to use paragraphs as a citation scheme but they are very common just as a way to visually break up the text. Obviously the fact they aren't generally uses as a citation scheme doesn't mean they can't be :-)

Beowulf and Homer both have paragraphs marked up but I don't think anyone would ever say "in paragraph 53...". A nice rendering of either Beowulf or Homer, though, might want to have vertical space between paragraphs or something.

That said, for prose it might be more useful for citation / addressing.

jacobwegner · 2019-05-28T17:15:22Z

Thanks for the context.

I guess what I was getting at with "chunking" was "grouping", so even if you don't "reference" (as a human) paragraph 53 or we don't handle a "query" (from the frontend) for a particular paragraph, we're doing something within the data layer to annotate that the token with the value μῆνιν (which is idx 0 for the whole work, position 1 within urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1 is part of a larger "paragraph 1".

jtauber · 2019-05-28T17:29:21Z

I'd still argue there's a difference between structure (which could be used for all sorts of things including visual treatment, citation, pagination, etc) and mere visual treatment.

Note also that often paragraph breaks are what is marked, not the overall structural unit of a paragraph. One could chose to map a paragraph break to <br/><br/><br/> (ugh!) which might achieve the desired visual effect while having zero to say about structure / grouping / chunking.

All this said, I don't think there's any harm in having the notion of a paragraph reference on a token.

In ReadBeowulf I have:

fitt_id = models.IntegerField(db_index=True)
para_id = models.IntegerField(db_index=True)
para_first = models.BooleanField()
line_id = models.IntegerField(db_index=True)
half_line = models.CharField(max_length=1)
token_offset = models.IntegerField()

(unlike https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data where I have the various chunking schemes defined individually and mapped to token numbers but you can obviously easily switch between one representation and the other)

Note in the Beowful token model I have a para_first boolean which indicates "this is the first token in a new paragraph". It could be that, for example, that triggers the visual treatment, rather than an actual para_id.

jtauber added the backend-api New API or change to existing API label Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make use of paragraph milestones in original XML in the styling #6

make use of paragraph milestones in original XML in the styling #6

jtauber commented Apr 5, 2019

jacobwegner commented May 28, 2019

jtauber commented May 28, 2019

jtauber commented May 28, 2019

jacobwegner commented May 28, 2019

jtauber commented May 28, 2019

make use of paragraph milestones in original XML in the styling #6

make use of paragraph milestones in original XML in the styling #6

Comments

jtauber commented Apr 5, 2019

jacobwegner commented May 28, 2019

jtauber commented May 28, 2019

jtauber commented May 28, 2019

jacobwegner commented May 28, 2019

jtauber commented May 28, 2019