Sortinfo #25

yfaria · 2021-07-02T17:47:30Z

This PR solves #23 creating the notion of "sort info" across all semantic representations. It represents the morphosemantic information that is extracted when analyzing a sentence. It also tackles the naming problem in #21. Now, the it exchanged

Of the three semantic representations right now, only two of them have XML serialization. DMRS DTD calls this information of "sortinfo", but MRS DTD (which is referenced here with a newer description of the XML serialization of MRSs which include ICONS) treat this as "extra pairs". The name of the resource can be changed later or specific names can be put in each representation.

This PR also puts more info on the first level of verbosity, addressing #17. Right now, we have

$ delphin profile-to-rdf -v --to dmrs ../erg/trunk/tsdb/gold/mrs/
WARNING:delphin.cli.profile_to_rdf:Converting 107 analysis of 107 sentences from ../erg/trunk/tsdb/gold/mrs/
WARNING:delphin.cli.profile_to_rdf:Loading the profile
WARNING:delphin.cli.profile_to_rdf:Converting the profile
WARNING:delphin.cli.profile_to_rdf:Result 0 of sentence 951 is not well formed
WARNING:delphin.cli.profile_to_rdf:Serializing results to output.ttl
WARNING:delphin.cli.profile_to_rdf:DONE

The name of the log message (WARNING) shall be changed later.

arademaker · 2021-07-02T20:19:07Z

In the discussion #21 (comment) my second alternative, the one that I prefer, was:

http://ibm.com/sick/b/33/4
http://ibm.com/sick/b/33/4#node-10012
http://ibm.com/sick/b/33/4#link-8
http://ibm.com/sick/b/33/4#predicate-10012

This would mean that 4 is a 'document' in collection 33, part of a bigger collection b in the sick collection. Note that after the # we don't have a type, but an identifier of a part of the prefix http://ibm.com/sick/b/33/4. Adding an extra identifier for the whole 'document' may work, but it is strange. It is like a section with just one paragraph or a paragraph with just one sentence. The http://ibm.com/sick/b/33/4#dmrs would say we are talking about the DMRS #dmrs from 4 that, itself, is the DMRS we are talking about. If the DMRS is #dmrs we also lose the patterns of identifiers after the hash. This would be a single #dmrs not followed by any -XXXX where XXXX is a number. We could make http://ibm.com/sick/b/33/4#dmrs-0, but it is a little misleading since we will never have a #dmrs-1. The only justification that I can think of for having a hash URI for the DMRS node itself, is the fact that when we implement the #24, we will need an extra URI to the named graph which will hold the triples from that DMRS. But in that case, with named graphs, we can eventually remove the URI and the hasNode and hasLink edges since all the remain triples will be already grouped into a single named graph called http://ibm.com/sick/b/33/4

A possible alternative would be to consider 4 itself as a part of the 'document' 33:

http://ibm.com/sick/b/33#4

and all other parts of 44 would have to be adapted to:

http://ibm.com/sick/b/33#4-node-10012
http://ibm.com/sick/b/33#4-link-8
http://ibm.com/sick/b/33#4-predicate-10012

or

http://ibm.com/sick/b/33#node-4-10012
http://ibm.com/sick/b/33#link-4-8
http://ibm.com/sick/b/33#predicate-4-10012

Of course, one can also think that b itself is a document... Two arguments to make each result a document instead of the item/profile itself are 1) readings are not related except the fact of begin readings of the same sentence; 2) size, a profile can be very big and a sentence may have thousands of readings.

We don't have # after a /.

arademaker · 2021-07-02T20:22:09Z

Since all nodes have only one predicate, we don't need #node-10012-predicate, #predicate-10012 is enough. The predicates will borrow the ids from their nodes.

yfaria · 2021-07-02T21:09:21Z

The # after / was a typo even though it would compose a valid URI, https://www.w3.org/TR/n-quads/#simple-triples v.g..

The #dmrs is not really necessary even thinking about graphs. One can interpret that it means that the resource being represented is the DMRS created from the result 4 of the item 33 of the specific profile as there are other information generated from the profile, such as the derivation tree. It would also let us store different representations in the same place without worrying about conflicting node names even though it's not something we are looking for.

arademaker · 2021-07-03T00:48:08Z

version was not updated to 1.0.2

yfaria added 7 commits July 2, 2021 13:25

changing logging for more info on verbosity

f7c0a7a

changing URI formation to flatten it

6e4a150

adding sortinfo node in transformations

2d03e5c

correcting typo

7bceaf2

adding notion of sortinfo in the rdf schema

4e5eecd

correcting a comment

374bb02

making a comment more precise and fixing a typo

2855ab4

removing redundancy in predicate and sortinfo URIs

219e607

arademaker merged commit cc27e6b into master Jul 3, 2021

arademaker deleted the sortinfo branch July 3, 2021 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sortinfo #25

Sortinfo #25

yfaria commented Jul 2, 2021 •

edited

Loading

arademaker commented Jul 2, 2021

arademaker commented Jul 2, 2021

yfaria commented Jul 2, 2021

arademaker commented Jul 3, 2021

Sortinfo #25

Sortinfo #25

Conversation

yfaria commented Jul 2, 2021 • edited Loading

arademaker commented Jul 2, 2021

arademaker commented Jul 2, 2021

yfaria commented Jul 2, 2021

arademaker commented Jul 3, 2021

yfaria commented Jul 2, 2021 •

edited

Loading