Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sortinfo #25

Merged
merged 8 commits into from
Jul 3, 2021
Merged

Sortinfo #25

merged 8 commits into from
Jul 3, 2021

Conversation

yfaria
Copy link
Contributor

@yfaria yfaria commented Jul 2, 2021

This PR solves #23 creating the notion of "sort info" across all semantic representations. It represents the morphosemantic information that is extracted when analyzing a sentence. It also tackles the naming problem in #21. Now, the it exchanged

  1. http://ibm.com/sick/b/33/4/dmrsi#dmrs to http://ibm.com/sick/b/result-33-4#mrs
  2. http://ibm.com/sick/b/33/4/nodes/10012 to http://ibm.com/sick/b/33/4#node-10012
  3. http://ibm.com/sick/b/33/4/links/8 to http://ibm.com/sick/b/33/4#link-8
  4. http://ibm.com/sick/b/33/4/nodes/10014#predicate to http://ibm.com/sick/b/33/4#node-10012-predicate

Of the three semantic representations right now, only two of them have XML serialization. DMRS DTD calls this information of "sortinfo", but MRS DTD (which is referenced here with a newer description of the XML serialization of MRSs which include ICONS) treat this as "extra pairs". The name of the resource can be changed later or specific names can be put in each representation.

This PR also puts more info on the first level of verbosity, addressing #17. Right now, we have

$ delphin profile-to-rdf -v --to dmrs ../erg/trunk/tsdb/gold/mrs/
WARNING:delphin.cli.profile_to_rdf:Converting 107 analysis of 107 sentences from ../erg/trunk/tsdb/gold/mrs/
WARNING:delphin.cli.profile_to_rdf:Loading the profile
WARNING:delphin.cli.profile_to_rdf:Converting the profile
WARNING:delphin.cli.profile_to_rdf:Result 0 of sentence 951 is not well formed
WARNING:delphin.cli.profile_to_rdf:Serializing results to output.ttl
WARNING:delphin.cli.profile_to_rdf:DONE

The name of the log message (WARNING) shall be changed later.

@arademaker
Copy link
Member

In the discussion #21 (comment) my second alternative, the one that I prefer, was:

http://ibm.com/sick/b/33/4
http://ibm.com/sick/b/33/4#node-10012
http://ibm.com/sick/b/33/4#link-8
http://ibm.com/sick/b/33/4#predicate-10012

This would mean that 4 is a 'document' in collection 33, part of a bigger collection b in the sick collection. Note that after the # we don't have a type, but an identifier of a part of the prefix http://ibm.com/sick/b/33/4. Adding an extra identifier for the whole 'document' may work, but it is strange. It is like a section with just one paragraph or a paragraph with just one sentence. The http://ibm.com/sick/b/33/4#dmrs would say we are talking about the DMRS #dmrs from 4 that, itself, is the DMRS we are talking about. If the DMRS is #dmrs we also lose the patterns of identifiers after the hash. This would be a single #dmrs not followed by any -XXXX where XXXX is a number. We could make http://ibm.com/sick/b/33/4#dmrs-0, but it is a little misleading since we will never have a #dmrs-1. The only justification that I can think of for having a hash URI for the DMRS node itself, is the fact that when we implement the #24, we will need an extra URI to the named graph which will hold the triples from that DMRS. But in that case, with named graphs, we can eventually remove the URI and the hasNode and hasLink edges since all the remain triples will be already grouped into a single named graph called http://ibm.com/sick/b/33/4

A possible alternative would be to consider 4 itself as a part of the 'document' 33:

http://ibm.com/sick/b/33#4

and all other parts of 44 would have to be adapted to:

http://ibm.com/sick/b/33#4-node-10012
http://ibm.com/sick/b/33#4-link-8
http://ibm.com/sick/b/33#4-predicate-10012

or

http://ibm.com/sick/b/33#node-4-10012
http://ibm.com/sick/b/33#link-4-8
http://ibm.com/sick/b/33#predicate-4-10012

Of course, one can also think that b itself is a document... Two arguments to make each result a document instead of the item/profile itself are 1) readings are not related except the fact of begin readings of the same sentence; 2) size, a profile can be very big and a sentence may have thousands of readings.

We don't have # after a /.

@arademaker
Copy link
Member

Since all nodes have only one predicate, we don't need #node-10012-predicate, #predicate-10012 is enough. The predicates will borrow the ids from their nodes.

@yfaria
Copy link
Contributor Author

yfaria commented Jul 2, 2021

The # after / was a typo even though it would compose a valid URI, https://www.w3.org/TR/n-quads/#simple-triples v.g..

The #dmrs is not really necessary even thinking about graphs. One can interpret that it means that the resource being represented is the DMRS created from the result 4 of the item 33 of the specific profile as there are other information generated from the profile, such as the derivation tree. It would also let us store different representations in the same place without worrying about conflicting node names even though it's not something we are looking for.

@arademaker arademaker merged commit cc27e6b into master Jul 3, 2021
@arademaker arademaker deleted the sortinfo branch July 3, 2021 00:47
@arademaker
Copy link
Member

version was not updated to 1.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants