Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Markdown generation with pyLODE examples.ttl #155

Closed
orieg opened this issue Nov 10, 2021 · 6 comments
Closed

Broken Markdown generation with pyLODE examples.ttl #155

orieg opened this issue Nov 10, 2021 · 6 comments

Comments

@orieg
Copy link

orieg commented Nov 10, 2021

pyLODE version: 2.12.0
Issue Reproduction Turtle File: examples.ttl

Two problems occurs when generating the pyLODE examples.ttl file:

  1. In HTML and Markdown output format, the dcterms:creator and dcterms:publisher will render as "None" when schema.org is using https @prefix sdo: <https://schema.org/> .. A workaround is to convert to @prefix sdo: <http://schema.org/> . which is not correct according to #121. Also, this workaround will prevent the sdo:codeRepository to work and completely fail the rendering with the error IndexError: list index out of range. You will need to define yet another schema prefix using https to get around it.
  2. The Markdown rendering of the Classes section is failing by using HTML <li> tags at the beginning which breaks the rest of the rendering. This does not occur with the following Object properties section.

Sample output of incorrect rendering

* **Publisher(s)**
  * None
* **Creators(s)**
  * [None](http://orcid.org/0000-0002-8742-7730)
    [[0000-0002-8742-7730](http://orcid.org/0000-0002-8742-7730)]
## Classes
<li>[Creature](#Creature)</li>
<li>[Fish](#Fish)</li>
<li>[Fish food](#Fishfood)</li>
<li>[Food](#Food)</li>

Expected output

* **Publisher(s)**
  * [https://linked.data.gov.au/org/surround](https://linked.data.gov.au/org/surround)
* **Creators(s)**
  * [Nicholas J. Car](http://orcid.org/0000-0002-8742-7730)
    [[0000-0002-8742-7730](http://orcid.org/0000-0002-8742-7730)]
## Classes
[Creature](#Creature),
[Fish](#Fish),
[Fish food](#Fishfood),
[Food](#Food),

Sample code to reproduce:

Using the latest examples.ttl from the official repo, you can use the below code.

f="examples.ttl"
h = MakeDocco(input_data_file=f)
h.document(destination=f.replace(".ttl", ".html"))
h = MakeDocco(input_data_file=f, outputformat="md")
h.document(destination=f.replace(".ttl", ".md"))
nicholascar added a commit that referenced this issue Nov 23, 2021
@nicholascar
Copy link
Member

Thanks for picking this up @orieg. I think I've fixed this issue: a Publisher, or any other Agent, will have a name generated from its URL if no actual name is given. So the HTML for the examples.ttl example should now have this in it <a href="https://linked.data.gov.au/org/surround">surround</a>. I't not perfect but if an Agent URI is supplied with no label, than this is better than either None or <a href="https://linked.data.gov.au/org/surround">[surround](https://linked.data.gov.au/org/surround)</a>!

Let me know if you see this working in your copy of master and, if so, I'll close this Issue

@hdelva
Copy link

hdelva commented Nov 30, 2021

I think there's a different root issue @nicholascar, rdflib seems to have switched from using https://schema.org to http://schema.org with version 6.0. That in itself it a dubious change, but something that should be raised in the rdflib repo itself.

Unfortunately this also means that ontologies that used to be formatted just fine with previous versions are now broken. All the checks in OntDoc._make_agent such as

     def _make_agent(self, agent_node):
        name = None
        url = None
        for p, o in self.G.predicate_objects(subject=agent_node):
            if p in [FOAF.homepage, SDO.identifier]:
                url = str(o)
            elif p in [FOAF.name, SDO.name]:
                name = str(o)

Are now trying to match the http variants of the URIs.

I think we could add some more statement to the _expand_graph to add the http variants of all schema.org identifiers, so that ontologies can use both.

@orieg
Copy link
Author

orieg commented Nov 30, 2021

I haven't tested @nicholascar update yet. Though, I would agree with @hdelva that the root problem seems to be a bit more "deep". For example:

Note that base.py in pyLODE import SDO from RDFLib (which use http namespace) and redefine SDO in the BaseProfile._make_schemaorg_metadata using the https namespace.

@nicholascar
Copy link
Member

It does look like there's a problem with upstream RDFlib. It should indeed be https://schema.org, not http://schema.org, so I will put in a PR now to update RDFlib and then pyLODE will pick that up.

@nicholascar
Copy link
Member

The schema.org HTTPS issue was fixed in RDFlib the other day and I've fixed the Markdown list issue. I've updated the example file that showed the issues too.

@nicholascar
Copy link
Member

So it's fixed in master branch but not in a release yet. I will make a release in a day or two after fixing a few other things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants