Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF parser randomly skips components #123

Open
b-gehrke opened this issue Feb 18, 2025 · 2 comments
Open

RDF parser randomly skips components #123

b-gehrke opened this issue Feb 18, 2025 · 2 comments
Labels
API This concerns horned as a library bug Something isn't working

Comments

@b-gehrke
Copy link
Contributor

When trying to parse the OpenEnergyOntology (https://openenergy-platform.org/ontology/oeo) the RDF parser skips over ~80 components and returns a different set of components each time. I.e. the following test fails and diff is different on each run. Unfortunately, I couldn't make out a pattern yet.

        let file = File::open("oeo.owl").unwrap();
        let mut f = BufReader::new(file);

        let o1 = match read(&mut f, Default::default()) {
            Ok((o, _)) => { SetOntology::<RcStr>::from(o)}
            Err(_) => { assert!(false); panic!() }
        }.into_iter().collect::<HashSet<_>>();

        f.seek(SeekFrom::Start(0));

        let o2 = match read(&mut f, Default::default()) {
            Ok((o, _)) => { SetOntology::<RcStr>::from(o) }
            Err(_) => { assert!(false); panic!() }
        }.into_iter().collect::<HashSet<_>>();

        let mut diff: Vec<_> = o1.difference(&o2).collect();
        diff.sort();

        for x in &diff {
            dbg!(x);
        }

        assert!(diff.is_empty())

The error does seem to be related to the OEO; other (large) ontologies in RDF/XML do not produce this error. Also, other parsers don't produce this error.

@filippodebortoli filippodebortoli added bug Something isn't working API This concerns horned as a library labels Feb 18, 2025
@filippodebortoli
Copy link
Collaborator

@b-gehrke are the skipped components always the same, or do they also differ from run to run? If they were always the same, do they belong to a particular class of components, i.e. all annotations or all axioms?

By looking through OEO, I see that e.g. at line 51873 there is the following axiom:

    <rdf:Description>
        <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#AllDisjointClasses"/>
        <owl:members rdf:parseType="Collection">
            <rdf:Description rdf:about="http://openenergy-platform.org/ontology/oeo/OEO_00000074"/>
            <rdf:Description rdf:about="http://openenergy-platform.org/ontology/oeo/OEO_00000263"/>
            <rdf:Description rdf:about="http://openenergy-platform.org/ontology/oeo/OEO_00000292"/>
            <rdf:Description rdf:about="http://openenergy-platform.org/ontology/oeo/OEO_00010215"/>
            <rdf:Description rdf:about="http://openenergy-platform.org/ontology/oeo/OEO_00140160"/>
        </owl:members>
    </rdf:Description>

Could you verify whether this axiom gets skipped or if it is parsed properly?
My hypothesis is that the parser is not handling axioms declared in this way correctly.

@b-gehrke
Copy link
Contributor Author

The skipped axioms are not always the same (e.g. the SubClassAxiom in line 9217 is sometimes parsed, sometimes not). I attached a list of three different runs which contain the missing axioms (compared the XML version).
The DisjointClasses axiom you mentioned are indeed always missing. Additionally, EquivalentClasses axioms and some trivial declarations are often not included (e.g. line 341, declaration of annotation property "IAO:0000115").

run 1.txt
run 2.txt
run 3.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API This concerns horned as a library bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants