Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in hdt_search #6

Open
GavinMendelGleason opened this issue Jul 24, 2018 · 13 comments
Open

Bug in hdt_search #6

GavinMendelGleason opened this issue Jul 24, 2018 · 13 comments

Comments

@GavinMendelGleason
Copy link

I have a reproducible fatal error in the hdt interface. For some reason searching with a predicate and object, returns more results than simply searching for the object (in this case), and eventually gives you bounds check error.

The turtle file to reproduce this is:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .                                            
@prefix dcog: <https://datachemist.net/ontology/dcog#> .
@prefix dcogrel: <https://datachemist.net/ontology/dcogrel#> .
@prefix prov: <http://www.w3.org/ns/prov#> .                                                       
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .                                       
@prefix ipg: <https://datachemist.net/ontology/ipg#> .
@prefix note: <https://datachemist.net/uks/annotation/> .
@prefix inst: <https://datachemist.net/uks/candidate/> .
@prefix uksupply: <https://datachemist.net/uks/ontology/uk_supply_chain#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

uksupply:Supplier a owl:Class;
  rdfs:comment "A supplier relationship"@en ;
  rdfs:subClassOf dcogrel:Relationship.  

uksupply:IndustryType a owl:Class; 
  rdfs:subClassOf dcog:Entity ;
  rdfs:comment "A type of industry"@en .

uksupply:supplier_source a owl:ObjectProperty ; 
  rdfs:comment "Source of a supplier relationship"@en ;
  rdfs:domain uksupply:Supplier ;
  rdfs:range uksupply:IndustryType . 

uksupply:supplier_target a owl:ObjectProperty ; 
  rdfs:comment "Target of a supplier relationship"@en ;
  rdfs:domain uksupply:Supplier ;
  rdfs:range uksupply:IndustryType . 

uksupply:supplier_value a owl:DatatypeProperty ;
  rdfs:comment "Value of a supplier relationship"@en ;
  rdfs:range  xsd:float ;
  rdfs:domain uksupply:Supplier .

And the following interaction demonstrates the problem:

?- hdt_open(HDT,'/home/me/mo.hdt',[]), hdt_search_id(HDT,X,Y,1).
HDT = <hdt>(0x28d5860),
X = Y, Y = 4 ;
HDT = <hdt>(0x28d5860),
X = 5,
Y = 4 ;
false.

?- hdt_open(HDT,'/home/me/mo.hdt',[]), hdt_search_id(HDT,X,3,1).
HDT = <hdt>(0x2969930),
X = 4 ;
HDT = <hdt>(0x2969930),
X = 5 ;
HDT = <hdt>(0x2969930),
X = 2 ;
HDT = <hdt>(0x2969930),
X = 4 ;
HDT = <hdt>(0x2969930),
X = 5 ;
HDT = <hdt>(0x2969930),
X = 6 ;
HDT = <hdt>(0x2969930),
X = 3 ;
HDT = <hdt>(0x2969930),
X = 1 ;
HDT = <hdt>(0x2969930),
X = 4 ;
HDT = <hdt>(0x2969930),
X = 5 ;
HDT = <hdt>(0x2969930),
X = 6 ;
HDT = <hdt>(0x2969930),
X = 6 ;
HDT = <hdt>(0x2969930),
X = 1 ;
HDT = <hdt>(0x2969930),
X = 3 ;
HDT = <hdt>(0x2969930),
X = 6 ;
HDT = <hdt>(0x2969930),
X = 4 ;
HDT = <hdt>(0x2969930),
X = 5 ;
HDT = <hdt>(0x2969930),
X = 1 ;
HDT = <hdt>(0x2969930),
X = 3 ;
terminate called after throwing an instance of 'std::runtime_error'
  what():  Trying to get an element bigger than the array.

Process prolog aborted (core dumped)

The bug can also be tickled using hdt_search instead of searching by id:

?- hdt_open(HDT,'/home/me/mo.hdt',[]), , hdt:hdt_search(HDT,X, 'http://www.w3.org/2000/01/rdf-schema#domain','https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType').
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Nonsense' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2103250),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
terminate called after throwing an instance of 'std::runtime_error'
  what():  Trying to get an element bigger than the array.

Process prolog aborted (core dumped)

I've used the prolog hdt library on multi-gig databases including millions of nodes with no problem, so this one has me stumped and quite surprised to find a failure on such a small example.

@JanWielemaker
Copy link
Owner

Thanks. I think this is more likely to be an issue with the HTD library itself than the interface. Can you try updating the HDT submodule?

@wouterbeek
Copy link
Collaborator

wouterbeek commented Jul 24, 2018

I cannot reproduce this: the query ?s http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType gives no results when I run it (which is the correct answer for the above data file).

I also notice that some solutions contain terms that do not appear in the data file at all, e..g, https://datachemist.net/uks/ontology/uk_supply_chain#Nonsense.

@GavinMendelGleason
Copy link
Author

Sorry, I had added the triple uksupply:Nonsense uksupply:Nonsense uksupply:Nonsense after I posted the ontology, but both the original and the altered version crash in much the same way. I've also replicated this on two machines (both running Ubuntu).

I'm updating the hdt library and testing again.

@GavinMendelGleason
Copy link
Author

Using the exact ontology that I gave above:

$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.7.15-9-g67e40a7)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit http://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- use_module(library(hdt)).
true.

?- hdt_open(HDT,'/home/me/mo.hdt',[]), hdt_search(HDT,X, 'http://www.w3.org/2000/01/rdf-schema#domain','https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType').
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType' ;
HDT = <hdt>(0x2380500),
X = 'https://datachemist.net/uks/ontology/uk_supply_chain#Supplier' ;
terminate called after throwing an instance of 'std::runtime_error'
  what():  Trying to get an element bigger than the array.
Aborted (core dumped)

The version # given by hdtSearch in the source tree for the prolog hdt library:

$ cd ~/lib/swipl/pack/hdt/hdt-cpp/hdt-lib/tools
$ ./hdtSearch -V
v1.1.2

Note that when I build from hdt separately and use hdtSearch to do the query I get the expected results:

$ hdtSearch ~/mo.hdt
>> ? http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
0 results in 145 us

The version # given by hdtSearch compiled separately:

$ hdtSearch -V
v1.1.2

If I use the version in the pack I get:

$ ~/lib/swipl/pack/hdt/hdt-cpp/hdt-lib/tools/hdtSearch mo.hdt
>> ? http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#Supplier http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#Supplier http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#Supplier http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_value http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_source http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#supplier_target http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
https://datachemist.net/uks/ontology/uk_supply_chain#Supplier http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
Trying to get an element bigger than the array.

When I create an hdt file using rdf2hdt from the swipl pack I get a different binary file from when I use the rdf2hdt compiled separtely. But in both cases, the search works with the externally compiled hdtSearch on hdt files create from rdf2hdt by both the swipl and externally compiled version.

Similarly search fails using the swipl compiled version for both versions of rdf2hdt.

For this reason it doesn't seem to be merely a version incompatibility but something going wrong with the version being compiled in the pack. Could it have to do with -fPIC or something along those lines? I can have another go at a clean recompile but I wonder are there any suggestions about what I might try.

@GavinMendelGleason
Copy link
Author

Sure enough, compiling the hdt pack without -fPIC results in an hdtSearch binary which works correctly (but obviously you can't use the pack).

Any ideas on how to solve this?

@JanWielemaker
Copy link
Owner

What do you mean with correctly? In the log above I see Trying to get an element bigger than the array., which seems to be be same exception as Prolog gets. Only, Prolog doesn't catch the runtime error and crashes. Sure we can make it produce a normal Prolog exception, but this still seems a bug in the HDT, no? I can reproduce the above with the hdtSearch compiled as part of the pack.

If this is all correct, the bug is in the core HDT rather than the Prolog interface. Does it depend on -fPIC? Didn't try. It shouldn't, unless there is a bug in the C++ toolchain or a bug in the C++ code that causes undefined behavior which happens to work fine in one case and not in the other.

I'd first consider updating hdt-lib as used by the pack. I'm a bit unsure which version to pick though. @wouterbeek is more knowledgeable here.

@GavinMendelGleason
Copy link
Author

What do you mean with correctly?

$ hdtSearch ~/mo.hdt
>> ? http://www.w3.org/2000/01/rdf-schema#domain https://datachemist.net/uks/ontology/uk_supply_chain#IndustryType
0 results in 145 us

The correct result is no results. If I compile hdt separately, using hdtSearch I get no results, rather than a series of incorrect results followed by a crash.

I have now definitely established that -fPIC changes the behaviour to make it incorrect, mimicking the incorrect behaviour I was getting in prolog. If I remove -fPIC the tools work correctly (though obviously not the prolog interface), and if I put it back they do not. I've reported this to hdt-lib.

Presumably this means I can write the package to be statically compiled against swipl as a workaround?

@wouterbeek
Copy link
Collaborator

Thanks for the additional information and MWE. I hope I can take a detailed look at it this evening.

@JanWielemaker
Copy link
Owner

Thanks. I think there are some sensible things to try: (1) a different version of hdt-lib, possibly the bug is already fixed and anyway, developers like fixing the latest version, (2) run the broken hdtSearch under valgrind, (3) try different optimization flags for C++ and (4) check the C++ toolchain.

I did (2). Doesn't show anything suspicious. For (4) I use gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 from Ubuntu 18.04.

Using a non-relocatable version of libhdt is hard and only works on some platforms, depending on CPU and OS. Should work on ADM64/Linux. You'd have to remove the -fPIC, change the pack building to produce a static library, change Prolog to use configure --enable-onefile (recent versions), add the static HDT lib to the libs you use for linking Prolog and call the installation function as part of the (modified) Prolog startup or by creating a very simple .so file and merely calls the install function. You should be quite desperate before going this way.

I'd start trying (1).

@GavinMendelGleason
Copy link
Author

(1) I've pulled the latest from the devel branch and it still occurs.

(2) valgrind shows memory leaks for the -fPIC version and none for the version without.

(3) I've played around with a few combinations here. I'll try more systematically.

(4) My gcc version is 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)

I guess if you guys can't reproduce it, I'd better try to upgrade gcc.

@JanWielemaker
Copy link
Owner

Thanks for all the searching! (2) is interesting and might give a hint about what is going wrong. As I can reproduce on Ubuntu 18.04 using gcc 7.3.0, upgrading GCC isn't promising. In theory, there should simply not be an observable difference between -fPIC and not using this. If there is (as here), there are two options:

  • The GCC toolchain is somewhere broken
  • The application is broken in the sense that it exhibits undefined (e.g., random) behaviour that
    happens to trigger now, for example due to a different memory layout.

Not sure how to proceed. As a work-around you could still try different -O levels as well as other GCC flags that may affect this. The HDT people can possibly use the valgrind differences (which shouldn't be there) to locate the issue.

Oh. you could consider compiling with clang. That at least gives us a new data point. SWI-Prolog compiles fine with clang as this is the default compiler for MacOS. In theory, only compiling the HDT library using clang should work.

@GavinMendelGleason
Copy link
Author

Sorry, it turns out I was checking out an old commit. The latest develop branch does fix the problem. The structure of the develop branch tree has changed somewhat significantly, so some changes needed to be made to the build process to make the swipl hdt pack compile.

I've forked the repository and it now seems to build cleanly in my fork.

https://github.com/GavinMendelGleason/hdt

I'm probably not using best practices, but it's working. Maybe you can fold changes back in in the appropriate way.

@GavinMendelGleason
Copy link
Author

And thanks to both of you for all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants