-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation notes #113
Comments
+1 |
I support this, yes. |
kg-construct/rml-io#41 is again such a specific thing, looks more to me like an implementation note than an actual test-case? |
I agree. Or only rely on Postgres for that test case |
Given that we are a CG, everythingn is a draft. That being said, I do not disagree with specifying in more details a few reference formulations but these can be just examples of how potential reference formulations may look like. |
I don't agree that the description of reference formulations should be just examples. I think that clearly defining the reference formulations is essential. In r2rml the only reference formulation was SQL, and r2rml defined several aspects of it.
These aspects should be clearly defined for any reference formulation that is introduced for RML. IMHO it would therefor be best to have a note per reference formulation where these can be described. |
Well, we can decide with the entire CG if we agree on limiting the Reference Formulations that RML can accept. In my opinion and how RML was designed so far: RML deliberately left the Reference Formulations unspecified so anyone can define its own Reference Formulation. In this sense, if we specify now some Reference Formulations, it should be examples of such Reference Formulations and RML should not be restricted to these Reference Formulations. One should be able to define any Reference Formulation desired. |
I think there is a bit of a misunderstanding. There is non intention to limit the accepted reference formulations. Only to clearly define those that are already mentioned in the specs and test cases, and for which we have definitions in the ontology.
Agreed. This issue has no intention to change that. The intention of this issue is to define what needs to be defined and described for any reference formulation to be properly handled, and to have a place where we put those definitions. Otherwise we risk that every implementation handles the same reference formulation in a different way. Since we already have several reference formulations that are broadly in use, the proposal is to define each of these as a note. |
My 2 cents here: we already 'enforce' specific behavior for reference formulations we use in the spec, ontology, and test cases. In the test cases we already have 'defined' what happens with a certain source + reference formulation implicitly. This note / notes is more to make this implicitly thing more explicit so developers do not have to read other implementations and interpret the output of each test case to know how a given reference formulation behaves. |
These are indeed examples of reference formulations. We can include a few but we cannot produce Notes. From the W3C types of documents: "A W3C Draft Note is a document produced by a W3C Working Group, a W3C Interest Group, the Advisory Board (AB), or the W3C Technical Architecture Group (TAG)." As a CG we publish a report. If specific Reference Formulations come in the report and we use this report for the WG, then the Reference Formulations will become part of the candidate recommendation. Even if we include them as notes, then these are notes for the RML-IO and not RML-core as RML-core is independent of reference formulations. @DylanVanAssche how do we do that? The test cases in RML-core are independent of reference formulations. In RML-core we are in a situation where we have already retrieved the data and we have key-value pairs with which we deal. There is (or should not be) anything in RML-core that is reference-formulation-dependent. |
Note is maybe not the right wording given the W3C definitions. With 'note' here is meant that it could be a document with examples and how a reference formulation is supposed to work. See it as a set of guidelines for implementations. Not a hard requirement they MUST follow, but more like a SHOULD as seen from good practice. The same assumptions about the reference formulations documented in there are also made for the output of the test cases.
That's definitely not the case currently, we depend on CSV (more abstract: tabular) there (if we move away all other data formats as proposed in an issue) but we still require implementations to interpret a CSV reference formulation as going over each row to correctly generate the triples/quads. If RML-Core was truly independent, no Logical Source may appear there in the test cases, but then the test cases are no integration tests as they are now. We cannot use an 'abstract' reference in RML-Core's test cases because at this point it is always tight to some data source defined by the Logical Source. At this point, this 'iterate over each row' is implicitly defined through test cases that assume such behavior. In R2RML this is hard defined in the spec:
So this is actually mentioned for R2RML implementations, but not RML implementations. R2RML implementations now know they have to follow a row-based iteration model for RDBs as it is clearly mentioned in the spec. Where we put our 'guidelines' on this matter is of course a point of discussion.
That's what RML-Core is supposed to be yes, but the test cases do not reflect this. How to improve this is a hard question as the references in |
IMO these descriptions will be more than a SHOULD, so then maybe it should these should also be reports and the W3C process can decide how to label it later on. As for the test cases: I see the test cases as functional tests, not as unit tests. My proposal would be to:
|
Yeah you can also do more 'enforcement' here, I first want to have a proper agreement on the rest before deciding the level of enforcement.
That's possible as well, but that contradicts with 'key-value' pairs IMO.
I prefer a tabular data source here for Core because it aligns better with R2RML making the transition less difficult. If we want adoption from R2RML implementations, Core should be as easy as possible to implement.
+1 |
just a comment, we also noticed that TC0002a-JSON fails because the "expected" result is not respecting natural data type mapping |
It's done right? @DylanVanAssche, any action here? |
rml-io-registry repo is created but the actual task of creating these documents did not happen yet. |
but then is this a rml-io-registry issue? |
Yes, but that didn't exist back then when the issue was created. This is one of the issues that triggered the creation of the registry. |
When fixing RML test-cases across modules, we noticed that most modules need some additional notes regarding implementation details. The modules describe properly what something is and how it looks like. However, implementations do not know how to use it. Examples of implementation details we can describe in a separate note:
CC: @pmaria @chrdebru
Actions:
Let us know what you think!
The text was updated successfully, but these errors were encountered: