-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input stream as source #2
Comments
Naive solution proposed here RMLio/rmlmapper-java#34 However, my suggestion is to "shift" the problem to the RML processor allowing the user to define a custom logic to process an rml:source of the logical sources defined in the mapping file and build an abstraction to let the processor access the records. For example, in the case of the RMLMapper, defining a custom AccessFactory and/or Access. As an example, in Chimera we implemented an alternative AccessFactory that can access InputStreams but can also bypass the rml:source defined and apply the mappings to the entire message body https://github.com/cefriel/chimera/tree/master/chimera-rml/src/main/java/com/cefriel/chimera/rml). Generally speaking, I think it is important to let users define mappings that are not bound to specific sources (e.g. to a specific file name). |
@DylanVanAssche is this already allowed with the LogicalSource spec? |
FYI: CARML supports this with an extension of |
Spec is now available at https://github.com/kg-construct/rml-target-source-spec.
Great! This would be allowed in the Sources & Targets because CARML's extension is just a different part for My suggestion: we need to discuss to have an extension to something like the DataIO spec to define how a stream should look like in the mapping rules. |
Relevant ontology: https://github.com/streamreasoning/vois Which stream sources do we want to support?
Characteristics to consider:
Please comment here which one definitely need to be supported of the sources & characteristics. |
I have been investigating this further and I think we can just 'extend' the Web of Things description :) They have already binding templates for:
My suggestion is to re-use Web of Things descriptions and define binding templates for other streams we may want as listed above. This way, we don't need to define this vocabulary ourselves, we re-use standardized specifications and are open for the future. |
Could you give an example what this would look like for a simple input stream/pipe of data passed at runtime? |
There are currently no binding templates for a pipe, but I don't see any requirements here that are specific to a pipe compared to the others, so we don't need a binding template. We only need a proper URL with a schema defining a pipe such as
At least this works fine under Linux where everything is a file, I'm curious how these things are handled on Windows & Mac. |
That honestly looks quite complex only to specify the location of a source. Don't you think?
Right, usually you would use an input stream for file IO, but it can also be used as an intermediary stream from one internally running process to another, in which case we're not dealing with files. So it would indeed be good to have some way of saying that the user will provide the source to the engine running the mapping as @marioscrock suggested in #2 (comment) |
Hmmmmm you're right. WoT also allows security to be integrated there, but is not applicable here.
Ah you mean this Java
If we want something 'simpler' we would need to define an ontology first (if we cannot re-use something existing). |
Am I right in understanding that currently directly using a (JAVA) inputstream is not possible with your suggested change @DylanVanAssche ? Of course, I guess this is very implementation-specific. @pmaria , in your experience, do you think there's another way to specify this? I guess it's tricky to support |
@DylanVanAssche I'm currently looking into this again. It is still unclear to me how we would now write a source for this. |
Could you please specify what is unclear to you here? I'm not sure we're on the same page. |
The use case is that you somehow obtain an input stream (I'm using Java terminology here, but in Python an example would be something like BytesIO ) and you want to be able to use that as source. In java most data processing libraries that are used for the reference formulations accept input streams as a source of input. It is also relatively easy to serialize some object into a form of that matches a supported source type and convert that to an input stream. This makes it possible to easily integrate an RML processor as a part of a pipeline using any type of ETL approach. This is where I see a lot of users of CARML using https://github.com/carml/carml#input-stream-extension, for lack of a standard approach. |
IMO, you only need to reference then that inputstream A is linked to Logical Source X with Target that points to inputstream A. Ain't this enough then?
When you call your RML engine in your code and pass through the inputstreams you have a Map that says Or am I missing something crucial here? |
Well I am still familiarizing myself with the
Does not really seem to fit this use case. Furthermore I don't know if we can consider |
The W3C Web Of Things are made for streams, but the ontology seems to be a bit challenging to grasp, like this here.
Well I consider this InputStream also yet-another-data-format which needs to be described in a RML IO Registry description. We could also introduce another class like with did for relative paths, but we need to decide then which approach we take and describe it. RML IO itself in an abstract form, has no issues with streams IMO. |
Thinking a bit further on this, I think what we really need here is just a way to indicate that a source is provided to the engine programmatically. Similar to what @marioscrock describes in #2 (comment). This could be very basic, and maybe all that we need extra for this would be to be able to name the provided source. So something like [] rml:logicalSource [
rml:source [
a rml:ProvidedSource ;
rml:name "some identifying name" ; # optional?
] ;
] ; An engine can then expose an API to be able to provide this source. Furthermore, this opens the possibility to be able to support not only input-stream-like sources, but also other objects, like e.g. an already deserialized JSON node or XML node etc. |
Well that's what I wanted to try with the example above. We're on the same page it seems :)
+1 |
OK. I think the hypermedia/wot vocabs don't really fit for this purpose. So how do you feel about |
+1 for drafting something up :) |
Ok, I will make a PR |
issue: not possible to describe an input stream as a source
suggestion: add support for describing input streams. This will facilitate the usage of RML in transformation pipelines.
The text was updated successfully, but these errors were encountered: