-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Case 6: discussion #7
Comments
I agree--if nothing else this use case needs clarification and/or narrowing. For example, there is a narrow sense in which this might be useful. For example for WCS or possibly other data reduction uses it would be possible to embed simple instructions for sequences of transformations and arithmetic functions to perform on some data in the file. But as currently written this use case read to me like stored procedures, as in a database, and that I think we want to avoid. |
No, the intention here was not to embed any executable or compiled code; its basically what Erik wrote above. The idea was that some restricted set of notation/instructions would be adopted in the standard so that some parts of the data could be algorithmically described (and generated). Libraries, regardless of actual implementation language would have to support parsing, and executing, the instruction set. |
This one does need to be handled with care though. If we allow mathematical transformations on image data, why not allow, say, virtual tables created from joins of other tables, or other such database-like operations? I don't think we should have such a requirement, but why do we privilege one type of embedded data transformation over another? I'm not sure how to write this use case in such a way that addresses that slippery slope. |
Hi, On Mon, Jan 12, 2015 at 02:20:47PM -0800, Erik Bray wrote:
I'd maintain it's a question of the type of machine required to Now, for use cases like the specification of generalised transforms Loops and function definitions are an entirely different beast. The The bottom line is that if we come up with a spec on this, we should Cheers,
|
This approach, as Brian glosses it, is similar to the approach used by the AST library. That library provides general WCS support not by listing a number of algorithms and parameters, in the style of FITS-WCS, but by implementing a collection of general transformations which can be composed to provide complicated transformations on the data (and several of which are precomposed to provide the standard WCS mappings). That manifestly works in that case, and it's easy to see how it might work for a more general case of data transformations. The transformations are specified within NDF files in (if I recall correctly; it's been a while) a not terribly readable form. One could imagine a little language which articulated them in a more naturally editable form. |
I've tried to re-write this use-case a little based on the discussion here. I've dropped the idea of generation of theory datasets from the use case and focused it more on tablular and image transformation/value generations. Please take a look and feedback. I expect we'll still need to iterate. |
In my view, use case #6 still reads more like a feature in search of a use case than a use case. It would be helpful to understand the reasons why such a feature would be important, and why it must be part of a storage file format. I understand why descriptions of coordinate transformations is essential: it allows for mapping between logical and physical coordinates without the problems that come with resampling the data. It could be done with a fixed lookup table (and HST had a history of that in some cases), but being able to tweak knobs of the transformation has proven very useful. I'm not as sold on the reasons why algorithmically-generated data must be specified in the file format, rather than as an adjunct tool or extension for that purpose. Particularly given that the file format will support the storage of structured metadata, one could store a procedure in the file that could be understood by some domain-specific tool in the future. I don't think the file format should require anything like this, as it adds significantly to the implementation burden and has the potential to create many more security holes where otherwise there would be few. |
There appears to be some confusion here still. Another attempt to explain this is that allowing simple mathematical formulae to describe the data is a good thing, if only from the standpoint of compression of large datasets. It also promotes long-term understanding of the data since you can see succinctly what the underlying formula (and perhaps scientific principle, as applicable) are behind that portion of the dataset.
I'd be all for more complex generation of data sitting in an (optional, outside of the spec) plugin which has compiled code. Where the line is between "simple mathematical formulae" and "complex generation" is another matter. |
Unless I misunderstand this use case, it proposes to allow embedding some sort
of executable code into the data format.
If that is true, I believe it the use case should be dropped, for at least the
following reasons:
(1) Security concerns: Even if you "sandbox" whatever code is executing
(which, of course, makes it more likely that the format's execution
facilities in the end will be too slow or restricted to be generally
useful), it's still going to be hard to control what apparently
innocuous files actually do (see Adobe's pain with Javascript in PDF).
(2) Ease of implementation: If we allow something like this, all
conforming implementations will have to include an interpreter for
whatever code this turns out to be. This will typically be a major
effort (or at least dependency) that's going to hurt adoption (not to
mention security concerns again). On the other hand, I've always wanted
to write a FORTH machine...
(3) Complexity considerations: As file formats are always at the "edges"
of computer systems, it's great if they are "verifiable" in some
sense (e.g., checking validity with a context free grammar). This
feature is deep, deep within the land of Turing complete languages with
all the related problems ("will this image halt?"). That's a fairly
fundamental flaw for something that sounds like a fairly exotic
application that would probably better be solved by local convention (a
pipeline manual might state: "look for the chunk labeled
'foo-execute', check for foo's signature via the foo-signature chunk,
and then just do it").
The text was updated successfully, but these errors were encountered: