Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require explicit file format version in each manifest #253

Merged
merged 1 commit into from
Jun 4, 2013

Conversation

stoicflame
Copy link
Member

The file format spec https://github.com/FamilySearch/gedcomx/blob/master/specifications/file-format-specification.md says:

  • X-DC-conformsTo, used to identify the specification(s) this file conforms to. Conformance to http://gedcomx.org/file/v1 is assumed for all GEDCOM X files.

Rather than making that assumption, I'd think that the standard should require that the file format version be specified in every manifest. Otherwise rolling out v2, etc, will be hard, since no one will be checking version numbers when they open files.

Side note: visiting the url http://gedcomx.org/file/v1 fails for me after redirecting to an https version, with "SSL connection error". I'm not sure exactly what is expected to be there, and when.

@mikkelee
Copy link

+1 to requiring version; if backwards incompatible changes are introduced at a later date, it will cause much less damage if implementations already reject versions that higher than they implement. Perhaps some language similar to "it is RECOMMENDED that implementations conforming to this specification reject (or just warn about?) files with higher version numbers"

Regarding the URI, I believe that's covered by #87

@stoicflame
Copy link
Member

I'm okay with requiring a conformance statement. Proposed changes are attached. See 381db7c and let me know if that's what you were thinking.

@nealmcb
Copy link
Author

nealmcb commented May 28, 2013

Thanks - that's better.
I have a notion that it would also be good to provide advice for implementations about how to deal with backwards-compatible updates to the standard, and perhaps have a versioning practice to identify them. I haven't looked for good advice in that regard but I bet it is out there and may be at someone's fingertips. E.g. something like saying that implementations can expect that they should be able to handle versions with only a change to the minor version number.
Alternatively, or in addition, it may make sense to do like what many standards do and define a flag or some metadata to flag certain elements/attributes/who-knows-what as something that an implementation MUST be able to understand, and allow them to just ignore other elements/etc which they don't understand.

E.g. the X.509 public key certificate standard includes a "critical" indication for each extension, as described by Wikipedia (http://en.wikipedia.org/wiki/X.509):

Each extension has its own id, expressed as Object identifier, which is a set of values, together with either a critical or non-critical indication. A certificate-using system MUST reject the certificate if it encounters a critical extension that it does not recognize, or a critical extension that contains information that it cannot process. A non-critical extension MAY be ignored if it is not recognized, but MUST be processed if it is recognized. [1]

@jdsumsion
Copy link
Contributor

The closest thing that comes to mind is semver.org.

But those guidelines would probably need to be adapted for how they apply
to parsability of data at rest.

John...
On May 28, 2013 4:59 PM, "Neal McBurnett" [email protected] wrote:

Thanks - that's better.
I have a notion that it would also be good to provide advice for
implementations about how to deal with backwards-compatible updates to the
standard, and perhaps have a versioning practice to identify them. I
haven't looked for good advice in that regard but I bet it is out there and
may be at someone's fingertips. E.g. something like saying that
implementations can expect that they should be able to handle versions with
only a change to the minor version number.
Alternatively, or in addition, it may make sense to do like what many
standards do and define a flag or some metadata to flag certain
elements/attributes/who-knows-what as something that an inplementation MUST
be able to understand, and allow them to just ignore other elements/etc
which they don't understand.

E.g. the X.509 public key certificate standard includes a "critical"
indication for each extension, as described by Wikipedia (
http://en.wikipedia.org/wiki/X.509):

Each extension has its own id, expressed as Object identifier, which is a
set of values, together with either a critical or non-critical indication.
A certificate-using system MUST reject the certificate if it encounters a
critical extension that it does not recognize, or a critical extension that
contains information that it cannot process. A non-critical extension MAY
be ignored if it is not recognized, but MUST be processed if it is
recognized. [1]


Reply to this email directly or view it on GitHubhttps://github.com//pull/253#issuecomment-18585905
.

@stoicflame
Copy link
Member

Yeah, there's also SOAP's mustUnderstand header.

I've seen these kinds of things attempted a handful of times in the industry, but my sense is that they've never been very useful. It seems like the standards that are most successful get entrenched pretty deeply and nobody wants to do anything that might break other implementations, so they stick to the pieces that are well-known. And the standards that are not very successful generally get replaced by a different (more popular) thing altogether, i.e. nobody wants to go create a "version 2".

SOAP seems to be a reasonable example. Those who use SOAP want to stick with the "core" and go out of their way to not use the mustUnderstand header. Those who hate SOAP have no interest in seeing version 2, and instead go with REST or some other architecture.

In summary, I'd rather not bother with semantic versioning for this case. Let's just be good with the required header, suggesting implementations fail if it's not there.

@mikkelee
Copy link

In summary, I'd rather not bother with semantic versioning for this case. Let's just be good with the required header, suggesting implementations fail if it's not there.

+1, though I'd like a recommendation that implentations explicitly reject versions they don't know, just to be safe.

Edit: A confirmation-requiring user-facing warning might be okay if it's clear that data loss is inevitable, but I'd hate for someone to lose data because an app didn't warn them.

@stoicflame
Copy link
Member

though I'd like a recommendation that implentations explicitly reject versions they don't know, just to be safe.

Ummm... What do you mean? Are you saying that if there is any conformsTo header that processors do not understand that they should fail processing? If that's what you mean, then I would object. I have no problem with a file conforming to some (other) specification, as long as it conforms to the file-v1 specification too.

The problems come when files do not conform to the file-v1 specification, in which case the REQUIRED conformsTo header will be missing and processors should fail.

I don't think any other language is needed.

@stoicflame
Copy link
Member

Woops. I missed your edit. Sorry about that.

A confirmation-requiring user-facing warning might be okay if it's clear that data loss is inevitable, but I'd hate for someone to lose data because an app didn't warn them.

That's fine if implementations want to do put a warning in a user's face, but I don't have any interest in suggesting that in the formal spec. If implementations want to risk data loss by conforming to specifications outside the scope of the core specification set, then let the data exporters take that risk and throw a warning in the face of a user as needed.

@nealmcb
Copy link
Author

nealmcb commented May 31, 2013

I certainly hope the standard evolves over time as the surrounding ecosystem and technology evolves. Even if the spec stakeholders (see #196) don't come out with future specs, we can expect that implementors will want to extend it. So there should be ways to indicate that.

Different use cases will have different requirements in terms of version handling and notices to users, so the spec shouldn't have implementation requirements for warnings etc.

A related question is what an implementation should do if it encounters an element which is not defined in the indicated spec. It could just ignore it, sort of like HTML, or it could throw an error. Is that what you're talking about in the previous comment? I'm not sure exactly what you mean by "data exporters", since it seems that the importer is the one who should know whether data will be lost.

To support implementations and users, I still think that defining how version numbers will work and trying to clarify what a minor version update vs a major one would mean should be done at some point. We may not know now or want to put a stake in the ground yet, but both users and implementers will benefit from it at least by the time any future releases or extensions arrive, and really would benefit from it as they decide how to handle and report version numbers and errors or warnings with their first implementation.

@stoicflame
Copy link
Member

So there should be ways to indicate that.

Agreed. I think the conformsTo header is adequate for those purposes.

I still think that defining how version numbers will work and trying to clarify what a minor version update vs a major one would mean should be done at some point.

We've got support for "major version". It's the conformsTo header. These changes make a v1 conformsTo header REQUIRED, so if the major version isn't what implementers are expecting, they'll throw an error.

The only thing I think you're asking for (that we don't have right now) is support for providing a "minor version". What I need to know is why do we need it? Just for informational purposes? To communicate to implementers that they may want to update their libraries? Help me out; I just don't see it yet.

@nealmcb
Copy link
Author

nealmcb commented May 31, 2013

I'm suggesting perhaps naming the initial standard something like "http://gedcomx.org/file/v1.0" ("major"."minor") and clarifying that a minor number change might, e.g., only add new elements.
Than, if an implementation written for 1.0 runs across a file that conforms to "http://gedcomx.org/file/v1.1" I'm suggesting it might expect that it could parse it fine and might just find stuff it doesn't know how to deal with (e.g. genetic SNP data) and if so it could tell the user that some information might be lost in transit. But if it sees elements conformant with 1.0 might assume it could handle them just fine.
If it runs across a http://gedcomx.org/file/v2.0 file, it might refuse to read it, assuming major things have changed.

@stoicflame
Copy link
Member

I don't like implementing it by changing the identifier (URI). It's so much easier to just provide a different header (e.g. "X-minorVersion" or something) than it is to explain how to parse a URI in such a way that you can "parse out" the minor version. In the end, you just want to get a minor version. Why not make it explicit?

And I think we could do that, but I still don't appreciate the value in it. You mention that a minor version might hint that an implementation could hint to a user that some data might be lost (e.g. genetic SNP data). But I'm suggesting that an implementation could do the same thing by noticing that the file "conforms to" specifications that it doesn't recognize.

For example, headers for a file that contains genetic SNP data might look like this:

...
X-DC-conformsTo: http://gedcomx.org/file/v1
X-DC-conformsTo: http://gedcomx.org/snp/v1
...

This implies that the file conforms to the v1 file format AND it conforms to some future genetic SNP specification

@stoicflame stoicflame merged commit 381db7c into master Jun 4, 2013
@stoicflame stoicflame deleted the conformance-header-required branch June 4, 2013 19:28
@nealmcb
Copy link
Author

nealmcb commented Jun 5, 2013

Thanks for clarifying all that. After reading and thinking some more I realize I should really probably be focused more on evolution and extensibility of the Serialization Format than of the file format. The file format already allows any mime type to be present, and defaults to the mime type application/x-gedcomx-v1+xml

In turn, that MIME type is defined by the GEDCOM X XML Serialization Format

It is a bit unclear to me how the questions I've raised here about evolution and extensions and new elements are dealt with by that standard.

In particular, what must/should an implementation should do if it encounters an element which is not defined in the indicated XML spec? E.g. it could just ignore it, sort of like HTML, or it could throw an error. How would xml related to a new Data Type for genetic traits be incorporated either by someone extending the conceptual model or by the stakeholders wanting to evolve the model in that direction?

And would a new MIME type be necessary?

(Perhaps these questions would be better in a new issue, and I can create one if you like.)

@stoicflame
Copy link
Member

You have fair questions. To that end, we provided the Extensibility section of the conceptual model and corresponding sections in the XML and JSON specifications.

We're open to suggestions on providing extra clarity there as needed. I'd suggesting handling that as a separate issue(s).

@nealmcb
Copy link
Author

nealmcb commented Jun 5, 2013

Cool! Just what I was originally looking for. I just hadn't read far enough.

I think it would be useful to have an "Extensibility" section of the file format which both notes that any MIME type can be included, and also notes these other extensibility sections you've pointed out. Given all that, I would join you in hoping that the file spec itself is flexible enough to handle lots of requirements for a while, and the main action will be in the data type specs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants