-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What exactly is "Attribution" for, and what Classes need one? #192
Conversation
In particular, the proof argument role has been taken over by Conclusion::Document::AnalysisDocument. So Attribution now looks like
Contributor and confidence-level are self-explanatory (though how confidence-level values like "probably" or "possibly" make sense in any context other than some subclasses of conclusion is questionable). "Justification" replaces "proofArgument" and is what Thad has likened to a commit note. I's like FS's thinking on that expanded upon a bit. I'm in favor of tagging everything with a contributor and a date. I've no objection to versioning some things, and for those things having a short string explaining the change is useful. The list of what gets versioned, though, and how to transfer history, is highly debatable. I also wonder whether it makes sense to include a versioning scheme inside of GedcomX or if exposing an uncompressed serialization to an external version control system would make more sense. |
If it really is like reversion control - does there need to be a history of attribution objects? Right now it is keep the latest change only, it appears. Is a person edits an attributed object of someone else is it now attributed to them? Perhaps contributor need to be a list that can grow to track everyone that edits the object? |
Not keep, transfer: It's easy to imagine a use case where a group of researchers are passing GedcomX files around to keep each other in sync, and each researcher's local program is responsible for taking a record with a changed
I don't like the idea of having a list of authors/revisers without also tracking who made what revisions. Suppose I took a conclusion you had written and re-wrote it in a way that you disagree with. You wouldn't want your name to still be on the conclusion, would you? |
Clarification: I'm in favor of tagging every top-level object, i.e., everything that gets its own MIME part or ZIP psuedo-file. Tagging every element of every object would be tiresome and create bloat. |
I'm not ... I believe (maybe wrongly) that the vast majority of GEDCOM files are owned/used/created by a single researcher. They know who they are and don't need to keep recording that fact. If they reference other researchers work then this is just like any other source which (rightly) has it's own attribution. The GEDCOMX file is in itself a source and therefore will have it's own Attribution. |
We're discussing GedcomX here, not Gedcom, and GedcomX has a web services mission as well as a single-researcher file import/export one. In the online family tree use case, the whole point is collaboration with multiple researchers, and who did what when is important metadata. |
I can see a few use cases in the single-user world that time-stamps on everything would be useful. example: You find a mistake in the analysis of a source. You fix the mistake and the source analysis time-stamp is updated. Now everything dependent on that source analysis with a time-stamp before source-time-stamp is invalid til reviewed by the researcher... the re-approval by the researcher would re-time-stamp the dependent resource... |
What we seem to be coming down to here and in #198 is that for you guys every object and property can optionally have an id, a datestamp, and "Attribution" (ie ownership) - and possibly other meta-data as yet to be identified. The trouble is that the more granular you make everything the more complex it becomes. I can see something like a "Change" object coming around (datestamp and Attribution) ... and guess what .. might we then need an ID for it? Where do we stop? I don't believe we are trying to re-create a source/change control system here. That might well be wanted but would be better served by other existing software. |
Not necessarily create, just facilitate. But maybe a better approach would be to have another spec for version-controlled GedcomX where the Attribution is part of the changeset metadata rather than part of GedcomX Object (meta)data. |
Why re-invent the wheel? |
What reinvention? Is it already specified somewhere how GedcomX would interact with a version control system? |
Why does it need to be? |
I agree in with you in this case. But what if someone just spell checked, grammatically corrected, or formatted my conclusion? Or just added another supporting point... Semantically is "Suppose I took a conclusion you had written and re-wrote it in a way that you disagree with." still the same conclusion, or should it be an entirely new conclusion? Where in that case you should mark my conclusion incorrect and add your own new conclusion? Semantically someone else performing minor cleans is really the same conclusion. Does it make sense to remove the primary contributor and replacing them with the person just doing the cleanups? Do the lifetime and identity semantics of a conclusion need to be define in the specification? Perhaps:
|
I'm going to ask again ... why do we need to create a version control system rather than allowing those who want one to overlay it onto whatever system they are using? |
+1 Version control should be out side the scope of gedcom X, except to maybe the file format working well with general version control systems. |
Because existing VCS are designed for diffing human-edited text files. They mark reordering as a change. They mark whitespace changes. How many times have you spent n times longer reviewing a changeset because while making a 10-line code change the submitter did something like untabbify or delete-trailing-whitespace? Those systems do not work well with machine-generated serializations because output order is usually not deterministic unless the underlying objects make it so: A linked list or an array will always come out in the same order because it's easy and natural to start at the beginning and go to the end. Hashmaps and trees can be iterated a bunch of different ways, and a single insertion or deletion can make a big difference in iteration order for most of them. If anyone knows about an existing semantically-aware VCS, I'd love to hear about it. I agree that building one is outside the scope of GedcomX, but not that it isn't a requirement for collaborative genealogy. |
Is putting "lipstick on a pig". Your list of "what abouts" that preceded it are all handled nicely by a VCS that shows who changed what and when. Only the formatting & spelling is handled correctly by a list of contributors. However,
Offers an interesting alternative: Immutability. Add a |
The file order should be consistent then at least. |
Agreed Remove the contributor attribute from the attribution? Push that out of scope to a revision control system?
I have used systems with application specific and integrated VCS (e.g. Team based UML design tools), but never a generic semantically-aware VCS.
This is basically adding revision control to GedcomX. I think we really need the requirements / use cases for collaborative genealogy support in GEDCOM X. What information does the GedcomX need to communicate? Assuming the Attribution object is primarily to support this.
To support option 2 there could be a defined patch/diff format that includes just the changed records with external references to the records changed in the original gedcomX. |
+1 Thad? |
+1 My take on this (which I know John will disagree with) is that GEDCOMX (or rather the cohesive set of conclusions in a GEDCOM X file) is in itself a Source. Therefore it must have the same form of ownership/attribution or whatever as a source has (which should include author(s), editor(s), publisher(s), publication/edit date etc etc). If a group of people are working collaboratively then they must collaborate and hence form a cohesive conclusion which they agree upon. If not, then they are just a bunch of people over-writing each others data. If I were reading a research text by a team of people I would not expect to have each guy pop up on each paragraph to put their own stamp on it. It would make the whole thing extremely difficult to read and understand if they constantly contradicted or corrected one another. Hence I vote against having attribution objects scattered around willy nilly. My assumption here is that GEDCOM X is (or should be) still primarily focused on import/export (see #141) and not on prescribing the be-all-and-end-all data-store for all applications to use. Hence it is up to each application to work out how it manages version control. John says the order doesn't matter .... well in some applications it might matter. Ditto other 'trivial' changes - any editor will tell you that small changes can change the context and meaning of what is written. It is not for GEDCOM-X to say which of these is or is not important. It is for GEDCOM X to provide a base standard of data which 'good' genealogical applications should be able to import/export. Leave change control to the application and ensure that the GEDCOM-X file is in itself a Source with all the bells and whistles we demand of any other SourceDescription. |
I said nothing of the sort. I said that standard serialization routines for certain types of containers don't produce output in a way that lends itself to minimal (and therefore meaningful) diffs with standard programming-language-centric version control systems, and so if GedcomX is to be version-controlled then we need to specify how to make that work.
Which is why I suggested that it should be a separate spec. |
You're not advocating GedcomX as a source there, you're advocating it as a publishing medium. That's certainly a use to which Gedcom5 has been put, but I don't think that it was a primary design use-case for Gedcom and I don't think that it should be for GedcomX. |
Apologies - I evidently misinterpreted your negative comment about change control systems "They mark reordering as a change" - I assumed you were implying that reordering was not a significant change ergo I assumed order was not important to you.
Both .. being published whilst being a work-in-progress ... isn't that in-line with your collaborative working requirement? |
Because you're using "source" to mean different things. In the context of the GPS and GedcomX, a "source" is something that can be cited and contributes evidence towards a conclusion. The usual term for "creating sources" is "making stuff up", and I don't think that's what you mean when you say "I create branches (sources)". I think you mean that you create different sets of conclusions and write proof arguments for each, and use those proof arguments as guidance for further research, recursing down each until you run out of new evidence and have one hypothesis (conclusion set) that fits that evidence better than the others. |
A source is just something which can be used in evidence ... it could be a book, an archive document, an email, a letter, a photograph etc etc etc .... A source is by definition "different things" ... a GEDCOM-X file is just another type of thing/source. |
That's just the credibility of the source ... an old baptism record could have been "just made up" by the priest who forgot to record it on the day and had to try to reconstruct it to send it to his superior. This is why we always seek multiple sources to prove/add evidence. |
That too but what do I put them in? A source - that way I can use the source in other projects/conclusions etc as per normal without having to make it something else to cite from it/use it as evidence. |
You guys are awesome. Very good stuff here.
It's a timestamp.
It kind of depends on how smart the application is. If the application can tell (or have the user say) that the change is just a whitespace edit or something of insignificant semantic value, then I'd probably not change the contributor. Otherwise, I probably would.
Indeed. What if we removed If we wanted, we could add a separate section that describes how to handle the
Agreed. But I agree with @jralls that it might be appropriate as a separate specification or as an extension project. I know that a "change history" will be part of the FamilySearch Platform APIs, so it might be useful to have such a notion for import/export, too.
Me too! Sounds like a fascinating research project, if you ask me.
I think we need to hash those out here. As I mentioned earlier, FamilySearch has their own ideas about the level of granularity for I guess I thought that the requirements could start at just being able to track who edited and committed each top-level entity. Anyway, thanks again for your comments. I'm going to put together a set of changes and attach them to this issue as we hash through this. I'll start with the changes I proposed above. |
…n for attribution to the conceptual model
Changes have been attached to this issue, waiting your review, summarized as follows:
|
Hmm. It's easy to understand |
Same definition applies: attributed to the agent who made the latest significant change to the person. I would presume that modifying a name conclusion on a person would be considered significant, yes. Of course if the application wanted to keep tack of changes at the level of the |
I still don't see a use-case for it. I think at this point I agree with Sarah: Take it out. |
This issue arises specifically from #144, where I proposed that having an Attribution property in SourceReference is redundant because Conclusion, which will carry the bulk of the SourceReference objects, already has one.
The original intention of Attribution was to carry a proof argument, but the purpose seems to have morphed a bit with the major architectural changes proposed in #182. Thad ( @thomast73 ) has described it as more of a version-control structure, but also said that it needs its own issue.
This is that issue.