please come up with "overlay" versioning scheme (e.g. in ds 201) #9

yarikoptic · 2016-09-15T02:06:19Z

https://openfmri.org/dataset/ds000201/
release 1.0.1 is a bugfix release and a new tarball was provided for "Metadata, demographics, survey, questionnaire, eye tracking, and non-imaging data (387 MB)". But the other tarballs were not uploaded for release 1.0.1, which is somewhat logical since they didn't change.

My concern is how could I (well, software) decide to have a next version to be an overlay (i.e. take old files and only replace updated ones) or a new release which might indeed have some similarly named tarballs removed. I see possible e.g. releases changing the last component (e.g. here from .0 to .1) are such overlay releases and I should assume that whatever tarball was present for previous one, is still present or replaced in the current one
Not sure if I have coded for such logic already, and not sure if there would be no glitches (eg. in some datasets some even minor inconsistency in filename could introduce "difficulties", e.g. hypothetically having "ds201_R1.0.0_dwi.tar" and "ds201_R1.0.1_dwis.tar")

But it would be nice to come up with some consistent and "standard" convention (should also be explained somewhere on the website)

vsoch · 2016-09-15T03:40:07Z

I'm not sure exactly how the workflow is on the backend, but given that the tarballs are served on s3 that limits the versioning to the (current) file naming method, which of course also matches BIDS. Maybe an idea is to integrate some form of "real" version control, because you could easily use tags and releases for a simple file with the md5 sum and file name of the tarballs for the release. That could (eventually) be a sort of quasi "openfmri-data" hub, where the user would start with the version defined in the file, and then use some base path (in this case the folder on aws) to retrieve the correct files for the version. Just a thought! :)

yarikoptic · 2016-09-15T03:40:35Z

ah... I also see that 157 also has the same situation (1.0.2 which updated only some tarballs).
So -- should I only consider 3rd digit increment as 'overlay'? I see only ds 3 which had increment in 2nd number but there it was obvious since only one tarball went from _raw to a verioned one..

yarikoptic · 2016-09-15T03:42:43Z

as for versioning -- just give a try to git-annex/datalad ;) (see e.g. http://datasets.datalad.org/?dir=/openfmri ; datalad install ///openfmri/ds000001 for a try).
in my current case I just want to reach some 'standard' agreement on what versions will have 'overlays' and which would come as 'complete', so any disappearing (not listed) tarball would be 'intensional'

vsoch · 2016-09-15T03:45:32Z

imho I don't think the version number on the file name is enough given that "old version" files are brought forward with new versions - this scheme would only work given that all "old version" files are copied and provided with the new version, so there is never any doubt. If you do my suggestion above, you would also likely want:

a dev command line tool to make it easy to select included files and generate the log, and then push to a github branch for a PR
the continuous integration testing would need to check the md5 sum against the files and verify they exist at some base
the tool should then make it easy to update the site
the user should then be able to select some version (based on the github release) and download the appropriate files

I've never used datalab but it looks cool!

yarikoptic · 2016-09-15T03:47:33Z

datalad (just like git, but a good one, i.e. a lad) ;)

I guess I will reread your answers tomorrow morning with a cleared mind to see if we aren't talking over each other heads ;) Thanks and Cheers!

yarikoptic · 2016-09-15T13:53:53Z

"simple file with the md5 sum and file name of the tarballs for the release" -- yes, that would be nice, since it would define it unambigously. But it would require user to look into that file, or, ideally, openfmri web frontend use it to present a view of "files of current release" or smth like that. So all in all it would require some development...
But I think "not all is lost" with how things are already setup, and I am just asking for some formalization of workflow. I will have a brief look later at what tarballs are provided:

$> git submodule foreach git ls-tree incoming
Entering 'ds000001'
040000 tree 53e962d5ab62ec6106e91291767cda58ec0d1caf    .datalad
100644 blob a6009f4392d2433656aa83ba285c90fa25821a31    .gitattributes
100644 blob 7ac454fbdca3c28cc50afe5440f40838810dae9b    changelog.txt
120000 blob 912e3153ddddfb8791b571c4a0175511a5707243    ds001_R1.1.0_raw.tgz
120000 blob 44b106c72a44ec8700e687d342dc3ea5b8dd1252    ds001_R2.0.0_raw.tgz
120000 blob 3465a356d86b4901015ec2814cddb402d9c61092    ds001_raw.tgz
Entering 'ds000002'
040000 tree e88517df6de36565c0ee8fca48e19335acc308d7    .datalad
100644 blob a6009f4392d2433656aa83ba285c90fa25821a31    .gitattributes
100644 blob eb41b053fcb4922422fcf16d893a943fc8907f14    changelog.txt
120000 blob 3692d181d7727475055860d32e9d73027c58b61a    ds002_raw.tgz
Entering 'ds000003'
040000 tree 7f1b959b15078502ca8cdeaff07dbf4c99799988    .datalad
100644 blob a6009f4392d2433656aa83ba285c90fa25821a31    .gitattributes
100644 blob a41909cf5a65af564af5301ec0e71c13c5166948    changelog.txt
120000 blob a79f6976293288eb2dc2794d6a341ef2b6a73fa7    ds003_R1.1.0_raw.tgz
120000 blob b8d579660c742b04c99e53b7933d1b58515d458e    ds003_raw.tgz
...

(full list is at http://www.onerussian.com/tmp/openfmri-incoming-20160915.txt, hopefully matches what is avail from s3 archives/ didn't check atm)

yarikoptic · 2016-09-15T15:57:41Z

somewhat demo example question: ds 17A -- there were first ds017A_models.tgz ds017A_raw.tgz and then ds017A_R1.1.0_raw.tgz . Assuming that "non-versioned" release was an 1.0.0 release (I think we agreed on that some time ago). I have 2nd digit change. But are the _models still applicable or not to the 1.1.0 release? I guess not, since I think (I can only guess since changelog file includes dates but not release numbers) that 1.1.0 was the one fixing orientation in the nifti files, thus invalidating majority if not any derived data as well. so it seems that not doing overlay for this one (and it was the 2nd digit boost) would be the correct behavior, right?

poldrack · 2016-09-15T15:59:52Z

IIRC, models were solely based on pre-bids data

On Sep 15, 2016, at 8:57 AM, Yaroslav Halchenko [email protected] wrote:

somewhat demo example question: ds 17A -- there were first ds017A_models.tgz ds017A_raw.tgz and then ds017A_R1.1.0_raw.tgz . Assuming that "non-versioned" release was an 1.0.0 release (I think we agreed on that some time ago). I have 2nd digit change. But are the _models still applicable or not to the 1.1.0 release? I guess not, since I think (I can only guess since changelog file includes dates but not release numbers) that 1.1.0 was the one fixing orientation in the nifti files, thus invalidating majority if not any derived data as well. so it seems that not doing overlay for this one (and it was the 2nd digit boost) would be the correct behavior, right?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #9 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AA1KkBW3G6IJkQm4H7CLBAG0uFq4MQyMks5qqWr1gaJpZM4J9bmW.

—
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

yarikoptic · 2016-09-15T16:06:50Z

and 1.1.0 for that is also pre-BIDS -- I am just dealing with datasets already released on openfmri

Re versioning -- easy to grasp semantically would be naming convention as "major.minor.patch", where the ".patch" (3rd level) would assume an overlay to patch previous "major.minor.patch-1" state (at the level of tarball names, so if there is 1.1.0_blah.tar.gz and 1.1.1_blah.tar.gz, I would simply not consider 1.1.0 version for the "1.1.1" release, thus if any file got removed within 1.1.1_blah.tar.gz, then it will be removed from the 1.1.1 release. so it is not an overlay as "I extract all the tarballs on top of each other and just extending the content"... sorry for confusing language but may be you get my point

yarikoptic · 2016-09-17T19:05:14Z

another 'interesting" dataset is ds 9:

$> ls -lL
total 24427496
-rw-r--r-- 1 yoh datalad        207 Mar 31 01:01 changelog.txt
-r--r--r-- 1 yoh datalad 6003651730 Feb 21  2016 ds009_R1.1.0_raw.tgz
-r--r--r-- 1 yoh datalad 4007257776 Mar 25 22:39 ds009_R2.0.0_01-17.tgz
-r--r--r-- 1 yoh datalad 2364753676 Mar 25 22:51 ds009_R2.0.0_18-29.tgz
-r--r--r-- 1 yoh datalad  124894379 Mar 25 22:52 ds009_R2.0.0_toplevel_metadata.tgz
-r--r--r-- 1 yoh datalad 4007258231 May  8 15:49 ds009_R2.0.1_01-17.tgz
-r--r--r-- 1 yoh datalad 2364753689 May  8 15:51 ds009_R2.0.1_18-29.tgz
-r--r--r-- 1 yoh datalad  124894223 May  8 15:52 ds009_R2.0.1_metadata_derivatives.tgz
-r--r--r-- 1 yoh datalad 6016271853 Apr  9  2015 ds009_raw.tgz

where the suffix has changed from toplevel_metadata to metadata_derivatives, so it is not clear (just by looking at filenames) if it is a new file added while relying on previous release (2.0.0) providing differently named file, or updated a differently named previous file.

yarikoptic · 2016-09-17T19:09:06Z

BTW -- looking at the API (which is nice!) https://openfmri.org/dataset/api/ds000201/ - 1.0.1 release lists only the "overlay/patch" (changed) file and not any other files which still apply to 1.0.1 release from 1.0.0 release @chrisfilo ?

chrisgorgo · 2016-09-17T19:29:24Z

This is the wrong repo to report those bugs - in the future please use https://github.com/poldracklab/open_fmri
@suyashdb @jbwexler could you look into this?

yarikoptic · 2016-09-17T19:31:32Z

oh, ok -- then please move #10 there as well.

jbwexler · 2016-09-18T04:32:49Z

We are currently trying to decide on practices for the workflow that will be used consistently, which will hopefully help with machine readability. One change we are planning to make is, for each new revision, to make copies of all the unaltered files from the previous revision, and rename these according to the new revision. See ds117: https://openfmri.org/dataset/ds000117/
Would this solve the issue?

Though even if this solves the problem from now on, there is still the issue with ds201, ds157, and others that had revisions added before we decided on this practice. I'm not sure we want to alter old revisions--what do people think?

poldrack · 2016-09-18T14:42:43Z

cc’ing Chris on this to make sure he is following this thread…

On Sep 17, 2016, at 9:32 PM, jbwexler [email protected] wrote:

We are currently trying to decide on practices for the workflow that will be used consistently, which will hopefully help with machine readability. One change we are planning to make is, for each new revision, to make copies of all the unaltered files from the previous revision, and rename these according to the new revision. See ds117: https://openfmri.org/dataset/ds000117/ https://openfmri.org/dataset/ds000117/
Would this solve the issue?

Though even if this solves the problem from now on, there is still the issue with ds201, ds157, and others that had revisions added before we decided on this practice. I'm not sure we want to alter old revisions--what do people think?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #9 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AA1KkBVNnpaDTWVBSXGtG63n1-JwWFvLks5qrL7xgaJpZM4J9bmW.

—
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

suyashdb · 2016-09-18T16:35:42Z

Hello Joe,
Lets get the versioning finalized and documented. One other thought i have

if we choose not to update old datasets versioning, we should document
all previously used flavors of versioning somewhere on website for users to
understand. What do you think?

-suyash

On Sun, Sep 18, 2016 at 7:42 AM, Russ Poldrack [email protected]
wrote:

cc’ing Chris on this to make sure he is following this thread…

On Sep 17, 2016, at 9:32 PM, jbwexler [email protected] wrote:

We are currently trying to decide on practices for the workflow that
will be used consistently, which will hopefully help with machine
readability. One change we are planning to make is, for each new revision,
to make copies of all the unaltered files from the previous revision, and
rename these according to the new revision. See ds117:
https://openfmri.org/dataset/ds000117/ <https://openfmri.org/dataset/
ds000117/>
Would this solve the issue?

Though even if this solves the problem from now on, there is still the
issue with ds201, ds157, and others that had revisions added before we
decided on this practice. I'm not sure we want to alter old revisions--what
do people think?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <
https://github.com/poldrack/openfmri/issues/9#issuecomment-247824861>, or
mute the thread <https://github.com/notifications/unsubscribe-auth/
AA1KkBVNnpaDTWVBSXGtG63n1-JwWFvLks5qrL7xgaJpZM4J9bmW>.

—
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKotn0lGTbtHKW2SfT7Wh9hEUlx9zqvYks5qrU3kgaJpZM4J9bmW
.

poldrack · 2016-09-18T16:50:53Z

agreed
rp

On Sep 18, 2016, at 9:35 AM, Suyash [email protected] wrote:

Hello Joe,
Lets get the versioning finalized and documented. One other thought i have

if we choose not to update old datasets versioning, we should document
all previously used flavors of versioning somewhere on website for users to
understand. What do you think?

-suyash

On Sun, Sep 18, 2016 at 7:42 AM, Russ Poldrack [email protected]
wrote:

cc’ing Chris on this to make sure he is following this thread…

On Sep 17, 2016, at 9:32 PM, jbwexler [email protected] wrote:

We are currently trying to decide on practices for the workflow that
will be used consistently, which will hopefully help with machine
readability. One change we are planning to make is, for each new revision,
to make copies of all the unaltered files from the previous revision, and
rename these according to the new revision. See ds117:
https://openfmri.org/dataset/ds000117/ <https://openfmri.org/dataset/
ds000117/>
Would this solve the issue?

Though even if this solves the problem from now on, there is still the
issue with ds201, ds157, and others that had revisions added before we
decided on this practice. I'm not sure we want to alter old revisions--what
do people think?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <
https://github.com/poldrack/openfmri/issues/9#issuecomment-247824861>, or
mute the thread <https://github.com/notifications/unsubscribe-auth/
AA1KkBVNnpaDTWVBSXGtG63n1-JwWFvLks5qrL7xgaJpZM4J9bmW>.

—
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKotn0lGTbtHKW2SfT7Wh9hEUlx9zqvYks5qrU3kgaJpZM4J9bmW
.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #9 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AA1KkGwr91QJ2vQLk1cL1o6m85ZXgOAFks5qrWhegaJpZM4J9bmW.

—
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Bldg. 420, Jordan Hall
Stanford University
Stanford, CA 94305

[email protected]
http://www.poldracklab.org/

yarikoptic mentioned this issue Sep 15, 2016

ds000201 -- missing , after a name in dataset_description.json #6

Closed

yarikoptic mentioned this issue Sep 15, 2016

need to implement overlays in handling versioned files datalad/datalad#831

Closed

yarikoptic mentioned this issue Sep 17, 2016

investigate and make use of openfmri API datalad/datalad#849

Closed

2 tasks

vsoch mentioned this issue Sep 18, 2016

please come up with "overlay" versioning scheme (e.g. in ds 201) #9 poldracklab/open_fmri#26

Open

yarikoptic mentioned this issue Jan 11, 2018

Flag incorrect dataset contents datalad/datalad#2069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

please come up with "overlay" versioning scheme (e.g. in ds 201) #9

please come up with "overlay" versioning scheme (e.g. in ds 201) #9

yarikoptic commented Sep 15, 2016

vsoch commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 15, 2016 •

edited

Loading

vsoch commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

poldrack commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 17, 2016

yarikoptic commented Sep 17, 2016 •

edited

Loading

chrisgorgo commented Sep 17, 2016

yarikoptic commented Sep 17, 2016

jbwexler commented Sep 18, 2016

poldrack commented Sep 18, 2016

suyashdb commented Sep 18, 2016

poldrack commented Sep 18, 2016

please come up with "overlay" versioning scheme (e.g. in ds 201) #9

please come up with "overlay" versioning scheme (e.g. in ds 201) #9

Comments

yarikoptic commented Sep 15, 2016

vsoch commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 15, 2016 • edited Loading

vsoch commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

poldrack commented Sep 15, 2016

yarikoptic commented Sep 15, 2016

yarikoptic commented Sep 17, 2016

yarikoptic commented Sep 17, 2016 • edited Loading

chrisgorgo commented Sep 17, 2016

yarikoptic commented Sep 17, 2016

jbwexler commented Sep 18, 2016

poldrack commented Sep 18, 2016

suyashdb commented Sep 18, 2016

poldrack commented Sep 18, 2016

yarikoptic commented Sep 15, 2016 •

edited

Loading

yarikoptic commented Sep 17, 2016 •

edited

Loading