Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AUDR analysis updates RFC #91
base: master
Are you sure you want to change the base?
AUDR analysis updates RFC #91
Changes from 1 commit
ac0c746
03f4982
efd2049
bcab6a3
7a66798
2fb651c
5b5c1d8
a282f1e
f0ea700
30ed794
c9dbd65
713bf7b
df2acd3
26ea3ce
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to describe that updated files are not really updated but deleted and then an independent new file is created and why. Also that this is a short term compromise to simplify implementation at the expense of a more complex data mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears this is no longer the case and the figure just needs updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an updated analysis the figure tries to show the possible scenarios of a) not changing a file, b) updating a file, c) adding a file, d) deleting a file. But it sounds like this isn't easy to see and I should improve the diagram?
On the content, yes on further consideration deleting and re-adding all the other files seems a worse idea since it means some special case work and I'm not sure it ultimately helps analysis much. This is a change from what was discussed so I need feedback from @rexwangcc and @samanehsan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justincc can you clarify why the old files need to be deleted in order to add analysis files that have new uuids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samanehsan Because everything in the previous version of that bundle will automatically carry forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justincc I don't think the scenario is unrealistic because it could happen if a primary submitter makes/updates a submission even an hour or two apart. Looking at this production workflow as an example, after analysis creates a submission envelope in ingest it takes 2 hours to upload the files and confirm the submission. So if an update came in less than 2 hours later, we would have this "simultaneous" submission problem.
Also in this case, the confirm submission step only took 4 minutes but the workflow can wait up to 2 hours for the submission to validate. And then there is additional time for the submission to go through ingest and appear as a results bundle in the data store. So in the worst-case scenario, "quick succession" actually means 2+ hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justincc, this is the concern that @rexwangcc and I mentioned in tech-arch today ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the answer here is that ingest may need to do complex engineering to ensure non-interference between submissions. Does that address the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what that will look like in ingest? Is the idea that analysis would still specify which files we think are new/updated/deleted and then ingest confirms? That seems like we would be duplicating effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment there's no plan for ingest to confirm anything, it just does what you say or some known interpretation thereof. If it's an update then the new update (which is given as the file, not a delta) will replace whatever the old version was. If it's addition of a file that already exists then it could just replace what is already there if this is the easiest behaviour. Or if it's a deletion of something that's already deleted then we could ignore that if this is the easiest behaviour.