Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AUDR analysis updates RFC #91

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

justincc
Copy link

@justincc justincc commented Jul 26, 2019

August 12: Last call for community review
August 23: Last day for oversight review

During community review, reviewers

  • Discussed requirements around updating/deleting files in envelopes through ingest api as a part of the process of updating analysis bundles
  • Small updates to wording and titles

Copy link
Contributor

@diekhans diekhans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good; clear and to the point. images are great. just minor updates requested.

rfcs/text/0000-audr-analysis-updates.md Show resolved Hide resolved
6. For files that are in both envelopes but don’t need updating, do nothing.

![](../images/0000-update-file-relationships.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to describe that updated files are not really updated but deleted and then an independent new file is created and why. Also that this is a short term compromise to simplify implementation at the expense of a more complex data mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears this is no longer the case and the figure just needs updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an updated analysis the figure tries to show the possible scenarios of a) not changing a file, b) updating a file, c) adding a file, d) deleting a file. But it sounds like this isn't easy to see and I should improve the diagram?

On the content, yes on further consideration deleting and re-adding all the other files seems a worse idea since it means some special case work and I'm not sure it ultimately helps analysis much. This is a change from what was discussed so I need feedback from @rexwangcc and @samanehsan.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincc can you clarify why the old files need to be deleted in order to add analysis files that have new uuids?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samanehsan Because everything in the previous version of that bundle will automatically carry forward.

Copy link

@samanehsan samanehsan Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm that 2 analyses triggering in quick succession is a scenario that's not sufficiently theoretical for us to ignore temporarily?

@justincc I don't think the scenario is unrealistic because it could happen if a primary submitter makes/updates a submission even an hour or two apart. Looking at this production workflow as an example, after analysis creates a submission envelope in ingest it takes 2 hours to upload the files and confirm the submission. So if an update came in less than 2 hours later, we would have this "simultaneous" submission problem.

Also in this case, the confirm submission step only took 4 minutes but the workflow can wait up to 2 hours for the submission to validate. And then there is additional time for the submission to go through ingest and appear as a results bundle in the data store. So in the worst-case scenario, "quick succession" actually means 2+ hours.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincc, this is the concern that @rexwangcc and I mentioned in tech-arch today ^

Copy link
Author

@justincc justincc Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the answer here is that ingest may need to do complex engineering to ensure non-interference between submissions. Does that address the issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what that will look like in ingest? Is the idea that analysis would still specify which files we think are new/updated/deleted and then ingest confirms? That seems like we would be duplicating effort.

Copy link
Author

@justincc justincc Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment there's no plan for ingest to confirm anything, it just does what you say or some known interpretation thereof. If it's an update then the new update (which is given as the file, not a delta) will replace whatever the old version was. If it's addition of a file that already exists then it could just replace what is already there if this is the easiest behaviour. Or if it's a deletion of something that's already deleted then we could ignore that if this is the easiest behaviour.

@parthshahva parthshahva self-requested a review July 26, 2019 20:28
rfcs/text/0000-audr-analysis-updates.md Outdated Show resolved Hide resolved
rfcs/text/0000-audr-analysis-updates.md Outdated Show resolved Hide resolved
mweiden added a commit that referenced this pull request Jul 30, 2019
Many people are confused about usage here:
* #91 (comment)
* #90 (comment)
* ... have observed others
mweiden added a commit that referenced this pull request Jul 31, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* #91 (comment)
* #90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
rfcs/text/0000-audr-analysis-updates.md Outdated Show resolved Hide resolved
6. For files that are in both envelopes but don’t need updating, do nothing.

![](../images/0000-update-file-relationships.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincc I'm a little confused as to why the old files from the original analysis bundle need to be deleted. Why would you copy forward files from the old analysis bundle to the new analysis bundle? Shouldn't the copy forward be from the new primary bundle to the new analysis bundle?

Copy link
Contributor

@diekhans diekhans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

diekhans pushed a commit to barkasn/dcp-community that referenced this pull request Aug 16, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* HumanCellAtlas#91 (comment)
* HumanCellAtlas#90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
@samanehsan
Copy link

@justincc there's an "rfc-paused" label you can add as well

diekhans pushed a commit to diekhans/dcp-community that referenced this pull request Oct 31, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* HumanCellAtlas#91 (comment)
* HumanCellAtlas#90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
@hannes-ucsc hannes-ucsc removed their request for review May 4, 2021 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants