-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing multiple values for Damage #67
Comments
Ayshin and I discussed this today in the Poseidon meeting. We considered making the |
Well, if you can't merge across libraries, then the logical conclusion would be that you would end up with separate Poseidon_IDs anyway, right? Like "Ind001_ds" and "Ind001_ss" or so. If you have libraries that you can't merge, you can't put them into the same genotype calls, which ultimately determines the granularity in Poseidon, right? I would avoid making Damage now a list field, as indeed we cannot now go back and change this for the entire PCA. Also, I think if we want to make Damage a list field per library, that argument holds for a lot of other fields too. What about genetic sex? That is theoretically a measure per library. Also, what about contamination? that currently is a list-field, but not for listing library-values, but different tools. That, too, could theoretically be a per-library measure. So I don't really know where to stop |
But in this case (paper) the genotypes were merged, between ds and ss libraries. |
Ah OK. Hmm, OK, then there is no well-defined single damage rate, indeed (if the alignments were merged, one could simply recompute the damage rate of the merged alignment, but if it gets merged at the level of genotypes, that is not an option). In this case I think indeed perhaps the conservative case would be the minimum. One could also argue for the average... not sure. Do the two libraries have different UDG-treatment? |
The combinations are: |
Oh boy, OK... Well, then honestly I think there is no blueprint how to deal with this and I think the way you proposed (using the minimum) is perhaps fine in this case. I leave it to you, @AyGhal, to close this issue if you feel it's sufficiently clear. |
Just a note from our Poseidon Spring Cleaning: We could really need damage values per library. Eager reports it per library, so at least from the MPI, people don't have the merged damage information. So from hindsight I'm now leaning towards @AyGhal's position here. |
I know that we wanted to keep our schema based on individuals instead of libraries. Currently, we have in case of multiple libraries report a value from the merged read alignment.
However, now we are getting cases where there are both single and double stranded libraries for the same individual. We do not merge these libraries, so maybe we could allow multiple entries for this column too.
The text was updated successfully, but these errors were encountered: