Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing multiple values for Damage #67

Open
AyGhal opened this issue Dec 6, 2023 · 7 comments
Open

Allowing multiple values for Damage #67

AyGhal opened this issue Dec 6, 2023 · 7 comments

Comments

@AyGhal
Copy link
Contributor

AyGhal commented Dec 6, 2023

I know that we wanted to keep our schema based on individuals instead of libraries. Currently, we have in case of multiple libraries report a value from the merged read alignment.
However, now we are getting cases where there are both single and double stranded libraries for the same individual. We do not merge these libraries, so maybe we could allow multiple entries for this column too.

@nevrome
Copy link
Member

nevrome commented Dec 8, 2023

Ayshin and I discussed this today in the Poseidon meeting. We considered making the Damage field a list-column, but matching the individual libraries and the damage values could only be done by order across Library_Names and Damage. Multiple packages in the PCA feature damage values, but no library names. Alternatively we considered listing the minimum damage value in case of multiple libraries or avoiding the Damage field completely in case of multiple libraries.

@stschiff
Copy link
Member

stschiff commented Dec 8, 2023

Well, if you can't merge across libraries, then the logical conclusion would be that you would end up with separate Poseidon_IDs anyway, right? Like "Ind001_ds" and "Ind001_ss" or so.

If you have libraries that you can't merge, you can't put them into the same genotype calls, which ultimately determines the granularity in Poseidon, right?

I would avoid making Damage now a list field, as indeed we cannot now go back and change this for the entire PCA. Also, I think if we want to make Damage a list field per library, that argument holds for a lot of other fields too. What about genetic sex? That is theoretically a measure per library. Also, what about contamination? that currently is a list-field, but not for listing library-values, but different tools. That, too, could theoretically be a per-library measure. So I don't really know where to stop

@AyGhal
Copy link
Contributor Author

AyGhal commented Dec 8, 2023

But in this case (paper) the genotypes were merged, between ds and ss libraries.
I agree with not changing the Damge filed, both for not making changes to PCA, but also because we decided to make our fields based on individual rather than library.
That's why I suggested to put the minimum. We have the damage as an authentication of ancient DNA reads, right? so if we have to choose one value, I would go with the minimum. What do you think?

@stschiff
Copy link
Member

Ah OK. Hmm, OK, then there is no well-defined single damage rate, indeed (if the alignments were merged, one could simply recompute the damage rate of the merged alignment, but if it gets merged at the level of genotypes, that is not an option).

In this case I think indeed perhaps the conservative case would be the minimum. One could also argue for the average... not sure. Do the two libraries have different UDG-treatment?

@AyGhal
Copy link
Contributor Author

AyGhal commented Dec 14, 2023

The combinations are:
ds_nonUDG,ss_nonUDG
ds_nonUDG,ss_halfUDG
ds_halfUDG,ss_noUDG

@stschiff
Copy link
Member

stschiff commented Jan 9, 2024

Oh boy, OK... Well, then honestly I think there is no blueprint how to deal with this and I think the way you proposed (using the minimum) is perhaps fine in this case. I leave it to you, @AyGhal, to close this issue if you feel it's sufficiently clear.

@stschiff
Copy link
Member

Just a note from our Poseidon Spring Cleaning: We could really need damage values per library. Eager reports it per library, so at least from the MPI, people don't have the merged damage information. So from hindsight I'm now leaning towards @AyGhal's position here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants