Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Rbiom unifrac implementation instead of runUnifrac #523

Merged
merged 62 commits into from
Jun 10, 2024

Conversation

thpral
Copy link
Collaborator

@thpral thpral commented Apr 18, 2024

This PR is related to issue #336.
It replaces the function runUnifrac with the unifrac implementation from rbiom.

The rbiom implementation does not require to aggregate the community matrix based on nodeLab, thus the calculateUnifrac method for TreeSummarizedExperiment becomes simpler.

I have deleted the arguments transposed, BPPARAM, and normalized for now because they were used by runUnifrac and they are not handled by rbiom::unifrac. It may be possible to handle transposed while using the rbiom implementation though.

When running the following code on branch devel and this branch, I obtain the same unifrac values.

data("esophagus")
data("GlobalPatterns")
calculateUnifrac(esophagus)
calculateUnifrac(GlobalPatterns)

@antagomir
Copy link
Member

Great! It might be good to summarize also in OMA the basis for selecting this particular implementation with just 1-3 sentences, when you update relevant part there.

@thpral
Copy link
Collaborator Author

thpral commented Apr 19, 2024

Without nodeLab parameter, calculateUnifrac could not handle TreeSummarizedExperiments with tip.label values differing from their rownames. Even though the function worked well on GlobalPatterns and esophagus TSEs, it did not work for aggregated TSEs for instance.

I selected the lines from the runUnifrac function that modify the community matrix and the tree with nodeLab parameter and moved them to calculateUnifrac. Now calculateUnifrac uses nodeLab parameter and rbiom::unifrac instead of using runUnifrac.

@TuomasBorman
Copy link
Contributor

Looks nice! However, I think it would be simpler to modify runUnifrac function --> keep the beginning of the function and replace the end with rbiom::unifrac

Is that possible?

@TuomasBorman
Copy link
Contributor

Tests fail because weighted rbiom::unifrac gives different result every time (<1e-5). Check why and add tolerance to tests

@TuomasBorman
Copy link
Contributor

It seems that rownames does not atch with tip labels. They should match and can be matched. Check if that is the case by debugging the lines just before rbiom::unifrac call

@TuomasBorman
Copy link
Contributor

Also, check what is wrong with the other error (that is not even related to this PR?)

@TuomasBorman
Copy link
Contributor

Alright, now the tests should be ok. I believe the stochastic nature comes from multithreading and complex tree (loops); distances between tips can be calculated multiple ways which slightly differs. After pruning, the results are the same every time.

@TuomasBorman TuomasBorman merged commit ba42574 into devel Jun 10, 2024
3 checks passed
@TuomasBorman TuomasBorman deleted the rbiom_unifrac branch June 10, 2024 08:18
@thpral thpral mentioned this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants