Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

massively sped up combineTCR with Rcpp #276

Merged
merged 7 commits into from
Nov 10, 2023
Merged

Conversation

Qile0317
Copy link
Collaborator

@Qile0317 Qile0317 commented Nov 8, 2023

combineTCR is one of the most important functions but it's kinda slow, running originally in ~ 24 seconds for contig_list on my device. I used the profvis profiling library to see the speed bottlenecks, and see that .parseTCR is the main time consumer. Looking deeper into the function, this PR solves one of them - finding the indicies of barcodes of Con.df$barcode of length N that are in data2$barcode of length M, by constructing a barcode index in O(N+M) time in Rcpp. This has cut the runtime down to ~17 seconds. I'm working on fully re-implementing .parseTCR in Rcpp atm to see if theres another time increase.

@Qile0317
Copy link
Collaborator Author

Qile0317 commented Nov 8, 2023

a new rcppConstructBarcodeIndex function was the main addition, which def should be able to be used by combineBCR as well but that needs some testing first before the transition to ensure nothing breaks.

@Qile0317 Qile0317 closed this Nov 9, 2023
@Qile0317
Copy link
Collaborator Author

Qile0317 commented Nov 9, 2023

Just implemented the parseTCR step fully in Rcpp but there are some critical errors I'm trying to fix. Although running the combineContigs test file seems faster now so thats a good sign

@Qile0317
Copy link
Collaborator Author

Qile0317 commented Nov 9, 2023

Ok, with the latest commit the segmentation fault cause has been figured out. I've added one small testcase but for the most part I believe the implementation is completely correct as past testcases on combineTCR are still passing. And the runtime for combineTCR has been massively reduced from 17 seconds to about 0.5 seconds! This can also be pretty easily implemented to combineBCR but testdata is needed.

@Qile0317 Qile0317 reopened this Nov 9, 2023
@Qile0317 Qile0317 changed the title sped up combineTCR by modifying .parseTCR massively sped up combineTCR with Rcpp Nov 9, 2023
@Qile0317
Copy link
Collaborator Author

Qile0317 commented Nov 9, 2023

With the latest commit 50ac737 I just re-implemented the barcode indexing for parseBCR temporarily, but testing is definetely needed, and the testdata should be generated with the old version. If you think this is unsound feel free to undo all the changes related to .parseBCR but otherwise I still advocate for the combineTCR changes

@ncborcherding ncborcherding merged commit f88e4d9 into BorchLab:v2 Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants