-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #51 from cmatKhan/develop
adding citations and updating changelog
- Loading branch information
Showing
2 changed files
with
49 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# 0.3.0 | ||
|
||
This is the result of the 2023 hackathon. Current features: | ||
|
||
1. The [Coordinate submodule](src/isocomp/Coordinates/) creates windows based | ||
on overlapping transcripts in the concatenated set of input sequences | ||
2. The [Compare submodule](src/isocomp/Compare/) outputs unique transcripts | ||
based on first whether they are individual transcripts in a given overlap | ||
bin, next whether they have the exact same start/end points, and finally | ||
those transcripts which do have the same start/end are pair-wise compared | ||
3. the command line tools function. On a 16 core machine on DNAnexus, runtime | ||
is ~15 minutes with less than 7GB on 16 CPU | ||
|
||
## Caveats | ||
|
||
1. The output should be considered an intermediate result. It is unparsed and | ||
not immediately useful to anyone. However, there is good information there | ||
|
||
2. We are not conducting exon level coordinate matching on the transcripts. We | ||
are therefore doing sequence comparison on transcripts which are not actually | ||
the same (eg, transcripts form the same individual with different exon usage), | ||
and we are not reporting the wealth of information that we could using the | ||
interval data alone. | ||
|
||
## Future directions | ||
|
||
1. The Coordinate submodule should create an interval tree structure from the | ||
input gtf files using exon coordinates. Exons should be labelled with the | ||
transcript and gene IDs | ||
2. The interval tree can then be used to more finely compare intervals and | ||
label different TSS/TTS, exon usage, intron retention, etc | ||
3. The output format(s) must be refined |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# If you use this repo, please cite: | ||
|
||
>Qiu, Y., Liew, C. S., Mateusiak, C., Kesharwani, R., Gu, B., Raza, M. S., Biederstedt, E., Yaman, U., Al Nahid, A., Tat, T., Modha, S., & Kubica, J. (2023). Isocomp. Carnegie Mellon, University of Nebraska-Lincoln, Washington University, Baylor College of Medicine, University of Southern California, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, HMS, UK Dementia Research Institute, University College London, Shahjalal University of Science and Technology, Houston Methodist Research Institute, Theolytics Limited, University of Warsaw. https://github.com/collaborativebioinformatics/isocomp | ||
## Significant dependencies | ||
|
||
### BioPython | ||
|
||
> Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 https://doi.org/10.1093/bioinformatics/btp163 pmid:19304878 | ||
### edlib | ||
|
||
>Martin Šošić, Mile Šikić; Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017 btw753. doi: 10.1093/bioinformatics/btw753 | ||
### PyRanges | ||
|
||
> Endre Bakken Stovner , Pål Sætrom, PyRanges: efficient comparison of genomic intervals in Python, Bioinformatics, Volume 36, Issue 3, February 2020, Pages 918–919, https://doi.org/10.1093/bioinformatics/btz615 |