Skip to content

Commit

Permalink
Merge pull request #51 from cmatKhan/develop
Browse files Browse the repository at this point in the history
adding citations and updating changelog
  • Loading branch information
cmatKhan authored Sep 1, 2023
2 parents 76e01d5 + 3735594 commit 172ffdf
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 0 deletions.
32 changes: 32 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# 0.3.0

This is the result of the 2023 hackathon. Current features:

1. The [Coordinate submodule](src/isocomp/Coordinates/) creates windows based
on overlapping transcripts in the concatenated set of input sequences
2. The [Compare submodule](src/isocomp/Compare/) outputs unique transcripts
based on first whether they are individual transcripts in a given overlap
bin, next whether they have the exact same start/end points, and finally
those transcripts which do have the same start/end are pair-wise compared
3. the command line tools function. On a 16 core machine on DNAnexus, runtime
is ~15 minutes with less than 7GB on 16 CPU

## Caveats

1. The output should be considered an intermediate result. It is unparsed and
not immediately useful to anyone. However, there is good information there

2. We are not conducting exon level coordinate matching on the transcripts. We
are therefore doing sequence comparison on transcripts which are not actually
the same (eg, transcripts form the same individual with different exon usage),
and we are not reporting the wealth of information that we could using the
interval data alone.

## Future directions

1. The Coordinate submodule should create an interval tree structure from the
input gtf files using exon coordinates. Exons should be labelled with the
transcript and gene IDs
2. The interval tree can then be used to more finely compare intervals and
label different TSS/TTS, exon usage, intron retention, etc
3. The output format(s) must be refined
17 changes: 17 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# If you use this repo, please cite:

>Qiu, Y., Liew, C. S., Mateusiak, C., Kesharwani, R., Gu, B., Raza, M. S., Biederstedt, E., Yaman, U., Al Nahid, A., Tat, T., Modha, S., & Kubica, J. (2023). Isocomp. Carnegie Mellon, University of Nebraska-Lincoln, Washington University, Baylor College of Medicine, University of Southern California, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, HMS, UK Dementia Research Institute, University College London, Shahjalal University of Science and Technology, Houston Methodist Research Institute, Theolytics Limited, University of Warsaw. https://github.com/collaborativebioinformatics/isocomp
## Significant dependencies

### BioPython

> Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 https://doi.org/10.1093/bioinformatics/btp163 pmid:19304878
### edlib

>Martin Šošić, Mile Šikić; Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017 btw753. doi: 10.1093/bioinformatics/btw753
### PyRanges

> Endre Bakken Stovner , Pål Sætrom, PyRanges: efficient comparison of genomic intervals in Python, Bioinformatics, Volume 36, Issue 3, February 2020, Pages 918–919, https://doi.org/10.1093/bioinformatics/btz615

0 comments on commit 172ffdf

Please sign in to comment.