Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate gene symbol and exon number #23

Open
komalsrathi opened this issue Jun 10, 2019 · 6 comments
Open

Annotate gene symbol and exon number #23

komalsrathi opened this issue Jun 10, 2019 · 6 comments

Comments

@komalsrathi
Copy link

Hi,

I wanted to know if there is any way to annotate gene_name and exon_number in addition to the transcript_id from the GTF file:

GTF file:

##description: evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2013-12-05
##modified by GTEx_Collapsed_Gene_Model.py
1       HAVANA  gene    11869   14362   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2";
1       HAVANA  transcript      11869   14362   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"
; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG0
0000000961.2";
1       HAVANA  exon    11869   12227   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_1; exon_number 1";
1       HAVANA  exon    12595   12721   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_2; exon_number 2";
1       HAVANA  exon    12975   13052   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_3; exon_number 3";

Sashimi Plot:

Screen Shot 2019-06-10 at 1 33 24 PM

Thanks
Komal

@dgarrimar
Copy link
Collaborator

Hi @komalsrathi, currently it is not possible with ggsashimi to annotate the gene name/exon number, but we will take into account your suggestion for future developments. Thanks!

@PedroBarbosa
Copy link

I would like to reopen this issue just to highlight how useful and important this feature would be.

Best,
Pedro

@stale
Copy link

stale bot commented Jan 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues with no recent activity label Jan 21, 2021
@dgarrimar dgarrimar added help wanted and removed stale Issues with no recent activity labels Jan 22, 2021
@luxeredias
Copy link

I had a similar demand. I needed to show the "transcript_name" instead of "ensembl_transcript_id" from each transcript in the GTF plot. What I did was alter lines 291 and 293 of the original code from:

"try:
transcript_id = re.findall('transcript_id ("[^"]+")', tags)[0]
except KeyError:
print("ERROR: 'transcript_id' attribute is missing in the GTF file.")
exit(1)
"

to:

"try:
transcript_id = re.findall('transcript_name ("[^"]+")', tags)[0]
except KeyError:
print("ERROR: 'transcript_name' attribute is missing in the GTF file.")
exit(1)
"

Maybe if you do the same you'll have at least an indication of the gene you are plotting without the need to add anything else to the plot later!

@dgarrimar
Copy link
Collaborator

Thanks @luxeredias, that is indeed a possibility for alternative transcript names. With respect to gene names, as pointed out originally by @komalsrathi and @PedroBarbosa, the point is where to add the gene name. For instance, when plotting a region that contains several genes it can be difficult to place the gene label in the right place. Suggestions and PR's are wellcome! As for exon numbers, I think they could be added relatively straigthforward if users are interested.

@bdgsilva
Copy link

Hi @dgarrimar,
I just wanted to say I would be super interested in having exon numbers in the annotation as well. I think it would be a great addition, not least because no other tools offer that feature so far, to the best of my knowledge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants