Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reversed exon numbers for the reverse strand #23

Open
TomekGa opened this issue Mar 5, 2024 · 0 comments
Open

Reversed exon numbers for the reverse strand #23

TomekGa opened this issue Mar 5, 2024 · 0 comments

Comments

@TomekGa
Copy link

TomekGa commented Mar 5, 2024

Hi,

I noticed that the exon numbering for the reverse strand genes does not follow the commonly used exon numbering (5' to 3'). Instead, it forces exon numbering according to the forward strand, which can lead to misunderstandings. As an example, the exon numbering of the "FBgn0010909" gene shown in Figure 3, Chapter 6 of the vignette is reversed compared to ensemble.

I have investigated this issue and found two elements of the pipeline that cause this:

  1. GenomicFeatures::exonicParts(txdb, linked.to.single.gene.only=TRUE) numbers exons solely according to the forward strand. I managed to manually reorder them with the simple self-written function shown below:
reorder.exons <- function(x){ #changing the order of the exons from the "-" strand
  output <- x  
  temp_tab <- as.data.frame(x)  
  for(i in unique(temp_tab$gene_id)){  
    if(unique(temp_tab$strand[temp_tab$gene_id == i])=="-"){  
      indexes <- which(temp_tab$gene_id == i)  
     output@elementMetadata@listData$exonic_part[indexes] <- rev(output@elementMetadata@listData$exonic_part[indexes])    
    }  
  }  
  return(output)  
}
  1. Although I reordered it manually, your DEXSeqDataSetFromSE() wrapper tries to change it back in the further steps, causing an error. I used the DEXSeqDataSet() function directly as shown below and it seems to work fine now.
DEXSeqDataSet(countData = assay(se),
                            sampleData = as.data.frame(colData(se)), 
                            design = ~sample+exon+TREATMENT:exon, 
                            featureID = sprintf("E%3.3d", mcols(se)$exonic_part), 
                            groupID =  as.character(mcols(se)$gene_id),
                            featureRanges = rowRanges(se),
                            transcripts = as.list(mcols(se)$tx_name))

Based on the definition of the DEXSeqDataSetFromSE() function, I assume that the exon numbering according to the forward strand is intentional. Is there a reason for this? Please consider implementing reverse strand support in the next version of the package.

Kind regards,
Tomek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant