Skip to content

Commit

Permalink
Add Representative Sequence article
Browse files Browse the repository at this point in the history
  • Loading branch information
ramiromagno committed Jul 4, 2024
1 parent d442811 commit e872c9d
Show file tree
Hide file tree
Showing 7 changed files with 95 additions and 7 deletions.
10 changes: 9 additions & 1 deletion R/report_source.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,16 @@ find_report_last_modified <- function(file) {

#' Report last modification date
#'
#' @description
#'
#' [report_last_modified()] returns the last modified date and time of the
#' report source (local file or remote file).
#' report source: local file or remote file. If a local file, the modification
#' date will be that indicated by the file system; if a remote file, the date
#' of last update is that provided by HTTP header `"last-modified"`.
#'
#' MGI updates its reports weekly, every Thursday. However, not all reports are
#' updated each week. The return value of this function is the closest you will
#' get to a versioning of MGI report files.
#'
#' @param tbl Report data as a [tibble][tibble::tibble-package].
#'
Expand Down
9 changes: 8 additions & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,11 @@ Use `read_report()` to read any supported MGI report into R, e.g. to
read `MRK_List1.rpt`:

```{r}
read_report("marker_list1", n_max = 10L)
(markers <- read_report("marker_list1", n_max = 10L))
# Report file source
report_source(markers)
# Report file last modification date
report_last_modified(markers)
```

## Code of Conduct
Expand All @@ -72,6 +76,9 @@ package thoroughly before relying on it in critical applications. The authors
disclaim all liability for any damage or loss resulting from the use of this
package. Use of the `{mgi.report.reader}` package is at the user's own risk.

Support for reports is an ongoing process, but we welcome pull requests for
quicker coverage.

## Citing this package

- Firstly, if you use this package please do not forget to start by citing the
Expand Down
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Use `read_report()` to read any supported MGI report into R, e.g. to
read `MRK_List1.rpt`:

``` r
read_report("marker_list1", n_max = 10L)
(markers <- read_report("marker_list1", n_max = 10L))
#> # A tibble: 10 × 15
#> marker_status marker_type marker_id marker_symbol marker_name feature_type
#> <fct> <fct> <chr> <chr> <chr> <fct>
Expand All @@ -79,6 +79,18 @@ read_report("marker_list1", n_max = 10L)
#> # marker_symbol_now <chr>, note <chr>
```

``` r
# Report file source
report_source(markers)
#> [1] "https://www.informatics.jax.org/downloads/reports/MRK_List1.rpt"
```

``` r
# Report file last modification date
report_last_modified(markers)
#> [1] "2024-07-01 11:51:02 GMT"
```

## Code of Conduct

Please note that the `{mgi.report.reader}` project is released with a
Expand All @@ -97,6 +109,9 @@ before relying on it in critical applications. The authors disclaim all
liability for any damage or loss resulting from the use of this package.
Use of the `{mgi.report.reader}` package is at the user’s own risk.

Support for reports is an ongoing process, but we welcome pull requests
for quicker coverage.

## Citing this package

- Firstly, if you use this package please do not forget to start by
Expand Down
2 changes: 2 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ navbar:
menu:
- text: Genetic Marker
href: articles/genetic_marker.html
- text: Representative Genomic Sequence
href: articles/representative_sequence.html
articles:
text: Reports
menu:
Expand Down
8 changes: 7 additions & 1 deletion man/report_last_modified.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions vignettes/articles/biotype_conflicts.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@ source of the classification is indicated in the `database` variable. The

## MGI Representative Gene Model

The variable `is_mgi_rep` stands for _is MGI representative_ and is encoded as a
logical vector that indicates whether the corresponding `gene_id` and `biotype`
values are the ones adopted by MGI as representative for the genetic marker.
The variable `is_mgi_rep` stands for _is the MGI genomic representative
sequence_ and is encoded as a logical vector that indicates whether the
corresponding `gene_id` and `biotype` values are the ones associated with MGI
representative sequence. See `vignette("representative_sequence")` for more
details.

```{r}
biotype_conflicts |>
Expand Down
48 changes: 48 additions & 0 deletions vignettes/articles/representative_sequence.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: "Representative Genomic Sequence"
---

```{r setup, echo=FALSE}
library(mgi.report.reader)
```

In MGI, selecting a representative genome sequence is crucial, as it influences the
representative transcript and protein sequences. Priorities for selecting
representative genomic sequences include gene model sequences from Ensembl,
NCBI, and VISTA annotations. In MGI, gene model sequences define the genomic
region using `start` and `end` coordinates from providers (`source`), including
regions defined by regulatory feature providers.

## MGI representative genomic sequence

For both protein-coding and noncoding RNA genes and pseudogenes, the
representative genomic sequence is typically chosen from Ensembl or NCBI gene
models. If both providers (`source`) offer gene models for a feature, the
shorter model is selected to avoid extended read-through transcripts. In the
absence of gene models, the longest associated GenBank genomic sequence is
chosen. For regulatory regions, the gene model from Ensembl, NCBI, or VISTA is
selected, with NCBI models preferred for enhancers when available.

Whether a sequence is considered representative is indicated by the variable
`is_mgi_rep`. For example, in the MGI_BioTypeConflict.rpt report, this can be
referenced by reviewing the `vignette("biotype_conflicts")`.

## MGI representative transcript and protein sequences

Representative transcript and protein sequences are selected algorithmically
based on the representative genomic sequence. If the genomic sequence is from
Ensembl, the longest Ensembl protein and corresponding transcript are chosen. If
it is not from Ensembl, the longest transcript from the genomic gene model
provider is selected, and, if coding, the longest associated protein from a
provider hierarchy is chosen. If the representative genomic sequence is not a
gene model from an annotation provider, both transcript and protein sequences
(if coding) are selected from provider (`source`) hierarchies:

- **Transcript hierarchy**: Longest of NM RefSeq > NR RefSeq > GenBank non-EST RNA >
XM RefSeq > XR RefSeq > GenBank EST RNA.

- **Protein hierarchy**: Longest of SWISS-PROT > RefSeq NP > TrEMBL > RefSeq XP.

## References

- Richard M Baldarelli, Cynthia L Smith, Martin Ringwald, Joel E Richardson, Carol J Bult, Mouse Genome Informatics Group , Mouse Genome Informatics: an integrated knowledgebase system for the laboratory mouse, Genetics, Volume 227, Issue 1, May 2024, iyae031. [doi:10.1093/genetics/iyae031](https://doi.org/10.1093/genetics/iyae031).

0 comments on commit e872c9d

Please sign in to comment.