Skip to content

Commit

Permalink
v0.4.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Zilong-Li committed Mar 26, 2024
1 parent 84a500b commit 80b2285
Show file tree
Hide file tree
Showing 4 changed files with 61 additions and 23 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: vcfppR
Title: Rapid Manipulation of the Variant Call Format (VCF)
Version: 0.4.2
Version: 0.4.3
Authors@R: c(
person("Zilong", "Li", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-5859-2078")),
Expand Down
35 changes: 26 additions & 9 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,20 @@ The vcfppR package implements various useful functions for rapidly manipulating
remotes::install_github("Zilong-Li/vcfppR") ## from latest github
```

## vcftable: read VCF as tabular data
## Cite the work

If you find it useful, please cite the [paper](https://doi.org/10.1093/bioinformatics/btae049)

``` r
library(vcfppR)
citation("vcfppR")
```

## `vcftable`: read VCF as tabular data

`vcftable` gives you fine control over what you want to extract from VCF/BCF files.

Read only SNP variants:
**Read only SNP variants**

```r
library(vcfppR)
Expand All @@ -45,42 +54,50 @@ res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps")
str(res)
```

Read only SNP variants with PL format and drop the INFO column in the VCF/BCF:
**Read only SNP variants with PL format and drop the INFO column in the VCF/BCF**

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps", format = "PL", info = FALSE)
str(res)
```

Read only indels variants with DP format in the VCF/BCF:
**Read only INDEL variants with DP format in the VCF/BCF**

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "indels", format = "DP")
str(res)
```
## vcfcomp: compare two VCF files and report concordance statistics
## `vcfcomp`: compare two VCF files and report concordance

Want to investigate the concordance between two VCF files? `vcfcomp` is the utility function you need!

**Genotype correlation**

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "r2", format = c('GT','GT'))
as.data.frame(res)
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "r2", formats = c('GT','GT'))
str(res)
```

**Genotype F1 score**

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "f1", format = c('GT','GT'))
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "f1")
str(res)
```

**Genotype Non-Reference Concordance**

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "nrc")
str(res)
```

## vcfsummary: variants characterization
## `vcfsummary`: variants characterization

Want to summarize variants discovered by genotype caller e.g. GATK? `vcfsummary` is the utility function you need!

Expand Down
42 changes: 30 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,22 @@ manipulating VCF/BCF files in R using the C++ API of
remotes::install_github("Zilong-Li/vcfppR") ## from latest github
```

## vcftable: read VCF as tabular data
## Cite the work

If you find it useful, please cite the
[paper](https://doi.org/10.1093/bioinformatics/btae049)

``` r
library(vcfppR)
citation("vcfppR")
```

## `vcftable`: read VCF as tabular data

`vcftable` gives you fine control over what you want to extract from
VCF/BCF files.

Read only SNP variants:
**Read only SNP variants**

``` r
library(vcfppR)
Expand All @@ -38,48 +48,56 @@ res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps")
str(res)
```

Read only SNP variants with PL format and drop the INFO column in the
VCF/BCF:
**Read only SNP variants with PL format and drop the INFO column in the
VCF/BCF**

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps", format = "PL", info = FALSE)
str(res)
```

Read only indels variants with DP format in the VCF/BCF:
**Read only INDEL variants with DP format in the VCF/BCF**

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "indels", format = "DP")
str(res)
```

## vcfcomp: compare two VCF files and report concordance statistics
## `vcfcomp`: compare two VCF files and report concordance

Want to investigate the concordance between two VCF files? `vcfcomp` is
the utility function you need\!
the utility function you need!

**Genotype correlation**

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "r2", format = c('GT','GT'))
as.data.frame(res)
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "r2", formats = c('GT','GT'))
str(res)
```

**Genotype F1 score**

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "f1", format = c('GT','GT'))
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "f1")
str(res)
```

**Genotype Non-Reference Concordance**

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcfcomp(test = vcffile, truth = vcffile, region = "chr21:1-5100000", stats = "nrc")
str(res)
```

## vcfsummary: variants characterization
## `vcfsummary`: variants characterization

Want to summarize variants discovered by genotype caller e.g. GATK?
`vcfsummary` is the utility function you need\!
`vcfsummary` is the utility function you need!

**Small variants**

Expand Down
5 changes: 4 additions & 1 deletion cran-comments.md
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
address clang-UBSAN issue by including the latest htslib-1.19.1 (https://github.com/samtools/htslib/releases/tag/1.19)
1. address clang-UBSAN issue by including the latest htslib-1.19.1 (https://github.com/samtools/htslib/releases/tag/1.19)
2. reduce size of package
3. add copyrights and authors of htslib
4. new function `vcfcomp`

0 comments on commit 80b2285

Please sign in to comment.