Skip to content

Commit

Permalink
add vcfwriter$copyHeader and update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Zilong-Li committed Jan 8, 2024
1 parent f9628a8 commit a167537
Show file tree
Hide file tree
Showing 7 changed files with 50 additions and 20 deletions.
2 changes: 2 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,8 @@ tableFormat <- function(vcffile, region, samples, format, ids, qualval, pass, IN
#' \itemize{ \item Parameter: str - A string for a SAMPLE name }
#' @field addLine Add a line in the header of the vcf
#' \itemize{ \item Parameter: str - A string for a line in the header of VCF }
#' @field copyHeader Copy header of given VCF into the output VCF
#' \itemize{ \item Parameter: vcffile - A string of the VCF path}
#' @field writeline Write a variant record given a line
#' \itemize{ \item Parameter: line - A string for a line in the variant of VCF. Not ended with "newline" }
#' @field close Close and save the vcf file
Expand Down
13 changes: 6 additions & 7 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,23 +45,23 @@ In addtion to two classes for R-bindings of vcfpp.h, there are some useful funct

## Read VCF as tabular data in R

This example shows how to read only SNP variants with genotypes in the VCF/BCF into R tables:
This example shows how to read only SNP variants with genotypes in the VCF/BCF into R list:
```r
library(vcfppR)
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps")
str(res)
```

This example shows how to read only SNP variants with PL format and drop the INFO column in the VCF/BCF into R tables:
This example shows how to read only SNP variants with PL format into R list and drop the INFO column in the VCF/BCF:

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps", format = "PL", info = FALSE)
str(res)
```

This example shows how to read only indels variants with DP format in the VCF/BCF into R tables:
This example shows how to read only indels variants with DP format in the VCF/BCF into R list:

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
Expand All @@ -75,7 +75,7 @@ This example shows how to summarize small variants discovered by GATK. The data

```r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
region <- "chr21:10000000-10500000"
region <- "chr21:10000000-10010000"
res <- vcfsummary(vcffile, region)
str(res)
# get labels and do plottiing
Expand All @@ -94,10 +94,9 @@ This example shows how to summarize complex structure variants discovered by GAT

```r
svfile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20210124.SV_Illumina_Integration/1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz"
sv <- vcfsummary(svfile, svtype = TRUE)
sv <- vcfsummary(svfile, svtype = TRUE, region = "chr20")
str(sv)
allsvs <- sv$summary[-1]
bar <- barplot(allsvs, ylim = c(0, 1.1*max(allsvs)),
main = "Variant Count (all SVs)")
text(bar, allsvs+4500, paste("n: ", allsvs, sep=""))
main = "Variant Counts on chr20 (all SVs)")
```
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ depend on your connection to the servers.
## Read VCF as tabular data in R

This example shows how to read only SNP variants with genotypes in the
VCF/BCF into R tables:
VCF/BCF into R list:

``` r
library(vcfppR)
Expand All @@ -54,8 +54,8 @@ res <- vcftable(vcffile, "chr21:1-5100000", vartype = "snps")
str(res)
```

This example shows how to read only SNP variants with PL format and drop
the INFO column in the VCF/BCF into R tables:
This example shows how to read only SNP variants with PL format into R
list and drop the INFO column in the VCF/BCF:

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
Expand All @@ -64,7 +64,7 @@ str(res)
```

This example shows how to read only indels variants with DP format in
the VCF/BCF into R tables:
the VCF/BCF into R list:

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
Expand All @@ -79,7 +79,7 @@ The data is from the 1000 Genome Project.

``` r
vcffile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/20201028_CCDG_14151_B01_GRM_WGS_2020-08-05_chr21.recalibrated_variants.vcf.gz"
region <- "chr21:10000000-10500000"
region <- "chr21:10000000-10010000"
res <- vcfsummary(vcffile, region)
str(res)
# get labels and do plottiing
Expand All @@ -99,10 +99,9 @@ discovered by GATK-SV.

``` r
svfile <- "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20210124.SV_Illumina_Integration/1KGP_3202.gatksv_svtools_novelins.freeze_V3.wAF.vcf.gz"
sv <- vcfsummary(svfile, svtype = TRUE)
sv <- vcfsummary(svfile, svtype = TRUE, region = "chr20")
str(sv)
allsvs <- sv$summary[-1]
bar <- barplot(allsvs, ylim = c(0, 1.1*max(allsvs)),
main = "Variant Count (all SVs)")
text(bar, allsvs+4500, paste("n: ", allsvs, sep=""))
main = "Variant Counts on chr20 (all SVs)")
```
3 changes: 3 additions & 0 deletions man/vcfwriter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 12 additions & 3 deletions src/vcf-writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ using namespace std;
//' \itemize{ \item Parameter: str - A string for a SAMPLE name }
//' @field addLine Add a line in the header of the vcf
//' \itemize{ \item Parameter: str - A string for a line in the header of VCF }
//' @field copyHeader Copy header of given VCF into the output VCF
//' \itemize{ \item Parameter: vcffile - A string of the VCF path}
//' @field writeline Write a variant record given a line
//' \itemize{ \item Parameter: line - A string for a line in the variant of VCF. Not ended with "newline" }
//' @field close Close and save the vcf file
Expand Down Expand Up @@ -67,12 +69,18 @@ class vcfwriter {
const std::string& desc) {
bw.header.addFORMAT(id, number, type, desc);
}
inline void addSample(const std::string & str) { bw.header.addSample(str); }
inline void addLine(const std::string & str) { bw.header.addLine(str); }
inline void writeline(const std::string & line) { bw.writeLine(line); }
inline void addSample(const std::string& str) { bw.header.addSample(str); }
inline void addLine(const std::string& str) { bw.header.addLine(str); }
inline void copyHeader(const std::string& vcffile) {
br.open(vcffile);
bw.initalHeader(br.header);
br.close();
}
inline void writeline(const std::string& line) { bw.writeLine(line); }

private:
vcfpp::BcfWriter bw;
vcfpp::BcfReader br;
};

RCPP_EXPOSED_CLASS(vcfwriter)
Expand All @@ -82,6 +90,7 @@ RCPP_MODULE(vcfwriter) {
.constructor<std::string, std::string>("construct vcfwriter given vcf file and the version")
.method("close", &vcfwriter::close)
.method("writeline", &vcfwriter::writeline)
.method("copyHeader", &vcfwriter::copyHeader)
.method("addLine", &vcfwriter::addLine)
.method("addSample", &vcfwriter::addSample)
.method("addContig", &vcfwriter::addContig)
Expand Down
6 changes: 6 additions & 0 deletions tests/testthat/test-modify-vcf.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ test_that("modify the genotypes", {
(g2 <- br$genotypes(F))
expect_false(isTRUE(all.equal(g0, g2)))
expect_identical(g1, g2)
br$close()
## the original vcf can not be modified
br <- vcfreader$new(outvcf)
br$variant() ## get a variant record
(g3 <- br$genotypes(F))
expect_identical(g0, g3)
})

test_that("modify item in FORMAT", {
Expand Down
16 changes: 14 additions & 2 deletions tests/testthat/test-vcf-writer.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,17 @@ test_that("vcfwriter: writing variant works", {
expect_identical(s1, s2)
})



test_that("vcfwriter: copy header from another vcf", {
outvcf <- paste0(tempfile(), ".vcf.gz")
vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR")
bw <- vcfwriter$new(outvcf, "VCF4.3")
bw$copyHeader(vcffile)
bw$close()
br <- vcfreader$new(outvcf)
h1 <- br$header()
br$close()
br <- vcfreader$new(vcffile)
h2 <- br$header()
br$close()
expect_identical(h1, h2)
})

0 comments on commit a167537

Please sign in to comment.