diff --git a/DESCRIPTION b/DESCRIPTION index ed098c5..31a1e68 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,10 +1,13 @@ Package: vcfppR -Title: Rapid Manipulation of the Variant Call Format (VCF) in R +Title: Rapid Manipulation of the Variant Call Format (VCF) Version: 0.3.5 -Authors@R: - person("Zilong", "Li", , "zilong.dk@gmail.com", role = c("aut", "cre", "cph"), - comment = c(ORCID = "0000-0001-5859-2078")) -Description: The is an easy-to-use C++ API of htslib, offering the full functionalities as the htslib to manipulate the VCF/BCF file. Thus, this package is built upon the vcfpp.h to provide rapid variant processing with the variant call format in R. +Authors@R: c( + person("Zilong", "Li", , "zilong.dk@gmail.com", role = c("aut", "cre"), + comment = c(ORCID = "0000-0001-5859-2078")), + person("Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng", role = "cph", + comment = "Authors of included htslib library") + ) +Description: The is an easy-to-use 'C++' 'API' of 'htslib', offering the full functionalities as the 'htslib' to manipulate the variant call format (VCF) file. Thus, this package is built upon the 'vcfpp.h' for rapid variant processing of the compressed or uncompressed VCF/BCF file. Encoding: UTF-8 Depends: R (>= 3.6.0) Roxygen: list(markdown = TRUE) diff --git a/R/RcppExports.R b/R/RcppExports.R index d8cb73b..33d8e95 100644 --- a/R/RcppExports.R +++ b/R/RcppExports.R @@ -2,8 +2,9 @@ # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #' @name vcfreader -#' @title API for reading the VCF/BCF. +#' @title API for manipulating the VCF/BCF. #' @description Type the name of the class to see its methods +#' @return A class with many methods for manipulating the VCF/BCF #' @field new Constructor given a vcf file \itemize{ #' \item Parameter: vcffile - The path of a vcf file #' } @@ -16,7 +17,7 @@ #' \item Parameter: region - The region to be constrained #' \item Parameter: samples - The samples to be constrained. Comma separated list of samples to include (or exclude with "^" prefix). #' } -#' @field variant Try to get next variant record. Return false if there are no more variants or hit the end of file, otherwise return true. +#' @field variant Try to get next variant record. @return FALSE if there are no more variants or hit the end of file, otherwise TRUE. #' @field chr Return the CHROM field of current variant #' @field pos Return the POS field of current variant #' @field id Return the CHROM field of current variant @@ -68,11 +69,23 @@ #' @field setVariant Modify current variant by adding a vcf line #' @field addINFO Add a INFO in the header of the vcf #' @field addFORMAT Add a FORMAT in the header of the vcf +#' @examples +#' vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") +#' br <- vcfreader$new(vcffile) +#' res <- rep(0L, br$nsamples()) +#' while(br$variant()) { +#' if(br$isSNP()) { +#' gt <- br$genotypes(TRUE) == 1 +#' gt[is.na(gt)] <- FALSE +#' res <- res + gt +#' } +#' } NULL #' @name vcfwriter #' @title API for writing the VCF/BCF. #' @description Type the name of the class to see its methods +#' @return A class with many methods for outputting the VCF/BCF #' @field new Constructor given a vcf file \itemize{ #' \item Parameter: vcffile - The path of a vcf file. don't start with "~" #' \item Parameter: version - The version of VCF specification @@ -85,6 +98,16 @@ NULL #' @field addLine Add a line in the header of the vcf #' @field writeline Write a variant record given a line #' @field close Close and save the vcf file +#' @examples +#' outvcf <- paste0(tempfile(), ".vcf.gz") +#' bw <- vcfwriter$new(outvcf, "VCF4.3") +#' bw$addContig("chr20") +#' bw$addINFO("AF", "A", "Float", "Estimated allele frequency in the range (0,1)"); +#' bw$addFORMAT("GT", "1", "String", "Genotype"); +#' bw$addSample("NA12878") +#' s1 <- "chr20\t2006060\trs146931526\tG\tC\t100\tPASS\tAF=0.000998403\tGT\t1|0" +#' bw$writeline(s1) +#' bw$close() NULL #' calculate the number of heterozygous SNPs for each sample diff --git a/R/vcf-tables.R b/R/vcf-tables.R index d646293..dd463a5 100644 --- a/R/vcf-tables.R +++ b/R/vcf-tables.R @@ -36,7 +36,7 @@ #' If the FORMAT to extract is not "GT", then with collapse=TRUE it will try to turn a list of the extracted vector into a matrix. #' However, this raises issues when one variant is mutliallelic resulting in more vaules than others. #' -#' @return \code{vcftable} a list containing the following components: +#' @return Return a list containing the following components: #'\describe{ #'\item{samples}{: character vector; \cr #' the samples ids in the VCF file after subsetting diff --git a/man/vcfppR-package.Rd b/man/vcfppR-package.Rd index 1c3fdd3..509044e 100644 --- a/man/vcfppR-package.Rd +++ b/man/vcfppR-package.Rd @@ -4,9 +4,9 @@ \name{vcfppR-package} \alias{vcfppR} \alias{vcfppR-package} -\title{vcfppR: Rapid Manipulation of the Variant Call Format (VCF) in R} +\title{vcfppR: Rapid Manipulation of the Variant Call Format (VCF)} \description{ -The \url{https://github.com/Zilong-Li/vcfpp} is an easy-to-use C++ API of htslib, offering the full functionalities as the htslib to manipulate the VCF/BCF file. Thus, this package is built upon the vcfpp.h to provide rapid variant processing with the variant call format in R. +The \url{https://github.com/Zilong-Li/vcfpp} is an easy-to-use 'C++' 'API' of 'htslib', offering the full functionalities as the 'htslib' to manipulate the variant call format (VCF) file. Thus, this package is built upon the 'vcfpp.h' for rapid variant processing of the compressed or uncompressed VCF/BCF file. } \seealso{ Useful links: @@ -17,6 +17,11 @@ Useful links: } \author{ -\strong{Maintainer}: Zilong Li \email{zilong.dk@gmail.com} (\href{https://orcid.org/0000-0001-5859-2078}{ORCID}) [copyright holder] +\strong{Maintainer}: Zilong Li \email{zilong.dk@gmail.com} (\href{https://orcid.org/0000-0001-5859-2078}{ORCID}) + +Other contributors: +\itemize{ + \item Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng (Authors of included htslib library) [copyright holder] +} } diff --git a/man/vcfreader.Rd b/man/vcfreader.Rd index 2ca2d99..7c3eaab 100644 --- a/man/vcfreader.Rd +++ b/man/vcfreader.Rd @@ -2,7 +2,10 @@ % Please edit documentation in R/RcppExports.R \name{vcfreader} \alias{vcfreader} -\title{API for reading the VCF/BCF.} +\title{API for manipulating the VCF/BCF.} +\value{ +A class with many methods for manipulating the VCF/BCF +} \description{ Type the name of the class to see its methods } @@ -24,7 +27,7 @@ Type the name of the class to see its methods \item Parameter: samples - The samples to be constrained. Comma separated list of samples to include (or exclude with "^" prefix). }} -\item{\code{variant}}{Try to get next variant record. Return false if there are no more variants or hit the end of file, otherwise return true.} +\item{\code{variant}}{Try to get next variant record. @return FALSE if there are no more variants or hit the end of file, otherwise TRUE.} \item{\code{chr}}{Return the CHROM field of current variant} @@ -129,3 +132,15 @@ Type the name of the class to see its methods \item{\code{addFORMAT}}{Add a FORMAT in the header of the vcf} }} +\examples{ +vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") +br <- vcfreader$new(vcffile) +res <- rep(0L, br$nsamples()) +while(br$variant()) { + if(br$isSNP()) { + gt <- br$genotypes(TRUE) == 1 + gt[is.na(gt)] <- FALSE + res <- res + gt + } +} +} diff --git a/man/vcftable.Rd b/man/vcftable.Rd index 751364a..5bd98d8 100644 --- a/man/vcftable.Rd +++ b/man/vcftable.Rd @@ -45,7 +45,7 @@ If the FORMAT to extract is not "GT", then with collapse=TRUE it will try to tur However, this raises issues when one variant is mutliallelic resulting in more vaules than others.} } \value{ -\code{vcftable} a list containing the following components: +Return a list containing the following components: \describe{ \item{samples}{: character vector; \cr the samples ids in the VCF file after subsetting diff --git a/man/vcfwriter.Rd b/man/vcfwriter.Rd index 5cde25b..5605255 100644 --- a/man/vcfwriter.Rd +++ b/man/vcfwriter.Rd @@ -3,6 +3,9 @@ \name{vcfwriter} \alias{vcfwriter} \title{API for writing the VCF/BCF.} +\value{ +A class with many methods for outputting the VCF/BCF +} \description{ Type the name of the class to see its methods } @@ -31,3 +34,14 @@ Type the name of the class to see its methods \item{\code{close}}{Close and save the vcf file} }} +\examples{ +outvcf <- paste0(tempfile(), ".vcf.gz") +bw <- vcfwriter$new(outvcf, "VCF4.3") +bw$addContig("chr20") +bw$addINFO("AF", "A", "Float", "Estimated allele frequency in the range (0,1)"); +bw$addFORMAT("GT", "1", "String", "Genotype"); +bw$addSample("NA12878") +s1 <- "chr20\t2006060\trs146931526\tG\tC\t100\tPASS\tAF=0.000998403\tGT\t1|0" +bw$writeline(s1) +bw$close() +} diff --git a/src/vcf-reader.cpp b/src/vcf-reader.cpp index 6369ef7..3287569 100644 --- a/src/vcf-reader.cpp +++ b/src/vcf-reader.cpp @@ -4,8 +4,9 @@ using namespace std; //' @name vcfreader -//' @title API for reading the VCF/BCF. +//' @title API for manipulating the VCF/BCF. //' @description Type the name of the class to see its methods +//' @return A class with many methods for manipulating the VCF/BCF //' @field new Constructor given a vcf file \itemize{ //' \item Parameter: vcffile - The path of a vcf file //' } @@ -18,7 +19,7 @@ using namespace std; //' \item Parameter: region - The region to be constrained //' \item Parameter: samples - The samples to be constrained. Comma separated list of samples to include (or exclude with "^" prefix). //' } -//' @field variant Try to get next variant record. Return false if there are no more variants or hit the end of file, otherwise return true. +//' @field variant Try to get next variant record. @return FALSE if there are no more variants or hit the end of file, otherwise TRUE. //' @field chr Return the CHROM field of current variant //' @field pos Return the POS field of current variant //' @field id Return the CHROM field of current variant @@ -70,6 +71,17 @@ using namespace std; //' @field setVariant Modify current variant by adding a vcf line //' @field addINFO Add a INFO in the header of the vcf //' @field addFORMAT Add a FORMAT in the header of the vcf +//' @examples +//' vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") +//' br <- vcfreader$new(vcffile) +//' res <- rep(0L, br$nsamples()) +//' while(br$variant()) { +//' if(br$isSNP()) { +//' gt <- br$genotypes(TRUE) == 1 +//' gt[is.na(gt)] <- FALSE +//' res <- res + gt +//' } +//' } class vcfreader { public: vcfreader(const std::string& vcffile) { diff --git a/src/vcf-writer.cpp b/src/vcf-writer.cpp index 1cec940..3839dc5 100644 --- a/src/vcf-writer.cpp +++ b/src/vcf-writer.cpp @@ -6,6 +6,7 @@ using namespace std; //' @name vcfwriter //' @title API for writing the VCF/BCF. //' @description Type the name of the class to see its methods +//' @return A class with many methods for outputting the VCF/BCF //' @field new Constructor given a vcf file \itemize{ //' \item Parameter: vcffile - The path of a vcf file. don't start with "~" //' \item Parameter: version - The version of VCF specification @@ -18,6 +19,16 @@ using namespace std; //' @field addLine Add a line in the header of the vcf //' @field writeline Write a variant record given a line //' @field close Close and save the vcf file +//' @examples +//' outvcf <- paste0(tempfile(), ".vcf.gz") +//' bw <- vcfwriter$new(outvcf, "VCF4.3") +//' bw$addContig("chr20") +//' bw$addINFO("AF", "A", "Float", "Estimated allele frequency in the range (0,1)"); +//' bw$addFORMAT("GT", "1", "String", "Genotype"); +//' bw$addSample("NA12878") +//' s1 <- "chr20\t2006060\trs146931526\tG\tC\t100\tPASS\tAF=0.000998403\tGT\t1|0" +//' bw$writeline(s1) +//' bw$close() class vcfwriter { public: vcfwriter(std::string vcffile, std::string version) : bw(vcffile, version) {}