Skip to content

Commit

Permalink
Merge pull request #50 from sbslee/0.29.0-dev
Browse files Browse the repository at this point in the history
0.29.0 dev
  • Loading branch information
sbslee authored Dec 19, 2021
2 parents 54c07e2 + ab764b0 commit f4eb5f6
Show file tree
Hide file tree
Showing 9 changed files with 329 additions and 110 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
Changelog
*********

0.29.0 (2021-12-19)
-------------------

* Add new property ``pyvcf.VcfFrame.phased``.
* Update :meth:`pyvcf.VcfFrame.slice` method to automatically handle the 'chr' string.
* Add new argument ``--thread`` to :command:`ngs-hc` command. This argument will be used to set ``--native-pair-hmm-threads`` for GATK's :command:`HaplotypeCaller` command, ``--reader-threads`` for GATK's :command:`GenomicsDBImport` command, and ``-XX:ParallelGCThreads`` and ``-XX:ConcGCThreads`` for Java.
* Add new argument ``--batch`` to :command:`ngs-hc` command. This argument will be used to set ``--batch-size`` for GATK's :command:`GenomicsDBImport` command.
* Update :command:`ngs-bam2fq` command to fix the SGE issue that outputs an error like ``Unable to run job: denied: "XXXXX" is not a valid object name (cannot start with a digit)``.
* Update :command:`ngs-hc` command so that when ``--posix`` is set, it will use ``--genomicsdb-shared-posixfs-optimizations`` argument from GATK's :command:`GenomicsDBImport` command in addition to exporting relevant shell variable (i.e. ``export TILEDB_DISABLE_FILE_LOCKING=1``).
* Add new argument ``--job`` to :command:`ngs-fq2bam` command.
* Update :command:`ngs-fq2bam` command so that BAM creation step and BAM processing step are now in one step.
* Update :command:`ngs-fq2bam` command so that ``--thread`` is now also used to set ``-XX:ParallelGCThreads`` and ``-XX:ConcGCThreads`` for Java.
* Add new method :meth:`common.parse_list_or_file`.

0.28.0 (2021-12-05)
-------------------

Expand Down
34 changes: 20 additions & 14 deletions docs/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ bam-slice
provide a BED file (compressed or uncompressed) to specify
regions. Note that the 'chr' prefix in contig names (e.g.
'chr1' vs. '1') will be automatically added or removed as
necessary to match the input VCF's contig names.
necessary to match the input BED's contig names.
Optional arguments:
-h, --help Show this help message and exit.
Expand Down Expand Up @@ -773,8 +773,8 @@ ngs-fq2bam
$ fuc ngs-fq2bam -h
usage: fuc ngs-fq2bam [-h] [--bed PATH] [--thread INT] [--platform TEXT]
[--force] [--keep]
manifest fasta output qsub1 qsub2 java vcf [vcf ...]
[--job TEXT] [--force] [--keep]
manifest fasta output qsub java vcf [vcf ...]
Pipeline for converting FASTQ files to analysis-ready BAM files.
Expand All @@ -798,12 +798,7 @@ ngs-fq2bam
manifest Sample manifest CSV file.
fasta Reference FASTA file.
output Output directory.
qsub1 SGE resoruce to request with qsub for read alignment
and sorting. Since both tasks support multithreading,
it is recommended to speicfy a parallel environment (PE)
to speed up the process (also see --thread).
qsub2 SGE resoruce to request with qsub for the rest of the
tasks, which do not support multithreading.
qsub SGE resoruce to request for qsub.
java Java resoruce to request for GATK.
vcf One or more reference VCF files containing known variant
sites (e.g. 1000 Genomes Project).
Expand All @@ -813,6 +808,7 @@ ngs-fq2bam
--bed PATH BED file.
--thread INT Number of threads to use (default: 1).
--platform TEXT Sequencing platform (default: 'Illumina').
--job TEXT Job submission ID for SGE.
--force Overwrite the output directory if it already exists.
--keep Keep temporary files.
Expand All @@ -822,7 +818,6 @@ ngs-fq2bam
ref.fa \
output_dir \
"-q queue_name -pe pe_name 10" \
"-q queue_name" \
"-Xmx15g -Xms15g" \
1.vcf 2.vcf 3.vcf \
--thread 10
Expand All @@ -833,7 +828,6 @@ ngs-fq2bam
ref.fa \
output_dir \
"-l h='node_A|node_B' -pe pe_name 10" \
"-l h='node_A|node_B'" \
"-Xmx15g -Xms15g" \
1.vcf 2.vcf 3.vcf \
--thread 10
Expand All @@ -844,8 +838,8 @@ ngs-hc
.. code-block:: text
$ fuc ngs-hc -h
usage: fuc ngs-hc [-h] [--bed PATH] [--dbsnp PATH] [--job TEXT] [--force]
[--keep] [--posix]
usage: fuc ngs-hc [-h] [--bed PATH] [--dbsnp PATH] [--thread INT]
[--batch INT] [--job TEXT] [--force] [--keep] [--posix]
manifest fasta output qsub java1 java2
Pipeline for germline short variant discovery.
Expand All @@ -869,10 +863,22 @@ ngs-hc
-h, --help Show this help message and exit.
--bed PATH BED file.
--dbsnp PATH VCF file from dbSNP.
--thread INT Number of threads to use (default: 1).
--batch INT Batch size used for GenomicsDBImport (default: 0). This
controls the number of samples for which readers are
open at once and therefore provides a way to minimize
memory consumption. The size of 0 means no batching (i.e.
readers for all samples will be opened at once).
--job TEXT Job submission ID for SGE.
--force Overwrite the output directory if it already exists.
--keep Keep temporary files.
--posix Optimize for a POSIX filesystem.
--posix Set GenomicsDBImport to allow for optimizations to improve
the usability and performance for shared Posix Filesystems
(e.g. NFS, Lustre). If set, file level locking is disabled
and file system writes are minimized by keeping a higher
number of file descriptors open for longer periods of time.
Use with --batch if keeping a large number of file
descriptors open is an issue.
[Example] Specify queue:
$ fuc ngs-hc \
Expand Down
70 changes: 66 additions & 4 deletions fuc/api/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1333,6 +1333,9 @@ def update_chr_prefix(regions, mode='remove'):
"""
Add or remove the (annoying) 'chr' string from specified regions.
The method will automatically detect regions that don't need to be
updated and will return them unchanged.
Parameters
----------
regions : str or list
Expand All @@ -1349,10 +1352,18 @@ def update_chr_prefix(regions, mode='remove'):
-------
>>> from fuc import common
>>> common.update_chr_prefix(['chr1:100-200', '1:300-400'], mode='remove')
['1:100-200', '1:300-400']
>>> common.update_chr_prefix(['chr1:100-200', '1:300-400'], mode='add')
['chr1:100-200', 'chr1:300-400']
>>> common.update_chr_prefix(['chr1:100-200', '2:300-400'], mode='remove')
['1:100-200', '2:300-400']
>>> common.update_chr_prefix(['chr1:100-200', '2:300-400'], mode='add')
['chr1:100-200', 'chr2:300-400']
>>> common.update_chr_prefix('chr1:100-200', mode='remove')
'1:100-200'
>>> common.update_chr_prefix('chr1:100-200', mode='add')
'chr1:100-200'
>>> common.update_chr_prefix('2:300-400', mode='add')
'chr2:300-400'
>>> common.update_chr_prefix('2:300-400', mode='remove')
'2:300-400'
"""
def remove(x):
return x.replace('chr', '')
Expand All @@ -1368,3 +1379,54 @@ def add(x):
return modes[mode](regions)

return [modes[mode](x) for x in regions]

def parse_list_or_file(obj, extensions=['txt', 'tsv', 'csv', 'list']):
"""
Parse the input variable and then return a list of items.
This method is useful when parsing a command line argument that accepts
either a list of items or a text file containing one item per line.
Parameters
----------
obj : str or list
Object to be tested. Must be non-empty.
extensions : list, default: ['txt', 'tsv', 'csv', 'list']
Recognized file extensions.
Returns
-------
list
List of items.
Examples
--------
>>> from fuc import common
>>> common.parse_list_or_file(['A', 'B', 'C'])
['A', 'B', 'C']
>>> common.parse_list_or_file('A')
['A']
>>> common.parse_list_or_file('example.txt')
['A', 'B', 'C']
>>> common.parse_list_or_file(['example.txt'])
['A', 'B', 'C']
"""
if not isinstance(obj, str) and not isinstance(obj, list):
raise TypeError(
f'Input must be str or list, not {type(obj).__name__}')

if not obj:
raise ValueError('Input is empty')

if isinstance(obj, str):
obj = [obj]

if len(obj) > 1:
return obj

for extension in extensions:
if obj[0].endswith(f'.{extension}'):
return convert_file2list(obj[0])

return obj
Loading

0 comments on commit f4eb5f6

Please sign in to comment.