Skip to content

Commit

Permalink
Merge pull request #31 from sbslee/0.17.0-dev
Browse files Browse the repository at this point in the history
0.17.0 dev
  • Loading branch information
sbslee authored Jul 8, 2021
2 parents d551451 + b14bbd0 commit eb45ac5
Show file tree
Hide file tree
Showing 15 changed files with 929 additions and 110 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
Changelog
*********

0.17.0 (2021-07-08)
-------------------

* Add :meth:`pymaf.MafFrame.plot_lollipop` method.
* :issue:`30`: Add :meth:`pymaf.MafFrame.plot_rainfall` method.
* :issue:`30`: Add :meth:`pyvcf.VcfFrame.plot_rainfall` method.
* Update :meth:`pymaf.MafFrame.to_vcf` method to output sorted VCF.
* Add :meth:`pymaf.MafFrame.matrix_prevalence` method.
* Add :meth:`pymaf.MafFrame.plot_regplot` method.
* Add ``samples`` argument to :meth:`pymaf.MafFrame.plot_snvclss` method.
* Add :meth:`pymaf.MafFrame.plot_evolution` method.
* Add new submodule ``pygff``.

0.16.0 (2021-07-02)
-------------------

Expand All @@ -22,7 +35,7 @@ Changelog
* Add :meth:`pyvcf.VcfFrame.plot_snvclsp` method (simply wraps :meth:`pymaf.MafFrame.plot_snvclsp` method).
* Add :meth:`pyvcf.VcfFrame.plot_snvclss` method (simply wraps :meth:`pymaf.MafFrame.plot_snvclss` method).
* Add :meth:`pyvcf.VcfFrame.plot_titv` method (simply wraps :meth:`pymaf.MafFrame.plot_titv` method).
* Update :meth:`pymaf.MafFrame.from_vcf` method to handle unannotated VCF data.
* :issue:`28`: Update :meth:`pymaf.MafFrame.from_vcf` method to handle unannotated VCF data.

0.15.0 (2021-06-24)
-------------------
Expand Down
11 changes: 7 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ Currently, fuc can be used to analyze, summarize, visualize, and manipulate the
- Browser Extensible Data (BED)
- FASTQ
- FASTA
- General Feature Format (GFF)
- Gene Transfer Format (GTF)
- delimiter-separated values format (e.g. comma-separated values or CSV format)

Additionally, fuc can be used to parse output data from the following programs:
Expand Down Expand Up @@ -150,6 +152,7 @@ Below is the list of submodules available in the fuc API:
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
- **pygff** : The pygff submodule is designed for working with GFF/GTF files. It implements ``pygff.GffFrame`` which stores GFF/GTF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `GFF specification <https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`_.
- **pymaf** : The pymaf submodule is designed for working with MAF files. It implements ``pymaf.MafFrame`` which stores MAF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pymaf.MafFrame`` class also contains many useful plotting methods such as ``MafFrame.plot_oncoplot`` and ``MafFrame.plot_summary``. The submodule strictly adheres to the standard `MAF specification <https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/>`_.
- **pysnpeff** : The pysnpeff submodule is designed for parsing VCF annotation data from the `SnpEff <https://pcingola.github.io/SnpEff/>`_ program. It should be used with ``pyvcf.VcfFrame``.
- **pyvcf** : The pyvcf submodule is designed for working with VCF files. It implements ``pyvcf.VcfFrame`` which stores VCF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pyvcf.VcfFrame`` class also contains many useful plotting methods such as ``VcfFrame.plot_comparison`` and ``VcfFrame.plot_tmb``. The submodule strictly adheres to the standard `VCF specification <https://samtools.github.io/hts-specs/VCFv4.3.pdf>`_.
Expand Down Expand Up @@ -322,8 +325,8 @@ To create an oncoplot with a MAF file:
>>> from fuc import common, pymaf
>>> common.load_dataset('tcga-laml')
>>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(f)
>>> maf_file = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(maf_file)
>>> mf.plot_oncoplot()
.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/oncoplot.png
Expand All @@ -338,8 +341,8 @@ To create a summary figure for a MAF file:
>>> from fuc import common, pymaf
>>> common.load_dataset('tcga-laml')
>>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(f)
>>> maf_file = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(maf_file)
>>> mf.plot_summary()
.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/maf_summary-2.png
Expand Down
33 changes: 33 additions & 0 deletions data/gff/fasta.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
##gff-version 3.1.26
##sequence-region ctg123 1 1497228
ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN
ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001
ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1
ctg123 . five_prime_UTR 1050 1200 . + . Parent=mRNA00001
ctg123 . CDS 1201 1500 . + 0 ID=cds00001;Parent=mRNA00001
ctg123 . CDS 3000 3902 . + 0 ID=cds00001;Parent=mRNA00001
ctg123 . CDS 5000 5500 . + 0 ID=cds00001;Parent=mRNA00001
ctg123 . CDS 7000 7600 . + 0 ID=cds00001;Parent=mRNA00001
ctg123 . three_prime_UTR 7601 9000 . + . Parent=mRNA00001
ctg123 . cDNA_match 1050 1500 5.80E-42 + . ID=match00001;Target=cdna0123+12+462
ctg123 . cDNA_match 5000 5500 8.10E-43 + . ID=match00001;Target=cdna0123+463+963
ctg123 . cDNA_match 7000 9000 1.40E-40 + . ID=match00001;Target=cdna0123+964+2964
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
gtatttgatttgggtttactatcgaataatgagaattttcaggcttaggc
ttaggcttaggcttaggcttaggcttaggcttaggcttaggcttaggctt
aggcttaggcttaggcttaggcttaggcttaggcttaggcttaggcttag
aatctagctagctatccgaaattcgaggcctgaaaagtgtgacgccattc
>cnda0123
ttcaagtgctcagtcaatgtgattcacagtatgtcaccaaatattttggc
agctttctcaagggatcaaaattatggatcattatggaatacctcggtgg
aggctcagcgctcgatttaactaaaagtggaaagctggacgaaagtcata
tcgctgtgattcttcgcgaaattttgaaaggtctcgagtatctgcatagt
gaaagaaaaatccacagagatattaaaggagccaacgttttgttggaccg
tcaaacagcggctgtaaaaatttgtgattatggttaaagg
7 changes: 7 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Below is the list of submodules available in the fuc API:
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
- **pygff** : The pygff submodule is designed for working with GFF/GTF files. It implements ``pygff.GffFrame`` which stores GFF/GTF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `GFF specification <https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md>`_.
- **pymaf** : The pymaf submodule is designed for working with MAF files. It implements ``pymaf.MafFrame`` which stores MAF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pymaf.MafFrame`` class also contains many useful plotting methods such as ``MafFrame.plot_oncoplot`` and ``MafFrame.plot_summary``. The submodule strictly adheres to the standard `MAF specification <https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/>`_.
- **pysnpeff** : The pysnpeff submodule is designed for parsing VCF annotation data from the `SnpEff <https://pcingola.github.io/SnpEff/>`_ program. It should be used with ``pyvcf.VcfFrame``.
- **pyvcf** : The pyvcf submodule is designed for working with VCF files. It implements ``pyvcf.VcfFrame`` which stores VCF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pyvcf.VcfFrame`` class also contains many useful plotting methods such as ``VcfFrame.plot_comparison`` and ``VcfFrame.plot_tmb``. The submodule strictly adheres to the standard `VCF specification <https://samtools.github.io/hts-specs/VCFv4.3.pdf>`_.
Expand Down Expand Up @@ -58,6 +59,12 @@ fuc.api.pyfq
.. automodule:: fuc.api.pyfq
:members:

fuc.api.pygff
=============

.. automodule:: fuc.api.pygff
:members:

fuc.api.pymaf
=============

Expand Down
4 changes: 2 additions & 2 deletions docs/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ maf-maf2vcf
[--cols TEXT [TEXT ...]] [--names TEXT [TEXT ...]]
maf
This command will convert a MAF file to a VCF file.
This command will convert a MAF file to a sorted VCF file.
In order to handle INDELs the command makes use of a reference assembly (i.e. FASTA file). If SNVs are your only concern, then you do not need a FASTA file and can just use the '--ignore_indels' flag.
Expand All @@ -347,7 +347,7 @@ maf-maf2vcf
$ fuc maf-maf2vcf in.maf --fasta hs37d5.fa --cols i_TumorVAF_WU --names AF > out.vcf
Positional arguments:
maf MAF file.
maf MAF file (zipped or unzipped).
Optional arguments:
-h, --help Show this help message and exit.
Expand Down
10 changes: 6 additions & 4 deletions docs/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@
- Browser Extensible Data (BED)
- FASTQ
- FASTA
- General Feature Format (GFF)
- Gene Transfer Format (GTF)
- delimiter-separated values format (e.g. comma-separated values or CSV format)
Additionally, fuc can be used to parse output data from the following programs:
Expand Down Expand Up @@ -310,8 +312,8 @@
>>> from fuc import common, pymaf
>>> common.load_dataset('tcga-laml')
>>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(f)
>>> maf_file = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(maf_file)
>>> mf.plot_oncoplot()
.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/oncoplot.png
Expand All @@ -326,8 +328,8 @@
>>> from fuc import common, pymaf
>>> common.load_dataset('tcga-laml')
>>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(f)
>>> maf_file = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
>>> mf = pymaf.MafFrame.from_file(maf_file)
>>> mf.plot_summary()
.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/maf_summary-2.png
Expand Down
10 changes: 9 additions & 1 deletion docs/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Glossary
SNV classes
===========

Considering the pyrimidines of the Watson-Crick base pairs, there are only six different possible substitutions: C>A, C>G, C>T, T>A, T>C, and T>G.
Considering the pyrimidines of the Watson-Crick base pairs, there are only six different possible substitutions: C>A, C>G, C>T, T>A, T>C, T>G.

References:

Expand All @@ -15,6 +15,14 @@ Transitions (Ti) and transversions (Tv)

DNA substitution mutations are of two types. Transitions are interchanges of two-ring purines (A↔G) or of one-ring pyrimidines (C↔T): they therefore involve bases of similar shape. Transversions are interchanges of purine for pyrimidine bases, which therefore involve exchange of one-ring and two-ring structures.

+------+--------------------+
| Type | SNV classes |
+======+====================+
| Ti | C>T, T>C |
+------+--------------------+
| Tv | C>A, C>G, T>A, T>G |
+------+--------------------+

References:

- `Transitions vs. Transversions <https://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html>`__
Expand Down
4 changes: 4 additions & 0 deletions fuc/api/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,10 @@ def load_dataset(name, force=False):
'tcga_laml.vcf',
'tcga_laml_vep.vcf',
],
'brca': [
'brca.maf.gz',
'brca.vcf',
],
'pyvcf': [
'plot_comparison.vcf',
'normal-tumor.vcf',
Expand Down
5 changes: 5 additions & 0 deletions fuc/api/pycov.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,10 +246,15 @@ def plot_region(
df = df.set_index('Position')
if kwargs is None:
kwargs = {}

# Determine which matplotlib axes to plot on.
if ax is None:
fig, ax = plt.subplots(figsize=figsize)

sns.lineplot(data=df, ax=ax, **kwargs)

ax.set_ylabel('Depth')

return ax

def slice(self, region):
Expand Down
Loading

0 comments on commit eb45ac5

Please sign in to comment.