Skip to content

How to create a new IO plugin

Tom Eulenfeld edited this page Nov 5, 2024 · 2 revisions

Create a sequence plugin in the sugar package

Let's assume you want to create an IO plugin for the sequence format fancy.

First, take a look at other plugins in the sugar/_io folder.

  1. Fork the repository to your account and create a new branch
  2. Create a new module or package inside the sugar/_io folder, e.g. fancymodule.py.
  3. Use the following template to provide read and/or write functionality for your format:
"""
My Fancy Seq Plugin

This is an example sequence file format. The layout is as follows:
#MyFancySeqFormat
seq1 AAATTGGGCCC
seq2 ATGGCT
"""

# To add read support you must define either an iter_fancy or read_fancy function, or both.
# To add write support you must define either an append_fancy or write_fancy function, or both.

from sugar import BioBasket, BioSeq
from sugar._io.util import _add_fmt_doc

# Use the following flag to indicate, that your file format is binary rather than text-based,
# the passed file handlers will be opened in binary mode.
#binary_fmt = True

# optional, filename extensions for automatic detection of file format
# when writing
filename_extensions = ['fancy']

def is_fancy(f, **kw):
    """
    Function is optional, used for auto-detection of format when reading

    It should return True if the format is detected,
    otherwise it may raise any exception or return False.
    """
    content = f.read(50)
    return content.strip().lower().startswith('#myfancyseqformat')

# The function decorators are used to automatically add a warning
# to the docstring, that this function should be called via the main
# iter_ or read functions.
@_add_fmt_doc('read')
def iter_fancy(f, optional_argument=None):
    """
    The iter_fancy function expects a file handler and has to yield BioSeq objects.

    You can define optional arguments.
    """
    for line in f:
        if line.strip() != '' and not line.startswith('#'):
            seqid, data = line.split()
            yield BioSeq(data, id=seqid)

@_add_fmt_doc('read')
def read_fancy(f, **kw):
    """
    The read_fancy function expects a file handler and has to return a BioBasket object
    """
    # We are lazy here and reuse iter_fancy
    return BioBasket(list(iter_fancy(f, **kw)))

@_add_fmt_doc('write')
def append_fancy(seq, f, **kw):
    """
    Write a single seq to file handler
    """
    f.write(f'{seq.id} {seq.data}\n')

@_add_fmt_doc('write')
def write_fancy(seqs, f, **kw):
    """
    Write a BioBasket object to file handler
    """
    f.write('#MyFancySeqFormat 3.14159\n')
    for seq in seqs:
        # be lazy again
        append_fancy(seq, f, **kw)
  1. Add your plugin to the FMTS list in sugar/_io/util.py
  2. Register the plugin in the pyproject.toml file:
[project.entry-points."sugar.io"]
fancy = "sugar._io.fancymodule"
  1. Write some tests in a new file sugar/tests/test_io_fancy.py.
  2. Re-Install your branch of sugar and check that everything is working:
from sugar import read
seqs = read('example.fancy')
print(seqs)
  1. Run your tests.
  2. Create a pull request to get your plugin into the main repository.

Create a sequence plugin that can be used with sugar, but is in an external package

Create your own package by following only step 3 above. Register the plugin in the pyproject.toml of your own project:

[project.entry-points."sugar.io"]
fancy = "myfancypackage.fancymodule"

When your package is installed you can still read seq files using the commands in point 7 above.

Create a features plugin in the sugar package or in an external package

This is analogous to the sequence plugin. It can even be located in the same file as the sequence plugin.

Instead, use the following variables and function definitions:

from sugar.core.fts import FeatureList, Location, Feature
from sugar._io.util import _add_fmt_doc

#binary_fmt_fts
filename_extensions_fts
def is_fts_fancy(f, **kw):
    ...

@_add_fmt_doc('read_fts')
def read_fts_fancy(f, **kw):
    ...
    return FeatureList(fts)

@_add_fmt_doc('write_fts')
def write_fts_fancy(fts, f, **kw):
    f.write(...)