Skip to content
/ pytrf Public

a python package for finding tandem repeats from genomic sequences

License

Notifications You must be signed in to change notification settings

lmdu/pytrf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

c03459c · Mar 19, 2025
Mar 19, 2025
Feb 25, 2025
Mar 19, 2025
Mar 19, 2025
Sep 24, 2024
Dec 4, 2024
Sep 16, 2023
Apr 26, 2021
May 4, 2023
Nov 27, 2024
Sep 24, 2024
Dec 5, 2024
Jul 20, 2024
Mar 19, 2025
Dec 5, 2024

Repository files navigation

Pytrf

Github Action Readthedocs PyPI PyPI https://app.codacy.com/project/badge/Grade/bbe59e55f686465ca5824c69583e9718

a fast Python package for finding tandem repeat sequences

Introduction

A Tandem repeat (TR) in genomic sequence is a set of adjacent short DNA sequence repeated consecutively. The pytrf is a lightweight Python C extension for identification of tandem repeats. The pytrf enables to fastly identify both exact or perfect SSRs. It also can find generic tandem repeats with any size of motif, such as with maximum motif length of 100 bp. Additionally, it has capability of finding approximate or imperfect tandem repeats. Furthermore, the pytrf not only can be used as Python package but also provides command line interface for users to facilitate the identification of tandem repeats.

Note: pytrf is not a Python binding to common used tool TRF.

Usage

The pytrf can be used as Python package. It requires pyfastx to parse FASTA or FASTQ file.

>>> import pytrf
>>> import pyfastx
>>> fa = pyfastx.Fastx('test.fa', uppercase=True):
>>> for name, seq in fa:
>>>     for ssr in STRFinder(name, seq):
>>>             print(ssr.as_string())

Command line

The pytrf also provides command line tools for you to find tandem repeats from given FASTA or FASTQ file.

pytrf -h

usage: pytrf command [options] fastx

a python package for finding tandem repeats from genomic sequences

options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

commands:

    findstr      find exact or perfect short tandem repeats
    findgtr      find exact or perfect generic tandem repeats
    findatr      find approximate or imperfect tandem repeats
    extract      get tandem repeat sequence and flanking sequence

For example:

pytrf findstr test.fa

Documentation

For more detailed usage, see our manual: https://pytrf.readthedocs.io