Skip to content

vanheeringen-lab/peaksql

Repository files navigation

Note: This package is in active development and functionality might change or not work correctly (yet)!

PeakSQL

PyPI version Anaconda-Server Badge Maintainability Test Coverage docs continuous-integration continuous-deployment

Dynamic machine learning database for genomics. Supports common bed-like dataformats like .bed, and .narrowPeak. bedgraph; and the binary bigwig format.

Installation

PeakSQL can be installed through pip:

pip install peaksql

Or installed with Conda (hosted on Bioconda):

conda install peaksql

And finally, installed from source:

git clone https://github.com/vanheeringen-lab/peaksql
cd peaksql
pip install .

Getting started

import peaksql

# paths to our files
db_file = 'peakSQL.sqlite'  # where to store our database
assembly = "/path/to/hg38.fa"
data = "binding_sites.bed"

# load data into database
db = peaksql.database.DataBase(db_file)
db.add_assembly(assembly, assembly="hg38", species="human")
db.add_data(data, assembly="hg38")

# now load as dataset
dataset = peaksql.BedDataSet(db_file, seq_length=101, stride=200)

# use the dataset in your application
for seq, label in dataset:
    ...