CASP Secondary Structure Property Datasets

ETL pipelines for producing Critical Assessment of Protein Structure Prediction (CASP) secondary structure feature datasets for AI models and analysis

Getting started

You can either download the datasets or run the project. The current datasets does only include PDB entries that are from the FM (free modelling) category. If you want to build datasets that include additional classifications you can edit the class_filter from the config files.

Requirements

Python 3.8+

Installation

Clone the directory and create a virtual environment inside the repository.

$ git clone <repository>\
$ python -m venv venv

Activate the virtual environment

$ source venv/bin/activate

Install dependencies and the casp

$ python setup.py install
$ pip install -r requirements.txt

Run the desired configuration.

$ casp run -c config/casp14.yml

CASP Datasets

The datasets are found in the data folder. They are organized by the competition number. In each of the folders there is a dataset of the FM domain CASP protein that have been merged into a single dataset. Additionally you can find DSSP files entries, domain summary and fasta files of the entries.

CASP14 \ CASP13 \ CASP12 \ CASP11 \ CASP10 \

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
casp		casp
config		config
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CASP Secondary Structure Property Datasets

Getting started

CASP Datasets

About

Releases

Packages

Languages

Eryk96/CASP-Datasets

Folders and files

Latest commit

History

Repository files navigation

CASP Secondary Structure Property Datasets

Getting started

CASP Datasets

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages