The CGAT Code Collection has two components. The first component
is a collection of scripts in this repository, which are located
here
and can be run using the cgat
command. Within this repository we also have a
number of utility modules that help working with various file formats
in Python. These are located here.
The second component is a collection of pipelines that utilise the functionality of the scripts and can be accessed here.
For questions, please open a discussion on the GitHub issue page.
Documentation of CGAT tools is available here.
The preferred method to install the CGAT code collection is using the installation script, which uses Conda.
Here are the steps:
# download installation script: curl -O https://raw.githubusercontent.com/CGATOxford/cgat/master/install-CGAT-tools.sh # see help: bash install-CGAT-tools.sh # install set of production scripts (well tested): bash install-CGAT-tools.sh --production [--location </full/path/to/folder/without/trailing/slash>] # or go for the latest development version: bash install-CGAT-tools.sh --devel [--location </full/path/to/folder/without/trailing/slash>] # enable the conda environment as requested by the installation script: source </full/path/to/folder/without/trailing/slash>/conda-install/bin/activate cgat-s # finally, please run the cgatflow command-line tool to check the installation: cgat --help
The installation script will put everything under the specified location. The aim of the script is to provide a portable installation that does not interfere with the existing software. As a result, you will have a conda environment working with the CGAT scripts which can be enabled on demand according to your needs.
You can also use pip to install the CGAT scripts. To go down this route, please type:
pip install cgat
However, CGAT depends on numerous other python packages which themselves might require manual intervention. Therefore, our preferred method of installation is through conda.
Run the cgat --help
command to see what scripts are available and how to use them.
For example, to strip sequence and quality information from a bam file, type:
cgat bam2bam --strip=sequence < in.bam > out.bam
For more extensive examples please refer to the documentation here