Genome assembly sequence and GFF/GTF file analyzer
GeneScoPy is a python based standalone graphical user interface (GUI) tool for working with genome assembly sequence files (FASTA) and genome annotation files (GTF/GFF). It provides a platform for:
- Inspecting and managing genome assembly files.
- Viewing genome annotation details.
- Performing basic analyses such as computing assembly statistics (e.g., N50, GC content, scaffold sizes).
- Searching and navigating annotation files efficiently.
- Higlight the region of interest in the FASTA sequence (selection based).
- File Compatibility: Supports FASTA and GTF/GFF file formats.
- Assembly Details: Displays total assembly length, scaffold counts, largest and smallest scaffolds, N50, and GC content.
- Annotation Table: Presents GTF/GFF data in an easy-to-navigate table with fields like scaffold, source, feature, start and end positions, strand, frame, product, and gene name.
- Sequence Viewer: Allows users to view scaffold sequences in a text editor.
- Search Functionality: Provides tools for searching and navigating annotation records by keywords.
- Highlight Functionality Highlights the sequence region of interest based annotation selection.
- User-Friendly Interface: Built with a modern and intuitive GUI.
- Clone this repository:
git clone https://github.com/SamakshSingh99/GeneScoPy/ cd GeneScoPy
- Ensure you have Python installed (version 3.7 or higher).
- Install required dependencies:
pip install tk
- Run the tool:
python ./Script/Script.py
- Launch the application.
- Use the
File
menu to open a FASTA file or GTF/GFF file.
- After loading a FASTA file, the "Assembly Details" section will display information about:
- Total assembly length.
- Number of scaffolds.
- Largest and smallest scaffolds.
- N50 value.
- Load a GTF/GFF file to populate the annotation table.
- Use the table columns to explore scaffold details, gene annotations, and other metadata.
- Use the search bar to find specific entries in the annotation table.
- Navigate through results using the
Previous
andNext
buttons. - Check the sequence box to find the highlighted regions for selection.
- Reset the search to view the entire table again.
- Scaffold sequences can be selected from the list and displayed in the sequence viewer for detailed inspection.
Contributions are welcome! If you'd like to enhance the tool or fix any bugs:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Submit a pull request with a detailed description.
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to the bioinformatics community for inspiring this project.
For questions or support, please open an issue on the GitHub repository.