-
Notifications
You must be signed in to change notification settings - Fork 4
Project Description
Song is a robust metadata validation and tracking system designed to streamline the management of genomics metadata across multiple cloud storage systems. With the increasing complexity and scale of genomics research, managing metadata manually using spreadsheets and text files can be time-consuming and error-prone. Song facilitates the process of metadata submission and automates validation and tracking, enabling users to create high-quality and reliable metadata repositories with minimal human intervention. Song is one of many microservices provided by Overture and is entirely open-source and accessible for everyone to use.
Song's metadata validation process ensures that all data submissions adhere to user-defined standards and structure.
When users submit a data payload through the Song and Score clients, the metadata gets recorded into a JSON file called the analysis file. The analysis file structures the metadata into name-value pairs that, upon submission, are validated against a base schema and a user-defined JSON schema. This process ensures that all metadata is valid and consistent, minimizing errors and safeguarding the quality of the metadata repository.
Song's metadata tracking system uses global identifiers to track all metadata within Song repositories with the associated molecular files uploaded to object storage through Score. With metadata tracking, all data is accessible to downstream services enabling researchers to search, locate, and download the data they need.
Song's access state system enables users to manage downstream access to metadata across three states. The Unpublished
state keeps the metadata within the Song repository and inaccessible to downstream indexing services such as Maestro, ensuring data security and privacy. The Published
state makes the assigned metadata accessible for search and download, enabling users to share their data with others. Finally, we use the Suppressed
state when the data is no longer relevant, making it unavailable for search and download. The state control capabilities of Song enables researchers to maintain control over their data.
Song is a Java Application written using Spring Boot.
This is a list of primary technologies used and the versions used for development and testing. External dependencies such as databases may work with different versions than listed but may be untested.
Technology | Version | Description |
---|---|---|
Java | 11 | Primary programming language |
Spring Boot | 2.6.6 | Application framework |
PostgreSQL | 11.1 | Database |
Kafka | 2.2* |
Optional - Messaging system to inform other applications of updates to Song Analysis data. *: Likely supports newer versions without issues. |
Song interacts with a required companion application, Score, which manages file uploads & downloads. Score is also integrated with SAMtools allowing users to download portions of genomic files with BAM Slicing.
As part of the Overture genomics toolkit, Song can be used with additional integrations, including:
- Event streaming Built-in support for Apache Kafka event streaming.
- Maestro: Song is built to natively integrate with Maestro, which can index multiple song repositories into a single Elasticsearch index
The following table outlines the core services within the Overture genomics toolkit:
Product | Description |
---|---|
Ego | An authorization and user management service |
Ego UI | A UI for managing EGO authentication and authorization services |
Score | Transfer data quickly and easily to and from any cloud-based storage system |
Song | Catalog and manage metadata of genomics data spread across cloud storage systems |
Maestro | Organizing your distributed data into a centralized Elasticsearch index |
Arranger | Organize an intuitive data search interface, complete with customizable components, tables, and search terms |
Are we missing anything? Found a typo? Let us know!