Skip to content

Project Description

Mitchell Shiell edited this page May 31, 2023 · 5 revisions

Project Description

Song is a robust metadata validation and tracking system designed to streamline the management of genomics metadata across multiple cloud storage systems. With the increasing complexity and scale of genomics research, managing metadata manually using spreadsheets and text files can be time-consuming and error-prone. Song facilitates the process of metadata submission and automates validation and tracking, enabling users to create high-quality and reliable metadata repositories with minimal human intervention. Song is one of many microservices provided by Overture and is entirely open-source and accessible for everyone to use.

Features

Metadata Validation

Song's metadata validation process ensures that all data submissions adhere to user-defined standards and structure.

When users submit a data payload through the Song and Score clients, the metadata gets recorded into a JSON file called the analysis file. The analysis file structures the metadata into name-value pairs that, upon submission, are validated against a base schema and a user-defined JSON schema. This process ensures that all metadata is valid and consistent, minimizing errors and safeguarding the quality of the metadata repository.

Metadata Tracking

Song's metadata tracking system uses global identifiers to track all metadata within Song repositories with the associated molecular files uploaded to object storage through Score. With metadata tracking, all data is accessible to downstream services enabling researchers to search, locate, and download the data they need.

State controls

Song's access state system enables users to manage downstream access to metadata across three states. The Unpublished state keeps the metadata within the Song repository and inaccessible to downstream indexing services such as Maestro, ensuring data security and privacy. The Published state makes the assigned metadata accessible for search and download, enabling users to share their data with others. Finally, we use the Suppressed state when the data is no longer relevant, making it unavailable for search and download. The state control capabilities of Song enables researchers to maintain control over their data.

Technologies Used

Song is a Java Application written using Spring Boot.

This is a list of primary technologies used and the versions used for development and testing. External dependencies such as databases may work with different versions than listed but may be untested.

Technology Version Description
Java 11 Primary programming language
Spring Boot 2.6.6 Application framework
PostgreSQL 11.1 Database
Kafka 2.2* Optional - Messaging system to inform other applications of updates to Song Analysis data.
*: Likely supports newer versions without issues.

Related Services

Song interacts with a required companion application, Score, which manages file uploads & downloads. Score is also integrated with SAMtools allowing users to download portions of genomic files with BAM Slicing.

As part of the Overture genomics toolkit, Song can be used with additional integrations, including:

  • Event streaming Built-in support for Apache Kafka event streaming.
  • Maestro: Song is built to natively integrate with Maestro, which can index multiple song repositories into a single Elasticsearch index

The following table outlines the core services within the Overture genomics toolkit:

Product Description
Ego An authorization and user management service
Ego UI A UI for managing EGO authentication and authorization services
Score Transfer data quickly and easily to and from any cloud-based storage system
Song Catalog and manage metadata of genomics data spread across cloud storage systems
Maestro Organizing your distributed data into a centralized Elasticsearch index
Arranger Organize an intuitive data search interface, complete with customizable components, tables, and search terms