Skip to content

Storage Command Line

Nacho edited this page Dec 26, 2015 · 16 revisions

Overview

OpenCGA Storage implements a command line interface (CLI) to allow users to interact with the storage.

New CLI Commands

The new Storage command line interface implements two levels: commands and subcommands. Main commands are alignment, variant, feature and server with their subcommands:

  • alignment
  • index
  • query (old fetch-alignments):
  • analysis: QC, variant call, …
  • benchmark
  • variant
  • index
  • query (old fetch-variants)
  • annotate
  • stats
  • sample: sample aggregation queries
  • ops: remove samples, …
  • benchmark
  • feature
  • index: GFF/BED files are indexed using tabix by default, some plugins could override this and index in MongoDB or HBAse
  • query: to execute region-based queries
  • server
  • rest: a RESTful server using Jetty, starts at port 9090 by default
  • grpc: a high-performance server using Protocol Buffer 3, starts at port 9091 by default

CLI Commands

The Storage command line interface defines this set of commands:

  • index-variants Index variants file
  • fetch-variants Search over indexed variants
  • annotate-variants Create and load variant annotations into the database
  • stats-variants Create and load stats into a database.
  • create-accessions Creates accession IDs for an input file
  • index-alignments Index alignment file
  • fetch-alignments Search over indexed alignments
Dynamic parameters

These parameters are not specified on the command line and will change internal configuration parameters. Depending on the biotype (alignment or variant) and the selected storage engine, this parameters will be added to the redden configuration file in the options field.

-D<configuration-parameter-name>=<value>

Storage Configuration file

The file storage-configuration.yml should be placed at $OPENCGA_HOME/conf/, and contains all configuration needed by OpenCGA-Storage. There are tree main blocks: storageEngines, server and cellbase.

  • Storage configuration
    • Storage Engine configuration
      • Variant
      • Alignment
    • Server configuration
    • CellBase configuration
Storage Engines configuration

Can define a set of configuration options for each installed storage-engine (mongodb, hadoop, ...). Each one contains a section for every supported biotype, currently alignment and variant.

Variant ETL configuration

Common options between all storage-engines for variants are defined in VariantStorageManager::Options

Alignment ETL configuration

Common options between all storage-engines for alignments are defined in AlignmentStorageManager::Options

Server configuration
CellBase configuration
Clone this wiki locally