Skip to content

carnival-data/carnival

Repository files navigation

License: GPL v3 Build Status Coverage Status

Carnival

It's a party of information!

Carnival is a data unification technology that aggregates and semantically enriches (encodes the meaning of) data from disparate sources into a unified property graph resource and provides tools to reason over and interact with that resource. Carnival has a robust architecture for tracking the provenance of data and providing evidence chains for conclusions or reasoning made on that data.

Inspired by Open Biological and Biomedical Ontology (OBO) Foundry ontologies, the carnival-clinical extension of the Carnival data model supports the execution of common investigatory tasks including harmonizing complex patient, specimen and healthcare information, patient cohort identification, case-control matching, and the production of data sets for scientific analysis.

Quick Links

Contents

  1. Framework Overview
  2. Package Description
  3. Graph Schema
  4. Getting Started

Framework Overview

Carnval uses objects called vines to connect to external data sources and reapers encode the domain knowledge specific to that data source. Vines can connect to sources such as MySql or Oracle databases, RedCap projects, and CSV files. Some vine features include:

  • Parameterized SQL queries
  • Utilities to compose iterative SQL from lists of identifiers and codes
  • Caching of query results
  • Incremental caching of long running query result data
  • Monitor thread to estimate time to completion of long running queries
  • Automatic re-establishment of dropped connections
  • API layer for REDCap
  • H2 database wrapper for CSV data

Carnival’s property graph database:

  • Is inherently schema-less enabling the incorporation of new data without restructuring resident data
  • Follows data instantiation patterns built for computational efficiency and inspired by OBO Foundry ontologies
  • Has a query engine capable of executing queries of arbitrary complexity

Package Overview

Core Framework Packages

  • carnival-graph - Framework for defining carnival graph schemas (vertex and edge definitions). Contains the basic vertex, edge, and property classes.
  • carnival-gremlin-dsl - Gremlin dsl support for traversing carnival property graphs.
  • carnival-util - Contains utility and helper classes such as MappedDataTable, FeatureReport and SqlUtils.
  • carnival-core - Basic carnival framework. Implements the basic carnival framework classes (vines, reapers, reasonsers, etc). Defines the basic carnival graph schema (processes, databases).

Application Packages

  • carnival-clinical - Extension of carnival-core for clinical data. Contains graph schema extensions for concepts such as patients, patient cohorts and healthcare encounters. Implements methods for case-control matching for patient cohorts.

Graph Schema

Getting Started

See developer setup for full documentation on how to set up a development environment, and a tutorial for getting started.