Skip to content

Library of useful tools for building pipelines for a research cloud datalake

License

Notifications You must be signed in to change notification settings

ncsa/dagster-ncsa

Repository files navigation

dagster-ncsa

A Python library providing useful components for using Dagster to create academic research cloud data lakes for the National Center for Supercomputing Applications (NCSA).

Overview

dagster-ncsa extends Dagster's capabilities with specialized tools designed specifically for academic research workflows and data management at scale. It provides abstractions and utilities to simplify building, managing, and monitoring data pipelines in research-oriented cloud data lake environments.

Components

  • S3ResourceNCSA: Extends the Dagster S3 resource to add some useful helper functions for working with S3 objects in a research data pipeline.
  • AirTableCatalogResource: A resource for interacting with AirTable tables as a catalog for data assets in a research data pipeline.

Installation

Basic Installation

pip install dagster-ncsa

Development Installation

pip install -e ".[dev]"

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/your-organization/dagster-ncsa.git
cd dagster-ncsa

# Install development dependencies
pip install -e ".[dev]

About

Library of useful tools for building pipelines for a research cloud datalake

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages