LSDB

LSDB - Large Survey Database

Definitions

HATS - Hierarchical Adaptive Tiling Scheme

HATS is a directory structure and metadata for spatially arranging large catalog survey data. This was originally motivated by a desire to perform spatial cross-matching between surveys at large scale, but is applicable to a range of spatial analysis and algorithms.

We use healpix pixels at various orders to divide the sky into partitions, where each partition will have roughly the same number of objects, instead of dividing into equal area. Because each partition is roughly the same size on disk, we can expect reasonable performance of parallel operations on each partition.

We use parquet as the underlying storage format, as it provides efficient storage and retrieval of tabular data.

NB: This was previously named HiPSCat - Hierarchical Partitioned Survey Catalog - Storage for scalable catalog cross-matching

LSDB - Large Survey Database - A framework for scalable spatial analysis

LSDB is an analytics framework, built on top of the HATS format. It is a python library that can read and interpret HATS-formatted catalogs and perform parallel operations on the underlying partitioned data.

This provides a reference implementation of scaled cross-matching using the HATS structure. In addition, we intend to support analysis and filtering on survey data, both before and after cross-matching.

What LSDB is NOT

This is NOT a full relational database, and focuses instead on spatial operations and full-survey analytics. At this time, we do not support updates of survey data. We do not provide heavy optimization for non-spatial queries and filtering.

Status: Active development

Working Group:

Design doc access through: [email protected] https://groups.google.com/g/hipscat-wg
Github maintenance through: https://github.com/orgs/astronomy-commons/teams/hipscat-friends

Code Repositories:

We've implemented a HATS library in python, to read and interpret metadata about catalogs (but not interact with the partitioned parquet files). (hipscat)

We've implemented LSDB in python, using dask as the supporting parallelization framework. (lsdb)

The LSDB library depends on the HATS library for reading catalog metadata.

In addition, we've implemented a hats-import tool which reads survey data from a variety of existing formats, and writes them in HATS format. (hats-import)

Issue Tracker: Joined issue search

Contributing LINCC Frameworks Team Members: Melissa DeLucchi, Mario Juric, Max West, Sam Wyatt, Sean McGuire, Sandro Campos, Konstantin Malanchev

Plans

Current / Recent Efforts:

2023 Q4 goals

hipscat/hipscat-import documentation beta testing
timedomain MVP (joint effort with LINCC Frameworks TAPE)
margin productionization
hipscat (data) feature freeze
ADASS 23: hipscat/LSDB tutorial; get more feedback from IVOA
LSDB v0.1 for alpha testing

Past Milestones

v0.1 alpha (mid-July 2023)
- hipscat-import
  - Map-reduce-based pipeline for creating basic file structure creation from original survey format (e.g. take a few large CSVs and convert them into structured parquet with root-level metadata)
  - User guide for getting started with your own datasets.
- HiPSCat
  - Read catalog metadata, as written by hipscat-import.
  - Perform basic spatial filtering (e.g. cone search) and list relevant parquet files.
  - Simple mollweide visualization of partitions.
- LSDB
  - This is NOT scheduled to have a public release at this time.
- Based on feedback from alpha users, we will revisit our features and priorities for a v0.2 alpha round.
Q1 2023
- HiPSCat Library MVP (minimum viable product)
- LSDB MVP
Q4 2022
- HiPSCat format prototype
- Converted gaia into HiPSCat format
- Prototyped cross matching with dask dataframes

To keep up to date on the effort, request membership in the working group: https://groups.google.com/g/hipscat-wg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LSDB

LSDB - Large Survey Database

Definitions

Code Repositories:

Plans

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally