Skip to content
Danny Price edited this page Jan 27, 2015 · 3 revisions

Parallel I/O: distributed data analysis

In some astronomy applications it is required that data can be written to global storage from multiple parallel input sources. In addition, data reduction for various astronomy applications benefit from processing with computers ranging from multi-core machines to high-performance supercomputers or GPU clusters for analyzing huge amounts of data generated by surveys. Given this, the data format should provide high-performance parallel read/write access to a shared data file by multiple processes. The interface that is used to perform data I/O to the files should be standard to ensure the portability to major computing platforms.

A specific use case for parallel write can be seen in radio astronomy: a FX correlator breaks up the cross-correlation into frequency subbands over several compute nodes. To reconstruct the full spectrum each compute node needs to write each subband to a single file (or file-like object).

Similarly, for parallel read: a user wishes to image several subbands of a wide-bandwidth visibility dataset produced by a correlator. Data access should be parallelizable over both time and frequency, so that multiple parallel data reduction pipelines can be run at once on the same dataset.

Clone this wiki locally