Skip to content

Commit

Permalink
more memo updates
Browse files Browse the repository at this point in the history
  • Loading branch information
bhazelton committed Dec 13, 2023
1 parent d9a0389 commit 7ef5159
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 49 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,10 @@ docs/_static/
docs/_templates/
docs/index.rst
docs/skymodel.rst

# Latex
docs/references/*.aux
docs/references/*.out
docs/references/*.toc
docs/references/_minted*/*.pygtex
docs/references/_minted*/*.pygstyle
121 changes: 72 additions & 49 deletions docs/references/skyh5_memo.tex
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,18 @@
\section{Introduction}
\label{sec:intro}

This memo introduces a new HDF5\footnote{\url{https://www.hdfgroup.org/}}-based
file format of a SkyModel object in \verb+pyradiosky+\footnote{\url{https://github.com/RadioAstronomySoftwareGroup/pyradiosky}},
This memo introduces a new HDF5\footnote{\url{https://www.hdfgroup.org/}} based
file format of a SkyModel object in \texttt{pyradiosky}\footnote{\url{https://github.com/RadioAstronomySoftwareGroup/pyradiosky}},
a python package that provides objects and interfaces for representing diffuse, extended and compact astrophysical radio sources.
Here, we describe the required and optional elements and the structure of this file format, called \textit{SkyH5}.

We assume that the user has a working knowledge of HDF5 and the associated
python bindings in the package \verb+h5py+\footnote{\url{https://www.h5py.org/}}, as
well as SkyModel objects in pyradiosky. For more information about HDF5, please
python bindings in the package \texttt{h5py}\footnote{\url{https://www.h5py.org/}}, as
well as SkyModel objects in \texttt{pyradiosky}. For more information about HDF5, please
visit \url{https://portal.hdfgroup.org/display/HDF5/HDF5}. For more information
about the parameters present in a SkyModel object, please visit
\url{https://pyradiosky.readthedocs.io/en/latest/skymodel.html}.
Examples of how to interact with SkyModel objects in pyradiosky are available at
Examples of how to interact with SkyModel objects in \texttt{pyradiosky} are available at
\url{http://pyradiosky.readthedocs.io/en/latest/tutorial.html}.

Note that throughout the documentation, we assume a row-major convention (i.e.,
Expand All @@ -51,21 +51,22 @@ \section{Overview}
\label{sec:overview}
A SkyH5 object contains data representing catalogs and maps of
astrophysical radio sources, including the associated metadata necessary to interpret them.
A SkyH5 file contains two primary HDF5 groups: the \verb+Header+ group, which contains the metadata, and
the \verb+Data+ group, which contains the Stokes parameters representing the
flux densities or temperatures of the sources. Datasets in the \verb+Data+ group
A SkyH5 file contains two primary HDF5 groups: the \texttt{pyradiosky} group, which contains the metadata, and
the \texttt{Data} group, which contains the Stokes parameters representing the
flux densities or temperatures of the sources (as well as some optional arrays the same size as the stokes parameters data).
Datasets in the \texttt{Data} group
are can be passed through HDF5's compression
pipeline, to reduce the amount of on-disk space required to store the data.
However, because HDF5 is aware of any compression applied to a dataset, there is
little that the user has to explicitly do when reading data. For users
interested in creating new files, the use of compression is optional in the
SkyH5 format, because the HDF5 file is self-documenting in this regard.

Many of the datasets in SkyH5 files have units associated with them (represented as astropy Quantity objects on the SkyModel object).
The units are stored as attributes on the datasets with the name `unit'.
Datasets that derive from other astropy objects (e.g. astropy.time.Time,
astropy.coordinate.EarthLocation, astropy.coordinate.Latitude, astropy.coordinate.Longitude)
also have an `object\_type' attribute indicating the object type.
Many of the datasets in SkyH5 files have units associated with them (represented as \texttt{astropy Quantity} objects on the SkyModel object).
The units are stored as attributes on the datasets with the name ``unit''.
Datasets that derive from other \texttt{astropy} objects (e.g. \texttt{astropy Time,
astropy EarthLocation, astropy Latitude, astropy Longitude})
also have an ``object\_type'' attribute indicating the object type.

In the discussion below, we discuss required and optional datasets in the
various groups. We note in parenthesis the corresponding attribute of a SkyModels
Expand All @@ -74,13 +75,13 @@ \section{Overview}

\section{Header}
\label{sec:header}
The \verb+Header+ group of the file contains the metadata necessary to interpret
The \texttt{pyradiosky} group of the file contains the metadata necessary to interpret
the data. We begin with the required parameters, then continue to optional
ones. Unless otherwise noted, all datasets are scalars (i.e., not arrays). The
precision of the data type is also not specified as part of the format, because
in general the user is free to set it according to the desired use case (and
HDF5 records the precision and endianness when generating datasets). When using
the standard \verb+h5py+-based implementation in pyradiosky, this typically
the standard \texttt{h5py}-based implementation in \texttt{pyradiosky}, this typically
results in 32-bit integers and double precision floating point numbers. Each
entry in the list contains \textbf{(1)} the exact name of the dataset in the
HDF5 file, in boldface, \textbf{(2)} the expected datatype of the dataset, in
Expand All @@ -95,97 +96,97 @@ \subsection{Required Parameters}
\label{sec:req_params}
\begin{itemize}

\item \textbf{component\_type}: \textit{string} The type of components in the SkyModel. The options are: `healpix' and `point'.
If component_type is `healpix', the components are the pixels in a HEALPix map in units compatible with K or Jy/sr.
If the component_type is `point', the components are point-like sources, or point like components of extended sources,
\item \textbf{component\_type}: \textit{string} The type of components in the SkyModel. The options are: ``healpix'' and ``point''.
If component\_type is ``healpix'', the components are the pixels in a HEALPix map in units compatible with K or Jy/sr.
If the component\_type is ``point'', the components are point-like sources, or point like components of extended sources,
in units compatible with Jy or K sr. Some additional parameters are required depending on the component type. (\textit{component\_type})

\item \textbf{Ncomponents}: \textit{int} The number of components in the SkyModel. This can be the number of individual
compact sources, or it can include components of extended sources, or the number of pixels in a map. (\textit{Ncomponents})

\item \textbf{spectral\_type}: \textit{string} This describes the type of spectral model for the components. The options are:
`spectral\_index', `subband', `flat', or `full'. If the spectral model uses a spectral index, the `reference\_frequency' and
`spectral\_index` parameters are required. The convention for the spectral index is $I=I_0 \frac{f}{f_0}^{\alpha}$, where
$I_0} is the `stokes` parameter at the `reference\_frequency' parameter $f_0$ and $\alpha$ is the `spectral\_index` parameter.
``spectral\_index'', ``subband'', ``flat'', or ``full''. If the spectral model uses a spectral index, the reference\_frequency and
spectral\_index parameters are required. The convention for the spectral index is $I=I_0 \frac{f}{f_0}^{\alpha}$, where
$I_0$ is the stokes parameter at the reference\_frequency parameter $f_0$ and $\alpha$ is the spectral\_index parameter.
Note that the spectral index is assumed to apply in the units of the stokes parameter (i.e. there is no additive factor of 2 applied
to convert between temperature and flux density units).
The subband spectral model is used for catalogs with multiple flux measurements at different frequencies (i.e. GLEAM
\url{https://www.mwatelescope.org/science/galactic-science/gleam/}). For subband spectral models, the `freq_array`
and `freq_edge_array` parameters are required to give the nominal (usually the central) frequency and the top and bottom of
\url{https://www.mwatelescope.org/science/galactic-science/gleam/}). For subband spectral models, the freq\_array
and freq\_edge\_array parameters are required to give the nominal (usually the central) frequency and the top and bottom of
each subband respectively.
The flat spectral model assumes no spectral flux dependence, which can be useful for testing.
The full spectral model is used for catalogs with flux values at multiple frequencies that are not expected to have flux correlations as a function of frequency, so cannot not be interpolated to frequencies not included in the catalog. This is a good representation for e.g. Epoch of Reionization signal cubes.
For full spectral models, the `freq_array` parameter is required to give the frequencies.
For full spectral models, the freq\_array parameter is required to give the frequencies.
(\textit{spectral\_type})

\item \textbf{Nfreqs}: \textit{int}
Number of frequencies if spectral_type is `full' or `subband', 1 otherwise. (\textit{Nfreqs})
Number of frequencies if spectral\_type is ``full'' or ``subband'', 1 otherwise. (\textit{Nfreqs})
astropy SkyCoord
\item \textbf{history}: \textit{string} The history of the catalog. (\textit{history})
\end{itemize}


\subsection{Optional Parameters}
\label{sec:opt_params}
\begin{itemize}
\item \textbf{name}: \textit{string} The name for each component. This is a one-dimensional array of size (Ncomponents).
Note this is \textbf{required} if the component\_type is `point'. (\textit{name})
Note this is \textbf{required} if the component\_type is ``point''. (\textit{name})

\item \textbf{skycoord}:
A nested dataset that contains the information to create an \verb+astropy.coordinates.SkyCoord+object representing the component positions.
Note this is \textbf{required} if the component\_type is `point'. The keys must include:
A nested dataset that contains the information to create an \texttt{astropy SkyCoord} object representing the component positions.
Note this is \textbf{required} if the component\_type is ``point''. The keys must include:
\begin{itemize}
\item \textbf{frame}: \textit{string} The name of the coordinate frame (e.g. `icrs', `fk5', `galactic'). Must be a frame supported by \verb+astropy+.
\item \textbf{representation\_type}: \textit{string} The representation type, one of `spherical', `cartesian' or `cylindrical'
\item \textbf{frame}: \textit{string} The name of the coordinate frame (e.g. ``icrs'', ``fk5'', ``galactic''). Must be a frame supported by \texttt{astropy}.
\item \textbf{representation\_type}: \textit{string} The representation type, one of ``spherical'', ``cartesian'' or ``cylindrical''
\item Representation component names (e.g. \textbf{ra}, \textbf{dec}, \textbf{alt}, \textbf{az}): \textit{float} Two or three such components must be present, which ones are required depend on the frame. These are one-dimensional arrays of size (Ncomponents).
\end{itemize}
And may include any other attributes accepted as input parameters for an \verb+astropy.coordinates.SkyCoord+object (e.g. `obstime', `equinox', `location').
Each of these datasets may have `unit' and `object\_type' attributes and may be either a scalar or a one-dimensional array of size (Ncomponents) as appropriate.
And may include any other attributes accepted as input parameters for an \texttt{astropy SkyCoord} object (e.g. obstime, equinox, location).
Each of these datasets may have ``unit'' and ``object\_type'' attributes and may be either a scalar or a one-dimensional array of size (Ncomponents) as appropriate.
(\textit{skycoord})

\item \textbf{nside}: \textit{int}
The HEALPix nside parameter. Note this is \textbf{required} if the component\_type is `healpix' and should not be defined otherwise. (\textit{nside})
The HEALPix nside parameter. Note this is \textbf{required} if the component\_type is ``healpix'' and should not be defined otherwise. (\textit{nside})

\item \textbf{hpx\_order}: \textit{string}
The HEALPix pixel ordering convention, either `ring' or `nested'.
Note this is \textbf{required} if the component\_type is `healpix' and should not be defined otherwise. (\textit{hpx\_order})
The HEALPix pixel ordering convention, either ``ring'' or ``nested''.
Note this is \textbf{required} if the component\_type is ``healpix'' and should not be defined otherwise. (\textit{hpx\_order})

\item \textbf{hpx\_frame}: A nested dataset that contains the information to describe an astropy coordinate frame giving the HEALPix coordinate frame.
\item \textbf{hpx\_frame}: A nested dataset that contains the information to describe an \texttt{astropy} coordinate frame giving the HEALPix coordinate frame.
This is similar to the skycoord dataset described above but it does not contain the representation\_type or the representation component names.
Note this is \textbf{required} if the component\_type is `healpix' and should not be defined otherwise.
Note this is \textbf{required} if the component\_type is ``healpix'' and should not be defined otherwise.
The keys must include:
\begin{itemize}
\item \textbf{frame}: \textit{string} The name of the coordinate frame (e.g. `icrs', `fk5', `galactic'). Must be a frame supported by \verb+astropy+.
\item \textbf{frame}: \textit{string} The name of the coordinate frame (e.g. ``icrs'', ``fk5'', ``galactic''). Must be a frame supported by \texttt{astropy}.
\end{itemize}
And may include any other scalar attributes accepted as input parameters for an \verb+astropy.coordinates.SkyCoord+object (e.g. `obstime', `equinox', `location').
Each of these datasets may have `unit' and `object\_type' attributes as appropriate.
And may include any other scalar attributes accepted as input parameters for an \texttt{astropy SkyCoord} object (e.g. obstime, equinox, location).
Each of these datasets may have ``unit'' and ``object\_type'' attributes as appropriate.
(\textit{hpx\_frame})

\item \textbf{hpx\_inds}: \textit{int}
The HEALPix indices for the included components. Does not need to include all the HEALPix pixels in the map.
This is a one-dimensional array of size (Ncomponents).
Note this is \textbf{required} if the component\_type is `healpix' and should not be defined otherwise. (\textit{hpx\_inds})
Note this is \textbf{required} if the component\_type is ``healpix'' and should not be defined otherwise. (\textit{hpx\_inds})

\item \textbf{freq\_array}: \textit{float}
Frequency array giving the nominal (or central) frequency in a unit that can be converted to Hz.
Note this is \textbf{required} if the spectral\_type is `full' or `subband' and should not be defined otherwise. (\textit{freq\_array})
Note this is \textbf{required} if the spectral\_type is ``full'' or ``subband'' and should not be defined otherwise. (\textit{freq\_array})

\item \textbf{freq\_edge\_array}: \textit{float}
Array giving the frequency band edges in a unit that can be converted to Hz, only required if spectral\_type is `subband'.
Array giving the frequency band edges in a unit that can be converted to Hz, only required if spectral\_type is ``subband''.
This is a two dimensional array with shape (2, Nfreqs). The zeroth index in the first dimension holds the lower band edge
and the first index holds the upper band edge.
Note this is \textbf{required} if the spectral\_type is `subband' and should not be defined otherwise. (\textit{freq\_edge\_array})
Note this is \textbf{required} if the spectral\_type is ``subband'' and should not be defined otherwise. (\textit{freq\_edge\_array})

\item \textbf{reference\_frequency}: \textit{float}
Reference frequency giving the frequency at which the flux in the stokes parameter was measured in a unit that can be converted to Hz.
This is a one-dimensional array of size (Ncomponents).
Note this is \textbf{required} if the spectral\_type is `spectral\_index' and should not be defined if the spectral\_type is `full' or `subband'.
Note this is \textbf{required} if the spectral\_type is ``spectral\_index'' and should not be defined if the spectral\_type is ``full'' or ``subband''.
(\textit{reference\_frequency})

\item \textbf{spectral\_index}: \textit{float}
The spectral index describing the flux evolution with frequency, see details in the `spectral\_type' description above.
The spectral index describing the flux evolution with frequency, see details in the spectral\_type description above.
This is a one-dimensional array of size (Ncomponents).
Note this is \textbf{required} if the spectral\_type is `spectral\_index' and should not be defined otherwise.
Note this is \textbf{required} if the spectral\_type is ``spectral\_index'' and should not be defined otherwise.
(\textit{spectral\_index})

\item \textbf{extended\_model\_group}: \textit{string}
Expand All @@ -194,5 +195,27 @@ \subsection{Optional Parameters}
(\textit{extended\_model\_group})
\end{itemize}

\subsection{Extra Columns}
\label{sec:extra-columns}
SkyModel objects support ``extra columns'', which are additional arbitrary per-component arrays of metadata that are useful to carry
around with the data but which are not formally supported as a reserved keyword in the \texttt{pyradiosky}. In a SkyH5 file,
extra columns are handled by creating a datagroup called \texttt{extra\_columns} inside the \texttt{pyradiosky} datagroup.
When possible, these quantities should be HDF5 datatypes, to support interoperability between UVH5 readers.
Inside of the extra\_columns datagroup, each extra columns is saved as a key-value pair using a dataset, where the name of the
extra columns is the name of the dataset and its corresponding array is saved in the dataset. The ``unit'' and ``object\_type'' HDF5 attributes
are used in the same way as for the other header items (see \ref{sec:overview}), but it is not recommended to use other attribute names
due to the lack of support inside of \texttt{pyradiosky} for ensuring the attributes are properly saved when reading and writing SkyH5 files.

\section{Data}
\label{sec:data}
In addition to the \texttt{pyradiosky} datagroup in the root namespace, there must be one called \texttt{Data}.
This datagroup saves the Stokes parameters representing the flux densities or temperatures of the sources and
some optional arrays that are the same size. They are also all expected to be the same shape: (4, Nfreqs, Ncomponents)
where the first dimension indexes the polarization direction, ordered (I, Q, U, V).
The \textbf{stokes} dataset must be present in this datagroup and it must have a ``unit'' attribute that is equivalent to Jy or K str if the
component\_type is ``point'' or equivalent to Jy/str or K if the component\_type is ``healpix''.
In addition, this datagroup may also contain a \textbf{stokes\_error} dataset that gives the standard error on the stokes values and
should have the same ``unit'' attribute as the stokes dataset and a \textbf{beam\_amp} dataset that gives the beam amplitude at the
source position for the instrument that made the measurement.

\end{document}

0 comments on commit 7ef5159

Please sign in to comment.