Fix: 1381 change to the axes attribute meaning #1396

woutdenolf · 2024-06-30T14:31:43Z

HTML RENDERING OF NXDATA FIX <-> current nxdata for comparison

When reviewing this PR, please keep in mind that the purpose is to rectify NXdata, not improve it. Suggestions for improvements can go in #1381.

For reference: multi-dimensional axes were introduced here https://www.nexusformat.org/2014_axes_and_uncertainties.html

Context

The NXdata definition got a makeover in PR #1213 to make it more understandable. In this effort, I assumed the @axes attribute was supposed to say what all the axes are of the NXdata group. It didn't occur to me this attribute is not about the axes, it is about the signal. It defines what the default axis is for each signal dimension. The unintended change do not make existing files invalid but it does introduce more flexibility which changes things for existing readers as pointed out in issue #1381 by @jacobfilik .

Purpose of this PR

The sole purpose of this PR is to rectify PR #1213 and NOT introduce anything new. The alternative PR #1392 by @PeterC-DLS fixes the situation by carefully modifying some sentences here and there. I would argue however that the entire "axes" section in the introduction has been structured with the less restrictive @axes in mind. In this PR I refactor the entire "axes" section to better reflect restrictive @axes.

Details on @axes rectification

This is the current less restrictive @axes attribute definition which I'm rectifying

The `@axes` attribute provides the names of all the `AXISNAME` fields in the NXdata group.
The corresponding `@AXISNAME_indices` attributes provide the signal dimensions they span
and when missing the position(s) of `AXISNAME` in `@axes` are taken as the signal dimensions
spanned by `AXISNAME`. The shape of an `AXISNAME` field is required to be equal to the shape
of the spanned signal dimensions.

This definition is much simpler and more concise than the definition in the "axes" section in this PR. However it removes the restriction that length(axes) == rank(signal) so we need to put it back in with @axes being the default axes, not all axes.

base_classes/NXdata.nxdl.xml

PeterC-DLS · 2024-07-17T12:00:04Z

Also add to @axes the following from my #1392

    <dimensions rank="1">
        <dim index="1" value="dataRank"/>
    </dimensions>

base_classes/NXdata.nxdl.xml

woutdenolf · 2024-07-17T16:03:33Z

NIAC Jul 2024 comments

AXISNAME_indices is not allowed to contradict the position of "AXISNAME" in @axes @rayosborn
"Axes without defaults" -> DATA dimensions with a default axis
"However it is strongly encouraged to provide them." -> When AXISNAME_indices is the same a the indices of "AXISNAME" in @axes there is no need to provide them. @benajamin
Axis dimensions were always allowed to be +1 to the data dimensions (@rayosborn will provide a reference for this)
Put the canonical example right at the top saying something like "NXdata allows representing data with corresponding coordinates like this two-dimensional example ... More complex cases where each data point has its own unique coordinate or where a dimension can have alternative axes is supported as well" @phyy-nx

woutdenolf · 2024-07-17T16:09:55Z

@rayosborn As for you comment about stating that "AXISNAME_indices are not allowed to contradict the position of AXISNAME in @axes". So this is the canonical example

data:NXdata
  @signal = "data"
  @axes = ["x", "y"]
  @x_indices=0
  @y_indices=1
  data: float[10,20]
  x: float[10]
  y: float[20]

Then this should be invalid

data:NXdata
  @signal = "data"
  @axes = ["y", "x"]  # position does not correspond indices
  @x_indices=0
  @y_indices=1
  data: float[10,20]
  x: float[10]
  y: float[20]

But then is this invalid as well?

data:NXdata
  @signal = "data"
  @axes = ["x", "y"]
  @x_indices = [0, 1]
  @y_indices = [0, 1]
  data: float[10,20]
  x: float[10,20]
  y: float[10,20]

From the position of the string "x" in @axes we get [0] and x_indices is [0,1]. It doesn't contradict in the sense that {0} is a subset of {0,1} but it contradicts in the sense that [0] != [0,1].

Edit: maybe it should say that "if @AXISNAME_indices is provided, then the position index of "AXISNAME" in @axes must be one of the indices." ???

rayosborn · 2024-07-17T18:47:29Z

@woutdenolf requested some evidence that NIAC had previously allowed the dimension size of an axis array to be one greater than the associated data dimension. It turns out that this was approved at the very first NIAC meeting in Pasadena in September, 2003. According to the minutes

"Histograms can be specified either by having an extra element in the axis array or with an attribute histogram_offset in the axis."

If you look at the current PDF version of the manual, you will find examples of axes implementing this rule on page 16/17. An example NeXus file has the following two entries:

NX Data  : data[148,750] (NX_INT32)
NX Data  : time_of_flight[751] (NX_FLOAT32)
NX Data  : polar_angle[148] (NX_FLOAT32)

Finally, it is also implied in the current NXdetector definition. Here are two of the field definitions:

time_of_flight: (optional) [NX_FLOAT](https://manual.nexusformat.org/nxdl-types.html#nx-float) (Rank: 1, Dimensions: [tof+1]) {units=[NX_TIME_OF_FLIGHT](https://manual.nexusformat.org/nxdl-types.html#nx-time-of-flight)}
raw_time_of_flight: (optional) [NX_INT](https://manual.nexusformat.org/nxdl-types.html#nx-int) (Rank: 1, Dimensions: [tof+1]) {units=[NX_PULSES](https://manual.nexusformat.org/nxdl-types.html#nx-pulses)}

The tof symbol refers to the number of time-of-flight bins, so the field contains tof+1 bin boundaries. The time_of_flight field even includes a (now deprecated) axis attribute, implying that this could be linked to a NXdata group. Time-of-flight bin boundaries are stored in the raw data files of virtually every spallation neutron source instrument as well as time-of-flight spectrometers at reactor sources.

woutdenolf · 2024-08-15T09:57:15Z

Refactor again to take NIAC comments into account:

start with simple example at the very top (@phyy-nx)
histogram edges (axis with 1 extra value, @rayosborn )
when AXISNAME_indices is the same a the indices of "AXISNAME" in axes there is no need to provide them (@benajamin )
indices of "AXISNAME" in axes must be a subset of AXISNAME_indices (comment by @rayosborn to prohibit contradiction between AXISNAME_indices and axes)
add <dimensions rank="1"> to @axes (@PeterC-DLS )

The structure of the NXdata doc section at the top is now

Usage: canonical example + list supported cases
Signals:
- Defined by ...
- Definition
- Example
Axes:
- Defined by ...
- Definition
- Example
Uncertainties
- Defined by ...
- Definition
- Example

One example per section is enough. The example in the axes section covers all axes use cases supported by NXdata (histogram edges +1, alternative axes, multi-dimensional axes, "." in axes, missing indices). I made this example very concrete by referring to a specific scientific case and included an image.

Edit: I keep discovering new things about the axes. I added this statement which did not seem to be in the current definition but I think is implied?

The number of values in AXISNAME_indices must be equal to the rank of the corresponding AXISNAME field.

woutdenolf · 2025-01-03T09:13:18Z

In the main documentation, we have used Sphinx tabs to provide examples using both nexusformat and h5py. Do you know if that is possible in the context of what you are proposing?

Sphinx-Gallery is meant to generate a gallery of plots from python code (see examples). So very different from code examples with different implementations, which is what sphinx-tabs or sphinxcontrib-osexample does well.

woutdenolf · 2025-01-04T12:27:21Z

Concerning histograms (axis has one value more than the signal): we do not define whether the signal contains the bin height or the bin area. I'm assuming it is the area. Should we add this to "The fields and attributes have the following constraints ..."?

Edit: often histograms contain photon counts integrated over the bin width. For example in powder diffraction the bin value unit is photons, not photons per degree (assuming the X axis is in degrees). So it is the integrated bin area.

Edit: however when plotting a histogram, the value is usually the height. I'm confused now.

rayosborn · 2025-01-04T17:14:48Z

@woutdenolf, you have opened up another can of worms here. What is stored could well depend on the conventions of the community writing the data, and we have probably been lax in NeXus in providing a mechanism to define which convention is being used.

In time-of-flight neutron scattering (at least when I was writing code), the convention was to divide the counts by the bin width when storing the data, so that we were effectively plotting in units of 'neutrons/microsecond', i.e., bin heights. When we wanted to convert to other units, e.g., to energy, we simply computed the energy values at all the bin boundaries and then multiplied by (dt/dE), a numerical Jacobian, to switch to units of 'neutrons/meV'. Because I come from that background, NeXpy just assumes the values are bin heights when plotting.

However, another community might well choose to store the original counts, i.e., bin integrals, and there is no way of telling with the current NeXus standard. We probably need to define an attribute that says which convention is being used, with one of them being the default. Of course, when the bin widths are all constant, as they nearly always are, it doesn't really matter, but we nevertheless shouldn't assume that.

I will contact some friendly Mantid developers to make sure that 'neutrons/microsecond' is still the convention at neutron sources, and perhaps you could check with your powder diffraction colleagues to see if anyone is storing bin integrals.

ggoneiESS · 2025-01-09T15:03:42Z

@PeterC-DLS has mentioned this PR in an issue I raised. I think this is all good, pictorial examples will be very useful, but there are some things that need to be explicit (some are raised in the issue and some raised by the current solution):

how non-contiguous data is recorded (it is not correct to say a bin has value 0 if it is not present in the data)
whether the bin value should be taken from the centre, leading, or trailing edge
how overflow is recorded

Raised by above discussion in this PR:

how AXIS_errors* is defined (I can't see any alternative except ignoring it if AXIS has length len(DATA)+1 which would define NXdata as containing a histogram)
in case of constant width bins, could axis just be defined by a start and end value, with bins given by len(DATA[INDEX])?
how different should scatter-like and histogram-like data look in the resulting NXdata group?

I think there's inspiration to be taken from
https://www.mathworks.com/help/matlab/ref/matlab.graphics.chart.primitive.histogram-properties.html which has properties for Matlab histograms (so we can see if an NXdata group would contain enough to generate a plot in Matlab)
https://www.silx.org/doc/silx/latest/modules/io/nxdata.html (some definitions are better here already than in the nexusformat definitions)

There's also https://root.cern.ch/doc/master/classTH1.html (a very complete histogram class for ROOT in C++) and it might be interesting that they have separate plot types for histograms and graphs (point-like data), which might help (or provide motivation for) divergence of NXdata trying to add too much information and essentially start plotting data (which, as far as I am aware, is explicitly not the aim of the class).

Anecdotally, it's been very difficult to get started with NXdata, so I'm encouraged by this discussion :) whilst we are talking about AXISNAME, it would be very useful to add some clarity to
the attributes:

@distribution: (optional) [NX_BOOLEAN](https://manual.nexusformat.org/nxdl-types.html#nx-boolean)

``0|false``: single value, ...
0|false: single value, 1|true: multiple values

@first_good: (optional) [NX_INT](https://manual.nexusformat.org/nxdl-types.html#nx-int)

Index of first good value

@last_good: (optional) [NX_INT](https://manual.nexusformat.org/nxdl-types.html#nx-int)

Index of last good value

since it seems that .*st_good is irrelevant with both the current definitions and the proposed, and that distribution is also overruled by the rest of the definitions surrounding the axis.

rayosborn · 2025-01-09T16:37:53Z

@ggoneiESS, thanks for raising the issue. I think this should be the subject of a separate PR. @woutdenolf has made significant progress in improving the NXdata documentation but has, in the process, uncovered some limitations in the current standard including, coincidentally, the one you raised in #1527. I think the discussion will get too confusing if we mix documentation fixes with substantive changes to the standard, although they are obviously linked. In my view, I think it would be best to reopen #1527 to complete the discussion with a view to preparing a separate PR. It shouldn't take too much work to update the documentation if we resolve your issue.

@woutdenolf, for the purposes of this PR, do you think it's sufficient just to state that histograms are plotted using the bin heights for now? Perhaps #1527 can be mentioned in the documentation as an ongoing discussion, but I would prefer that this PR is not further delayed. The vast majority of histogram plotting is currently in the context of time-of-flight neutron scattering, so I think this represents the current practice, even if it could be modified in the future.

ggoneiESS · 2025-01-20T10:51:10Z

I reopened my issue, but I'll hold off submitting the PR until this issue is implemented.

I still wonder whether the extra information about the bin plotting (leading/trailing edge or centre) and the errors info should form part of this, or whether it should all be pushed into my new PR (which, having compared the two, will also change the standard slightly rather than just the documentation).

galleries/nxdata/plot_fscan2d.py

galleries/nxdata/plot_image.py

galleries/nxdata/plot_fscan2d.py

typo Co-authored-by: Peter Chang <[email protected]>

woutdenolf · 2025-02-26T15:23:42Z

Summary of what this MR provides:

"@axes" does not provide a list of all the axes, it provides the default axis for each signal dimension. This is fixed.
The axes can have one more value than the corresponding signal dimensions they span (histograms). Was always supported but got lost. This is fixed.
Restructure the description with an example gallery.

HTML RENDERING of the NXdata after this MR. Especially expand the "description".

phyy-nx · 2025-02-26T16:51:06Z

Hi all, as discussed in the Telco, we can now move this PR to an online vote. NIAC committee members please vote on this PR using emojis on this comment. 👍 for yes, 👎 for no, anything else (for example 👀) to abstain. We need 14 votes to hit quorum so please review and vote!

woutdenolf · 2025-02-26T19:55:12Z

A technical detail: we see orphan: in the HTML rendering. @tacaswell Do you have a workaround since sphinx-gallery/sphinx-gallery#854 is not addressed?

woutdenolf · 2025-02-27T06:15:32Z

Voters: consider the two open discussions

#1396 (comment) (can be handled separately in #1544)

#1396 (comment) (potential extra condition to allow axis_indices to be omitted)

NXdata: the gift that keeps on giving ;-).

woutdenolf linked an issue Jun 30, 2024 that may be closed by this pull request

Change to the axes attribute meaning #1381

Open

woutdenolf requested a review from a team June 30, 2024 14:38

This was referenced Jun 30, 2024

Retighten specification of NXdata's axes attribute for #1381 #1392

Closed

Change to the axes attribute meaning #1381

Open

PeterC-DLS reviewed Jul 1, 2024

View reviewed changes