Skip to content

Commit

Permalink
Create specs_discussion.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jochenstahn authored Jan 30, 2024
1 parent b83a72e commit d617fe0
Showing 1 changed file with 259 additions and 0 deletions.
259 changes: 259 additions & 0 deletions advanced_and_expert_level/file_format/specs_discussion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
---
layout: page
title: "ORSO - file formats - discussions on specifications"
author: "Jochen Stahn"
---


## Feedback and discussions on the .ort specs

In the following you find a rather unsorted collection of feedback and ideas about the [`.ort` specs](https://www.reflectometry.org/advanced_and_expert_level/file_format/file_format_specs) and the related [`orsopy`](https://orsopy.readthedocs.io/en/latest) package.

last modified: 2023-05-05

### anonymous 2022 workshop participant

#### naming confusion *angle_of_incidence* vs. *alpha_i* etc

For the optional columns of wavelength and angle of incidence, are lambda and alpha_i the expected/standardized names?
Or should they be wavelength and incident_angle, as written in the header?
Or is this discrepancy in the example required because column names must be different from values in the header?

> (Jochen) See also below. This problem arises from the fact that we try to make it right in the header, and use the *conventional*
> terms in the column description.
>
> The keywords in the header are taken form the *physical quantity* name, e.g. *incidence_angle*,
> while in the (optional) column description there are two possible (and recommended) entries: `name` and `physical_quantity`.
> The `name` is used to create the 1-line header right above the data array and thus a well-established *symbol* is the right choice there.
> And `physical_quantity` is used to avoid all ambiguities.
If there is no current standard name (understanding that this is optional information), this should be made clear, with a statement/explanation of whether or not a standard name is expected in the future.

> (Jochen) I agree that we should define a set of key words here and recommend their use. Suggestions can be found below.
A third important quantity for which there could be a standard name in the future is photon energy (for synchrotron x-ray experiments).

> (Jochen) I agree.
#### wrong declaration in specs

The documentation states Value can be a list, but it cannot. (ComplexValue can be a list though).
This discrepancy should be corrected either by modifying the documentation or the implementation,
otherwise people could attempt to write files with data which cannot be handled by orsopy.

> (Jochen) Wrong in specs. I'll correct this.
#### redundant information and priorisation

Specs are not clear about what should happen if there is a header entry and a column with the same name:

1. is this invalid (see first point)
2. do they have to be consistent or
3. does the column overwrite any header information?

E.g., if a column is supplied, would it be required that, if that is also in the header,

- it is a range with matching min/max?
- That it is a value with matching average?
- Must it be left out of the header entirely?
- Should it be overwritten if a column is found,
- or should there be a pointer to a column, for example an optional keyword ‘column’, where the value or range could then be used by software which cannot support point-by-point calculation (the column data)?

My vote is for 3 implemented in this last way, for the purpose that the header can still contain some human-readable information useful for experimental reproducibility even if the contents are overwritten by a column.

> (Jochen) Here we have to diferentiate between the data format rules and recommendations for software using this format.
>
> The format allows for redundant and even for contradicting information. It is in the responsibility of the
> programmer to write out a physically consistend data file.
>
> On the other hand we should give some recommendations like the ones mentioned above.
>
> Personally I also prefer option 3, but without any further restrictions for the header.
>
> We discussed the pointer from header to column entries in an early stage and it was dropped at some point.
> With a priorisation *column over header* this is clear to the software.
>
> ``` YAML
> # measurement:
> # instrument_settings:
> # incident_angle:
> # min: <value>
> # max: <value>
> # details_at_column: alpha_i
> # unit: deg
> ```
>
> alternatively to `details_at_column` one can use the already existing `comment` to create a human-readabla link.
**item 1**: What is the ORSO recommendation for using redundand information?
#### Future feature request:
Ability to have an error defined for a quantity in the header, either implemented similar to how quantities are allowed to have a range, or similar to how columns are allowed to be an error of another column”
> The following syntax is now implemented in orsopy:
>
> ``` YAML
> # measurement:
> # instrument_settings:
> # incident_angle:
> # magnitude: 2.1
> # unit: deg
> # error:
> # magnitude: 0.01
> # error_type: resolution
> # distribution: gaussian
> # value_is: sigma
> ```
### confusion of physical terms
(Jochen)
We have a confusion of what we use as key words. Since the german terms are different I had probplems figuring out the correct English definitions...
#### official definitions
What can be measured or calculated is a **physical quantity**.
> E.g. the *incident angle*
This has a **dimension** = dim(*physical quantity*) relating it to a set of base quantities like *length*, *time*, *charge*, *temperature* etc. The *dimension* is no unit, nor can it be used to unambigiously describe a *physical quantity* (*plane angle* does not tell between *scattering angle*, *incident angle*, *total reflection angle*, ...).
> dim( *incident angle* ) = *plane angle*
The *physical quantity* is often refered to by using a **symbol**.
> one possible symbol for *incident angle* is $\alpha_i$ (or *alpha_i* in the orso header)
The *physical quantity* is composed of a **numerical magnitude** times **unit**. Depending on the chosen *unit*, the *numerical magnitude* changes.
> $\alpha_i = 2.3 \cdot \mathrm{deg}$
#### what we do wrong or inconsistent
- For the column *name* we use the *symbol* (R, Qz, alpha_i, ...) rather than the *physical quantity*. But in the header above we use the latter as key words. Thus if the analysis software searches for example for information about the *incident angle*, it has to look in various places (this is intended) for different keys. A solution might be that the software searches for standardised `physical_quantity` entries in the column description which match the keys in the header.
## stitched data
- Where do we store e.g. the angles for stitched tof measurements? These are **no longer used for processing**, but may help future planning.
- x-ray data obtained with different attenuator settings.
> In case this information is not provided in one of the optional columns or in the individual headers of multiple data sets,
> it can not be used by the analysis software. Good choices for this information might be extra entries e.g. in the `incident_angle` section:
>
> ``` YAML
> incident_angle:
> min: 1.0
> max: 5.8
> individual_magnitudes: [1.0, 2.7, 5.8]
> unit: deg
> ```
**item 7**: Do we introduce `individual_magnitudes` as a new key within the class `ValueRange`?
## guidelines for writing and reading
- hirarchy for looking up information (e.g. column beats header content)
- avoid contradicting information (e.g. single incident angle in the header for angle-disperse measurement)
## open issues for lab x-ray reflectometers
**item 8**: Which of the keys discussed below should be included in the specs to (better) incorporate lab x-ray data files?
When attempting to convert the ASCII output files of various commercial lab x-ray reflectometers (diffractometers)
it became obvious that the present dictionary misses several entries.
- It is not exactely clear where to put the *brand*, *model* and probably *configuration* information.
``` YAML
experiment:
title: ...
instrument:
type: x-ray lab source (neutron reflectometer, synchrotron diffractometer, ....)
brand: Brucker
model: Discovery
hardware_indicator: 65519
```
- The wavelength is often defiend via the anode material, the line(s) and probably the presence of a monochromator.
- The scan modes might be `steps` or `continous`.
- The slit sizes are reported to enable resolution calculation.
- Often a long list of hardware settings is supplied, e.g. tube current, temperature, configuration, etc.
These things do not really belong to a *reduced data* file, but we shoul at least recommend a place for
these entries. In the example below I put it as a multy-line string in `instrument_settings.details`.
``` YAML
measurement:
instrument_settings:
incident_angle:
min: 0.1
max: 6.0
unit: deg
wavelength:
magnitude: 1.54184
unit: angstrom
anode: Cu
lines:
- name: K_alpha1
magnitude: 1.5405980
weight: 2/3
- name: K_alpha2
magnitude: 1.5444260
weight: 1/3
scan_type: continuous
details: |
"Configuration=Reflection-Transmission Spinner 3.0, Owner=user, Creation date=3/5/2021 8:12:09 AM"
"Goniometer=Theta/Theta; Minimum step size 2Theta:0.0001; Minimum step size Omega:0.0001"
"Sample stage=Reflection-transmission spinner 3.0; Minimum step size Phi:0.1"
```
- Most present day files report the *incident angle*, the *counting time* and probably the *attenuation factor*
as columns. We should define standard keys for the corresponding column descriptions.
``` YAML
- name: alpha_i
unit: deg
physical_quantity: incident_angle
- name: alpha_f
unit: deg
physical_quantity: final_angle
- name: two_theta
unit: deg
physical_quantity: scattering_angle
- name: tme ?
unit: s
physical_quantity: counting_time
- name: att ?
physical_quantity: attenuation_factor
```
- The `.ort` specs clearly separate data origin and data reduction. For lab reflectometers it often the same software for
instrument control and reduction.
- Information about the facility, the owner and the sample is often missing.
## new column type: `flag`
suggestd by Artur, draft by Jochen
``` YAML
# columns:
...
# - flag_is:
# 0: electric field off
# 1: electric field on, positive
# 2: electric field on, negative
```
or
``` YAML
# columns:
...
# - flag_is:
# 0: ignored for fitting
# 1: used for fitting
```

0 comments on commit d617fe0

Please sign in to comment.