From d617fe066b995601c8163f4af2fd5735a25d974d Mon Sep 17 00:00:00 2001 From: Jochen Stahn <57442805+jochenstahn@users.noreply.github.com> Date: Tue, 30 Jan 2024 14:31:43 +0100 Subject: [PATCH] Create specs_discussion.md --- .../file_format/specs_discussion.md | 259 ++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100644 advanced_and_expert_level/file_format/specs_discussion.md diff --git a/advanced_and_expert_level/file_format/specs_discussion.md b/advanced_and_expert_level/file_format/specs_discussion.md new file mode 100644 index 0000000..d76ae98 --- /dev/null +++ b/advanced_and_expert_level/file_format/specs_discussion.md @@ -0,0 +1,259 @@ +--- +layout: page +title: "ORSO - file formats - discussions on specifications" +author: "Jochen Stahn" +--- + + +## Feedback and discussions on the .ort specs + +In the following you find a rather unsorted collection of feedback and ideas about the [`.ort` specs](https://www.reflectometry.org/advanced_and_expert_level/file_format/file_format_specs) and the related [`orsopy`](https://orsopy.readthedocs.io/en/latest) package. + +last modified: 2023-05-05 + +### anonymous 2022 workshop participant + +#### naming confusion *angle_of_incidence* vs. *alpha_i* etc + +For the optional columns of wavelength and angle of incidence, are lambda and alpha_i the expected/standardized names? +Or should they be wavelength and incident_angle, as written in the header? +Or is this discrepancy in the example required because column names must be different from values in the header? + +> (Jochen) See also below. This problem arises from the fact that we try to make it right in the header, and use the *conventional* +> terms in the column description. +> +> The keywords in the header are taken form the *physical quantity* name, e.g. *incidence_angle*, +> while in the (optional) column description there are two possible (and recommended) entries: `name` and `physical_quantity`. +> The `name` is used to create the 1-line header right above the data array and thus a well-established *symbol* is the right choice there. +> And `physical_quantity` is used to avoid all ambiguities. + +If there is no current standard name (understanding that this is optional information), this should be made clear, with a statement/explanation of whether or not a standard name is expected in the future. + +> (Jochen) I agree that we should define a set of key words here and recommend their use. Suggestions can be found below. + +A third important quantity for which there could be a standard name in the future is photon energy (for synchrotron x-ray experiments). + +> (Jochen) I agree. + +#### wrong declaration in specs + +The documentation states Value can be a list, but it cannot. (ComplexValue can be a list though). +This discrepancy should be corrected either by modifying the documentation or the implementation, +otherwise people could attempt to write files with data which cannot be handled by orsopy. + +> (Jochen) Wrong in specs. I'll correct this. + +#### redundant information and priorisation + +Specs are not clear about what should happen if there is a header entry and a column with the same name: + +1. is this invalid (see first point) +2. do they have to be consistent or +3. does the column overwrite any header information? + +E.g., if a column is supplied, would it be required that, if that is also in the header, + +- it is a range with matching min/max? +- That it is a value with matching average? +- Must it be left out of the header entirely? +- Should it be overwritten if a column is found, +- or should there be a pointer to a column, for example an optional keyword ‘column’, where the value or range could then be used by software which cannot support point-by-point calculation (the column data)? + +My vote is for 3 implemented in this last way, for the purpose that the header can still contain some human-readable information useful for experimental reproducibility even if the contents are overwritten by a column. + +> (Jochen) Here we have to diferentiate between the data format rules and recommendations for software using this format. +> +> The format allows for redundant and even for contradicting information. It is in the responsibility of the +> programmer to write out a physically consistend data file. +> +> On the other hand we should give some recommendations like the ones mentioned above. +> +> Personally I also prefer option 3, but without any further restrictions for the header. +> +> We discussed the pointer from header to column entries in an early stage and it was dropped at some point. +> With a priorisation *column over header* this is clear to the software. +> +> ``` YAML +> # measurement: +> # instrument_settings: +> # incident_angle: +> # min: +> # max: +> # details_at_column: alpha_i +> # unit: deg +> ``` +> +> alternatively to `details_at_column` one can use the already existing `comment` to create a human-readabla link. + +**item 1**: What is the ORSO recommendation for using redundand information? + +#### Future feature request: + +Ability to have an error defined for a quantity in the header, either implemented similar to how quantities are allowed to have a range, or similar to how columns are allowed to be an error of another column” + +> The following syntax is now implemented in orsopy: +> +> ``` YAML +> # measurement: +> # instrument_settings: +> # incident_angle: +> # magnitude: 2.1 +> # unit: deg +> # error: +> # magnitude: 0.01 +> # error_type: resolution +> # distribution: gaussian +> # value_is: sigma +> ``` + + +### confusion of physical terms + +(Jochen) + +We have a confusion of what we use as key words. Since the german terms are different I had probplems figuring out the correct English definitions... + +#### official definitions + +What can be measured or calculated is a **physical quantity**. + +> E.g. the *incident angle* + +This has a **dimension** = dim(*physical quantity*) relating it to a set of base quantities like *length*, *time*, *charge*, *temperature* etc. The *dimension* is no unit, nor can it be used to unambigiously describe a *physical quantity* (*plane angle* does not tell between *scattering angle*, *incident angle*, *total reflection angle*, ...). + +> dim( *incident angle* ) = *plane angle* + +The *physical quantity* is often refered to by using a **symbol**. + +> one possible symbol for *incident angle* is $\alpha_i$ (or *alpha_i* in the orso header) + +The *physical quantity* is composed of a **numerical magnitude** times **unit**. Depending on the chosen *unit*, the *numerical magnitude* changes. + +> $\alpha_i = 2.3 \cdot \mathrm{deg}$ + +#### what we do wrong or inconsistent + +- For the column *name* we use the *symbol* (R, Qz, alpha_i, ...) rather than the *physical quantity*. But in the header above we use the latter as key words. Thus if the analysis software searches for example for information about the *incident angle*, it has to look in various places (this is intended) for different keys. A solution might be that the software searches for standardised `physical_quantity` entries in the column description which match the keys in the header. + + +## stitched data + +- Where do we store e.g. the angles for stitched tof measurements? These are **no longer used for processing**, but may help future planning. +- x-ray data obtained with different attenuator settings. + +> In case this information is not provided in one of the optional columns or in the individual headers of multiple data sets, +> it can not be used by the analysis software. Good choices for this information might be extra entries e.g. in the `incident_angle` section: +> +> ``` YAML +> incident_angle: +> min: 1.0 +> max: 5.8 +> individual_magnitudes: [1.0, 2.7, 5.8] +> unit: deg +> ``` + +**item 7**: Do we introduce `individual_magnitudes` as a new key within the class `ValueRange`? + + +## guidelines for writing and reading + +- hirarchy for looking up information (e.g. column beats header content) +- avoid contradicting information (e.g. single incident angle in the header for angle-disperse measurement) + +## open issues for lab x-ray reflectometers + +**item 8**: Which of the keys discussed below should be included in the specs to (better) incorporate lab x-ray data files? + +When attempting to convert the ASCII output files of various commercial lab x-ray reflectometers (diffractometers) +it became obvious that the present dictionary misses several entries. + +- It is not exactely clear where to put the *brand*, *model* and probably *configuration* information. + + ``` YAML + experiment: + title: ... + instrument: + type: x-ray lab source (neutron reflectometer, synchrotron diffractometer, ....) + brand: Brucker + model: Discovery + hardware_indicator: 65519 + ``` + +- The wavelength is often defiend via the anode material, the line(s) and probably the presence of a monochromator. +- The scan modes might be `steps` or `continous`. +- The slit sizes are reported to enable resolution calculation. +- Often a long list of hardware settings is supplied, e.g. tube current, temperature, configuration, etc. + These things do not really belong to a *reduced data* file, but we shoul at least recommend a place for + these entries. In the example below I put it as a multy-line string in `instrument_settings.details`. + +``` YAML + measurement: + instrument_settings: + incident_angle: + min: 0.1 + max: 6.0 + unit: deg + wavelength: + magnitude: 1.54184 + unit: angstrom + anode: Cu + lines: + - name: K_alpha1 + magnitude: 1.5405980 + weight: 2/3 + - name: K_alpha2 + magnitude: 1.5444260 + weight: 1/3 + scan_type: continuous + details: | + "Configuration=Reflection-Transmission Spinner 3.0, Owner=user, Creation date=3/5/2021 8:12:09 AM" + "Goniometer=Theta/Theta; Minimum step size 2Theta:0.0001; Minimum step size Omega:0.0001" + "Sample stage=Reflection-transmission spinner 3.0; Minimum step size Phi:0.1" +``` + +- Most present day files report the *incident angle*, the *counting time* and probably the *attenuation factor* + as columns. We should define standard keys for the corresponding column descriptions. + + ``` YAML + - name: alpha_i + unit: deg + physical_quantity: incident_angle + - name: alpha_f + unit: deg + physical_quantity: final_angle + - name: two_theta + unit: deg + physical_quantity: scattering_angle + - name: tme ? + unit: s + physical_quantity: counting_time + - name: att ? + physical_quantity: attenuation_factor + ``` + +- The `.ort` specs clearly separate data origin and data reduction. For lab reflectometers it often the same software for + instrument control and reduction. +- Information about the facility, the owner and the sample is often missing. + +## new column type: `flag` + +suggestd by Artur, draft by Jochen + +``` YAML +# columns: +... +# - flag_is: +# 0: electric field off +# 1: electric field on, positive +# 2: electric field on, negative +``` + +or + +``` YAML +# columns: +... +# - flag_is: +# 0: ignored for fitting +# 1: used for fitting +```