Create specs_discussion.md

reflectivity · Jan 30, 2024 · d617fe0 · d617fe0
1 parent b83a72e
commit d617fe0
Showing 1 changed file with 259 additions and 0 deletions.
diff --git a/advanced_and_expert_level/file_format/specs_discussion.md b/advanced_and_expert_level/file_format/specs_discussion.md
@@ -0,0 +1,259 @@
+---
+layout: page  
+title: "ORSO - file formats - discussions on specifications"  
+author: "Jochen Stahn"  
+---
+
+
+## Feedback and discussions on the .ort specs
+
+In the following you find a rather unsorted collection of feedback and ideas about the [`.ort` specs](https://www.reflectometry.org/advanced_and_expert_level/file_format/file_format_specs) and the related [`orsopy`](https://orsopy.readthedocs.io/en/latest) package.
+
+last modified: 2023-05-05
+
+### anonymous 2022 workshop participant
+
+#### naming confusion *angle_of_incidence* vs. *alpha_i* etc
+
+For the optional columns of wavelength and angle of incidence, are lambda and alpha_i the expected/standardized names? 
+Or should they be wavelength and incident_angle, as written in the header? 
+Or is this discrepancy in the example required because column names must be different from values in the header? 
+
+> (Jochen) See also below. This problem arises from the fact that we try to make it right in the header, and use the *conventional*
+> terms in the column description. 
+> 
+> The keywords in the header are taken form the *physical quantity* name, e.g. *incidence_angle*, 
+> while in the (optional) column description there are two possible (and recommended) entries: `name` and `physical_quantity`. 
+> The `name` is used to create the 1-line header right above the data array and thus a well-established *symbol* is the right choice there.
+> And `physical_quantity` is used to avoid all ambiguities.
+
+If there is no current standard name (understanding that this is optional information), this should be made clear, with a statement/explanation of whether or not a standard name is expected in the future. 
+
+> (Jochen) I agree that we should define a set of key words here and recommend their use. Suggestions can be found below.
+
+A third important quantity for which there could be a standard name in the future is photon energy (for synchrotron x-ray experiments). 
+
+> (Jochen) I agree.
+
+#### wrong declaration in specs
+
+The documentation states Value can be a list, but it cannot. (ComplexValue can be a list though). 
+This discrepancy should be corrected either by modifying the documentation or the implementation, 
+otherwise people could attempt to write files with data which cannot be handled by orsopy. 
+
+> (Jochen) Wrong in specs. I'll correct this.
+
+#### redundant information and priorisation
+
+Specs are not clear about what should happen if there is a header entry and a column with the same name: 
+
+1. is this invalid (see first point) 
+2. do they have to be consistent or 
+3. does the column overwrite any header information? 
+
+E.g., if a column is supplied, would it be required that, if that is also in the header, 
+
+- it is a range with matching min/max? 
+- That it is a value with matching average? 
+- Must it be left out of the header entirely? 
+- Should it be overwritten if a column is found, 
+- or should there be a pointer to a column, for example an optional keyword ‘column’, where the value or range could then be used by software which cannot support point-by-point calculation (the column data)? 
+
+My vote is for 3 implemented in this last way, for the purpose that the header can still contain some human-readable information useful for experimental reproducibility even if the contents are overwritten by a column. 
+
+> (Jochen) Here we have to diferentiate between the data format rules and recommendations for software using this format. 
+> 
+> The format allows for redundant and even for contradicting information. It is in the responsibility of the 
+> programmer to write out a physically consistend data file. 
+> 
+> On the other hand we should give some recommendations like the ones mentioned above. 
+> 
+> Personally I also prefer option 3, but without any further restrictions for the header. 
+> 
+> We discussed the pointer from header to column entries in an early stage and it was dropped at some point. 
+> With a priorisation *column over header* this is clear to the software.
+>
+> ``` YAML
+> #    measurement: 
+> #         instrument_settings:  
+> #             incident_angle:  
+> #                 min:   <value>
+> #                 max:   <value>
+> #                 details_at_column: alpha_i
+> #                 unit:  deg    
+> ```
+> 
+> alternatively to `details_at_column` one can use the already existing `comment` to create a human-readabla link.
+
+**item 1**: What is the ORSO recommendation for using redundand information? 
+ 
+#### Future feature request:
+
+Ability to have an error defined for a quantity in the header, either implemented similar to how quantities are allowed to have a range, or similar to how columns are allowed to be an error of another column”        
+
+> The following syntax is now implemented in orsopy:
+>
+> ``` YAML
+> #    measurement: 
+> #         instrument_settings:  
+> #             incident_angle:  
+> #                 magnitude:        2.1
+> #                 unit:             deg    
+> #                 error:
+> #                     magnitude:    0.01
+> #                     error_type:   resolution
+> #                     distribution: gaussian
+> #                     value_is:     sigma
+> ```
+
+
+### confusion of physical terms 
+
+(Jochen)
+
+We have a confusion of what we use as key words. Since the german terms are different I had probplems figuring out the correct English definitions...
+
+#### official definitions
+
+What can be measured or calculated is a **physical quantity**.
+
+> E.g. the *incident angle*
+
+This has a **dimension** = dim(*physical quantity*) relating it to a set of base quantities like *length*, *time*, *charge*, *temperature* etc. The *dimension* is no unit, nor can it be used to unambigiously describe a *physical quantity* (*plane angle* does not tell between *scattering angle*, *incident angle*, *total reflection angle*, ...). 
+
+> dim( *incident angle* ) = *plane angle*
+
+The *physical quantity* is often refered to by using a **symbol**.
+
+> one possible symbol for *incident angle* is $\alpha_i$ (or *alpha_i* in the orso header)
+
+The *physical quantity* is composed of a **numerical magnitude** times **unit**. Depending on the chosen *unit*, the *numerical magnitude* changes.
+
+> $\alpha_i = 2.3 \cdot \mathrm{deg}$
+
+#### what we do wrong or inconsistent
+ 
+- For the column *name* we use the *symbol* (R, Qz, alpha_i, ...) rather than the *physical quantity*. But in the header above we use the latter as key words. Thus if the analysis software searches for example for information about the *incident angle*, it has to look in various places (this is intended) for different keys. A solution might be that the software searches for standardised `physical_quantity` entries in the column description which match the keys in the header. 
+
+
+## stitched data
+
+- Where do we store e.g. the angles for stitched tof measurements? These are **no longer used for processing**, but may help future planning.
+- x-ray data obtained with different attenuator settings.
+
+> In case this information is not provided in one of the optional columns or in the individual headers of multiple data sets,
+> it can not be used by the analysis software. Good choices for this information might be extra entries e.g. in the `incident_angle` section:
+>
+> ``` YAML
+>         incident_angle:
+>              min: 1.0
+>              max: 5.8
+>              individual_magnitudes: [1.0, 2.7, 5.8]
+>              unit: deg
+> ```
+
+**item 7**: Do we introduce `individual_magnitudes` as a new key within the class `ValueRange`? 
+
+
+## guidelines for writing and reading
+
+- hirarchy for looking up information (e.g. column beats header content)
+- avoid contradicting information (e.g. single incident angle in the header for angle-disperse measurement)
+
+## open issues for lab x-ray reflectometers
+ 
+**item 8**: Which of the keys discussed below should be included in the specs to (better) incorporate lab x-ray data files?
+
+When attempting to convert the ASCII output files of various commercial lab x-ray reflectometers (diffractometers) 
+it became obvious that the present dictionary misses several entries.
+ 
+- It is not exactely clear where to put the *brand*, *model* and probably *configuration* information.
+
+  ``` YAML
+     experiment:
+         title: ...
+         instrument:
+             type: x-ray lab source        (neutron reflectometer, synchrotron diffractometer, ....)
+             brand: Brucker
+             model: Discovery
+             hardware_indicator: 65519
+  ```
+ 
+- The wavelength is often defiend via the anode material, the line(s) and probably the presence of a monochromator.
+- The scan modes might be `steps` or `continous`.
+- The slit sizes are reported to enable resolution calculation.
+- Often a long list of hardware settings is supplied, e.g. tube current, temperature, configuration, etc. 
+  These things do not really belong to a *reduced data* file, but we shoul at least recommend a place for
+  these entries. In the example below I put it as a multy-line string in `instrument_settings.details`.
+ 
+``` YAML 
+     measurement: 
+         instrument_settings:  
+             incident_angle:           
+                 min:          0.1
+                 max:          6.0
+                 unit:         deg
+             wavelength:               
+                 magnitude:    1.54184
+                 unit:         angstrom
+                 anode:        Cu 
+                 lines:
+                    - name:    K_alpha1
+                      magnitude:  1.5405980
+                      weight:  2/3
+                    - name:    K_alpha2
+                      magnitude:  1.5444260
+                      weight:  1/3
+             scan_type:        continuous
+             details: |
+                 "Configuration=Reflection-Transmission Spinner 3.0, Owner=user, Creation date=3/5/2021 8:12:09 AM"
+                 "Goniometer=Theta/Theta; Minimum step size 2Theta:0.0001; Minimum step size Omega:0.0001"
+                 "Sample stage=Reflection-transmission spinner 3.0; Minimum step size Phi:0.1"
+```
+ 
+- Most present day files report the *incident angle*, the *counting time* and probably the *attenuation factor* 
+  as columns. We should define standard keys for the corresponding column descriptions.
+ 
+  ``` YAML
+      - name: alpha_i
+        unit: deg
+        physical_quantity: incident_angle
+      - name: alpha_f
+        unit: deg
+        physical_quantity: final_angle
+      - name: two_theta
+        unit: deg
+        physical_quantity: scattering_angle
+      - name: tme ?
+        unit: s
+        physical_quantity: counting_time
+      - name: att ?
+        physical_quantity: attenuation_factor
+  ```
+ 
+- The `.ort` specs clearly separate data origin and data reduction. For lab reflectometers it often the same software for 
+  instrument control and reduction.
+- Information about the facility, the owner and the sample is often missing.
+ 
+## new column type: `flag`
+ 
+suggestd by Artur, draft by Jochen
+ 
+``` YAML
+# columns:
+...
+#    - flag_is:
+#      0: electric field off
+#      1: electric field on, positive
+#      2: electric field on, negative
+```
+
+or 
+ 
+``` YAML
+# columns:
+...
+#    - flag_is:
+#      0: ignored for fitting
+#      1: used for fitting
+```