Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define what we mean by array range #716

Open
2 tasks
graeme-winter opened this issue Mar 26, 2024 · 11 comments
Open
2 tasks

Define what we mean by array range #716

graeme-winter opened this issue Mar 26, 2024 · 11 comments

Comments

@graeme-winter
Copy link
Collaborator

graeme-winter commented Mar 26, 2024

Even if we want to replace this with z_range (#719) we need to decide what this actually means

Options:

  • zero is the start of the images
  • zero is the logical beginning of the scan i.e. if we collect images 10001-20000
@graeme-winter graeme-winter changed the title Define what we need by array range Define what we mean by array range Mar 26, 2024
@graeme-winter
Copy link
Collaborator Author

@jbeilstenedmands suggests consider three cases:

  • the first image is numbered 0
  • the first image is numbered 1
  • the first image is numbered 100109101 (&c.)

What should the internal data structure representation of this be?

@graeme-winter
Copy link
Collaborator Author

@benjaminhwilliams asserts

We have a contiguous 3D array of data fundamentally: for HDF5 this is a trivial mapping. For cases where the data array is stored as a sequence of images then we need to store a map which allows the software to fetch the correct "image" for a given slice.

@graeme-winter
Copy link
Collaborator Author

@ndevenish points out that what if we take a subset of a data set?

dials.import image=foo.nxs:901:1000

Which is counting in people numbers: we are importing 100 images from 900...999 inclusive in array index terms here. This highlights some confusion already.

dials.import template=foo_####.cbf image_range=901,1000

... would open images 901 to 1000

What if one of the images is called 0?

@graeme-winter
Copy link
Collaborator Author

graeme-winter commented Mar 26, 2024

All interfaces internally use C counting from zero. We have an internal offset (yet to be named) to take us to image numbers and functions to map from an array index to an image number or slice offset in a 3D data file. Then the user interface presents image_range in people numbers (which could start from zero) and array_range which counts in C: the internal offset is derived from the lower of these values.

We propose we do this. We welcome reactions.

@graeme-winter
Copy link
Collaborator Author

array_range is exactly zero indexed, image_range is exactly one-indexed i.e. people numbers. Most of this is a string formatting problem. Internally the array will always be exactly zero indexed with the first image that the user has imported will always be indexed zero, with some internal offset defined to allow the correct file to be loaded.

@graeme-winter
Copy link
Collaborator Author

If we truncate the scan then the scan needs to have a subset of the imported image sequence which is strictly contained within the imported set.

@graeme-winter
Copy link
Collaborator Author

Shoeboxes are indices into the imported array, z values for centroids are on the imported array index baseline. The mapping to the actual image where the spot came from is handled by the image set / image sequence.

@graeme-winter
Copy link
Collaborator Author

DO NOT internally store tuple for the array range: the first value is canonically zero, so do not store it, and thus internally present only a way to get the value above the upper bound.

Then use explicit API calls to get the people names for e.g. reporting, fetching data from a CBF file.

@graeme-winter
Copy link
Collaborator Author

array_index_to_file_offset(), file_offset_to_array_index() as the API?

get_raw_data() needs to take as input as the array index and internally call the above functions.

@graeme-winter
Copy link
Collaborator Author

Looking at scan.h I note

    int num_images_;

=> we have the upper bound on the array size

@graeme-winter
Copy link
Collaborator Author

In the same file I note without further comment

     * @param file_offset A offset to add to the image number (for tracking of
     *                     unique batch numbers for multi-crystal datasets)

Should this be in the scan object? @jbeilstenedmands I suspect holds an opinion here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant