Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case 1 : Max number of dimensions question #1

Open
brianthomas opened this issue Oct 10, 2014 · 10 comments
Open

Use Case 1 : Max number of dimensions question #1

brianthomas opened this issue Oct 10, 2014 · 10 comments
Assignees

Comments

@brianthomas
Copy link
Member

We need to determine a maximum, if any, in the number of allowed dimensions in a data cube.

@brianthomas brianthomas self-assigned this Oct 10, 2014
@brianthomas brianthomas added this to the Collect Usecases milestone Oct 16, 2014
@brianthomas brianthomas changed the title Use case 1 max number of dimensions Use case 1 : Max number of dimensions question Oct 16, 2014
@brianthomas brianthomas changed the title Use case 1 : Max number of dimensions question Use Case 1 : Max number of dimensions question Oct 16, 2014
@migueldvb
Copy link
Member

I think that the maximum number of dimensions in HDF5 is 32, defined by H5S_MAX_RANK (edited this number in the use case).

@brianthomas
Copy link
Member Author

"640K ought to be enough for anybody." -bill gates

I supply the quote to wonder if building limits into the data format is wise. What appears to be a good limit today may look inadequate in the future.

@timj
Copy link
Contributor

timj commented Oct 16, 2014

A fixed number simplifies some coding in C et al and having it as a compile time parameter makes it easy to change. Are there any datasets in astronomy that even come close to 32 though?

@brianthomas
Copy link
Member Author

Radio guys are probably the ones pushing this more than anyone else (the number of dimensions needed)

@juandesant
Copy link

For the SKA data products the number of elements per dimension will be very
large, but the number of dimensions will be typically 2 spatial axes (in
the order of 100Mpixel), 1 polarization (4 values), and a frequency axis
(with up to 256k channels). An optional RFI axis could be added, where
elements in one plane are the actual measure, and the other would be an RFI
axes, and other such maps, but I cannot envision nothing beyond 32 for a
data product. A velocity axis is typically a second representation of the
frequency axis, as well as a wavelength one.

For raw data I can imagine more dimensions, including baseline, but I find
it difficult to go beyond 32 dimensions.

On Thu, Oct 16, 2014 at 5:19 PM, Brian Thomas [email protected]
wrote:

Radio guys are probably the ones pushing this more than anyone else.

Reply to this email directly or view it on GitHub
#1 (comment)
.

Juande Santander-Vela
System Engineer (Science Data Processor/Telescope Manager)
Square Kilometre Array/SKA Organisation
Jodrell Bank Observatory, Lower Withington
Macclesfield SK11 9DL, United Kingdom

@migueldvb
Copy link
Member

It looks like the maximum number of dimensions is defined in the header file H5Spublic.h in HDF5. The standard library limits dataspace objects to a maximum rank of 32 but it should be possible to change this up to the maximum value on the system and recompile the library if necessary. I think this is a good approach and I agree that it is unlikely that a larger value is needed.

@embray
Copy link

embray commented Oct 17, 2014

I believe it would be foolish to bake in any absolute upper limit, though it might make sense to define a minimum number of dimensions that software readers must be able to support somehow. Even for very large numbers of dimensions readers should probably at least be able to return slices along a subset of those dimensions--after all it's still just bytes.

I believe Numpy has a baked in limit of 256 axes for ndarrays, but that can be changed at compile-time if needed. So data with more than 256 dimensions may not be readable into a typical Numpy array and software should be able to detect that.

I guess what I'm trying to say is, I don't feel like all data needs to be readable by all readers (at least in extreme cases) as long as it's clear where the limitations are, and that it's at least possible to find a way to read the data in those files in the preferred format.

@brianthomas
Copy link
Member Author

Yes, I agree with Erik; this is what I was alluding to earlier. Further, I'd expect that various data models probably have different limits. Not sure what the minimum for all images in the format might be, but its at least 3 axes.

@juandesant
Copy link

OK, so we should characterize some typical dimensions, spatial, polarization, frequency/wavelength, time, intensity which can be orthogonal, and perhaps hope for the referees to show a few more, and then use the typical dimension limits in modern formats to show that there is a lot of legroom, and that more can be achieved.

@brianthomas
Copy link
Member Author

@juandesant I think that would be a good starting point for justification of any derived requirement(s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants