Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninitialized variable length sequences are returned as scalars instead of empty arrays #321

Open
mattjala opened this issue Feb 29, 2024 · 2 comments
Assignees
Labels

Comments

@mattjala
Copy link
Contributor

mattjala commented Feb 29, 2024

When reading from elements in a dataset of variable-length type, uninitialized elements are returned as scalars. With the library API, uninitialized vlen types are considered to be length-zero arrays.

Test program in C to generate an erroneous response from HSDS using the REST VOL:

int main() {
  #define COUNT 10
  hid_t file_id = H5I_INVALID_HID;
  hid_t dset_id = H5I_INVALID_HID;
  hvl_t rbuf[COUNT];
  const hsize_t dims[] = {COUNT};

  hid_t vlen_id = H5Tvlen_create(H5T_NATIVE_INT);

  file_id = H5Fcreate("/home/test_user1/tfile.c", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
  hid_t space_id = H5Screate_simple(1, dims, NULL);
  
  dset_id = H5Dcreate2(file_id, "dset_vlen", vlen_id, space_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
  H5Dread(dset_id, vlen_id, space_id, H5S_ALL, H5P_DEFAULT, (void*) rbuf);
  
  H5Treclaim(vlen_id, space_id, H5P_DEFAULT, (void*) rbuf);
}

Server response:

{..."value": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], ...}

Expected response:

{..."value": [[], [], [], [], [], [], [], [], [], []], ...}
@jreadey
Copy link
Member

jreadey commented Mar 4, 2024

I think this is working as designed from the HSDS point of view..
Take a look at the h5pyd test here: https://github.com/HDFGroup/h5pyd/blob/master/test/hl/test_vlentype.py#L151.
The bytes returned from HSDS in this case (dset is {[], []}) are:

b'\x00\x00\x00\x00\x00\x00\x00\x00'

i.e. 2 elements returned. The 1st element is a zero-length array, the 2nd element also.
By contrast, if the vlen dataset had: {[0,]j, [0,]} the bytes returned would be:

b'\x02\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00'

@mattjala
Copy link
Contributor Author

mattjala commented Mar 4, 2024

Looks like it's correct when HSDS returns the variable lengths types in binary instead of JSON. Changing the VOL to use binary instead of JSON should be faster for this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants