Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup and restore apps may fail when moving between formats #287

Open
JSKenyon opened this issue Aug 1, 2023 · 6 comments
Open

Backup and restore apps may fail when moving between formats #287

JSKenyon opened this issue Aug 1, 2023 · 6 comments

Comments

@JSKenyon
Copy link
Collaborator

JSKenyon commented Aug 1, 2023

Describe the bug
See title. Flags backed up from a zarr ms cannot be restored to a conventional ms - they end up as appended rows.

To Reproduce
Backup flags from a zarr ms and attempt to restore them to a conventional ms.

Expected behavior
The above should work as expected (provided ordering and selection are consistent).

Version
main

@landmanbester
Copy link
Collaborator

I was just bitten by this again. It seems that it's not just the column that get's restored that is affected, all the columns seem to have additional rows in the final scan. I guess ROWID gets appended to. Any ideas on how to recover from this? cc @sjperkins

@landmanbester
Copy link
Collaborator

Trying to access data in the affected scan with

In [99]: xds[-1].DATA.values.shape

results in a long error culminating in

RuntimeError: IPosition::getFirst(n); n is too high

@sjperkins
Copy link
Member

WIthout ROWID present, dask-ms will append https://dask-ms.readthedocs.io/en/latest/tutorial/writes.html#updating-appending-rows.

Its not immediately obvious to me what the correct thing to do here is. ROWID is needed to map the zarr format back to the original ms, but it is discarded during conversion to zarr. Possibly we could retain the ROWID column when writing to zarr.

@landmanbester
Copy link
Collaborator

Yeah just keeping ROWID makes the most sense to me. I have nuked the old MS, splitting again. We should probably prioritize this as it can corrupt data

@JSKenyon
Copy link
Collaborator Author

Ah, yeah. I had forgotten about this one. This may be another point to add to the desired features list on dask-ms. If we retained ROWID on the zarr datasets, we would have substantially more power in xarray-land too, as it would be possible to resolve selections i.e. read from zarr, manipulate with xarray and write back to ms. It would be expensive in the worst-case scenario, but probably worth it overall.

@JSKenyon
Copy link
Collaborator Author

Any thoughts on how difficult this would be to change @sjperkins?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants