Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map is broken, includeing missing collect_similar for DiskGenerator #144

Open
felixcremer opened this issue Feb 1, 2024 · 3 comments
Open

Comments

@felixcremer
Copy link
Contributor

The DiskGenerator is missing an implementation of collect_similar which makes the filter function not work properly. See yeesian/ArchGDAL.jl#409 for the details.

@rafaqz
Copy link
Collaborator

rafaqz commented Feb 1, 2024

map is actually broken in a lot of ways, not just collect_similar

I'm keen to try my very old idea of caching a whole column of chunks for iterate instead of a single chunk. Then iteration can follow the normal order and all of these problems go away. We can just delete DiskGenerator and DiskZip completely.

(As in your full band iteration - it works fine if we have the whole column)

@rafaqz rafaqz changed the title collect_similar for DiskGenerator map is broken, includeing missing collect_similar for DiskGenerator Feb 1, 2024
@meggart
Copy link
Owner

meggart commented Feb 1, 2024

I'm keen to try my very old idea of caching a whole column of chunks for iterate instead of a single chunk. Then iteration can follow the normal order and all of these problems go away. We can just delete DiskGenerator and DiskZip completely.

Won't this very easily lead to Out-Of-Memory errors for large multi-dimensional arrays where you just can not keep all chunks along the first dimension in memory? Very easy to construct examples where this fails.

Another option might just be to completely deprecate map, broadcast and reduce from DiskArrays and refer to DiskArrayEngine. Alternatively we could of course try to come up with more principled implementations, but I don't know if it would be worth the effort...

@rafaqz
Copy link
Collaborator

rafaqz commented Feb 1, 2024

Does map work in DiskArrayEngine ?

It just seems to me that we are silently returning the wrong answer for map currently, which is worse than OOM errors. I am really keen to not silently return the wrong result ever.

We could always fall back to loading partial chunks if the whole column is too large for memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants