Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for RNTuple RC2 #1115

Merged
merged 13 commits into from
Mar 22, 2024
Merged

feat: add support for RNTuple RC2 #1115

merged 13 commits into from
Mar 22, 2024

Conversation

ariostas
Copy link
Collaborator

@ariostas ariostas commented Feb 1, 2024

This is still work in progress. I'll update this comment to when there are new changes.

Current status:

  • Seems like everything is read correctly (in a very simple test), but the column logic is broken.
  • The writing part hasn't been updated.

@ariostas
Copy link
Collaborator Author

ariostas commented Mar 6, 2024

All RNTuple tests are passing now (although they require scikit-hep/scikit-hep-testdata#144), so I think this is pretty close to being done. One thing I want to confirm is whether we want to make xxhash a dependency or not. I was working with the assumption that it should be a dependency, but in a brief discussion with @jpivarski it seemed like maybe it was preferable to have it only as a writing dependency. I'll fix it if that's the case.

I haven't confirmed that all the fancy features of RNTuple are working, but I think the basic functionality is good enough for now and I can look into further fixes and improvements in other PRs.

@ariostas
Copy link
Collaborator Author

ariostas commented Mar 7, 2024

I'm gonna reply to scikit-hep/scikit-hep-testdata#144 (comment) here since it was probably meant to be posted here.

Yeah, that's a good point. xxhash is not currently in cramjam and I'm not sure if they would be open to include it since a "good" implementation should be very fast, given that that's its main selling point. It could be that they're willing to implement a slow version for compatibility reasons. There's other alternatives like ppxxh that implement xxhash in pure Python, so they're compatible with Pyodide, but I'm not sure if they're actively maintained. For now, I'll remove the dependency and we can discuss it further later on.

@Moelf
Copy link
Collaborator

Moelf commented Mar 7, 2024

I can confirm ppxxh works in the sense that I implemented a native XXhash (https://github.com/Moelf/XXHashNative.jl) based on spec and used ppxxh for debugging.

As for the speed, hash is hard to get fast... as you can see my naive Julia implementation only has 33% of the speed compare to wrapper around the reference C implementation. In fact the thing I wish existed when I implemented it was a Numba or Jax implementation... because all the class-OOP implementations are not straightforward to learn from

@ariostas ariostas marked this pull request as ready for review March 21, 2024 15:18
@ariostas
Copy link
Collaborator Author

I marked this as ready for review, but it would probably be better to merge and release scikit-hep/scikit-hep-testdata#144 before merging this PR so that we don't have a bunch of failing tests.

@jpivarski
Copy link
Member

scikit-hep/scikit-hep-testdata#144 is included in scikit-hep-testdata 0.4.41, so I'm going to trigger these tests to run again. Hopefully, they should pass with the new RNTuple files.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! I see that extensive changes were needed, but they're all using the chunk/cursor/struct tools appropriately. I see that you even updated some comments, generated from cursor.debug, to serve as a guide. (Thank you! Outdated comments are the worst.)

I see that the test is picking up scikit-hep-testdata-0.4.39. That's weird; why not 0.4.41? I'm going to try running the tests locally, to be sure they work with the new files.

@ariostas
Copy link
Collaborator Author

@jpivarski the upload of the test data package to PyPI from the CI is failing. It seems like it's been broken for a couple of weeks because v0.4.40 also failed to be deployed to the PyPI. Could it have anything to do with the time when there was a bug with GHA secrets and you did some cleanup for a bunch of repos?

https://github.com/scikit-hep/scikit-hep-testdata/actions/runs/8379922201/job/22947949026

@ariostas
Copy link
Collaborator Author

Also, I'm really confused how the test passed when it downloaded 0.4.39. Does it download the files from somewhere else?

@jpivarski
Copy link
Member

Could it have anything to do with the time when there was a bug with GHA secrets and you did some cleanup for a bunch of repos?

That's what it was. scikit-hep-testdata 0.4.42 is out now. However, the tests are and were failing for me, when I've been running them locally. I don't understand why they passed in CI.

@ariostas
Copy link
Collaborator Author

I had a bunch of issues when using scikit-hep-testdata locally. Even if I installed the new version it kept using the old files. I had to manually go where it keeps the files and put the new ones in there. I think it must be downloading them fresh instead of using what's packaged in the wheel for some reason.

@jpivarski
Copy link
Member

That's right! The files are in ~/.local/skhepdata, and this is a weakness for scikit-hep-testdata, at least for any files that are ever changed in place. Normally, we only add new ones, but for RNTuple, it makes sense to replace them. (Maybe. Maybe it would have been better to give them new names. Oh well! Too late now!)

I'm going to flush my local cache and try to test again locally.

@jpivarski
Copy link
Member

That did it, for my tests locally.

It may be that this test workflow has a cache of its own? No, I don't see it in the workflow YAMLs.

@jpivarski
Copy link
Member

With the new RNTuple files (and not with the old RNTuple files), all of the tests pass locally for me. The CI might have been incorrect when it passed before, but a careful check says it's okay and the CI is giving a green light, so it's okay to merge it now.

Go ahead and do that—I'm asking you to, in order to manage the concurrency problem of the fact that you might still be trying things on this branch.

@ariostas ariostas merged commit e29932f into main Mar 22, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants