Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entry_stop not working with concatenation? #1387

Open
acampove opened this issue Feb 23, 2025 · 2 comments · May be fixed by #1396
Open

entry_stop not working with concatenation? #1387

acampove opened this issue Feb 23, 2025 · 2 comments · May be fixed by #1396
Assignees
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@acampove
Copy link

I am using version 5.5.2. The reproducer is:

import uproot
import numpy as np

def _make_file(fname : str):
    n_entries = 1000
    branch1_data = np.random.rand(n_entries)
    branch2_data = np.random.rand(n_entries)

    with uproot.recreate(fname) as f:
        f["tree"] = {
            "a": branch1_data,
            "b": branch2_data   
        }

def main():
    _make_file('file_1.root')
    _make_file('file_2.root')

    df = uproot.concatenate({'file_1.root': 'tree', 'file_2.root' : 'tree'}, library='pd', entry_stop=100)

    print(len(df))

if __name__ == "__main__":
    main()

I expect the dataframe to have 200 entries but I see 2000. Am I doing anything stupid here?

@acampove acampove added the bug (unverified) The problem described would be a bug, but needs to be triaged label Feb 23, 2025
@pfackeldey pfackeldey self-assigned this Feb 27, 2025
@pfackeldey
Copy link
Collaborator

Hi @acampove,
entry_start and entry_stop are not supported for uproot.concatenate. The question here is what should these entry ranges correspond to?

  • To the entry_start and entry_stop of each input file? Then we need to have a different type of argument that recognizes these per tree.
  • To the entry_start and entry_stop of the concatenated file? That still requires us to concatenate the original trees and then slice them down. If you want this behavior you can always slice with df[:200].

I agree though that uproot.concatenate should complain if it gets an unrecognized argument (the drawbacks of **kwargs...).

Let me know what you think about this, maybe it could be helpful if you can describe more your specific application?

@acampove
Copy link
Author

Hi @pfackeldey

Thanks for your reply. The entry_stop and entry_start should correspond to the full, concatenated tree. That is intuitive and I would do that.

I need to run a quick local test involving code using this functionality. What you need is a line of code saying:

for key in kwargs:
     if key not in self.implemented_args:
         raise NotImplementedError(f'Argument {key} not implemented')

and I would put that in a function, sot that you only call one line.

Is there any way to achieve what entry_stop would? If not, maybe it's a good idea to implement it

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants