Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use partial rather than custom partial classes #469

Open
agoose77 opened this issue Feb 12, 2024 · 6 comments
Open

Use partial rather than custom partial classes #469

agoose77 opened this issue Feb 12, 2024 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@agoose77
Copy link
Collaborator

Looking at the internals of dask, it seems that features such as tokenize have fast-path logic for partial. If instead dask encounters class instances, it has to invoke a serialiser. We should probably just move to partial (which will also reduce the sloc).

@agoose77 agoose77 added enhancement New feature or request good first issue Good for newcomers labels Feb 12, 2024
@martindurant
Copy link
Collaborator

You mean specifically the class itertools.partial? Which function classes are you thinking about in particular, like FromParquetFn ?

@agoose77
Copy link
Collaborator Author

@martindurant yes, much of dask-awkward's curried calls to ak.XXX operations are class-based, e.g. the structure module.

@martindurant
Copy link
Collaborator

Microbenchmark

from functools import partial

kwargs = {"arg": 1}


class Callme:
  def __init__(self, fn, kwargs):
    self.fn = fn
    self.kwargs = kwargs

  def __call__(self, *args, **kwargs):
    return self.fn(*args, **self.kwargs, **kwargs)


one = Callme(sum, kwargs)
two = partial(sum, **kwargs)
In [15]: %timeit dask.base.tokenize(callme.one)
1.66 µs ± 3.49 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [16]: %timeit dask.base.tokenize(callme.two)
1.71 µs ± 2.78 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@martindurant
Copy link
Collaborator

If I make kwargs bigger, I find that the class tokenises faster than partial

@agoose77
Copy link
Collaborator Author

That's fascinating! I'm guessing you didn't run the code as above, because callme.two isn't defined?

@martindurant
Copy link
Collaborator

The first block was a module called "callme"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants