Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup arglist #13879

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Speedup arglist #13879

wants to merge 3 commits into from

Conversation

bonzini
Copy link
Collaborator

@bonzini bonzini commented Nov 6, 2024

This speeds up the target generation of QEMU, which is very slow.

  • Before: 41 seconds (total time 174 seconds)
  • After: 29 21 seconds (total time 150 134 seconds)

@bonzini bonzini force-pushed the speedups branch 4 times, most recently from c7bd527 to f501fa9 Compare November 6, 2024 16:08
mesonbuild/backend/backends.py Outdated Show resolved Hide resolved
mesonbuild/compilers/compilers.py Outdated Show resolved Hide resolved
@bonzini bonzini force-pushed the speedups branch 4 times, most recently from 47d8542 to 8c0faca Compare November 7, 2024 09:35
@bonzini bonzini changed the title Speedup Ninja backend with many extract_objects Speedup Ninja backend with many extract_objects or targets Nov 7, 2024
@bonzini bonzini force-pushed the speedups branch 2 times, most recently from 234adf6 to bc660cb Compare November 7, 2024 12:13
@bonzini bonzini requested a review from eli-schwartz November 7, 2024 12:15
@eli-schwartz
Copy link
Member

Pushed 1b15bd0 and 58d1efb for now. The backend changes look very interesting, though also take quite a bit more effort to understand what the code is doing both before and after. :D

@bonzini bonzini force-pushed the speedups branch 3 times, most recently from 14150bd to 84b08ff Compare November 20, 2024 07:16
@bonzini
Copy link
Collaborator Author

bonzini commented Nov 20, 2024

The backend changes look very interesting, though also take quite a bit more effort to understand what the code is doing both before and after. :D

Fair enough! I've split the changes more finely, which should be easier to both review and apply individually. If you prefer to have multiple PRs let me know.

@bonzini bonzini force-pushed the speedups branch 2 times, most recently from 60192f2 to 58a072d Compare November 20, 2024 07:22
@bonzini bonzini requested a review from eli-schwartz November 20, 2024 13:46
@bonzini
Copy link
Collaborator Author

bonzini commented Dec 5, 2024

Ping. :)

@jpakkane
Copy link
Member

LGTM at least.

@eli-schwartz
Copy link
Member

eli-schwartz commented Dec 30, 2024

Cherry-picked 4 commits and merged to master: e542901...bb23db4

"Inline" CompilerArgs.__iter__() into CompilerArgs.__init__(), so that
replace list(Iterable) is replaced by the much faster list(List).

Before:

   ncalls  tottime  cumtime
    19268    0.163    3.586 arglist.py:97(__init__)

After:

   ncalls  tottime  cumtime
    18674    0.211    3.442 arglist.py:97(__init__)

Signed-off-by: Paolo Bonzini <[email protected]>
Unless an argument is marked as Dedup.OVERRIDDEN, pre_flush_set and
post_flush_set will always be empty and the loops in flush_pre_post()
will not be doing anything interesting:

        for a in self.pre:
            dedup = self._can_dedup(a)
            if a not in pre_flush_set:
                # This just makes new a copy of self.pre
                new.append(a)
                if dedup is Dedup.OVERRIDDEN:
                    # this never happens
                    pre_flush_set.add(a)

        for a in reversed(self.post):
            dedup = self._can_dedup(a)
            if a not in post_flush_set:
                # Here self.post is reversed twice
                post_flush.appendleft(a)
                if dedup is Dedup.OVERRIDDEN:
                    # this never happens
                    post_flush_set.add(a)
        new.extend(post_flush)

In this case it's possible to avoid expensive calls and loops, instead
relying as much on Python builtins as possible.  Track whether any options
have that flag and if not just concatenate pre, _container and post.

Before:

   ncalls  tottime  cumtime
    45127    0.251    4.530 arglist.py:142(__iter__)
    81866    3.623    5.013 arglist.py:108(flush_pre_post)
    76618    3.793    5.338 arglist.py:273(__iadd__)

After:

    35647    0.156    0.627 arglist.py:160(__iter__)
    78998    2.627    3.603 arglist.py:116(flush_pre_post)
    73774    3.605    5.049 arglist.py:292(__iadd__)

The time in __iadd__ is reduced because it calls __iter__, which flushes
pre and post.

Signed-off-by: Paolo Bonzini <[email protected]>
self.post is only ever appended to on the right hand.  However,
it is then reversed twice in flush_pre_post(), by using "for a in
reversed.post()" and appendleft() within the loop.  It would be tempting
to use appendleft() in __iadd__ to avoid the call to reversed(), but that
is not a good idea because the loop of flush_pre_post() is part of a slow
path.  It's rather more important to use a fast extend-with-list-argument
in the fast path where needs_override_check if False.

For clarity, and to remove the temptation, make "post" a list instead
of a deque.

Signed-off-by: Paolo Bonzini <[email protected]>
@bonzini
Copy link
Collaborator Author

bonzini commented Dec 30, 2024

Thanks, I reduced the rest to just three that provide the most bang for the buck (and are easier to review than it seems).

@bonzini bonzini added this to the 1.7 milestone Dec 31, 2024
@bonzini bonzini modified the milestones: 1.7, 1.8 Jan 9, 2025
@bonzini bonzini changed the title Speedup Ninja backend with many extract_objects or targets Speedup arglist Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants