-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup arglist #13879
base: master
Are you sure you want to change the base?
Speedup arglist #13879
Conversation
c7bd527
to
f501fa9
Compare
47d8542
to
8c0faca
Compare
234adf6
to
bc660cb
Compare
14150bd
to
84b08ff
Compare
Fair enough! I've split the changes more finely, which should be easier to both review and apply individually. If you prefer to have multiple PRs let me know. |
60192f2
to
58a072d
Compare
Ping. :) |
LGTM at least. |
Cherry-picked 4 commits and merged to master: e542901...bb23db4 |
"Inline" CompilerArgs.__iter__() into CompilerArgs.__init__(), so that replace list(Iterable) is replaced by the much faster list(List). Before: ncalls tottime cumtime 19268 0.163 3.586 arglist.py:97(__init__) After: ncalls tottime cumtime 18674 0.211 3.442 arglist.py:97(__init__) Signed-off-by: Paolo Bonzini <[email protected]>
Unless an argument is marked as Dedup.OVERRIDDEN, pre_flush_set and post_flush_set will always be empty and the loops in flush_pre_post() will not be doing anything interesting: for a in self.pre: dedup = self._can_dedup(a) if a not in pre_flush_set: # This just makes new a copy of self.pre new.append(a) if dedup is Dedup.OVERRIDDEN: # this never happens pre_flush_set.add(a) for a in reversed(self.post): dedup = self._can_dedup(a) if a not in post_flush_set: # Here self.post is reversed twice post_flush.appendleft(a) if dedup is Dedup.OVERRIDDEN: # this never happens post_flush_set.add(a) new.extend(post_flush) In this case it's possible to avoid expensive calls and loops, instead relying as much on Python builtins as possible. Track whether any options have that flag and if not just concatenate pre, _container and post. Before: ncalls tottime cumtime 45127 0.251 4.530 arglist.py:142(__iter__) 81866 3.623 5.013 arglist.py:108(flush_pre_post) 76618 3.793 5.338 arglist.py:273(__iadd__) After: 35647 0.156 0.627 arglist.py:160(__iter__) 78998 2.627 3.603 arglist.py:116(flush_pre_post) 73774 3.605 5.049 arglist.py:292(__iadd__) The time in __iadd__ is reduced because it calls __iter__, which flushes pre and post. Signed-off-by: Paolo Bonzini <[email protected]>
self.post is only ever appended to on the right hand. However, it is then reversed twice in flush_pre_post(), by using "for a in reversed.post()" and appendleft() within the loop. It would be tempting to use appendleft() in __iadd__ to avoid the call to reversed(), but that is not a good idea because the loop of flush_pre_post() is part of a slow path. It's rather more important to use a fast extend-with-list-argument in the fast path where needs_override_check if False. For clarity, and to remove the temptation, make "post" a list instead of a deque. Signed-off-by: Paolo Bonzini <[email protected]>
Thanks, I reduced the rest to just three that provide the most bang for the buck (and are easier to review than it seems). |
This speeds up the target generation of QEMU, which is very slow.
2921 seconds (total time150134 seconds)