-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some perf tweaks to the tokenizer. #112
Conversation
The stylesheet is a mix of MozReview, The Guardian, and Facebook, so I can remove it if there's any concern about having it in-repo (we can find others also). cc @heycam @bholley, I believe this may improve our page load times, is there any way to measure Gecko's tokenizer in isolation? (to see if we should aim for more improvement or not) r? @SimonSapin |
Yes, please remove. |
Done |
Review ping @SimonSapin? |
Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 3 of 3 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 1 of 1 files at r7. .travis.yml, line 13 at r6 (raw file):
Why is it useful to run benchmarks on CI? They take time, and the CPU speed of VMs can vary a lot (e.g if another Travis job is running on the same CPU at the same time). build.rs, line 1 at r2 (raw file): Needs a copyright header. src/tokenizer.rs, line 408 at r1 (raw file):
Please rename src/tokenizer.rs, line 556 at r1 (raw file):
Since we’re already excluding non-ASCII, src/tokenizer.rs, line 409 at r2 (raw file):
Have you checked that src/macros/match_byte.rs, line 1 at r2 (raw file): Needs a copyright header. src/macros/match_byte.rs, line 53 at r2 (raw file):
This could use some docs. What is the expected "shape" of the input tokens (maybe give an example)? What are the respective components of the return value? src/macros/match_byte.rs, line 82 at r2 (raw file):
The parser allows multiple bindings but only the last one will work. But since this macro is not public and only used with a known set of inputs maybe that’s ok. src/macros/match_byte.rs, line 108 at r2 (raw file):
This accepts some inputs that it shouldn’t (the empty input, an ident followed by a byte pattern, …) but since this macro is not public and only used with a known set of inputs maybe that’s ok. src/macros/match_byte.rs, line 182 at r2 (raw file):
Replace src/macros/mod.rs, line 1 at r2 (raw file): Needs a copyright header. src/macros/visit.rs, line 1 at r2 (raw file):
I believe I wrote all of this code, and I agree to re-licensing it under the MPL 2. Please change this license header to the same as the rest of cssparser. src/macros/visit.rs, line 14 at r2 (raw file):
In html5ever I initially did that but then switched to only parsing token trees and avoiding the AST entirely, so that a visitor trait is not needed. Was there a reason not to do that here? Comments from Reviewable |
I think all your comments are addressed now :) Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions. .travis.yml, line 13 at r6 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Only to prevent them from breaking. I can back this out if you want. build.rs, line 1 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Good catch. src/tokenizer.rs, line 408 at r1 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Done src/tokenizer.rs, line 556 at r1 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Good catch. src/tokenizer.rs, line 409 at r2 (raw file):
without the custom macro expansion, to:
with it. src/macros/match_byte.rs, line 1 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Right src/macros/match_byte.rs, line 82 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Yeah, I only focused in making something that kept enough idiomatic code working. src/macros/match_byte.rs, line 182 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Yup, fine. I didn't want to pass it on the stack, but it's small anyway. src/macros/visit.rs, line 1 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Ok, thanks. src/macros/visit.rs, line 14 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Well, we use it to visit expressions and statements in the AST. I can definitely rework it, but I think this way works quite fine. Comments from Reviewable |
Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions. src/tokenizer.rs, line 409 at r2 (raw file): Previously, emilio (Emilio Cobos Álvarez) wrote…
After conversations in rust-internals (cc @nox), it seems that rustc doesn't generate the best possible code if we use ranges, but could if we moved the ranges to the bottom or expanded it. We can try to make the Comments from Reviewable |
Reviewed 1 of 6 files at r8, 3 of 3 files at r9, 1 of 1 files at r11, 1 of 1 files at r13, 4 of 4 files at r14, 1 of 1 files at r15. .travis.yml, line 13 at r6 (raw file): Previously, emilio (Emilio Cobos Álvarez) wrote…
I just checked: src/tokenizer.rs, line 409 at r2 (raw file): Previously, emilio (Emilio Cobos Álvarez) wrote…
This is already an improvement that I’m happy with, so up to you to land it as-is or investigate alternatives first. src/macros/match_byte.rs, line 182 at r2 (raw file): Previously, emilio (Emilio Cobos Álvarez) wrote…
Could also be src/macros/visit.rs, line 14 at r2 (raw file): Previously, emilio (Emilio Cobos Álvarez) wrote…
Yeah, both work. It’s just less code to maintain, or to update if the AST needs to change: servo/html5ever@2ddcd53 Comments from Reviewable |
Review status: 4 of 7 files reviewed at latest revision, 2 unresolved discussions. .travis.yml, line 13 at r6 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Oh, right! done :) src/tokenizer.rs, line 409 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Ok, I think I want to land this as-is for now. src/macros/match_byte.rs, line 182 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Yup, agreed. src/macros/visit.rs, line 14 at r2 (raw file): Previously, SimonSapin (Simon Sapin) wrote…
Thanks for the pointers :) I made it work on token trees, I've left the last fixup commit intentionally so you can review it. Comments from Reviewable |
Squash as desired, then r=me :) Reviewed 1 of 1 files at r16, 3 of 3 files at r17. Comments from Reviewable |
This increases the performance of the stylesheet tokenization test about 20%, and now one of the hottest instructions is the sign extension rust does to index in the array.
This was causing unaligned moves (movups instructions), for some reason.
@bors-servo r=SimonSapin |
📌 Commit 21f8573 has been approved by |
Some perf tweaks to the tokenizer. This makes parsing quite faster, by stripping UTF-8 logic and using tables instead of branching everywhere. We may be able to tweak it a bit more (sometimes the table may be overkill? I don't know). I've written the table macro so you can skip it easily if you want. In any case, benchmark results: Before: > test tests::big_stylesheet ... bench: 10,392,017 ns/iter (+/- 1,954,644) > test tests::unquoted_url ... bench: 261,854 ns/iter (+/- 53,335) After: > test tests::big_stylesheet ... bench: 8,638,215 ns/iter (+/- 381,980) > test tests::unquoted_url ... bench: 211,863 ns/iter (+/- 73,418) Which is quite good if you ask me. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-cssparser/112) <!-- Reviewable:end -->
☀️ Test successful - status-travis |
Thanks for the review @SimonSapin! :) |
Part of this (or maybe all of it) can be reverted once rust-lang/rust#39456 is in Rust stable. |
This makes parsing quite faster, by stripping UTF-8 logic and using tables instead of branching everywhere.
We may be able to tweak it a bit more (sometimes the table may be overkill? I don't know).
I've written the table macro so you can skip it easily if you want.
In any case, benchmark results:
Before:
After:
Which is quite good if you ask me.
This change is