-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize scan
with a lookup table to reduce branching
#59106
base: main
Are you sure you want to change the base?
Conversation
It's probably faster to actually check the common case of spaces, tabs, and newlines, but the current draft always does the lookup. @typescript-bot perf test this |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
@typescript-bot perf test this |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
@typescript-bot perf test this |
@typescript-bot perf test this |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
I'm actually not sure if that's much better or not, it could just be a better run. @typescript-bot pack this |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
scan
with a lookup table to reduce branching
@typescript-bot pack this |
@typescript-bot test top400 |
Hey @DanielRosenwasser, I've packed this into an installable tgz. You can install it for testing by referencing it in your
and then running There is also a playground for this build and an npm module you can use via |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
@DanielRosenwasser Here are the results of running the top 400 repos with tsc comparing Everything looks good! |
@dragomirtitian maybe this balances out the parse-time hit from #58928. 😄 |
27392e7
to
33ae58d
Compare
@typescript-bot perf test this |
@DanielRosenwasser Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
This change introduces two lookup tables to the lexical scanner/tokenizer. One is a simple array-based lookup for entities in the ASCII range, and the other is a
Map
for wider codepoints.These tables assign categories to codepoints to indicate how they should be handled. For example, all line breaks, single-line whitespace, and single-character tokens can be handled generically in their own respective fashion. Doing so allows us to perform fewer individual checks to jump into or skip over a given branch.
One additional optimization is the way in which simple single-character tokens are handled is that the token itself is encoded in the lower 8 bits the category value. This allows us to avoid any branching for such cases.
This all comes at the expense of some memory overhead, along with some overhead from fetching from the table. To avoid hitting the table too often, spaces, tabs, and carriage returns, and line feeds are currently handled separately.