Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perfomance regression from 6.0.2 to main #186

Open
kitsuniru opened this issue Jan 1, 2025 · 9 comments
Open

perfomance regression from 6.0.2 to main #186

kitsuniru opened this issue Jan 1, 2025 · 9 comments

Comments

@kitsuniru
Copy link

kitsuniru commented Jan 1, 2025

Greetings, i'm experiencing performance regression in my code after updating to latest main branch from 6.0.2

I have measured both versions 20 times each and the results was consistently regressive
There were no changes to the code between them. The only change was to the petitparser version in the pubspec file (from ^6.0.2 to the git main branch)

6.0.2:

chunkSize: 21869061
[Isolate 0] Read: Range(0...21869075)
...
[Isolate 15] Read: Range(328036163...349970780)
length: 349904988
8868744
----------------------------

Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 47     <-----
Milliseconds      : 593
Ticks             : 1075930866
TotalDays         : 0,00124529035416667
TotalHours        : 0,0298869685
TotalMinutes      : 1,79321811
TotalMilliseconds : 107593,0866

main branch (latest commit):

chunkSize: 21869061
[Isolate 0] Read: Range(0...21869075)
...
[Isolate 15] Read: Range(328036163...349970780)
length: 349904988
8868744
----------------------------


Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 53     <-----
Milliseconds      : 375
Ticks             : 1133753292
TotalDays         : 0,00131221445833333
TotalHours        : 0,031493147
TotalMinutes      : 1,88958882
TotalSeconds      : 113,3753292
TotalMilliseconds : 113375,3292
@kitsuniru
Copy link
Author

kitsuniru commented Jan 1, 2025

Measurements were obtained using the following PowerShell script:

Measure-Command { .\parser.exe | Out-Default }

@renggli
Copy link
Member

renggli commented Jan 1, 2025

Thank you for highlighting.

Can you provide the commit hash from the main branch that you used for testing? There were many changes that could affect performance, and some more recent ones I believe could improve performance again.

What are you measuring when collecting the benchmark? Could you provide a minimally reproducible example?

@kitsuniru
Copy link
Author

kitsuniru commented Jan 1, 2025

For benchmarking, I have left only the file reading and parsing functionalities.

The file is split into 16 parts and read in parallel in 16 isolates through a separator. (semicolon in our case)

The parser, built from the grammar definition, always returns one object (meaning that one object is placed in each subchunk between semicolons).

There are no delays in reading the file (as it is small, around 400 MB). The main time is spent parsing and tokenizing the output.
There are no difference between SSD and HDD, measurements are same

I can give you access to my experimental private repo with whole code, it looks kinda trashy and unstructured, but only yet

@kitsuniru
Copy link
Author

kitsuniru commented Jan 2, 2025

@renggli

It may sound strange but looks like that the perfomance was regressed at the 5e944abe55945cab53b045453e835a610aa8d1d4 commit

@renggli
Copy link
Member

renggli commented Jan 2, 2025

Thank you for the clarification. 5e944ab is an interesting find, in my quick tests I found that to improve the speed by around 5% (but I am aware of some older regressions). I will run some of my standard benchmarks now.

@renggli
Copy link
Member

renggli commented Jan 2, 2025

I've updated and run my usual benchmarks on the new development branch: https://docs.google.com/spreadsheets/d/15rRvLnmjT3rhGLP1iEYv5QF1JVMZmPksIjmFysDnnz0/edit?usp=sharing

The problem really is that the unicode parsing requires to duplicate each parser that reads individual characters, and I tried to avoid that in some cases. Reintroducing the specialised any-parser to see if that solves at least the regressions in the existing parsers.

@kitsuniru
Copy link
Author

kitsuniru commented Jan 2, 2025

Tried latest commit 517c63b

517c63b commit:

Seconds           : 38
Milliseconds      : 321
Ticks             : 383218801

6.0.2 version:

Seconds           : 35
Milliseconds      : 399
Ticks             : 353998834

all values are averaged, I ran the measurements several times, but 517c63b commit gave me 37-39 seconds, while 6.0.2 gave 34-36

@renggli
Copy link
Member

renggli commented Jan 3, 2025

The performance regression in parse-speed seems to happen with 06cfe90. Not clear why, because this change only really changes the way the parsers are instantiated. Maybe it prevents some optimizations?

@renggli
Copy link
Member

renggli commented Jan 5, 2025

I have spent the last couple of days to revert line by line and still have no clue where the performance regression comes from. For existing parsers that do not use unicode character decoding, the code being executed should be exactly the same, only some extra parsers were added. Maybe that prevents some optimisations?

The effect on complicated grammars is small, but clearly measurable:
Screenshot 2025-01-05 at 17 38 18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants