Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise text.lines for longer lines, larger chunks use-cases #3481

Merged
merged 2 commits into from
Oct 8, 2024

Conversation

svalaskevicius
Copy link
Contributor

The main idea is to avoid copying single characters, and instead copy string slices, that can use avx2 instructions.

The copy is still unavoidable unfortunately in JVM - AFAIK.

benchmarks running locally -wi 3 -i 3 -f 1 -t 1

Before:

LinesBenchmark.linesBenchmark                0            4  thrpt    3  122139.570 ± 236181.395  ops/s
LinesBenchmark.linesBenchmark                0           16  thrpt    3  134204.974 ±  76569.841  ops/s
LinesBenchmark.linesBenchmark                0           64  thrpt    3  133776.725 ± 137902.948  ops/s
LinesBenchmark.linesBenchmark                1            4  thrpt    3   44078.282 ±  46048.728  ops/s
LinesBenchmark.linesBenchmark                1           16  thrpt    3   95173.337 ±  31919.684  ops/s
LinesBenchmark.linesBenchmark                1           64  thrpt    3   91207.064 ± 134077.287  ops/s
LinesBenchmark.linesBenchmark               10            4  thrpt    3   11310.247 ±   6860.349  ops/s
LinesBenchmark.linesBenchmark               10           16  thrpt    3   53543.590 ±  12547.417  ops/s
LinesBenchmark.linesBenchmark               10           64  thrpt    3   66603.479 ±  66765.096  ops/s
LinesBenchmark.linesBenchmark              100            4  thrpt    3    1759.697 ±   1187.625  ops/s
LinesBenchmark.linesBenchmark              100           16  thrpt    3   10084.010 ±   6844.210  ops/s
LinesBenchmark.linesBenchmark              100           64  thrpt    3   15806.198 ±   3290.178  ops/s

After:

LinesBenchmark.linesBenchmark                0            4  thrpt    3  129302.595 ±  42736.093  ops/s
LinesBenchmark.linesBenchmark                0           16  thrpt    3  130886.744 ±  47487.794  ops/s
LinesBenchmark.linesBenchmark                0           64  thrpt    3  123490.463 ± 156930.971  ops/s
LinesBenchmark.linesBenchmark                1            4  thrpt    3   41200.649 ±  10262.513  ops/s
LinesBenchmark.linesBenchmark                1           16  thrpt    3   87733.636 ± 111704.611  ops/s
LinesBenchmark.linesBenchmark                1           64  thrpt    3   94767.345 ±  62120.286  ops/s
LinesBenchmark.linesBenchmark               10            4  thrpt    3   12385.057 ±   7146.545  ops/s
LinesBenchmark.linesBenchmark               10           16  thrpt    3   49429.892 ±  33859.569  ops/s
LinesBenchmark.linesBenchmark               10           64  thrpt    3   75037.094 ± 107270.451  ops/s
LinesBenchmark.linesBenchmark              100            4  thrpt    3    1699.683 ±    544.435  ops/s
LinesBenchmark.linesBenchmark              100           16  thrpt    3   13667.955 ±   3294.870  ops/s
LinesBenchmark.linesBenchmark              100           64  thrpt    3   31498.156 ±  13943.473  ops/s

The main idea is to avoid copying single characters, and instead copy
string slices, that can use avx2 instructions.

The copy is still unavoidable unfortunately in JVM - AFAIK.

Before:
```
LinesBenchmark.linesBenchmark                0            4  thrpt    3  122139.570 ± 236181.395  ops/s
LinesBenchmark.linesBenchmark                0           16  thrpt    3  134204.974 ±  76569.841  ops/s
LinesBenchmark.linesBenchmark                0           64  thrpt    3  133776.725 ± 137902.948  ops/s
LinesBenchmark.linesBenchmark                1            4  thrpt    3   44078.282 ±  46048.728  ops/s
LinesBenchmark.linesBenchmark                1           16  thrpt    3   95173.337 ±  31919.684  ops/s
LinesBenchmark.linesBenchmark                1           64  thrpt    3   91207.064 ± 134077.287  ops/s
LinesBenchmark.linesBenchmark               10            4  thrpt    3   11310.247 ±   6860.349  ops/s
LinesBenchmark.linesBenchmark               10           16  thrpt    3   53543.590 ±  12547.417  ops/s
LinesBenchmark.linesBenchmark               10           64  thrpt    3   66603.479 ±  66765.096  ops/s
LinesBenchmark.linesBenchmark              100            4  thrpt    3    1759.697 ±   1187.625  ops/s
LinesBenchmark.linesBenchmark              100           16  thrpt    3   10084.010 ±   6844.210  ops/s
LinesBenchmark.linesBenchmark              100           64  thrpt    3   15806.198 ±   3290.178  ops/s
```

After:
```
LinesBenchmark.linesBenchmark                0            4  thrpt    3  129302.595 ±  42736.093  ops/s
LinesBenchmark.linesBenchmark                0           16  thrpt    3  130886.744 ±  47487.794  ops/s
LinesBenchmark.linesBenchmark                0           64  thrpt    3  123490.463 ± 156930.971  ops/s
LinesBenchmark.linesBenchmark                1            4  thrpt    3   41200.649 ±  10262.513  ops/s
LinesBenchmark.linesBenchmark                1           16  thrpt    3   87733.636 ± 111704.611  ops/s
LinesBenchmark.linesBenchmark                1           64  thrpt    3   94767.345 ±  62120.286  ops/s
LinesBenchmark.linesBenchmark               10            4  thrpt    3   12385.057 ±   7146.545  ops/s
LinesBenchmark.linesBenchmark               10           16  thrpt    3   49429.892 ±  33859.569  ops/s
LinesBenchmark.linesBenchmark               10           64  thrpt    3   75037.094 ± 107270.451  ops/s
LinesBenchmark.linesBenchmark              100            4  thrpt    3    1699.683 ±    544.435  ops/s
LinesBenchmark.linesBenchmark              100           16  thrpt    3   13667.955 ±   3294.870  ops/s
LinesBenchmark.linesBenchmark              100           64  thrpt    3   31498.156 ±  13943.473  ops/s
```
@@ -542,9 +548,9 @@ object text {
else Pull.output1(result)
}
case Some((chunk, stream)) =>
val linesBuffer = ArrayBuffer.empty[String]
linesBuffer.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clear this earlier? To avoid holding on to lines after they've been emitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check, thx

Copy link
Contributor Author

@svalaskevicius svalaskevicius Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I've reverted this part back to creating a new array buffer for each output chunk

it's a mutable array and it is not safe to clear it because of unexpected consequences if the chunk is used after we pull the next one

thanks for pointing my attention to this line! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks!

underlying array is mutable and can cause unexpected consequences when
clearing it - e.g. if a chunk is used after pulling the next chunk
@mpilquist mpilquist merged commit 74693cc into typelevel:main Oct 8, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants