Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get column value by column label in Framework Core ASoA #13498

Open
wants to merge 21 commits into
base: dev
Choose a base branch
from

Conversation

mytkom
Copy link
Contributor

@mytkom mytkom commented Sep 10, 2024

I will continue our discussion here @ktf, @jgrosseo, @saganatt.

@jgrosseo was correct; this solution shouldn't be implemented as it is. It was slower in this benchmark by 3-10 times for simple examples. I will leave this draft PR here and try to implement this alternative solution I was writing about in AliceO2Group/O2Physics#7371. That 20% I was speaking of during the presentation was the whole task performance difference locally on my laptop. These are my local benchmark results:

------------------------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------
BM_ASoADynamicColumnPresent/8                            237 ns          236 ns      2983783 bytes_per_second=258.465M/s
BM_ASoADynamicColumnPresent/64                           395 ns          395 ns      1717027 bytes_per_second=1.20857G/s
BM_ASoADynamicColumnPresent/512                         1559 ns         1555 ns       448023 bytes_per_second=2.45328G/s
BM_ASoADynamicColumnPresent/4096                       10961 ns        10933 ns        62651 bytes_per_second=2.79137G/s
BM_ASoADynamicColumnPresent/32768                      85382 ns        85164 ns         8024 bytes_per_second=2.86673G/s
BM_ASoADynamicColumnPresent/262144                    679514 ns       677289 ns          998 bytes_per_second=2.88374G/s
BM_ASoADynamicColumnPresentGetValueByLabel/8             419 ns          418 ns      1622625 bytes_per_second=145.876M/s
BM_ASoADynamicColumnPresentGetValueByLabel/64           1772 ns         1768 ns       392634 bytes_per_second=276.255M/s
BM_ASoADynamicColumnPresentGetValueByLabel/512         12671 ns        12641 ns        54994 bytes_per_second=309.019M/s
BM_ASoADynamicColumnPresentGetValueByLabel/4096        99301 ns        99056 ns         6893 bytes_per_second=315.478M/s
BM_ASoADynamicColumnPresentGetValueByLabel/32768      805034 ns       802857 ns          858 bytes_per_second=311.388M/s
BM_ASoADynamicColumnPresentGetValueByLabel/262144    6319410 ns      6301477 ns          107 bytes_per_second=317.386M/s
BM_ASoADynamicColumnCall/8                              25.3 ns         25.2 ns     28377608 bytes_per_second=2.36521G/s
BM_ASoADynamicColumnCall/64                              183 ns          182 ns      3722494 bytes_per_second=2.61903G/s
BM_ASoADynamicColumnCall/512                            1378 ns         1375 ns       478496 bytes_per_second=2.7751G/s
BM_ASoADynamicColumnCall/4096                          11039 ns        11012 ns        62893 bytes_per_second=2.77126G/s
BM_ASoADynamicColumnCall/32768                         88298 ns        88067 ns         7847 bytes_per_second=2.77221G/s
BM_ASoADynamicColumnCall/262144                       706708 ns       704322 ns          996 bytes_pelementaryer_second=2.77306G/s
BM_ASoADynamicColumnCallGetValueByLabel/8               67.6 ns         67.4 ns     10002511 bytes_per_second=905.098M/s
BM_ASoADynamicColumnCallGetValueByLabel/64               500 ns          498 ns      1345915 bytes_per_second=979.614M/s
BM_ASoADynamicColumnCallGetValueByLabel/512             3981 ns         3972 ns       175994 bytes_per_second=983.554M/s
BM_ASoADynamicColumnCallGetValueByLabel/4096           31831 ns        31757 ns        22014 bytes_per_second=984.025M/s
BM_ASoADynamicColumnCallGetValueByLabel/32768         255368 ns       254706 ns         2722 bytes_per_second=981.526M/s
BM_ASoADynamicColumnCallGetValueByLabel/262144       2029059 ns      2022533 ns          340 bytes_per_second=988.859M/s

Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass3
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0

@ktf
Copy link
Member

ktf commented Sep 12, 2024

Thank you for your investigation. Much appreciated.

As a general note, please try to have a smaller contributions if possible, because that will for sure simplify and speedup the review process.

Please consider the following formatting changes to AliceO2Group#13498
@mytkom
Copy link
Contributor Author

mytkom commented Sep 26, 2024

Previously, I used std::function as the return type for the getter, but the performance was pretty bad. I changed O2 to debug mode and used perf + speedscope to analyze it. I obtained a much better performance, almost equal to an explicit getter call (less than 10% execution time difference). Then I switched back to release mode for compilation, and these are the results:

-------------------------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------
BM_ASoADynamicColumnPresent/8                             236 ns          235 ns      2951784 bytes_per_second=259.24M/s
BM_ASoADynamicColumnPresent/64                            397 ns          396 ns      1765157 bytes_per_second=1.2049G/s
BM_ASoADynamicColumnPresent/512                          1557 ns         1553 ns       454082 bytes_per_second=2.45591G/s
BM_ASoADynamicColumnPresent/4096                        10783 ns        10758 ns        63512 bytes_per_second=2.83666G/s
BM_ASoADynamicColumnPresent/32768                       85087 ns        84863 ns         8263 bytes_per_second=2.87686G/s
BM_ASoADynamicColumnPresent/262144                     683696 ns       681478 ns         1030 bytes_per_second=2.86601G/s
BM_ASoADynamicColumnPresentGetGetterByLabel/8             277 ns          276 ns      2529198 bytes_per_second=221.199M/s
BM_ASoADynamicColumnPresentGetGetterByLabel/64            658 ns          656 ns      1077150 bytes_per_second=744.385M/s
BM_ASoADynamicColumnPresentGetGetterByLabel/512          3490 ns         3482 ns       199110 bytes_per_second=1121.89M/s
BM_ASoADynamicColumnPresentGetGetterByLabel/4096        27125 ns        27054 ns        26808 bytes_per_second=1.12801G/s
BM_ASoADynamicColumnPresentGetGetterByLabel/32768      208795 ns       208255 ns         3366 bytes_per_second=1.17232G/s
BM_ASoADynamicColumnPresentGetGetterByLabel/262144    1668853 ns      1663740 ns          423 bytes_per_second=1.17394G/s
BM_ASoADynamicColumnCall/8                               24.8 ns         24.7 ns     28070062 bytes_per_second=2.41155G/s
BM_ASoADynamicColumnCall/64                               180 ns          180 ns      3630937 bytes_per_second=2.65376G/s
BM_ASoADynamicColumnCall/512                             1377 ns         1374 ns       497025 bytes_per_second=2.77621G/s
BM_ASoADynamicColumnCall/4096                           10961 ns        10935 ns        63683 bytes_per_second=2.79084G/s
BM_ASoADynamicColumnCall/32768                          87853 ns        87642 ns         8035 bytes_per_second=2.78565G/s
BM_ASoADynamicColumnCall/262144                        703888 ns       701646 ns          982 bytes_per_second=2.78363G/s
BM_ASoADynamicColumnCallGetGetterByLabel/8               40.0 ns         39.9 ns     17621092 bytes_per_second=1.4949G/s
BM_ASoADynamicColumnCallGetGetterByLabel/64               299 ns          298 ns      2335159 bytes_per_second=1.60043G/s
BM_ASoADynamicColumnCallGetGetterByLabel/512             2400 ns         2394 ns       300728 bytes_per_second=1.59373G/s
BM_ASoADynamicColumnCallGetGetterByLabel/4096           18681 ns        18635 ns        36845 bytes_per_second=1.63765G/s
BM_ASoADynamicColumnCallGetGetterByLabel/32768         148504 ns       148136 ns         4728 bytes_per_second=1.64809G/s
BM_ASoADynamicColumnCallGetGetterByLabel/262144       1189493 ns      1185832 ns          571 bytes_per_second=1.64705G/s

I am currently stuck with this solution; I would like to know if using the get() explicitly column struct member method is possible. I was trying to implement such an approach, but there is a type problem; the member method pointer type is ex. float(*ColumnType)(const RowItType&), and I would know ColumnType only in runtime after string comparison, so I can't think of an efficient implementation.

I would like to know if you think such a performance difference would be acceptable and if you can think of a more efficient solution.

@mytkom mytkom marked this pull request as ready for review October 8, 2024 17:36
@mytkom mytkom requested a review from a team as a code owner October 8, 2024 17:36
@mytkom
Copy link
Contributor Author

mytkom commented Oct 9, 2024

@ktf @jgrosseo,
May you have a look at the latest changes?

@ktf
Copy link
Member

ktf commented Oct 9, 2024

@mytkom thank you for your work. See inline comments. I might have more once those are resolved.

Please consider the following formatting changes to AliceO2Group#13498
@mytkom
Copy link
Contributor Author

mytkom commented Oct 14, 2024

@ktf
I have finally made the changes requested in your last review. Could you take a look at the current implementation?

Please consider the following formatting changes to AliceO2Group#13498
@mytkom
Copy link
Contributor Author

mytkom commented Oct 14, 2024

@ktf
I simplified this using nullptr as you advised; now I wonder if this strncmp in my implementation is not making it overcomplicated for such a comparison. Could you take a look and give me your opinion on this? Maybe using string + strcmp for str.data() instead of string_view would be more readable and performance drawback should not be so heavy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants