Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve benchmark setup #1323

Merged
merged 13 commits into from
Aug 30, 2023
Merged

Improve benchmark setup #1323

merged 13 commits into from
Aug 30, 2023

Conversation

upsj
Copy link
Member

@upsj upsj commented Apr 15, 2023

I needed to find something lightweight to do to get my mind off the Cholesky issues, so here we are 😄

  • Refactor benchmarks to use a common framework for handling JSON etc.
  • Replace RapidJSON by nlohmann_json

@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Apr 15, 2023
@upsj upsj requested a review from a team April 15, 2023 12:32
@upsj upsj self-assigned this Apr 15, 2023
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. reg:benchmarking This is related to benchmarking. type:solver This is related to the solvers type:preconditioner This is related to the preconditioners labels Apr 15, 2023
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several empty files for distribution tests
From cuda, there are two part of profiling: nsight system and nsight compute. (not sure aboth the other vendor)
I do not know the reason for splitting, but nsight system is for tracing (timeline) and nsight compute is for profiling kernels.
For tracing, annotations on the repetition should give a better timeline overview after the warmup.
For profiling kernels, the cuProfilerStart and cuProfilerStop or filter by annotation should help here.
It's also based on that we have some workspace avoiding reallocation and skipping some operations in the second run.
Only profiling the first run may lead us always see the reallocation overhead.

benchmark/blas/blas_common.hpp Outdated Show resolved Hide resolved
benchmark/utils/generator.hpp Outdated Show resolved Hide resolved
benchmark/utils/general.hpp Outdated Show resolved Hide resolved
benchmark/utils/general.hpp Outdated Show resolved Hide resolved
benchmark/utils/general.hpp Outdated Show resolved Hide resolved
benchmark/conversions/conversions.cpp Outdated Show resolved Hide resolved
benchmark/test/compare.py Outdated Show resolved Hide resolved
benchmark/test/compare.py Outdated Show resolved Hide resolved
benchmark/test/preconditioner.py Outdated Show resolved Hide resolved
@upsj
Copy link
Member Author

upsj commented Apr 18, 2023

The distributed benchmarks are tested now as well. The -profile flag is meant as a shortcut, if users are interested in the difference between hot and cold calls, they can still see the individual generate and apply calls in the timeline by controlling repetitions themselves. We could consider adding ranges to the timer iterations if repetitions > 1?

@upsj upsj force-pushed the improve_benchmarks branch 3 times, most recently from a2f11db to 715f4ab Compare April 18, 2023 13:25
@MarcelKoch
Copy link
Member

I feel like this PR mixes quite a few things, which could stand as their own PR. I think splitting this up into 3 PRs

  1. CLI changes (-profile, -input, ...)
  2. test framework
  3. JSON changes
    would make each part simpler to review. If that is too inconvenient, I would suggest to at least extract the test framework changes.

Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. currently the pull request has some content from the other pull request and is not rebased

benchmark/test/reference/blas.simple.stderr Outdated Show resolved Hide resolved
@@ -272,7 +272,7 @@ if(GINKGO_BUILD_TESTS)
endif()
if(GINKGO_BUILD_BENCHMARKS)
find_package(gflags 2.2.2 QUIET)
find_package(RapidJSON 1.1.0 QUIET)
find_package(nlohmann_json 3.9.1 QUIET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for picking 3.9.1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I needed this particular minimum version to provide ordered_json support

benchmark/blas/blas.cpp Outdated Show resolved Hide resolved
benchmark/solver/distributed/solver.cpp Show resolved Hide resolved
benchmark/utils/generator.hpp Outdated Show resolved Hide resolved
benchmark/utils/generator.hpp Outdated Show resolved Hide resolved
value.Accept(writer);
return os;
}
using json = nlohmann::ordered_json;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it mainly for testing purposes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RapidJSON has the same property, I wanted to preserve it, since it also makes the output more stable.

benchmark/utils/runner.hpp Show resolved Hide resolved
benchmark/conversion/conversion.cpp Outdated Show resolved Hide resolved
@upsj upsj force-pushed the benchmark_tests branch 2 times, most recently from 6ab159f to 22e12e9 Compare June 21, 2023 09:39
Base automatically changed from benchmark_tests to develop July 20, 2023 07:27
@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Patch has no changes to coverable lines.

📢 Thoughts on this report? Let us know!.

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few minor comments left.

.gitlab/image.yml Outdated Show resolved Hide resolved
benchmark/solver/solver_common.hpp Show resolved Hide resolved
benchmark/test/multi_vector_distributed.py Outdated Show resolved Hide resolved
benchmark/utils/loggers.hpp Show resolved Hide resolved
benchmark/test/test_framework.py.in Show resolved Hide resolved
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but don't forget to update the CI.

upsj and others added 11 commits August 28, 2023 09:52
This reverts commit 0dab762.
Additionally replaces the JSON test case output by their description
they are sometimes implementation-dependent
for libstdc++ types
- rename 'determinize' -> 'sanitize'
- use empty struct for empty benchmark state
- use version tag instead of commit ID
- use std::endl where appropriate

Co-authored-by: Marcel Koch <[email protected]>
- remove unnecessary stdin in tests
- simplify validate_config
- consistently use pointer members instead of reference members

Co-authored-by: Marcel Koch <[email protected]>
benchmark/test/CMakeLists.txt Outdated Show resolved Hide resolved
benchmark/test/reference/blas.simple.stdout Outdated Show resolved Hide resolved
benchmark/test/reference/conversion.simple.stderr Outdated Show resolved Hide resolved
Comment on lines +25 to +26
add_benchmark_test(multi_vector_distributed)
add_benchmark_test(spmv_distributed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it had the issue from unstable output from MPI. Is it solved now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the instability came from the fact that multiple ranks were printing output. This is now fixed thanks to the do_print variables that are set everywhere.

[
{
"size": 125,
"size": 100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size meaning is changed now? from matrix size to stecil point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, before the matrix dimensions were written into the size field, now they are being written into rows and cols to avoid overwriting the input size specified for the stencil.

benchmark/utils/generator.hpp Outdated Show resolved Hide resolved
benchmark/preconditioner/preconditioner.cpp Show resolved Hide resolved
benchmark/spmv/spmv_common.hpp Outdated Show resolved Hide resolved
Comment on lines -119 to -120
DEBUG: begin components::aos_to_soa
DEBUG: end components::aos_to_soa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aos_to_soa -> fill_array + copy + convert_idxs_to_ptrs
any idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to do with changing the benchmark from using matrix_data to using device_matrix_data, so the AOS-SOA conversion only happens once.

benchmark/utils/runner.hpp Show resolved Hide resolved
- don't install nlohmann-json
- simplify code
- improve config description formatting

Co-authored-by: Yuhsiang M. Tsai <[email protected]>
@upsj upsj added 1:ST:no-changelog-entry Skip the wiki check for changelog update 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test labels Aug 29, 2023
@upsj upsj merged commit 1100cbd into develop Aug 30, 2023
12 of 14 checks passed
@upsj upsj deleted the improve_benchmarks branch August 30, 2023 12:28
@sonarcloud
Copy link

sonarcloud bot commented Aug 30, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 71 Code Smells

87.7% 87.7% Coverage
2.8% 2.8% Duplication

warning The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:no-changelog-entry Skip the wiki check for changelog update 1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. reg:benchmarking This is related to benchmarking. reg:build This is related to the build system. reg:testing This is related to testing. type:preconditioner This is related to the preconditioners type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants