Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX2 and AVX512 optimization #1552

Merged
merged 4 commits into from
Sep 6, 2024

Conversation

tszumski
Copy link
Contributor

@tszumski tszumski commented Sep 3, 2024

Encoder: performance gain ~0.1%​
Decoder: performance gain ~2.5%

@rouault
Copy link
Collaborator

rouault commented Sep 3, 2024

Encoder: performance gain ~0.1%​
Decoder: performance gain ~2.5%

How was that measured? Total performance gain, or just on the DWT?

@tszumski
Copy link
Contributor Author

tszumski commented Sep 3, 2024

Encoder: performance gain ~0.1%​
Decoder: performance gain ~2.5%

How was that measured? Total performance gain, or just on the DWT?

Total performance gain

@rouault
Copy link
Collaborator

rouault commented Sep 3, 2024

This breaks at least Windows x64 regression tests.
Did you run the regression test suite locally? Make sure to clone https://github.com/uclouvain/openjpeg-data in a "data" subdirectory of your openjpeg repository, and run ctest

@tszumski
Copy link
Contributor Author

tszumski commented Sep 5, 2024

This breaks at least Windows x64 regression tests. Did you run the regression test suite locally? Make sure to clone https://github.com/uclouvain/openjpeg-data in a "data" subdirectory of your openjpeg repository, and run ctest

Thanks for reply, I was able to fix it. Root cause is that CI use MSVC 2015 toolset that does not have _mm256_extract_epi32 and _mm256_insert_epi32 intrinsic defined,

@rouault
Copy link
Collaborator

rouault commented Sep 5, 2024

@tszumski Can you fix the formatting of the code according to the instructions at end of https://github.com/uclouvain/openjpeg/actions/runs/10718601930/job/29724100308?pr=1552, that is

* Enable WITH_ASTYLE in your cmake configuration to format C++ code
* Use "scripts/astyle.sh file" to fix the now badly indented files
* Consider using scripts/prepare-commit.sh as pre-commit hook to avoid this
  in the future (ln -s scripts/prepare-commit.sh .git/hooks/pre-commit) or
  run it manually before each commit.

@tszumski
Copy link
Contributor Author

tszumski commented Sep 5, 2024

@tszumski Can you fix the formatting of the code according to the instructions at end of https://github.com/uclouvain/openjpeg/actions/runs/10718601930/job/29724100308?pr=1552, that is

* Enable WITH_ASTYLE in your cmake configuration to format C++ code
* Use "scripts/astyle.sh file" to fix the now badly indented files
* Consider using scripts/prepare-commit.sh as pre-commit hook to avoid this
  in the future (ln -s scripts/prepare-commit.sh .git/hooks/pre-commit) or
  run it manually before each commit.

Fixed

@rouault
Copy link
Collaborator

rouault commented Sep 5, 2024

@tszumski I don't have hardware to test AVX512F and github CI doesn't seem to have machines with it. Could you paste somewhere (here or a github gist if too large):

  • the output of "grep C_FLAGS CMakeCache.txt" , showing that it includes -mavx512f
  • the output of "ctest -V"

@tszumski
Copy link
Contributor Author

tszumski commented Sep 6, 2024

@rouault
For consistency with CI scans(windows) i applied flag -DCMAKE_C_FLAGS="/arch:AVX512" And i can confirm that AVX512F is set/defined* (test logs from MSVC 2017 and MSVC 2022 are identical)

Note: *MSVC added support for AVX512 in Visual Studio 2017

  • the output of "grep C_FLAGS CMakeCache.txt"
$ grep C_FLAGS CMakeCache.txt
CMAKE_C_FLAGS:STRING=/arch:AVX512
CMAKE_C_FLAGS_DEBUG:STRING=/MDd /Zi /Ob0 /Od /RTC1
CMAKE_C_FLAGS_MINSIZEREL:STRING=/MD /O1 /Ob1 /DNDEBUG
CMAKE_C_FLAGS_RELEASE:STRING=/MD /O2 /Ob2 /DNDEBUG
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=/MD /Zi /O2 /Ob1 /DNDEBUG
CMAKE_RC_FLAGS:STRING=-DWIN32
CMAKE_RC_FLAGS_DEBUG:STRING=-D_DEBUG
CMAKE_RC_FLAGS_MINSIZEREL:STRING=
CMAKE_RC_FLAGS_RELEASE:STRING=
CMAKE_RC_FLAGS_RELWITHDEBINFO:STRING=
//ADVANCED property for variable: CMAKE_C_FLAGS
CMAKE_C_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_DEBUG
CMAKE_C_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_MINSIZEREL
CMAKE_C_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS
CMAKE_RC_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_DEBUG
CMAKE_RC_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_MINSIZEREL
CMAKE_RC_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_RELEASE
CMAKE_RC_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_RELWITHDEBINFO
CMAKE_RC_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
  • the output of "ctest", to make logs shorter here is only first and last few lines with summary
1>------ Build started: Project: RUN_TESTS, Configuration: Release x64 ------
1>Test project D:/openjpeg/build-tests
1>          Start    1: tte0
1>   1/1587 Test    #1: tte0 .........................................................................................   Passed    0.88 sec
...
1>          Start 1587: testjp2
1>1587/1587 Test #1587: testjp2 ......................................................................................   Passed    0.27 sec
1>
1>99% tests passed, 2 tests failed out of 1587
1>
1>Total Test time (real) = 224.17 sec
1>
1>The following tests FAILED:
1>	1147 - NR-DEC-issue226.j2k-74-decode (Failed)
1>	1148 - NR-DEC-issue226.j2k-74-decode-md5 (Failed)
1>Errors while running CTest

Failed Tests 1147/1148 are also failing on master branch without my changes

@rouault
Copy link
Collaborator

rouault commented Sep 6, 2024

Failed Tests 1147/1148 are also failing on master branch without my changes

yes, those are "expected" (we have some a file in CI to ignore them

ok, so everything looks good. Merging

@rouault rouault merged commit e0e0c80 into uclouvain:master Sep 6, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants