Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operators support for Ascend NPU (CANN backend) #3552

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

hipudding
Copy link
Contributor

@hipudding hipudding commented Aug 17, 2023

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Opencv DNN has already suppoted CANN backend #22634.

There are more and more users using Ascend NPU and programming with CANN, and the number is still growing rapidly. AI training and inference are inseparable from data preprocessing. When users use OpenCV to work with CANN backend, data preprocessing can only run on CPUs, resulting in inefficiency.

The purpose of this PR is to enable OpenCV operators on CANN backend. We also complete a E2E test on Ascend 310 for new added operators, see test results.

The usage of CANN backend is consistent, Please refer to OpenCV DNN: CANN backend manual:

  1. Install dependencies
  2. Install CANN
  3. Compile OpenCV with CANN

The CANN backend is used in a similar way to CUDA:

Object CANN CUDA
Namespace cv::cann cv::cuda
Matrix AscendMat GpuMat
Stream AscendStream Stream
Event AscendEvent Event

The current PR provides CANN backend operator support framework, In order to make code viewing easy, only some basic interfaces are implemented, all of the following operators are tested and compared result with CPU backend:

  • Add
  • subtract
  • multiply
  • divide
  • bitwise_and
  • bitwise_or
  • bitwise_xor
  • bitwise_not
  • addWeighted
  • crop
  • cvtColor (support part of color formats)
  • merge
  • split
  • rotate
  • flip
  • threshold
  • transpose
  • resize

More operators will continue implement in new independent PRs.

OpenCVFindCANN.cmake is modified in opencv#24488.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • N/A There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Co-authored-by: CaoMengqing [email protected]

@hipudding hipudding force-pushed the npu_support branch 5 times, most recently from fe53efd to 0a751c3 Compare August 22, 2023 02:41
@hipudding hipudding changed the title Support Ascend NPU Support operators to execute on CANN backend Aug 22, 2023
@hipudding hipudding force-pushed the npu_support branch 2 times, most recently from 4176ecd to dc927ce Compare August 30, 2023 08:04
@hipudding
Copy link
Contributor Author

Performance Test Reasult

VM from huawei cloud

  • CPU: Intel(R) Xeon(R) Gold 6278C CPU @ 2.60GHz
  • Memory: 32G
  • NPU: Ascend 310(driver version: 22.0.4)
  • CANN: 6.3.RC2.alpha003
opreator size type NPU time(ms) CPU time(ms) Efficiency improvement
(cpu_time/npu_time)
add 1920x1080 CV_32S 234 219 0.94
add 1920x1080 CV_32SC3 545 663 1.22
add 2048x2048 CV_32S 395 448 1.13
add 2048x2048 CV_32SC3 984 1564 1.59
add 3840x2160 CV_32S 672 1021 1.52
add 3840x2160 CV_32SC3 1867 3122 1.67
add 7680x4320 CV_32S 2454 4148 1.69
add 7680x4320 CV_32SC3 7196 12382 1.72
subtract 1920x1080 CV_32S 227 221 0.97
subtract 1920x1080 CV_32SC3 525 763 1.45
subtract 2048x2048 CV_32S 392 700 1.79
subtract 2048x2048 CV_32SC3 1020 1567 1.54
subtract 3840x2160 CV_32S 706 1024 1.45
subtract 3840x2160 CV_32SC3 1902 3738 1.97
subtract 7680x4320 CV_32S 2492 4164 1.67
subtract 7680x4320 CV_32SC3 7308 12478 1.71
multiply 1920x1080 CV_32S 236 220 0.93
multiply 1920x1080 CV_32SC3 539 667 1.24
multiply 2048x2048 CV_32S 392 448 1.14
multiply 2048x2048 CV_32SC3 983 1567 1.59
multiply 3840x2160 CV_32S 678 1020 1.50
multiply 3840x2160 CV_32SC3 1869 3129 1.67
multiply 7680x4320 CV_32S 2461 4172 1.70
multiply 7680x4320 CV_32SC3 7170 12497 1.74
divide 1920x1080 CV_32S 232 460 1.98
divide 1920x1080 CV_32SC3 530 1588 3.00
divide 2048x2048 CV_32S 394 929 2.36
divide 2048x2048 CV_32SC3 994 3046 3.06
divide 3840x2160 CV_32S 678 1988 2.93
divide 3840x2160 CV_32SC3 1912 5614 2.94
divide 7680x4320 CV_32S 2512 7488 2.98
divide 7680x4320 CV_32SC3 7290 22479 3.08
bitwise_and 1920x1080 CV_32S 223 218 0.98
bitwise_and 1920x1080 CV_32SC3 524 756 1.44
bitwise_and 2048x2048 CV_32S 389 520 1.34
bitwise_and 2048x2048 CV_32SC3 1022 1551 1.52
bitwise_and 3840x2160 CV_32S 676 1021 1.51
bitwise_and 3840x2160 CV_32SC3 1882 3103 1.65
bitwise_and 7680x4320 CV_32S 2471 4125 1.67
bitwise_and 7680x4320 CV_32SC3 7187 12366 1.72
bitwise_or 1920x1080 CV_32S 234 221 0.94
bitwise_or 1920x1080 CV_32SC3 564 667 1.18
bitwise_or 2048x2048 CV_32S 407 449 1.10
bitwise_or 2048x2048 CV_32SC3 1019 1573 1.54
bitwise_or 3840x2160 CV_32S 669 1027 1.54
bitwise_or 3840x2160 CV_32SC3 1847 3141 1.70
bitwise_or 7680x4320 CV_32S 2653 4176 1.57
bitwise_or 7680x4320 CV_32SC3 7181 12495 1.74
bitwise_xor 1920x1080 CV_32S 218 218 1.00
bitwise_xor 1920x1080 CV_32SC3 521 755 1.45
bitwise_xor 2048x2048 CV_32S 387 518 1.34
bitwise_xor 2048x2048 CV_32SC3 1022 1552 1.52
bitwise_xor 3840x2160 CV_32S 675 1013 1.50
bitwise_xor 3840x2160 CV_32SC3 1905 3098 1.63
bitwise_xor 7680x4320 CV_32S 2487 4117 1.66
bitwise_xor 7680x4320 CV_32SC3 7193 12321 1.71

@hipudding hipudding changed the title Support operators to execute on CANN backend Add operators support for Ascend NPU (CANN backend) Aug 31, 2023
@hipudding hipudding force-pushed the npu_support branch 2 times, most recently from 381d853 to b07acec Compare August 31, 2023 03:15
@hipudding
Copy link
Contributor Author

hipudding commented Aug 31, 2023

Hi @vpisarev, Could you please help me to review this PR or assign someone else to do it? This PR is mainly to enable Ascend NPU (CANN backend) to be used as an accelerated backend of OpenCV and implemented several simple arithmetic operators, these operators do seem to have a certain acceleration effect.

In addition, I have two more questions and would like to get your advice:

  1. I have considered two implementation ways, one is the way of using namespace as this PR does, and the other is the way of hal replacement. They each have their own pros and cons, considering that we want to complete a new backend supporting, do you have any suggestions for these two implementations?
  2. Using namespace way, I had to implement a new Mat class, which contained a lot of duplicate code with Mat and GpuMat. Moreover, I need to modify the InputArray and OutputArray to adapt to the new Mat class, and I need to modify the python binding code generator. I am wondering if there is a better way to achieve this goal with no modification or slight modification in OpenCV's core module, because it is not very good to modify these code for every new backend.

Thanks.

// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.

#ifndef OPENCV_CANN_HPP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identifier should have the following format: OPENCV_<module>_<header_underscrore_subpath>_HPP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code review, will fix them in next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. change all identifiers in hpp.

CV_EXPORTS_W void subtract(InputArray src1, InputArray src2, OutputArray dst,
InputArray mask = noArray(), int dtype = -1,
AclStream& stream = AclStream::Null());
#ifdef NEVER_DEFINED

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEVER_DEFINED

What is that?

We should not have platform-specific conditional compilation in OpenCV public headers.

Also OpenCV bindings generators can't properly handle that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVER_DEFINED means this part of code will never be compiled. I did this only to cheat to python bindings to generate these two interfaces. Only did this can it support Mat subtract scalar. Otherwise, only Mat is accepted.

To be hoest, it's not a very good idea, do you have any suggestions?

#ifndef OPENCV_CANN_STREAM_ACCESSOR_HPP
#define OPENCV_CANN_STREAM_ACCESSOR_HPP

#include <acl/acl.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is platform-specific include in public header.

Some assumptions should be applied on the User side (like properly configured build environment, including headers paths and defines).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some code refactoring, <acl/acl.h> only defined in cann_call.hpp, other files can't call acl functions directly.
In this commit(opencv/opencv#22634), OpenCVFindCANN.cmake has already introduced, and cmake will set headers and libraries correctly.

void aclTwoInputs(const AclMat& src1, const AclMat& src2, AclMat& dst, const char* op,
AclStream& stream = AclStream::Null());

void transNCHWToNHWC(const AclMat& src, AclMat& dst, AclStream& stream = AclStream::Null());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a reverse operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do. Not this function is renamed to transData, it support transform all kinds of img formats.


#include "opencv2/cann.hpp"

typedef std::vector<cann::AclMat> vector_AclMat;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any python test?

misc/python/test/test_cann.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A python interface testcase is added.

cv::cann::initAcl();
cv::cann::setDevice(0);

cv::cann::AclMat aclMat = cv::cann::AclMat();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv::cann::AclMat aclMat = cv::cann::AclMat();

Just:

cv::cann::AclMat aclMat;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Mat cpuMat1 = randomMat(10, 10, CV_32SC3); \
Mat cpuMat2 = randomMat(10, 10, CV_32SC3); \
Mat cpuDst; \
cv::op(cpuMat1, cpuMat2, cpuDst, __VA_ARGS__); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to avoid multi-line macros.
Especially in tests, as we can't debug code line-by-line (macros is a single complex line of code)

Use templates instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed. But it's still exists some multi-line macros. I will change them in next code refactoring(maybe in separate PR).

@hipudding
Copy link
Contributor Author

Hi @opencv-alalek , Thanks for your review. I fixed the review coments, and did some code refactoring. Could you please review it again? Thanks.

@fengyuentau
Copy link
Member

Hello @hipudding , does this PR build against opencv/opencv#24277?

@hipudding
Copy link
Contributor Author

Hello @hipudding , does this PR build against opencv/opencv#24277?

Yes, it is. This PR need some predeclaration, which have to put in opencv main repo.

@hipudding
Copy link
Contributor Author

@fengyuentau Good weekends. Now this pr is not depend on opencv's main repo anymore(except OpenCVFindCANN.cmake). Please review it again. Thanks.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is
a heterogeneous computing architecture for AI. Opencv DNN has already
suppoted CANN backend [#22634](opencv/opencv#22634).

There are more and more users using [Ascend NPU](https://www.hiascend.com/)
and programming with CANN, and the number is still growing rapidly.
AI training and inference are inseparable from data preprocessing.
When users use OpenCV to work with CANN backend, data preprocessing can
only run on CPUs, resulting in inefficiency.

The purpose of this commit is to enable OpenCV operators on CANN backend.

The usage of CANN backend is consistent, Please refer to OpenCV DNN: [CANN backend manual]
(https://gist.github.com/fengyuentau/083f7f339592545c1f1d2c1fde6a53dc#file-a_ocv_cann-md):
1. [Install dependencies]
   (https://gist.github.com/fengyuentau/083f7f339592545c1f1d2c1fde6a53dc#install-dependencies)
2. [Install CANN]
   (https://gist.github.com/fengyuentau/083f7f339592545c1f1d2c1fde6a53dc#install-cann)
3. [Compile OpenCV with CANN]
   (https://gist.github.com/fengyuentau/083f7f339592545c1f1d2c1fde6a53dc#build-opencv-with-cann)

The CANN backend is used in a similar way to CUDA:
| Object    | CANN         | CUDA     |
| --------- | ------------ | -------- |
| Namespace | cv::cann     | cv::cuda |
| Matrix    | AscendMat    | GpuMat   |
| Stream    | AscendStream | Stream   |
| Event     | AscendEvent  | Event    |

The current commit provides CANN backend operator support framework, In
order to make code viewing easy, only a few basic interfaces are
implemented, all of the following operators are tested and compared
result with CPU backend.

More operators will continue implement in new independent commits.

Co-authored-by: CaoMengqing <[email protected]>
@diandianliu
Copy link

@hipudding @fengyuentau Hi, When cv::Canny and cv::GaussianBlur operators can be supported with CANN?

@hipudding
Copy link
Contributor Author

@hipudding @fengyuentau Hi, When cv::Canny and cv::GaussianBlur operators can be supported with CANN?

Thank you for your interest in OpenCV’s CANN support. Unfortunately, we do not have any plans for support these operators. OpenCV's CANN support mainly provides the ability of calling CANN build-in operators and run AscendC kernels(see also #3614).

Can we talk more details about technical issues via email? ([email protected])

asmorkalov pushed a commit that referenced this pull request Mar 28, 2024
Add additional image processing operators for Ascend NPU by utilizing DVPP #3608

The user base for [Ascend NPU](https://www.hiascend.com/en/) and programming with CANN is increasing rapidly, with a growing number of users joining each day. To facilitate the use of these users, this PR provides more support for Ascend backend operators. All operators this PR offers are using use DVPP as the computational unit. Digital Vision Pre-Processing (DVPP) is an image processing unit built into the Ascend AI processor. Its main functions include image and video encoding/decoding, as well as image cropping and scaling. 

The high-frequency operators with NPU as the backend and basic data structure AscendMat has been provided in #3552, while it still lacks many image processing operators. Moreover, only two interpolation algorithms for the resize operator are supported in #3552. In this PR, the bilinear interpolation algorithm and nearest neighbour interpolation algorithm are implemented for the resize operator, as well as the Ascend implementation of the copyMakeBorder operator. 

In addition, the serialization of image processing operations is widely used in the preprocessing and post-processing stages of computer vision deep learning methods. Therefore, providing integrated operators is very meaningful for improving the convenience of use for OpenCV and deep learning crossover users. For example, torchvision also provides similar operators: [RESIZED_CROP](https://pytorch.org/vision/stable/generated/torchvision.transforms.functional.resized_crop.html?highlight=resizedcrop).
Thus, this PR also provides two serialization processing operators: cropResize and cropResizeMakeBorder. 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [N/A] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
rurban pushed a commit to SpexAI/opencv_contrib that referenced this pull request Mar 28, 2024
Add additional image processing operators for Ascend NPU by utilizing DVPP opencv#3608

The user base for [Ascend NPU](https://www.hiascend.com/en/) and programming with CANN is increasing rapidly, with a growing number of users joining each day. To facilitate the use of these users, this PR provides more support for Ascend backend operators. All operators this PR offers are using use DVPP as the computational unit. Digital Vision Pre-Processing (DVPP) is an image processing unit built into the Ascend AI processor. Its main functions include image and video encoding/decoding, as well as image cropping and scaling. 

The high-frequency operators with NPU as the backend and basic data structure AscendMat has been provided in opencv#3552, while it still lacks many image processing operators. Moreover, only two interpolation algorithms for the resize operator are supported in opencv#3552. In this PR, the bilinear interpolation algorithm and nearest neighbour interpolation algorithm are implemented for the resize operator, as well as the Ascend implementation of the copyMakeBorder operator. 

In addition, the serialization of image processing operations is widely used in the preprocessing and post-processing stages of computer vision deep learning methods. Therefore, providing integrated operators is very meaningful for improving the convenience of use for OpenCV and deep learning crossover users. For example, torchvision also provides similar operators: [RESIZED_CROP](https://pytorch.org/vision/stable/generated/torchvision.transforms.functional.resized_crop.html?highlight=resizedcrop).
Thus, this PR also provides two serialization processing operators: cropResize and cropResizeMakeBorder. 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [N/A] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants