-
Notifications
You must be signed in to change notification settings - Fork 4.3k
layer support behavior
This document is for developers implementing new layers in ncnn. It explains the support_XYZ boolean properties in the ncnn::Layer base class. Correctly setting these properties declares the capabilities of your layer to the ncnn inference engine. This allows the engine to apply specific optimizations, such as enabling SIMD, half-precision floating-point computation, or Vulkan GPU acceleration, to achieve optimal performance and memory efficiency.
A layer can set its support properties in two ways:
- Statically in the constructor: If the layer's capabilities are fixed, the simplest way is to set them in its constructor.
-
Dynamically in
create_pipeline: If the layer's capabilities depend on parameters loaded fromload_paramorload_model(e.g., the data type of weights), you can set these properties dynamically within thecreate_pipelinemethod.
Here is a detailed breakdown of each support property and what it means for your layer's implementation.
-
Purpose: Declares that the layer accepts only one input
bloband produces only one outputblob. -
Requirements if
true: You must implement the single-input, single-output version of theforwardmethod:virtual int forward(const Mat& bottom_blob, Mat& top_blob, const Option& opt) const;
-
Behavior: When
true,ncnncalls this overload. Iffalse(default), thestd::vector<Mat>version offorwardis called.
- Purpose: Declares that the layer supports in-place computation, meaning the input and output can share the same memory. This significantly reduces memory overhead.
-
Requirements if
true: You must implement theforward_inplacemethod. Depending on whetherone_blob_onlyis also enabled, implement the corresponding version:// If one_blob_only is true virtual int forward_inplace(Mat& bottom_top_blob, const Option& opt) const; // If one_blob_only is false virtual int forward_inplace(std::vector<Mat>& bottom_top_blobs, const Option& opt) const;
- Purpose: Declares that the layer has a Vulkan implementation for GPU-accelerated inference.
-
Requirements if
true:- Implement
forward/forward_inplacemethods that acceptVkMatfor input and output. - Implement
upload_modelto transfer weight data to the GPU. - Implement
create_pipelineanddestroy_pipelineto manage VulkanPipelineobjects and other GPU resources.
- Implement
-
Purpose: Declares that the layer's CPU implementation can handle
Matdata with a "packing" memory layout (i.e.,elempack > 1). This is crucial for SIMD optimizations (e.g., processing 4 or 8 floats at once with NEON or AVX). -
Behavior if
true:- When the input
Matchannel count is a multiple of the SIMD width, thencnnengine ensures that the inputMatpassed toforward/forward_inplaceis packed (e.g.,elempack=4orelempack=8). - Your implementation must correctly handle
Matdata wherecstepandelempackare not their default values.
- When the input
-
Behavior if
false:- The
ncnnengine guarantees that the inputMatpassed to your layer will always haveelempack=1. The engine will automatically insert conversions if the preceding layer produced a packed output.
- The
-
Output: Regardless of the property's value, your layer can output a
Matwith anyelempack. However, it is highly recommended to output aMatwith an adaptiveelempackto avoid unnecessary conversions in subsequent layers.
-
Purpose: An extension of
support_packing. It declares that the layer's CPU implementation is flexible enough to handle aMatwith anyelempackvalue (1,4,8, etc.). -
Behavior if
true:- The
ncnnengine can pass an inputMatwith any packing layout to yourforwardmethod, without forcing a conversion to the hardware's "optimal"elempack. For example, on an AVX512 system whereelempack=16is optimal, your layer can still acceptelempack=1,4, or8. - This gives the engine more flexibility to avoid unnecessary packing/unpacking conversions between layers.
- The
-
Behavior if
false: Iffalse(butsupport_packingistrue), the engine will try to provide an inputMatwith an optimalelempackfor the target architecture. -
Output: This property does not enforce any constraint on the output
Mat, which can have anyelempack.
-
Purpose: This is the Vulkan equivalent of
support_packing. It declares that the layer's Vulkan implementation can handleVkMatwithelempack=4. -
Behavior if
true: When the inputVkMathas a channel count that is a multiple of 4, thencnnengine will provide a packedVkMat(withelempack=4) to your Vulkanforwardmethods. -
Behavior if
false: The engine will ensure the inputVkMathaselempack=1. -
Note:
support_packingandsupport_vulkan_packingare independent. A layer can support packing on CPU but not on Vulkan, or vice-versa.
-
Purpose: An extension of
support_vulkan_packing. It declares that the layer's Vulkan implementation can handle aVkMatwith any supportedelempackvalue (e.g.,1,4). -
Behavior if
true:- The
ncnnengine can pass an inputVkMatwith any supported packing layout to your Vulkanforwardmethod. This allows the engine to avoid unnecessary repacking operations on the GPU. - This is particularly useful for optimizing shader dispatch and memory access patterns.
- The
-
Behavior if
false: Iffalse(butsupport_vulkan_packingistrue), the engine will try to provide aVkMatwithelempack=4if the channel count is a multiple of 4. -
Note: This property is independent of its CPU counterpart,
support_any_packing.
-
Purpose: Declares that the layer can process
bfloat16data. -
Behavior if
true:- The
forwardmethod may receive an inputMatof typebfloat16(elembits() == 16) orfp32. - Inside your
forwardimplementation, you must checkopt.use_bf16_storageandbottom_blob.elembits()to determine whether to use abfloat16-optimized code path.
- The
-
Behavior if
false: Thencnnengine ensures your layer will not receive abfloat16Mat. -
Output: Your layer can output either a
bfloat16orfp32Mat. Whenopt.use_bf16_storageis active, outputtingbfloat16is recommended to maintain precision and performance across the network.
-
Purpose: Declares that the layer can process
float16data for half-precision inference. -
Behavior if
true:- Similar to
support_bf16_storage, theforwardmethod may receive anfp16orfp32Mat. - Your implementation should check
opt.use_fp16_storageandbottom_blob.elembits()to select the correct code path.
- Similar to
-
Behavior if
false: Thencnnengine ensures your layer will not receive anfp16Mat. -
Output: Your layer can output either a
fp16orfp32Mat. Whenopt.use_fp16_storageis active, outputting anfp16Matis recommended.
-
Purpose: Declares that the layer supports
int8quantized inference. -
Behavior if
true:- When
opt.use_int8_inferenceistrue, theforwardmethod may receive anint8orfp32Mat. -
Important: If the input is
fp32, yourforwardimplementation is responsible for dynamically quantizing it toint8before performing computations.
- When
-
Behavior if
false: Thencnnengine ensures your layer will not receive anint8Mat. -
Output: The output can be
int8orfp32, depending on your layer's design.
A layer can set support_fp16_storage and support_bf16_storage to true simultaneously. The ncnn engine prioritizes these formats based on the Option flags. As seen in the convert_layout function in src/net.cpp, if opt.use_bf16_storage is true, the engine will prefer converting inputs to bfloat16. Otherwise, it falls back to fp16 if opt.use_fp16_storage is true.
The chosen elempack also depends on the precision. For instance, with SIMD, the priority might be:
- FP16:
elempack=8(if supported), thenelempack=4, then1. - BF16:
elempack=4, then1.
Your forward implementation should reflect this by checking elembits() and elempack to dispatch to the correct kernel.
The Clip_arm layer provides a great example of these concepts in practice.
-
Declaring Support in the Constructor: It declares support for packing and, conditionally, for fp16 and bf16 storage.
// From: src/layer/arm/clip_arm.cpp Clip_arm::Clip_arm() { #if __ARM_NEON support_packing = true; #if NCNN_ARM82 support_fp16_storage = cpu_support_arm_asimdhp(); #endif #endif // __ARM_NEON #if NCNN_BF16 support_bf16_storage = true; #endif }
-
Dispatching in
forward_inplace: Theforward_inplacemethod acts as a dispatcher. It first checks the element size (elembits) and the correspondingoptflag to decide whether to call a specialized low-precision implementation (fp16sorbf16s). If neither is applicable, it defaults to the standardfp32implementation.// From: src/layer/arm/clip_arm.cpp int Clip_arm::forward_inplace(Mat& bottom_top_blob, const Option& opt) const { int elembits = bottom_top_blob.elembits(); #if NCNN_ARM82 if (support_fp16_storage && opt.use_fp16_storage && elembits == 16) return forward_inplace_fp16s(bottom_top_blob, opt); #endif #if NCNN_BF16 if (opt.use_bf16_storage && elembits == 16) return forward_inplace_bf16s(bottom_top_blob, opt); #endif // Default fp32 implementation follows... int w = bottom_top_blob.w; // ... }
Adopting a gradual approach can simplify the development of a new layer:
-
Implement the Core Algorithm: Start with all
support_XYZproperties set tofalse. Focus on getting the mathematical logic correct using standardfp32data andelempack=1. -
Add Packing Support: Once the core logic is validated, set
support_packing = true. Modify your code to handleelempack > 1and implement SIMD optimizations (e.g., using NEON intrinsics). -
Add Low-Precision Support: Next, add support for
fp16,bf16, orint8. Set the correspondingsupport_*_storageflags totrueand add branches in yourforwardmethod to handle these data types based on theoptflags. -
Add Vulkan Support: Finally, if GPU acceleration is desired, set
support_vulkan = trueand implement the Vulkan-specific methods.
This incremental process allows you to tackle one challenge at a time, making it easier to develop a highly optimized and feature-rich layer.