C++ benchmarks CMake error caused by enable_fp16 option in generate.py #734

rtxxxpro · 2025-01-13T02:36:23Z

When I followed the guidance here to compile the C++ benchmarks, an error was raised during the CMake part. The log hints a missing option --enable_fp16 when calling generate.py. By briefly reading CMakeLists.txt and generate.py, I guess the argument at line 290 should be --enable_f16 instead of --enable_fp16

Commands:

mkdir build
cd build
cp ../cmake/config.cmake ./
cmake .. -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release
make

CMake output:

-- The CUDA compiler identification is NVIDIA 12.3.103 with host compiler GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /home/rtx/miniconda3/bin/python3.12 (found version "3.12.2") found components: Interpreter
-- CMAKE_CUDA_ARCHITECTURES set to native.
-- NVBench and GoogleTest enabled
-- Testing CXX17 Support: TRUE
-- Testing CXX20 Support: TRUE
-- Testing CUDA17 Support: TRUE
-- Testing CUDA20 Support: TRUE
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.3.103")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Creating symlink from /home/rtx/nvidia/flashinfer/compile_commands.json to /home/rtx/nvidia/flashinfer/build/compile_commands.json...
-- Performing Test NVBench_CXX_FLAG__Wall
-- Performing Test NVBench_CXX_FLAG__Wall - Success
-- Performing Test NVBench_CXX_FLAG__Wextra
-- Performing Test NVBench_CXX_FLAG__Wextra - Success
-- Performing Test NVBench_CXX_FLAG__Wconversion
-- Performing Test NVBench_CXX_FLAG__Wconversion - Success
-- Performing Test NVBench_CXX_FLAG__Woverloaded_virtual
-- Performing Test NVBench_CXX_FLAG__Woverloaded_virtual - Success
-- Performing Test NVBench_CXX_FLAG__Wcast_qual
-- Performing Test NVBench_CXX_FLAG__Wcast_qual - Success
-- Performing Test NVBench_CXX_FLAG__Wpointer_arith
-- Performing Test NVBench_CXX_FLAG__Wpointer_arith - Success
-- Performing Test NVBench_CXX_FLAG__Wunused_local_typedef
-- Performing Test NVBench_CXX_FLAG__Wunused_local_typedef - Failed
-- Performing Test NVBench_CXX_FLAG__Wunused_parameter
-- Performing Test NVBench_CXX_FLAG__Wunused_parameter - Success
-- Performing Test NVBench_CXX_FLAG__Wvla
-- Performing Test NVBench_CXX_FLAG__Wvla - Success
-- Performing Test NVBench_CXX_FLAG__Wgnu
-- Performing Test NVBench_CXX_FLAG__Wgnu - Failed
-- Performing Test NVBench_CXX_FLAG__Wno_gnu_line_marker
-- Performing Test NVBench_CXX_FLAG__Wno_gnu_line_marker - Success
-- Found Git: /usr/bin/git (found version "2.34.1")
-- CPM: Using local package [email protected]
-- CPM: Adding package [email protected] (3.11.3)
CMake Warning (dev) at /home/rtx/miniconda3/share/cmake-3.31/Modules/FetchContent.cmake:1953 (message):
  Calling FetchContent_Populate(nlohmann_json) is deprecated, call
  FetchContent_MakeAvailable(nlohmann_json) instead.  Policy CMP0169 can be
  set to OLD to allow FetchContent_Populate(nlohmann_json) to be called
  directly for now, but the ability to call it with declared details will be
  removed completely in a future version.
Call Stack (most recent call first):
  build/cmake/CPM_0.38.5.cmake:1004 (FetchContent_Populate)
  build/cmake/CPM_0.38.5.cmake:798 (cpm_fetch_package)
  build/cmake/CPM_0.38.5.cmake:306 (CPMAddPackage)
  build/_deps/rapids-cmake-src/rapids-cmake/cpm/find.cmake:176 (CPMFindPackage)
  3rdparty/nvbench/cmake/NVBenchDependencies.cmake:27 (rapids_cpm_find)
  3rdparty/nvbench/CMakeLists.txt:57 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- NVBench CUDA architectures: native
Detected NVIDIA/CUDA.
-- Found clang-format: /usr/bin/clang-format
STATUS,black not found.
-- Found IBVerbs: /usr/include
-- Found NUMA: /usr/include
-- Using the multi-header code from /home/rtx/nvidia/flashinfer/build/_deps/json-src/include/
CMake Warning (dev) at 3rdparty/mscclpp/CMakeLists.txt:141 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at 3rdparty/mscclpp/CMakeLists.txt:143 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at 3rdparty/mscclpp/CMakeLists.txt:145 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- The C compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Python: /home/rtx/miniconda3/bin/python3.12 (found suitable version "3.12.2", minimum required is "3.8") found components: Interpreter Development.Module
-- Found Thrust: /usr/local/cuda/include (found version "200200")
-- Compile fp8_e4m3 kernels.
-- Compile fp8_e5m2 kernels.
-- Compile bf16 kernels.
-- FLASHINFER_HEAD_DIMS=64;128;256;512
-- FLASHINFER_POS_ENCODING_MODES=0;1;2
-- FLASHINFER_ALLOW_FP16_QK_REDUCTIONS=false;true
-- FLASHINFER_MASK_MODES=0;1;2
usage: Generate cuda files [-h] --path PATH --head_dims HEAD_DIMS
                           [HEAD_DIMS ...] --pos_encoding_modes
                           POS_ENCODING_MODES [POS_ENCODING_MODES ...]
                           --allow_fp16_qk_reductions ALLOW_FP16_QK_REDUCTIONS
                           [ALLOW_FP16_QK_REDUCTIONS ...] --mask_modes
                           MASK_MODES [MASK_MODES ...] --enable_fp16
                           ENABLE_FP16 [ENABLE_FP16 ...] --enable_bf16
                           ENABLE_BF16 [ENABLE_BF16 ...]
                           [--enable_fp8_e4m3 ENABLE_FP8_E4M3 [ENABLE_FP8_E4M3 ...]]
                           [--enable_fp8_e5m2 ENABLE_FP8_E5M2 [ENABLE_FP8_E5M2 ...]]
Generate cuda files: error: the following arguments are required: --enable_fp16
-- Compile single decode kernel benchmarks.
-- Compile single decode kernel tests.
-- Compile batch decode kernel benchmarks.
-- Compile batch mla decode kernel benchmarks.
-- Compile batch decode kernel tests.
-- Compile single prefill kernel benchmarks
-- Compile single prefill kernel tests.
-- Compile batch prefill kernel benchmarks.
-- Compile batch prefill kernel tests.
-- Compile page kernel tests.
-- Compile cascade kernel benchmarks.
-- Compile cascade kernel tests.
-- Compile sampling kernel benchmarks.
-- Compile sampling kernel tests.
-- Compile normalization kernel benchmarks.
-- Compile normalization kernel tests.
-- Compile tvm binding.
-- FlashInfer uses TVM home /home/rtx/nvidia/exp/tvm.
-- Compile fastdiv test.
-- Compile fast dequant test.
-- Compile sum all-reduce kernel tests.
-- Compile attention allreduce kernel tests.
-- Configuring done (31.5s)
-- Generating done (0.5s)
-- Build files have been written to: /home/rtx/nvidia/flashinfer/build

Unix Makefiles output:

[  0%] Generating kernel sources
usage: Generate cuda files [-h] --path PATH --head_dims HEAD_DIMS [HEAD_DIMS ...]
                           --pos_encoding_modes POS_ENCODING_MODES [POS_ENCODING_MODES ...]
                           --allow_fp16_qk_reductions ALLOW_FP16_QK_REDUCTIONS
                           [ALLOW_FP16_QK_REDUCTIONS ...] --mask_modes MASK_MODES
                           [MASK_MODES ...] --enable_fp16 ENABLE_FP16 [ENABLE_FP16 ...]
                           --enable_bf16 ENABLE_BF16 [ENABLE_BF16 ...]
                           [--enable_fp8_e4m3 ENABLE_FP8_E4M3 [ENABLE_FP8_E4M3 ...]]
                           [--enable_fp8_e5m2 ENABLE_FP8_E5M2 [ENABLE_FP8_E5M2 ...]]
Generate cuda files: error: the following arguments are required: --enable_fp16
make[2]: *** [CMakeFiles/decode_kernels.dir/build.make:86: /home/rtx/nvidia/flashinfer/src/generated/batch_paged_decode_head_128_posenc_0_dtypeq_bf16_dtypekv_bf16_dtypeout_bf16_idtype_i32.cu] Error 2
make[1]: *** [CMakeFiles/Makefile2:682: CMakeFiles/decode_kernels.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

The simple modification I mentioned above solved this problem.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ benchmarks CMake error caused by enable_fp16 option in generate.py #734

C++ benchmarks CMake error caused by enable_fp16 option in generate.py #734

rtxxxpro commented Jan 13, 2025

C++ benchmarks CMake error caused by enable_fp16 option in generate.py #734

C++ benchmarks CMake error caused by enable_fp16 option in generate.py #734

Comments

rtxxxpro commented Jan 13, 2025