Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ benchmarks CMake error caused by enable_fp16 option in generate.py #734

Open
rtxxxpro opened this issue Jan 13, 2025 · 0 comments
Open

Comments

@rtxxxpro
Copy link

When I followed the guidance here to compile the C++ benchmarks, an error was raised during the CMake part. The log hints a missing option --enable_fp16 when calling generate.py. By briefly reading CMakeLists.txt and generate.py, I guess the argument at line 290 should be --enable_f16 instead of --enable_fp16

Commands:

mkdir build
cd build
cp ../cmake/config.cmake ./
cmake .. -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release
make

CMake output:

-- The CUDA compiler identification is NVIDIA 12.3.103 with host compiler GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Python3: /home/rtx/miniconda3/bin/python3.12 (found version "3.12.2") found components: Interpreter
-- CMAKE_CUDA_ARCHITECTURES set to native.
-- NVBench and GoogleTest enabled
-- Testing CXX17 Support: TRUE
-- Testing CXX20 Support: TRUE
-- Testing CUDA17 Support: TRUE
-- Testing CUDA20 Support: TRUE
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.3.103")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Creating symlink from /home/rtx/nvidia/flashinfer/compile_commands.json to /home/rtx/nvidia/flashinfer/build/compile_commands.json...
-- Performing Test NVBench_CXX_FLAG__Wall
-- Performing Test NVBench_CXX_FLAG__Wall - Success
-- Performing Test NVBench_CXX_FLAG__Wextra
-- Performing Test NVBench_CXX_FLAG__Wextra - Success
-- Performing Test NVBench_CXX_FLAG__Wconversion
-- Performing Test NVBench_CXX_FLAG__Wconversion - Success
-- Performing Test NVBench_CXX_FLAG__Woverloaded_virtual
-- Performing Test NVBench_CXX_FLAG__Woverloaded_virtual - Success
-- Performing Test NVBench_CXX_FLAG__Wcast_qual
-- Performing Test NVBench_CXX_FLAG__Wcast_qual - Success
-- Performing Test NVBench_CXX_FLAG__Wpointer_arith
-- Performing Test NVBench_CXX_FLAG__Wpointer_arith - Success
-- Performing Test NVBench_CXX_FLAG__Wunused_local_typedef
-- Performing Test NVBench_CXX_FLAG__Wunused_local_typedef - Failed
-- Performing Test NVBench_CXX_FLAG__Wunused_parameter
-- Performing Test NVBench_CXX_FLAG__Wunused_parameter - Success
-- Performing Test NVBench_CXX_FLAG__Wvla
-- Performing Test NVBench_CXX_FLAG__Wvla - Success
-- Performing Test NVBench_CXX_FLAG__Wgnu
-- Performing Test NVBench_CXX_FLAG__Wgnu - Failed
-- Performing Test NVBench_CXX_FLAG__Wno_gnu_line_marker
-- Performing Test NVBench_CXX_FLAG__Wno_gnu_line_marker - Success
-- Found Git: /usr/bin/git (found version "2.34.1")
-- CPM: Using local package [email protected]
-- CPM: Adding package [email protected] (3.11.3)
CMake Warning (dev) at /home/rtx/miniconda3/share/cmake-3.31/Modules/FetchContent.cmake:1953 (message):
  Calling FetchContent_Populate(nlohmann_json) is deprecated, call
  FetchContent_MakeAvailable(nlohmann_json) instead.  Policy CMP0169 can be
  set to OLD to allow FetchContent_Populate(nlohmann_json) to be called
  directly for now, but the ability to call it with declared details will be
  removed completely in a future version.
Call Stack (most recent call first):
  build/cmake/CPM_0.38.5.cmake:1004 (FetchContent_Populate)
  build/cmake/CPM_0.38.5.cmake:798 (cpm_fetch_package)
  build/cmake/CPM_0.38.5.cmake:306 (CPMAddPackage)
  build/_deps/rapids-cmake-src/rapids-cmake/cpm/find.cmake:176 (CPMFindPackage)
  3rdparty/nvbench/cmake/NVBenchDependencies.cmake:27 (rapids_cpm_find)
  3rdparty/nvbench/CMakeLists.txt:57 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- NVBench CUDA architectures: native
Detected NVIDIA/CUDA.
-- Found clang-format: /usr/bin/clang-format
STATUS,black not found.
-- Found IBVerbs: /usr/include
-- Found NUMA: /usr/include
-- Using the multi-header code from /home/rtx/nvidia/flashinfer/build/_deps/json-src/include/
CMake Warning (dev) at 3rdparty/mscclpp/CMakeLists.txt:141 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at 3rdparty/mscclpp/CMakeLists.txt:143 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at 3rdparty/mscclpp/CMakeLists.txt:145 (install):
  Policy CMP0177 is not set: install() DESTINATION paths are normalized.  Run
  "cmake --help-policy CMP0177" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- The C compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Python: /home/rtx/miniconda3/bin/python3.12 (found suitable version "3.12.2", minimum required is "3.8") found components: Interpreter Development.Module
-- Found Thrust: /usr/local/cuda/include (found version "200200")
-- Compile fp8_e4m3 kernels.
-- Compile fp8_e5m2 kernels.
-- Compile bf16 kernels.
-- FLASHINFER_HEAD_DIMS=64;128;256;512
-- FLASHINFER_POS_ENCODING_MODES=0;1;2
-- FLASHINFER_ALLOW_FP16_QK_REDUCTIONS=false;true
-- FLASHINFER_MASK_MODES=0;1;2
usage: Generate cuda files [-h] --path PATH --head_dims HEAD_DIMS
                           [HEAD_DIMS ...] --pos_encoding_modes
                           POS_ENCODING_MODES [POS_ENCODING_MODES ...]
                           --allow_fp16_qk_reductions ALLOW_FP16_QK_REDUCTIONS
                           [ALLOW_FP16_QK_REDUCTIONS ...] --mask_modes
                           MASK_MODES [MASK_MODES ...] --enable_fp16
                           ENABLE_FP16 [ENABLE_FP16 ...] --enable_bf16
                           ENABLE_BF16 [ENABLE_BF16 ...]
                           [--enable_fp8_e4m3 ENABLE_FP8_E4M3 [ENABLE_FP8_E4M3 ...]]
                           [--enable_fp8_e5m2 ENABLE_FP8_E5M2 [ENABLE_FP8_E5M2 ...]]
Generate cuda files: error: the following arguments are required: --enable_fp16
-- Compile single decode kernel benchmarks.
-- Compile single decode kernel tests.
-- Compile batch decode kernel benchmarks.
-- Compile batch mla decode kernel benchmarks.
-- Compile batch decode kernel tests.
-- Compile single prefill kernel benchmarks
-- Compile single prefill kernel tests.
-- Compile batch prefill kernel benchmarks.
-- Compile batch prefill kernel tests.
-- Compile page kernel tests.
-- Compile cascade kernel benchmarks.
-- Compile cascade kernel tests.
-- Compile sampling kernel benchmarks.
-- Compile sampling kernel tests.
-- Compile normalization kernel benchmarks.
-- Compile normalization kernel tests.
-- Compile tvm binding.
-- FlashInfer uses TVM home /home/rtx/nvidia/exp/tvm.
-- Compile fastdiv test.
-- Compile fast dequant test.
-- Compile sum all-reduce kernel tests.
-- Compile attention allreduce kernel tests.
-- Configuring done (31.5s)
-- Generating done (0.5s)
-- Build files have been written to: /home/rtx/nvidia/flashinfer/build

Unix Makefiles output:

[  0%] Generating kernel sources
usage: Generate cuda files [-h] --path PATH --head_dims HEAD_DIMS [HEAD_DIMS ...]
                           --pos_encoding_modes POS_ENCODING_MODES [POS_ENCODING_MODES ...]
                           --allow_fp16_qk_reductions ALLOW_FP16_QK_REDUCTIONS
                           [ALLOW_FP16_QK_REDUCTIONS ...] --mask_modes MASK_MODES
                           [MASK_MODES ...] --enable_fp16 ENABLE_FP16 [ENABLE_FP16 ...]
                           --enable_bf16 ENABLE_BF16 [ENABLE_BF16 ...]
                           [--enable_fp8_e4m3 ENABLE_FP8_E4M3 [ENABLE_FP8_E4M3 ...]]
                           [--enable_fp8_e5m2 ENABLE_FP8_E5M2 [ENABLE_FP8_E5M2 ...]]
Generate cuda files: error: the following arguments are required: --enable_fp16
make[2]: *** [CMakeFiles/decode_kernels.dir/build.make:86: /home/rtx/nvidia/flashinfer/src/generated/batch_paged_decode_head_128_posenc_0_dtypeq_bf16_dtypekv_bf16_dtypeout_bf16_idtype_i32.cu] Error 2
make[1]: *** [CMakeFiles/Makefile2:682: CMakeFiles/decode_kernels.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

The simple modification I mentioned above solved this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant