metal: accelerated conv2d #17175

bghira · 2025-11-11T19:03:06Z

This is a pull of ggml-org/ggml#1384 into the llama.cpp repository for review/sync to ggml, since I'm mostly unfamiliar with the contribution process.

I noted a lack of Metal-accelerated ops in GGML and thought Conv2d would be a simple target for my first contribution.

The results for performance test on M3 Max (the only hw I have for testing) show a substantial boost from leveraging simdgroup:

Shape	Metal (GFLOPS)	CPU (GFLOPS)
19x19, Cin=256, Cout=4096, fp32	191.6	17.1
224x224, Cin=3, Cout=8, fp32	103.0	1.5
58x58, Cin=32, Cout=64, fp32	159.3	7.0

Copilot-generated summary:

This pull request adds support for 2D convolution (CONV_2D) operations in the Metal backend of GGML, enabling hardware-accelerated execution of this operation on supported Apple devices. The changes include the implementation of the Metal kernel, integration into the operation pipeline, and updates to device capability checks and argument structures.

2D Convolution (CONV_2D) Support:

Added a new Metal kernel kernel_conv_2d in ggml-metal.metal for efficient 2D convolution, with template instantiations for both float and half.
Introduced the ggml_metal_kargs_conv_2d argument struct in ggml-metal-impl.h to pass necessary parameters to the Metal kernel.
Implemented the ggml_metal_op_conv_2d function in ggml-metal-ops.cpp to encode and dispatch the 2D convolution operation.
Registered the new operation in the Metal operation pipeline and header files (ggml-metal-ops.cpp, ggml-metal-ops.h) [1] [2].
Added the pipeline getter for CONV_2D in ggml-metal-device.cpp and declared it in the header [1] [2].
Updated device capability checks to recognize CONV_2D support in ggml-metal-device.m.

Other Minor Changes:

Updated tensor API enablement logic for device compatibility, removing checks for some device models.
Fixed type consistency in argument passing for the concat operation.
Minor code cleanup and header includes [1] [2].

These changes collectively allow GGML to offload 2D convolution operations to the GPU via Metal, improving performance for models that use this operation.

metal: accelerated conv2d

20acc63

bghira requested a review from ggerganov as a code owner November 11, 2025 19:03

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 11, 2025

bghira mentioned this pull request Nov 11, 2025

metal: add ops DIAG_MASK_INF, IM2COL_3D, fix op PAD #16669

Open

DajanaV mentioned this pull request Nov 11, 2025

UPSTREAM PR #17175: metal: accelerated conv2d auroralabs-loci/llama.cpp#171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal: accelerated conv2d #17175

metal: accelerated conv2d #17175

bghira commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

metal: accelerated conv2d #17175

Are you sure you want to change the base?

metal: accelerated conv2d #17175

Conversation

bghira commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant