Add Metal backend core ETMetal runtime. #15020

manuelcandales · 2025-10-10T21:01:37Z

This commit introduces the foundational Metal backend runtime.

Key features:

ETMetalStream for managing Metal devices, command queues, buffers, and synchronization.
ETMetalShaderLibrary for compiling Metal shader source and caching pipeline states.
ETMetalKernelFunction for kernel argument binding, dispatching, and synchronization with stream-managed encoders.
Added global buffer management and pointer tracking between host and Metal buffers.
Added global stream management utilities and synchronization helpers

This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads.

[ghstack-poisoned]

manuelcandales · 2025-10-10T21:01:39Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-10-10T21:01:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15020

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d37e7ef with merge base 6e0c9f6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

mergennachin

See inline

mergennachin · 2025-10-12T16:13:24Z

backends/apple/metal/runtime/shims/et_metal.mm

+    // Commit buffer and allow immediate reuse for better performance
+    [commandBuffer_ commit];
+    ET_LOG(Debug, "ETMetalStream::commitAndContinue: Committed buffer %p with continue", commandBuffer_);
+


Don't you need to release the buffer after commit?

[commandBuffer_ release]; commandBuffer_ = nil;

With commitAndContinue we commit and continue reusing the buffer, until we flush. So, we don't release it after commit.

mergennachin · 2025-10-12T16:17:29Z

backends/apple/metal/runtime/shims/et_metal.mm

+    if (cps_) [cps_ retain];
+    if (func_) [func_ retain];


Why do you need these retain lines? Aren't these already owned by the class?

I think retaining ensures that the objects remain valid for the lifetime of the ETMetalKernelFunction instance, preventing them from being deallocated elsewhere. Without the retains cps_ and func_ could become dangling pointers.
Similar patter followed in PyTorch here

mergennachin · 2025-10-12T16:18:13Z

backends/apple/metal/runtime/shims/et_metal.mm

+        // Don't release encoder_ here - the stream owns it
+        // Only clean up our own references
+        if (cps_) {
+            [cps_ release];
+            cps_ = nil;
+        }
+        if (func_) {
+            [func_ release];
+            func_ = nil;
+        }
+
+        encoder_ = nil; // Clear reference without releasing


Just this?

cps_ = nil; func_ = nil; encoder_ =nil;

No, we need to release them, because we own them now.
Same pattern followed in PyTorch here

mergennachin · 2025-10-12T16:19:02Z

backends/apple/metal/runtime/shims/et_metal.mm

+                          resultsDictionary:results
+                        executionDescriptor:nil];
+
+            //synchronize(syncType);


Why commented out?

left over of debugging, deleted.

backends/apple/metal/runtime/shims/et_metal.h

mergennachin · 2025-10-12T16:38:54Z

backends/apple/metal/runtime/shims/et_metal.h

+extern "C" {
+#endif
+
+// Memory management functions for Metal


Docblock about the memory management aspect of this design, such as buffer lifecycle, thread safety etc.

Buffer management is something I want to change. I think I want to have some kind of RAII instead of a global map. But I was hoping to do that after this first landing. Is it ok if I add documentation here later?

mergennachin · 2025-10-12T16:41:55Z

backends/apple/metal/runtime/shims/et_metal.h

+
+// C++ only - expose the Metal buffer mapping
+#ifdef __OBJC__
+extern std::unordered_map<void*, MTLBuffer_t> ptr_to_mtl_buffer;


do you need a lock and thread safety to access ptr_to_mtl_buffer?

It is currently working without one, because the operations are being executed in sequence. I don't see any CPU threading in the AOTI MPS side. But again, I want to replace this design with something more robust.

backends/apple/metal/runtime/shims/et_metal.mm

[ghstack-poisoned]

This commit introduces the foundational Metal backend runtime. Key features: - ETMetalStream for managing Metal devices, command queues, buffers, and synchronization. - ETMetalShaderLibrary for compiling Metal shader source and caching pipeline states. - ETMetalKernelFunction for kernel argument binding, dispatching, and synchronization with stream-managed encoders. - Added global buffer management and pointer tracking between host and Metal buffers. - Added global stream management utilities and synchronization helpers This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads. ghstack-source-id: ea4fbb5 ghstack-comment-id: 3392300041 Pull-Request: pytorch#15020

[ghstack-poisoned]

larryliu0820 · 2025-10-16T18:43:20Z

backends/apple/metal/runtime/shims/et_metal.mm

+
+    @autoreleasepool {
+        // Case 1: Device-to-device copy - use GPU blit encoder (most efficient)
+        if (src_is_device && dst_is_device) {


Do we need this? Shouldn't we already have unified memory for all the M series?

larryliu0820 · 2025-10-16T18:48:15Z

backends/apple/metal/runtime/shims/et_metal.mm

+// Global storage to keep shared_ptr alive while raw pointers are used
+static std::unordered_map<ETMetalKernelFunction*, std::shared_ptr<ETMetalKernelFunction>> function_storage;


Huh. Can you explain why we need this? Is the raw pointer mapping to its own shared_ptr version?

We need this to keep the shared_ptr alive, while the raw pointer is still in use

larryliu0820 · 2025-10-16T18:50:21Z

backends/apple/metal/runtime/shims/et_metal.h

+// =======================
+// ETMetalStream - Metal command buffer and synchronization management


noob question, how much of https://github.com/pytorch/executorch/blob/main/backends/apple/mps/runtime/MPSStream.h can be reused here?

They are all adaptations of the PyTorch's MPSStream.h.
I think with enough time (not right now) we should refactor PyTorch's MPS classes in order to be ATen agnostic, and be able to reuse them.

So, basically what I am saying is, I don't think we should reuse code from the ET MPS backend, instead, we should refactor PyTorch's code and reuse that.

manuelcandales added 5 commits October 10, 2025 13:29

Update

6420712

[ghstack-poisoned]

Update

d036c07

[ghstack-poisoned]

Update

1a22c5e

[ghstack-poisoned]

Update

d6f0bc9

[ghstack-poisoned]

Update

7e11615

[ghstack-poisoned]

manuelcandales requested review from cccclai and shoumikhin as code owners October 10, 2025 21:01

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2025

manuelcandales requested review from larryliu0820 and mergennachin and removed request for cccclai and shoumikhin October 10, 2025 21:03

manuelcandales added 2 commits October 11, 2025 15:47

Update

ca5f1e5

[ghstack-poisoned]

Update

7e971b0

[ghstack-poisoned]

mergennachin reviewed Oct 12, 2025

View reviewed changes

manuelcandales added 6 commits October 13, 2025 12:46

Update

c4c16aa

[ghstack-poisoned]

Update

ce0f085

[ghstack-poisoned]

Update

e391e17

[ghstack-poisoned]

Update

3572de8

[ghstack-poisoned]

Update

89d3f14

[ghstack-poisoned]

Update

7590e37

[ghstack-poisoned]

Update

bea144f

[ghstack-poisoned]

manuelcandales added the release notes: none Do not include this in the release notes label Oct 13, 2025

manuelcandales added 4 commits October 14, 2025 20:57

Update

971a762

[ghstack-poisoned]

Update

3425f17

[ghstack-poisoned]

Update

f46adc5

[ghstack-poisoned]

Update

16d863c

[ghstack-poisoned]

Base automatically changed from gh/manuelcandales/139/head to main October 15, 2025 02:58

manuelcandales added 4 commits October 14, 2025 23:00

Update

b782bb5

[ghstack-poisoned]

Update

71f87b6

[ghstack-poisoned]

Update

95a7024

[ghstack-poisoned]

Update

d37e7ef

[ghstack-poisoned]

larryliu0820 reviewed Oct 16, 2025

View reviewed changes

larryliu0820 approved these changes Oct 16, 2025

View reviewed changes

manuelcandales merged commit 3e38cbe into main Oct 17, 2025
145 checks passed

manuelcandales deleted the gh/manuelcandales/140/head branch October 17, 2025 01:57

		// Global storage to keep shared_ptr alive while raw pointers are used
		static std::unordered_map<ETMetalKernelFunction*, std::shared_ptr<ETMetalKernelFunction>> function_storage;

		// =======================
		// ETMetalStream - Metal command buffer and synchronization management

Add Metal backend core ETMetal runtime. #15020

Add Metal backend core ETMetal runtime. #15020

Uh oh!

Conversation

manuelcandales commented Oct 10, 2025

Uh oh!

manuelcandales commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15020

✅ No Failures

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

manuelcandales commented Oct 10, 2025 •

edited

Loading

pytorch-bot bot commented Oct 10, 2025 •

edited

Loading