-
Notifications
You must be signed in to change notification settings - Fork 687
Add Metal backend core ETMetal runtime. #15020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15020
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d37e7ef with merge base 6e0c9f6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline
// Commit buffer and allow immediate reuse for better performance | ||
[commandBuffer_ commit]; | ||
ET_LOG(Debug, "ETMetalStream::commitAndContinue: Committed buffer %p with continue", commandBuffer_); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to release the buffer after commit?
[commandBuffer_ release];
commandBuffer_ = nil;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With commitAndContinue we commit and continue reusing the buffer, until we flush. So, we don't release it after commit.
if (cps_) [cps_ retain]; | ||
if (func_) [func_ retain]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need these retain
lines? Aren't these already owned by the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think retaining ensures that the objects remain valid for the lifetime of the ETMetalKernelFunction instance, preventing them from being deallocated elsewhere. Without the retains cps_ and func_ could become dangling pointers.
Similar patter followed in PyTorch here
// Don't release encoder_ here - the stream owns it | ||
// Only clean up our own references | ||
if (cps_) { | ||
[cps_ release]; | ||
cps_ = nil; | ||
} | ||
if (func_) { | ||
[func_ release]; | ||
func_ = nil; | ||
} | ||
|
||
encoder_ = nil; // Clear reference without releasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just this?
cps_ = nil;
func_ = nil;
encoder_ =nil;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we need to release them, because we own them now.
Same pattern followed in PyTorch here
resultsDictionary:results | ||
executionDescriptor:nil]; | ||
|
||
//synchronize(syncType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over of debugging, deleted.
extern "C" { | ||
#endif | ||
|
||
// Memory management functions for Metal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docblock about the memory management aspect of this design, such as buffer lifecycle, thread safety etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer management is something I want to change. I think I want to have some kind of RAII instead of a global map. But I was hoping to do that after this first landing. Is it ok if I add documentation here later?
|
||
// C++ only - expose the Metal buffer mapping | ||
#ifdef __OBJC__ | ||
extern std::unordered_map<void*, MTLBuffer_t> ptr_to_mtl_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need a lock and thread safety to access ptr_to_mtl_buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is currently working without one, because the operations are being executed in sequence. I don't see any CPU threading in the AOTI MPS side. But again, I want to replace this design with something more robust.
This commit introduces the foundational Metal backend runtime. Key features: - ETMetalStream for managing Metal devices, command queues, buffers, and synchronization. - ETMetalShaderLibrary for compiling Metal shader source and caching pipeline states. - ETMetalKernelFunction for kernel argument binding, dispatching, and synchronization with stream-managed encoders. - Added global buffer management and pointer tracking between host and Metal buffers. - Added global stream management utilities and synchronization helpers This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads. ghstack-source-id: ea4fbb5 ghstack-comment-id: 3392300041 Pull-Request: pytorch#15020
|
||
@autoreleasepool { | ||
// Case 1: Device-to-device copy - use GPU blit encoder (most efficient) | ||
if (src_is_device && dst_is_device) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Shouldn't we already have unified memory for all the M series?
// Global storage to keep shared_ptr alive while raw pointers are used | ||
static std::unordered_map<ETMetalKernelFunction*, std::shared_ptr<ETMetalKernelFunction>> function_storage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Can you explain why we need this? Is the raw pointer mapping to its own shared_ptr version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this to keep the shared_ptr alive, while the raw pointer is still in use
// ======================= | ||
// ETMetalStream - Metal command buffer and synchronization management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noob question, how much of https://github.com/pytorch/executorch/blob/main/backends/apple/mps/runtime/MPSStream.h can be reused here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all adaptations of the PyTorch's MPSStream.h.
I think with enough time (not right now) we should refactor PyTorch's MPS classes in order to be ATen agnostic, and be able to reuse them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically what I am saying is, I don't think we should reuse code from the ET MPS backend, instead, we should refactor PyTorch's code and reuse that.
This commit introduces the foundational Metal backend runtime.
Key features:
This provides the necessary runtime primitives for executing compute shaders and MPSGraph workloads.