Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal and CoreML Backends #865

Draft
wants to merge 403 commits into
base: master
Choose a base branch
from

Conversation

ChinChangYang
Copy link
Contributor

@ChinChangYang ChinChangYang commented Dec 16, 2023

Summary:

KataGo harnesses the advanced capabilities of Apple Silicon through the integration of the Metal Performance Shaders Graph and CoreML. This integration empowers KataGo with GPU acceleration and compatibility with the Neural Engine, ensuring exceptional performance levels.

Documentation for Metal and CoreML Backends in KataGo:

https://github.com/ChinChangYang/KataGo/blob/metal-coreml-stable/docs/CoreML_Backend.md

Release:

https://github.com/ChinChangYang/KataGo/releases

Resolve:

In this commit, I added conditional compilation for the main function in the cpp/main.cpp file. This ensures that the code inside the main function is only executed if the OS is not iOS.
In this commit, I improved the device selection logic in the metalbackend.swift file. I replaced the MTLCopyAllDevices function with MTLCreateSystemDefaultDevice to select the default Metal device. Additionally, I removed code related to validating the GPU index and logging device information. Instead, I now simply log the name of the selected Metal device.
This commit adds support for creating the application support directory if it does not already exist.
This commit adds the KataGoHelper class, which provides a method to run the gtp command from the main.cpp file. The KataGoHelper class is integrated with the KataGo iOS app by calling the runGtp method asynchronously in the KataGo_iOSApp initialization.
1. Implemented a new `Message` struct that is `Identifiable`, `Equatable`, and `Hashable` for storing text messages and their IDs.
2. Created a `KataGoController` class which keeps track of messages and handles their updates.
3. Refactored `ContentView` to display the KataGo messages in a `ScrollView`.
4. Added a new `getMessageLine` method to `KataGoHelper` to get a line from KataGo output.
5. Made significant modifications to `KataGoHelper` to make it thread-safe and to accommodate new changes.
6. The `KataGo_iOSApp` now initiates KataGo GTP run in a separate thread on start.
- Introduced an actor called `MessageId` to allow only one task to access mutable state at a time
- Added a `getNextId()` function to retrieve the next ID for a message in an asynchronous manner
- Created a `Message` struct with an ID and text, utilizing the `MessageId` actor
- Modified the `KataGoController` class to include a list of messages and methods for processing and retrieving IDs
- Refactored the `ContentView` to use the updated `KataGoController` methods and removed the previous message processing logic
- Added a new method `startMessageThread()` to start a thread for processing messages from KataGo

Note: These changes improve message handling and ensure synchronized access to message IDs.
- Simplify the message processing loop in the `KataGoController` struct.
- Move the message processing tasks to the `ContentView` struct.
- Modify the `KataGoHelper` class to provide the `getOneMessageLineWithCompletion` method.

This commit refactors the message processing loop in the `KataGoController` struct to remove redundancy. It also moves the message processing tasks to the `ContentView` struct for better organization. Additionally, the `KataGoHelper` class is modified to provide the `getOneMessageLineWithCompletion` method, which is used to asynchronously retrieve message lines from KataGo.
Scroll to the last message when the "messages" array changes, by using the ID of the last message. Also, create a message task on the initial view appearance to fetch messages from KataGo and continuously append them to the list of messages.

Created a infinite while loop in the "createMessageTask" function to continuously fetch and append new messages from KataGo.
- The message ID generation is fixed to use UUID instead of a custom implementation.
- Unnecessary code related to managing message IDs is removed.

The previous implementation used a custom MessageId actor to generate and manage message IDs. This commit replaces that with the use of UUID to generate unique IDs for each message. The unnecessary code related to managing message IDs, including the MessageId actor and its methods, are removed.
- Added a `@State` property `command` to track the user's input.
- Created a `TextField` for the user to enter their message.
- Added an `onSubmit` action to send the entered command to KataGoHelper and clear the input.
- Added a `Button` to send the command to KataGoHelper and clear the input when pressed.

This change enhances the user interface by allowing them to send commands to KataGo GTP from the app.
The nullability annotation for the `sendCommand` method was fixed, ensuring that a non-null `command` parameter is expected. This change ensures better code clarity and helps prevent potential runtime issues.
- The `getOutputWithBinInputs` method's output variable names have been updated to improve readability and consistency. This commit changes `policyOutput` to `policyOutputs`, `valueOutput` to `valueOutputs`, `ownershipOutput` to `ownershipOutputs`, `miscValuesOutput` to `miscValueOutputs`, and `moreMiscValuesOutput` to `moreMiscValueOutputs`.
The `init(text: String) async` method has been changed to `init(text: String)` in order to remove the `async` attribute. Now, when entering a GTP command in the TextField, it will disable autocorrection and autocapitalization. The `onSubmit` action has been updated to append a new Message to the list of messages before sending the command. Additionally, the `await` operator has been removed from the creation of a new Message object.
- Added `CommandButton` struct to display command buttons with specific titles and actions.
- Included buttons for `genmove b`, `genmove w`, `showboard`, and `clear_board`.
- Initialized message task by adding `Initializing...` message and sending `showboard` command.
This commit adds the GobanView.swift file to KataGo iOS, which includes functions for rendering a Go board. The file defines a SwiftUI view called GobanView, which is responsible for drawing the background, lines, and star points of the board. It also calculates the dimensions of the board based on the available geometry. The GobanView struct is previewed in the GobanView_Previews struct.
This commit adds the CommandView.swift file, which contains the implementation of a view for handling commands and displaying messages. The CommandView struct includes properties and functionality for managing a list of messages, handling GTP commands, and displaying the messages in a scrollable view.
- Added a new CommandView tab for entering GTP commands and displaying messages.
- Added a new GobanView tab for displaying the Goban interface.
- Updated ContentView to use TabView to switch between tabs.
- Change maxTime value from 10 to 1 second for capping search time.
This commit adds the ability to draw black and white stones on the GobanView. The `drawBlackStone` and `drawWhiteStone` functions are implemented to draw the stones at specific coordinates. The `drawStones` function is added to the `GobanView` and calls the stone-drawing functions to draw several stones on the board.
1. Rename `StarPoint` struct to `BoardPoint` for clearer semantics.
2. Modify `drawStones` method to use ForEach for better maintainability.
3. Revise stone rendering with gradient and shadow optimizations.
- `CommandView` now uses a `messagesObject` environment object instead of a local state variable for managing messages.
- The `CommandView` no longer starts a thread in the `init()` method.
- The `CommandView` now retrieves messages from `messagesObject` and appends new messages to it.
- The `createMessageTask()` method has been moved to the `ContentView` and is now responsible for appending new messages to `messagesObject`.
- The `ContentView` now initializes and uses `stones` and `messagesObject` as environment objects.
- The `createMessageTask()` method in `ContentView` now retrieves messages from KataGo and appends them to `messagesObject`.

This commit introduces changes to improve the message management in the CommandView and ContentView structures.
…nd GobanView

The commit adds the stones and board objects as environment objects for the CommandView and GobanView structs in ContentView.swift. The stones object is added to the environment for CommandView, and the stones and board objects are added to the environment for GobanView. These environment objects allow these structs to access and update the state of the stones and board objects.
This update adds detailed instructions for running human-trained CoreML models with KataGo, including downloading and converting the checkpoint file to a CoreML model, configuring multi-threaded Metal and CoreML execution, and running the model with the katago executable. The documentation also includes notes on reorganizing the models and updating the human-trained CoreML model.
- Introduced steps in the GitHub Actions workflow to set up the human supervised learning (SL) network for testing.
- Added a step to download the human SL model from the KataGo GitHub releases and link it for the GPU error test.
- Implemented a new test using the downloaded model with the Eigen backend to evaluate GPU error for the human SL network.
- Added steps to set up both FP16 and FP32 CoreML models for the human SL network.
- Ensured the workflow includes GPU error tests for the CoreML backend using the relevant models.

This update enhances the testing framework by integrating human SL network capabilities, enabling more comprehensive evaluation of error metrics.
In the configuration file `gtp_human5k_coreml.cfg`, I have modified the settings related to the usage of CoreML devices for the neural network. This change was prompted by persistent issues with the Neural Engine, specifically its inability to pass KataGo's GPU error tests due to a high output error rate.

Changes made:
- Set `numNNServerThreadsPerModel` to 1, indicating that only one server thread will be used.
- Unified the backend setting to use the GPU only by setting `coremlDeviceToUse` to 0, while disabling Neural Engine support by commenting out the line for `coremlDeviceToUseThread1`.

Additionally, I have included comments to clarify the configuration for situations where one or two models may be utilized in the future. These changes aim to enhance the stability and performance of the model by ensuring that we rely solely on the GPU, which has shown to provide more consistent results.

This commit addresses the issue of high output errors with the Neural Engine, streamlining the configuration for better reliability.
- **numNNServerThreadsPerModel** is increased from 2 to 4. This change allocates two threads for GPU processing and two for the Neural Engine, effectively ensuring near 100% utilization of both processing units.
- Removed unused conditional compilation blocks for `USE_COREML_BACKEND`, streamlining the codebase as these parts were not contributing to any feature variations.
- Updated assertions in getCoreMLOutput for resolving a compile warning of an unused variable in the release mode.
- Added a method to retrieve model metadata descriptions in the CoreML backend to enhance clarity and debugging capabilities.
…Backend

**Summary:** This commit refactors the PolicyHead class in the CoreML model to eliminate unsupported gathering operations, allowing the model to be fully executed on the Apple Neural Engine (ANE). The change enhances performance by leveraging ANE for all inference operations.

**Details:**

- **PolicyHead Refactor:**
  - Removed operations that involved gathering policy data from the PolicyHead, which were previously required for compatibility with the CoreML framework but are not supported by the ANE.
  - This change ensures that the model can operate entirely on the ANE, maximizing performance and efficiency.

- **CoreML Backend Update:**
  - Updated the CoreML backend to accommodate the new output shapes resulting from the PolicyHead refactor.
  - Changed variable names and buffer allocations to align with the updated policy output specifications.
  - The new backend implementation is compatible only with models that integrate the recent changes, thus making previous versions of the CoreML model incompatible with the upgraded backend.

- **Impact:**
  - The previous CoreML models handling policy results in a different shape can no longer be processed by the newly upgraded CoreML backend.
  - This upgrade solidifies the commitment to optimizing for the capabilities of the Apple Neural Engine while declaring the need for users to update their models for compatibility with the new backend system.
This commit updates the CoreML model references in the GitHub Actions workflow and the setup script to the latest versions (v1.15.1) from the KataGo GitHub repository.

**Changes include:**

1. **GitHub Actions Workflow Updates:**
   - Replaced the model URLs for FP16 and FP32 models in multiple steps to use the new version `v1.15.1-coreml2`:
     - **FP16 Model**: Updated from `KataGoModel19x19fp16v14s7709731328.mlpackage.zip` to `KataGoModel19x19fp16v14s9996604416.mlpackage.zip`.
     - **FP32 Model**: Updated from `KataGoModel19x19fp32v14s7709731328.mlpackage.zip` to `KataGoModel19x19fp32v14s9996604416.mlpackage.zip`.
     - **FP32 Meta Model**: Updated from `KataGoModel19x19fp32meta1.mlpackage.zip` to `KataGoModel19x19fp32v15m1humanv0.mlpackage.zip`.
   - Ensured symbolic links point to the updated model names.

2. **Setup Script Updates:**
   - Updated the model download command for FP16 in the setup script to reflect the new version `KataGoModel19x19fp16v14s9996604416.mlpackage.zip`.
   - Added commands to download and setup the new FP32 model version `KataGoModel19x19fp32v15m1humanv0.mlpackage.zip`.
   - Adjusted the unzip command and file renaming for consistency with new model names.

**Impact:**
These changes ensure that the workflow and setup scripts use the latest models, which may include performance improvements and updates. This is crucial for maintaining compatibility and leveraging the latest features provided by the KataGo models.

**Note:**
The old model versions have been phased out from the scripts, and the new versions maintain the existing symbolic link structure for seamless integration in the build process.
This commit updates the documentation in the `CoreML_Backend.md` file to reflect the changes in the KataGo model versions and includes necessary adjustments for downloading and linking models. Key changes include:

- Updated the download links for the binary models to the latest version `v1.15.1-coreml2`, replacing the previous version `v1.13.2-coreml2`.
- Updated the symbolic links to reflect the new model filenames corresponding to the latest releases.
- Adjusted benchmark, GTP, and analysis command examples to use the new binary model filenames.
- Replaced the outdated human-trained CoreML model download link with the updated model from `v1.15.1-coreml2`.
- Enhanced clarity on linking the human-trained CoreML model in the run directory.
- Reintroduced the section for updating the human-trained CoreML model, including instructions for downloading the checkpoint and converting it to a CoreML model.

These changes ensure that the documentation provides accurate and up-to-date instructions for utilizing the CoreML backend with the latest models available.
This commit enhances the `createComputeHandle` function within the `NeuralNet` class to ensure that the instantiation of the `ComputeHandle` object is thread-safe. The modification employs a mutex to prevent simultaneous access to the critical section of code responsible for creating the `ComputeHandle` instance.

**Changes Made:**
- Introduced a static mutex variable `computeHandleMutex` to synchronize access to the `ComputeHandle` creation logic.
- Encapsulated the instantiation of `ComputeHandle` within a lock guard (`std::lock_guard`) to lock the mutex and ensure that only one thread can execute the instantiation at any given time.
- Ensured that the lock is held only during the critical section where the `ComputeHandle` instance is created, thereby minimizing contention and maximizing efficiency for other threads that might be attempting to use the `createComputeHandle` method concurrently.

**Rationale:**
The previous implementation of `createComputeHandle` allowed concurrent invocations that could lead to race conditions during the creation of `ComputeHandle`, especially since this operation involves writing data to the file system. By enforcing thread safety, we minimize the risk of corruption and enhance the robustness of the neural network's backend processing capabilities.

**Related Issues:**
- This commit addresses potential threading issues outlined in previous test processes of GitHub Actions.
Updated the model download links in the build workflow and setup script
from version v1.13.2-coreml1 to v1.15.1-coreml2 to ensure compatibility
and resolve issues related to the GPU error test.
This commit updates the version number in the source code to reflect the new coreml3 version. Both the getKataGoVersion and getKataGoVersionForHelp methods have been modified to return the updated version string.
- Renamed the meta encoder version prefix from "meta" to "m" in convert_coreml_pytorch.py for enhanced consistency.
- Updated CoreML_Backend.md to format the model directory name as code, improving clarity.
**Description:**
This commit introduces a new feature to compress the CoreML model after conversion from PyTorch. The following changes were made:

- Imported `coremltools.optimize` to leverage optimization functionalities for model compression.
- Moved the definition of the model file name to a new location for better readability.
- Added a model compression process:
  - Configured the palettization with a bit depth of 8 bits.
  - Created an optimization configuration using the defined configuring options.
  - Implemented the palettization of the model weights, resulting in a compressed model.
  - Defined a new file naming convention for the compressed model that indicates the bit configuration.
  - Implemented saving for the compressed model, followed by logging the location of the saved file.

**Impact:**
This enhancement aims to reduce the size of the finalized CoreML model, improving storage efficiency and potentially speeding up the inference process when deployed on resource-constrained environments.
…ility

This commit introduces a new method, `safelyPredict`, in the `CoreMLBackend` class to improve the robustness of the model's prediction capabilities. The following changes have been made:

1. **Retry Logic for Predictions:**
   - The `safelyPredict` function attempts to execute a prediction using the CoreML model up to two times. This is to catch transient errors that may arise during the prediction process.
   - If both attempts fail, the function falls back to a third attempt using a model compiled for CPU execution.

2. **Model Compilation Improvement:**
   - The model is now compiled with flexible compute units, allowing for better resource management based on the device's capabilities. The transition from using a boolean `useCpuAndNeuralEngine` flag to `MLComputeUnits` increases clarity and future-proofs the method by accommodating additional compute configurations.

3. **Code Refactoring:**
   - Updated the `init` method of `CoreMLBackend` and several references to the `compileBundleMLModel` method to align with the new parameters.
   - Adjusted corresponding unit tests in `CoreMLModelTest` to align with the new parameters.

4. **Error Handling:**
   - Introduced enhanced error handling within the `safelyPredict` method, ensuring that any issues during the prediction process are properly managed and do not crash the application.
Changed the `model` property in `CoreMLBackend` from a constant to a variable to allow reassignment when recompiling the model.

- Updated the `safelyPredict` function to handle prediction failures more gracefully:
  - Reorganized the logic to include a loop that attempts compilation and prediction with both cached and recompilation strategies.
  - Introduced a new private method `compileAndPredict` to encapsulate the model compilation and prediction logic, improving code readability and maintainability.

- Enhanced the `KataGoModel` class by modifying the `compileBundleMLModel` and `compileMLModel` methods to accept a `mustCompile` parameter, allowing conditional recompilation of the model based on input flags.

- This change addresses issues where the model fails to produce valid predictions by ensuring a fresh compilation under specific circumstances, improving overall reliability in predicting with CoreML models.
This update introduces a new optional argument, `-nbits`, that allows users to specify the number of bits to use when palettizing model weights. The weights are palettized during conversion, improving flexibility and enabling different quantization levels based on user preference. The code also handles cases where no palettization is applied.
- Introduced a new command-line argument `-sparsity` to specify the target sparsity level for pruning weights during model conversion.
- Updated the CoreML model conversion process to include a sparsity configuration that prunes weights according to the specified target.
- Adjustments made to ensure that models can be converted with both weight pruning and quantization.
- Introduced OpLinearQuantizerConfig and linear_quantize_weights functions.
- Added support for 8-bit weight quantization based on a predefined weight threshold.
- Enhanced the existing weight pruning process to include joint compression options.
- Updated argument handling for sparsity, ensuring default values are set correctly.
Updated `convert_coreml_pytorch.py` to add a sparsity description for pruned models and modified the compression description for better clarity. Now includes default empty sparsity description when no pruning is applied.
- Introduced a new argument '-prune-to-zero' to allow users to prune all weights to zero, creating a null model during export.
- Updated the `write_weights` function to handle the new pruning logic, ensuring models can be exported as zero-weight models if desired.
…lity

- Added detailed docstrings to functions for better documentation.
- Separated version printing into a dedicated function.
- Consolidated argument parsing into a single function for clarity.
- Modularized model tracing and conversion logic for better separation of concerns.
- Improved handling of optional parameters with defaults.
- Enhanced error handling with try-except block in the main execution flow.
- Cleaned up variable names and function calls for readability.

This refactoring aims to improve maintainability and enhance the clarity of the code structure while preserving existing functionality.
- Updated nbits choices to include 6, 3, and additional granularity options.
- Changed the quantization mode to "linear" for improved accuracy.
- Enhanced the palettization configuration with 'kmeans' mode and per-grouped channel granularity for better performance.
- Removed unnecessary weight threshold parameter in quantization for cleaner code.

These changes optimize the quantization process, improving both accuracy and latency.
Updated the logic for determining the meta encoder version to handle cases where the metadata encoder is not present or the version is missing from the configuration. This ensures the correct version is set and prevents errors during conversion.
Enhanced the logic for determining the minimum deployment target based on model sparsity and the number of bits specified. The updated conditions provide clearer handling for different scenarios, ensuring compatibility with iOS16 for 8-bit models while maintaining support for iOS18 for others.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant