2 hardware agnostic front and backend #5

smedegaard · 2024-11-12T13:36:31Z

Description

This PR decouples the hardware layer from the front- and backend of TorchServe.
Relates to #740

Requirement Files

Added requirements/torch_rocm62.txt, requirements/torch_rocm61.txt and requirements/torch_rocm60.txt for easy install of dependencies needed for AMD support.

Backend

The Python backend supports currently NVIDIA GPUs using hardware specific libraries. There were also a number of functions that could be refactored using more generalized interfaces.

Changes Made to Backend

Use torch.cuda for detecting GPU availability and torch.version for differentiating between GPU vendors (NVIDIA, AMD)
Use torch.cuda for collecting GPU metrics
- Exclude nvgpu library usage which is a quick and dirty solution calling nvidia-smi and parsing its output
- Currently temporary solution for AMD GPUs which relies on using amdsmi library directly
- When the bug is changed in torch.cuda, same functions can be used for collecting metrics from different GPUs (NVIDIA, AMD)
Extend print_env_info for AMD GPUs and reimplement a number of functions
- Detect versions of HIP runtime, ROCm and MIOpen
- Collect model names of available GPUs with torch.cuda (NVIDIA, AMD)
- Use pynvml for detecting nvidia driver and cuda versions
- Use torch for detecting compiled cuda and cudnn versions
Refactor nvidia-specific code in several places

Frontend

The Java frontend that acts as the workload manager had calls to SMIs hard-coded in a few places. This made it difficult for TorchServe to support multiple hardware vendors in a graceful manner.

Changes Made to Frontend

We've introduced a new package org.pytorch.serve.device with the classes SystemInfo and Accelerator. SystemInfo holds an array list of Accelerator objects that holds static information about the specific accelerators on a machine, and the relevant metrics.

Instead of calling the SMIs directly in multiple places in the frontend code we have abstracted the hardware away by adding an instance of SystemInfo to the pre-existing ConfigManager. Now the frontend can get data from the hardware via the methods on SystemInfo without knowing about the specifics of the hardware and SMIs.

To implement the specifics for each of the vendors that was already partially supported we have created a number of utility classes that communicates with the hardware via the relevant SMI.

The following steps are taken in the SystemInfo constructor.

Detect the relevant vendor by calling which {relevant smi} for each of the supported vendors.
This is how vendor detection was done previously. There might be more robust ways. where is used on Windows systems.
When the accelerator vendor is detected it creates an instance of the relevant utility class , for example ROCmUtility for AMD.
Accelerators are detected, respecting the relevant environment variable for selecting devices. HIP_VISIBLE_DEVICES for AMD, CUDA_VISIBLE_DEVICES for nvidia and XPU_VISIBLE_DEVICES for Intel. All devices are detected if the relevant environment variable is not set.
Finally the metrics for the detected devices are updated

The following is a class diagram showing how the new classes relate to the existing code

classDiagram
    class Accelerator {
        +Integer id
        +AcceleratorVendor vendor
        +String model
        +IAcceleratorUtility acceleratorUtility
        +Float usagePercentage
        +Float memoryUtilizationPercentage
        +Integer memoryAvailableMegabytes
        +Integer memoryUtilizationMegabytes
        +getVendor()
        +getAcceleratorModel()
        +getAcceleratorId()
        +getMemoryAvailableMegaBytes()
        +getUsagePercentage()
        +getMemoryUtilizationPercentage()
        +getMemoryUtilizationMegabytes()
        +setMemoryAvailableMegaBytes()
        +setUsagePercentage()
        +setMemoryUtilizationPercentage()
        +setMemoryUtilizationMegabytes()
        +utilizationToString()
        +updateDynamicAttributes()
    }

    class SystemInfo {
        -AcceleratorVendor acceleratorVendor
        -ArrayList<Accelerator> accelerators
        -IAcceleratorUtility acceleratorUtil
        +hasAccelerators()
        +getNumberOfAccelerators()
        +getAccelerators()
        +updateAcceleratorMetrics()
    }

    class AcceleratorVendor {
        <<enumeration>>
        AMD
        NVIDIA
        INTEL
        APPLE
        UNKNOWN
    }

    class IAcceleratorUtility {
        <<interface>>
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +getUpdatedAcceleratorsUtilization()
    }

    class ICsvSmiParser {
        <<interface>>
        +csvSmiOutputToAccelerators()
    }

    class IJsonSmiParser {
        <<interface>>
        +jsonOutputToAccelerators()
        +extractAcceleratorId()
        +jsonObjectToAccelerator()
        +extractAccelerators()
    }

    class CudaUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +parseAccelerator()
        +parseUpdatedAccelerator()
    }

    class ROCmUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +extractAccelerators()
        +extractAcceleratorId()
        +jsonObjectToAccelerator()
    }

    class XpuUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +parseDiscoveryOutput()
        +parseUtilizationOutput()
    }

    class AppleUtil {
        +getGpuEnvVariableName()
        +getUtilizationSmiCommand()
        +getAvailableAccelerators()
        +smiOutputToUpdatedAccelerators()
        +jsonObjectToAccelerator()
        +extractAcceleratorId()
        +extractAccelerators()
    }

        class ConfigManager {
        -SystemInfo systemInfo
        +init(Arguments args)
    }

    class WorkerThread {
        #ConfigManager configManager
        #WorkerLifeCycle lifeCycle
    }

    class AsyncWorkerThread {
        #boolean loadingFinished
        #CountDownLatch latch
        +run()
        #connect()
    }

    class SystemInfo {
        -Logger logger
        -AcceleratorVendor acceleratorVendor
        -ArrayList<Accelerator> accelerators
        -IAcceleratorUtility acceleratorUtil
        +SystemInfo()
        -createAcceleratorUtility() IAcceleratorUtility
        -populateAccelerators()
        +hasAccelerators() boolean
        +getNumberOfAccelerators() Integer
        +static detectVendorType() AcceleratorVendor
        -static isCommandAvailable(String) boolean
        +getAccelerators() ArrayList<Accelerator>
        -updateAccelerators(List<Accelerator>)
        +updateAcceleratorMetrics()
        +getAcceleratorVendor() AcceleratorVendor
        +getVisibleDevicesEnvName() String
    }

    class Accelerator {
        +Integer id
        +AcceleratorVendor vendor
        +String model
        +Float usagePercentage
        +Float memoryUtilizationPercentage
        +Integer memoryAvailableMegabytes
        +Integer memoryUtilizationMegabytes
        +getVendor() AcceleratorVendor
        +getAcceleratorModel() String
        +getAcceleratorId() Integer
        +getUsagePercentage() Float
        +setUsagePercentage(Float)
        +setMemoryUtilizationPercentage(Float)
        +setMemoryUtilizationMegabytes(Integer)
    }

    class WorkerLifeCycle {
        -ConfigManager configManager
        -ModelManager modelManager
        -Model model
    }

    class WorkerThread {
        #ConfigManager configManager
        #int port
        #Model model
        #WorkerState state
        #WorkerLifeCycle lifeCycle

    }

    WorkerLifeCycle --> "1" ConfigManager
    WorkerLifeCycle --> "1" Model
    WorkerLifeCycle --> "1" Connector
    WorkerThread --> "1" WorkerLifeCycle

    ConfigManager "1" --> "1" SystemInfo
    ConfigManager "1" --> "*" Accelerator
    WorkerThread --> "1" ConfigManager

    WorkerThread --> "1" WorkerLifeCycle
    AsyncWorkerThread --|> WorkerThread

    SystemInfo --> "0..*" Accelerator
    SystemInfo --> "1" IAcceleratorUtility
    SystemInfo --> "1" AcceleratorVendor
    Accelerator --> "1" AcceleratorVendor
    CudaUtil ..|> IAcceleratorUtility
    CudaUtil ..|> ICsvSmiParser
    ROCmUtil ..|> IAcceleratorUtility
    ROCmUtil ..|> IJsonSmiParser
    XpuUtil ..|> IAcceleratorUtility
    XpuUtil ..|> ICsvSmiParser
    AppleUtil ..|> IAcceleratorUtility
    AppleUtil ..|> IJsonSmiParser

Documentation

Added the section "Hardware Support" in the table of contents
Moved the pages about hardware support to serve/docs/hardware_support/ and added them under "Hardware Support" in the TOC
Added the page "AMD Support"

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

We build new docker container for each platform using Dockerfile.dev and build arguments CUDA_VERSION and ROCM_VERSION

# AMD instance
docker build -f docker/Dockerfile.rocm -t torch-serve-dev-image-rocm --build-arg ROCM_VERSION=rocm62 --build-arg BUILD_FROM_SRC=true .

Run containers

# AMD instance
docker run --rm -it -w /serve --device=/dev/kfd --device=/dev/dri --entrypoint bash torch-serve-dev-image-rocm

Tests

Frontend tests, CPU

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 35s

Frontend tests, CUDA

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 5s

Frontend tests, ROCm

Logs:

> ./frontend/gradlew -p frontend clean build
...
BUILD SUCCESSFUL in 6m 43s

Backend tests, CPU

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
============================================================================ 113 passed, 30 warnings in 38.09s ============================================================================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================================================================== 20 passed in 0.36s ====================================================================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================================================================== 33 passed in 0.20s ====================================================================================

Backend tests, CUDA

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
======================================================================= 113 passed, 21 warnings in 83.76s (0:01:23) =======================================================================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================================================================== 20 passed in 0.31s ====================================================================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================================================================== 33 passed in 0.20s ====================================================================================

Backend tests, ROCm

Logs:

> python3 -m pytest ts/tests/unit_tests ts/torch_handler/unit_tests
============================ 113 passed, 21 warnings in 48.06s ============================
> cd workflow-archiver && python3 -m pytest workflow_archiver/tests/unit_tests workflow_archiver/tests/integ_tests
=================================== 20 passed in 0.32s ====================================
> cd model-archiver && python3 -m pytest model_archiver/tests/unit_tests model_archiver/tests/integ_tests
=================================== 33 passed in 0.16s ====================================

Regression tests, CPU

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
================================================================ 163 passed, 40 skipped, 15 warnings in 2014.67s (0:33:34) ================================================================

Regression tests, CUDA

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
FAILED test_handler.py::test_huggingface_bert_model_parallel_inference - assert 'Bloomberg has decided to publish a new report on the global economy' in '{\n ...
====================================================== 1 failed, 162 passed, 40 skipped, 10 warnings in 2070.51s (0:34:30) ======================================================

Regression tests, ROCm

Logs:

> git submodule update --init --recursive
> python3 test/regression_tests.py
FAILED test_handler.py::test_huggingface_bert_model_parallel_inference - assert 'Bloomberg has decided to publish a new report on the global economy' in '{\n  ...
=========== 1 failed, 162 passed, 40 skipped, 11 warnings in 2085.45s (0:34:45) ===========

OBS! The test test_handler.py::test_huggingface_bert_model_parallel_inference fails due to:

ValueError: Input length of input_ids is 150, but max_length is set to 50. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

This indicates that preprocessing uses a different max_length than inference, which can be verified when looking at the handler when the test was originally implemented: model.generate() has max_length=50 by default, while tokenizer uses max_length from setup_config (max_length=150). It seems that the bert-based Textgeneration.mar needs an update.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

'update backend to be hardware agnostic' Rony Leppänen <[email protected]> 'update frontend to be hardware agnostic' Anders Smedegaard Pedersen <[email protected]> 'update Dockerfile.dev to also work for AMD' 'update requirements/ for AMD support' Samu Tamminen <[email protected]> Other contributions: Bipradip Chowdhury <[email protected]> Jarkko Lehtiranta <[email protected]> Jarkko Vainio <[email protected]> Tero Kemppi <[email protected]>

CONTRIBUTING.md

README.md

docker/Dockerfile.dev

frontend/server/src/main/java/org/pytorch/serve/device/utils/XpuUtil.java

frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java

frontend/server/src/test/java/org/pytorch/serve/device/utils/CudaUtilTest.java

requirements/torch_rocm61.txt

jataylo · 2024-11-13T12:45:28Z

ts/metrics/system_metrics.py

+                amdsmi.amdsmi_init()
+
+                handle = amdsmi.amdsmi_get_processor_handles()[gpu_index]
+                mem_used = amdsmi.amdsmi_get_gpu_vram_usage(handle)["vram_used"]


I believe that torch.cuda.mem_get_info should work fine on our systems if we want to follow the same approach.

@jataylo thank you for the comment! I am not sure if we can do that yet, see below an example when using torch2.4.1+rocm6.1:

(venv) root@6f6ab1e7f4fb:/workspaces/torch-serve-amd# amd-smi monitor --vram GPU VRAM_USED VRAM_TOTAL 0 61496 MB 65501 MB 1 61496 MB 65501 MB 2 59062 MB 65501 MB 3 61496 MB 65501 MB 4 13 MB 65501 MB 5 13 MB 65501 MB 6 13 MB 65501 MB 7 13 MB 65501 MB (venv) root@6f6ab1e7f4fb:/workspaces/torch-serve-amd# python -c "import amdsmi; import torch; print(*[(i, amdsmi.amdsmi_get_gpu_vram_usage(amdsmi.amdsmi_get_processor_handles()[i])) for i in range(torch.cuda.device_count())], sep='\n');" (0, {'vram_total': 65501, 'vram_used': 61496}) (1, {'vram_total': 65501, 'vram_used': 61496}) (2, {'vram_total': 65501, 'vram_used': 59062}) (3, {'vram_total': 65501, 'vram_used': 61496}) (4, {'vram_total': 65501, 'vram_used': 13}) (5, {'vram_total': 65501, 'vram_used': 13}) (6, {'vram_total': 65501, 'vram_used': 13}) (7, {'vram_total': 65501, 'vram_used': 13}) (venv) root@6f6ab1e7f4fb:/workspaces/torch-serve-amd# python -c "import torch; import numpy as np; print(*[(i, np.array(torch.cuda.mem_get_info(i)) // 1024**2) for i in range(torch.cuda.device_count())], sep='\n');" (0, array([ 6146, 65520])) # vram_used 59374 (1, array([ 4046, 65520])) # vram_used 61474 (2, array([ 4046, 65520])) # vram_used 61474 (3, array([ 4046, 65520])) # vram_used 61474 (4, array([65414, 65520])) # vram_used 106 (5, array([65414, 65520])) # vram_used 106 (6, array([65414, 65520])) # vram_used 106 (7, array([65414, 65520])) # vram_used 106

Here the amdsmi and handle-based approach seems to provide correct numbers, but when using torch.cuda.mem_get_info() and accessing devices by index, the information does not seem to be correct (note that mem_get_info() returns (free, total) memory used). Seems that the device indices get somehow mixed up a bit.

Co-authored-by: Jack Taylor <[email protected]>

extend serve.ModelServerTest.testMetricManager

Add latest ROCM support

docs/hardware_support/amd_support.md

requirements/common_rocm.txt

ts/metrics/metric_collector.py

Co-authored-by: Jeff Daily <[email protected]>

jataylo

Generally looks good to me now, I think we should consult upstream feedback once any remaining comments are addressed.

But there still does seem to be unnecessary formatting changes in the java code that may want to clean up.

smedegaard and others added 5 commits November 8, 2024 17:05

Update README.md with rocm flags

ce19723

add rocm to CONTRIBUTING.md

0fad8e2

WorkerLifeCycle uses SystemInfo to get X_VISIBLE_DEVICES

3247498

AppleUtil adds Accelerator number_of_cores times

bae9b2c

smedegaard linked an issue Nov 12, 2024 that may be closed by this pull request

hardware-agnostic front- and backend #2

Open

smedegaard marked this pull request as ready for review November 12, 2024 13:37

smedegaard mentioned this pull request Nov 12, 2024

hardware-agnostic front- and backend #2

Open

jataylo reviewed Nov 13, 2024

View reviewed changes

CONTRIBUTING.md Show resolved Hide resolved

jataylo reviewed Nov 13, 2024

View reviewed changes

CONTRIBUTING.md Show resolved Hide resolved

jataylo reviewed Nov 13, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

jataylo suggested changes Nov 13, 2024

View reviewed changes

smedegaard and others added 3 commits November 13, 2024 15:46

fix typo in README.md

88f3cb8

Co-authored-by: Jack Taylor <[email protected]>

remove mention of java version from README.md

8e4d24c

revert unnecessary changes

ff4daa8

samutamm force-pushed the 2-hardware-agnostic-front-and-backend branch from 05211fa to ff4daa8 Compare November 14, 2024 08:08

jakki-amd and others added 6 commits November 14, 2024 16:20

Fix import errors in AppleUtils

0bc3e3c

remove rocm support from dockerfile.dev to simplify

1e635e1

fix missing newline

1647826

revert unnecessary changes

0dc5145

'improve formatting for amd_support.md'

f905d0e

Fix AppleUtils tests

9a515b8

smedegaard requested a review from jataylo November 20, 2024 13:05

smedegaard and others added 7 commits November 20, 2024 15:02

fixes 11. parse-metrics-failed-collecting-amd-gpu-metrics (#24)

9d30159

extend testMetricManager

8cdf54b

Merge pull request #25 from nod-ai/9-extend-java-testmetricmanager

bd95835

extend serve.ModelServerTest.testMetricManager

Add latest ROCM support

e5d382f

Merge pull request #26 from nod-ai/19-add-support-for-latest-torch-rocm

607d836

Add latest ROCM support

PR 24 system_metrics bugfix

f2d17d5

Format files

49bc051

eppane reviewed Nov 25, 2024

View reviewed changes

docs/hardware_support/amd_support.md Outdated Show resolved Hide resolved

eppane reviewed Nov 25, 2024

View reviewed changes

docs/hardware_support/amd_support.md Outdated Show resolved Hide resolved

eppane reviewed Nov 25, 2024

View reviewed changes

docs/hardware_support/amd_support.md Outdated Show resolved Hide resolved

eppane reviewed Nov 25, 2024

View reviewed changes

docs/hardware_support/amd_support.md Outdated Show resolved Hide resolved

eppane mentioned this pull request Nov 25, 2024

Fix regression tests on CUDA and ROCm #23

Closed

jeffdaily reviewed Nov 25, 2024

View reviewed changes

smedegaard and others added 7 commits November 26, 2024 09:40

Update docs/hardware_support/amd_support.md

4bff6d3

Co-authored-by: Jeff Daily <[email protected]>

typo in docs/hardware_support/amd_support.md

b9a1627

Co-authored-by: Jeff Daily <[email protected]>

Update docs/hardware_support/amd_support.md

964e5f1

Co-authored-by: Jeff Daily <[email protected]>

Update docs/hardware_support/amd_support.md

61da32e

Co-authored-by: Jeff Daily <[email protected]>

remove pyrsmi and nvgpu deps

0a4d628

metric collector revert gpu arg name

aa96f2f

fix number of metrics assertion in testMetricManager

a26eefb

jataylo approved these changes Nov 27, 2024

View reviewed changes

smedegaard and others added 8 commits November 27, 2024 12:58

'move Intel docs under Hardware Support' (#31)

f0b1dfb

Fix docstring

d330494

Add Dockerfile.rocm

cbdfe25

Remove sharing lock from bind mounts

8330233

Update Dockerfile.rocm

9e5afd0

Revert Dockerfile changes

8f35524

Update documentation for Docker support

f5ce2ec

Merge branch 'master' into 2-hardware-agnostic-front-and-backend

f03d0fd

jakki-amd closed this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 hardware agnostic front and backend #5

2 hardware agnostic front and backend #5

smedegaard commented Nov 12, 2024 •

edited by jakki-amd

Loading

jataylo Nov 13, 2024

eppane Nov 14, 2024

jataylo left a comment

2 hardware agnostic front and backend #5

2 hardware agnostic front and backend #5

Conversation

smedegaard commented Nov 12, 2024 • edited by jakki-amd Loading

Description

Requirement Files

Backend

Changes Made to Backend

Frontend

Changes Made to Frontend

Documentation

Type of change

Feature/Issue validation/testing

Tests

Checklist:

jataylo Nov 13, 2024

Choose a reason for hiding this comment

eppane Nov 14, 2024

Choose a reason for hiding this comment

jataylo left a comment

Choose a reason for hiding this comment

smedegaard commented Nov 12, 2024 •

edited by jakki-amd

Loading