Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XPU support (duplicate #125) #209

Closed
wants to merge 26 commits into from
Closed

Add XPU support (duplicate #125) #209

wants to merge 26 commits into from

Conversation

ma595
Copy link
Member

@ma595 ma595 commented Dec 20, 2024

Adds XPU support to examples and associated instructions in the documentation.

Copy link

Cpp-Linter Report ⚠️

Some files did not pass the configured checks!

clang-format (v12.0.0) reports: 1 file(s) not formatted
  • src/ctorch.cpp

Have any feedback or feature suggestions? Share it here.

@ma595
Copy link
Member Author

ma595 commented Dec 20, 2024

Build script for CSD3 (@ma595 needs to check this works end to end).

module purge
module load default-dawn
module load intel-oneapi-compilers/2025.0.3/gcc/sb5vj5us
module load gcc/14.2.0/vaetnoca
module load python/3.11.9/gcc/7xr7o47s

python3 -m venv ./venv3-pvc
source venv3-pvc/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu

git clone [email protected]:Cambridge-ICCS/FTorch.git
cd FTorch/src; mkdir build; cd build

export TORCH=$(python -c "import torch; print(torch.__path__[0])")

export CMAKE_PREFIX_PATH=$TORCH

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/src/build/install

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/src/build/install -DCMAKE_BUILD_TESTS=TRUE  -DCMAKE_Fortran_COMPILER=$(which ifx)

cmake --build . --target install

@ma595 ma595 changed the title Add PVC support (duplicate #125) Add XPU support (duplicate #125) Dec 20, 2024
@ma595
Copy link
Member Author

ma595 commented Dec 20, 2024

Running the 2_ResNet_18 example (using gfortran).

./resnet_infer_fortran

[ERROR]: 0 <= device && static_cast<size_t>(device) < device_allocators.size() INTERNAL ASSERT FAILED at "/pytorch/c10/xpu/XPUCachingAllocator.cpp":555, please report a bug to PyTorch. Allocator not initialized for device 0: did you call init?

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x14626d1e6688 in ???
#1  0x146251733d68 in ???
#2  0x1463501f95af in ???
#3  0x14633846fca4 in ???
#4  0x1463385010bc in ???
#5  0x14633835ca87 in ???
#6  0x146350e79501 in ???
#7  0x1463501fc1f6 in ???
#8  0x146350e65fc6 in ???
Segmentation fault

@ma595 ma595 self-assigned this Dec 21, 2024
@ma595
Copy link
Member Author

ma595 commented Jan 24, 2025

import torch
torch.xpu.is_available()
>>>True
torch.xpu.is_initialized()
>>>False
torch.xpu.init()
torch.xpu.is_initialized()
>>>True
a = torch.tensor([1,2,3])
a.to('xpu') 
>>>tensor([1, 2, 3], device='xpu:0')
torch.xpu.is_initialized()
>>> True

Failing example:

import torch
torch.jit.load("examples/2_ResNet18/saved_resnet18_model_cpu.pt")
>>>
[WARNING] Failed to create Level Zero tracer: 2013265921
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/venv3-pvc/lib/python3.11/site-packages/torch/jit/_serialization.py", line 163, in load
    cpp_module = torch._C.import_ir_module(cu, os.fspath(f), map_location, _extra_files, _restore_shapes)  # type: ignore[call-arg]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: 0 <= device && static_cast<size_t>(device) < device_allocators.size() INTERNAL ASSERT FAILED at "/pytorch/c10/xpu/XPUCachingAllocator.cpp":555, please report a bug to PyTorch. Allocator not initialized for device 0: did you call init?

Solution

Successful example:

import torch
torch.xpu.init()
torch.jit.load("examples/2_ResNet18/saved_resnet18_model_cpu.pt")
>>>
lots of model output here.

@jwallwork23
Copy link
Contributor

6034cb7 should've been "DO NOT MERGE".

@jwallwork23
Copy link
Contributor

Closing as superseded by #276.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Related to buiding and running on GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants