add NPU support for huggingface.py #1787

jiaqiw09 · 2024-05-06T11:42:24Z

what this PR do

issue: #1797
This PR add NPU support for huggingface.py. It just does some fix of existing code to support NPU device.

what part to fix

Currently, the class HFLM just support three different ways to do evaluations:

using single card just set cuda:0
using accelerate to do evaluation on multiple cards
using device_map = 'auto' to do evaluation on multiple cards

how to fix and why

Here are explanation of my code:

- using single card just set cuda:0

Just simply add ["npu"] and ["npu:0"] to device_list as mps. If users want to use in different card, they can export ASCEND_RT_VISIBLE_DEVICES =1 or 2 or 3 and device npu to run task.

- using accelerate to do evaluation on multiple cards

The major change is just replace f"cuda:{accelerator.local_process_index}" with f"{accelerator.device}", it does the same thing, and I think it may help adapt more different devices later if accelerate supports.

- using device_map = 'auto' to do evaluation on multiple cards

For device_map = 'auto', there is something different. If people want to use device_map = 'auto' in NPUs, they can use following code

lm_eval --model hf \
    --tasks lambada_openai,arc_easy \
    --model_args parallelize=True \
    --device npu:0 \
    --batch_size 16

the card info should be set. I do this because of this issue, I met same problem in NPUs. And the problem could be solved by setting specific card. I think it's better just set the device.

CLAassistant · 2024-05-06T11:42:30Z

All committers have signed the CLA.

jiaqiw09 · 2024-05-07T02:09:13Z

@haileyschoelkopf @lintangsutawika

would you mind having a check?
Best

lm_eval/models/huggingface.py

haileyschoelkopf

Hi @jiaqiw09 !

using accelerate to do evaluation on multiple cards

The major change is just replace f"cuda:{accelerator.local_process_index}" with f"{accelerator.device}", it does the same thing, and I think it may help adapt more different devices later if accelerate supports.

This is a good change. For the rest though:

could we rename device_counts either back to gpus or to device_count?
we should be able to handle "npu:{i}". as @LSinev mentions.

Before any merge of this we'd want to be sure we can 1) make certain that this code isn't going to break any existing integrations and 2) try to keep this logic minimally invasive--currently it seems to rely on Accelerate and torch NPU support which I presume is recent. I don't want these to make requirements for installing the library more unwieldy.

Actually, would you be able to point out to me where NPU support lives in torch, or the easiest path to installing such? It's my understanding that as of #1470 this support was not native, in which case I'd be wary of merging code that requires torch.npu if this is still the case.

lm_eval/models/huggingface.py

jiaqiw09 · 2024-05-20T06:55:37Z

@haileyschoelkopf @lintangsutawika

thanks for your suggestion and thanks @statelesshz support in fixing code. I have just test code in NPU and GPU, three methods all work. Would you mind having a look again?

Best

ji-huazhong · 2024-05-21T14:25:58Z

cc @haileyschoelkopf

jiaqiw09 force-pushed the device_npu branch from b5099a9 to f21a02d Compare May 6, 2024 11:55

jiaqiw09 marked this pull request as ready for review May 7, 2024 02:09

jiaqiw09 requested review from haileyschoelkopf and lintangsutawika as code owners May 7, 2024 02:09

jiaqiw09 mentioned this pull request May 7, 2024

Add NPU support for huggingface.py #1797

Closed

LSinev reviewed May 7, 2024

View reviewed changes

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

jiaqiw09 force-pushed the device_npu branch 3 times, most recently from 06d4b13 to 0f4fe94 Compare May 8, 2024 04:44

LSinev reviewed May 8, 2024

View reviewed changes

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

LSinev reviewed May 8, 2024

View reviewed changes

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

jiaqiw09 force-pushed the device_npu branch from cbbef02 to db5c319 Compare May 16, 2024 12:09

haileyschoelkopf requested changes May 19, 2024

View reviewed changes

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved

ji-huazhong force-pushed the device_npu branch from 4035784 to f69c221 Compare May 20, 2024 02:54

jiaqiw09 requested a review from haileyschoelkopf May 21, 2024 13:32

jiaqiw09 and others added 3 commits May 25, 2024 14:32

add NPU support to huggingface.py

85ed4a4

fix npu support in single card

4688b19

rename device_counts back to gpus and code format

864262b

ji-huazhong force-pushed the device_npu branch from 19ffe12 to 81ab18a Compare May 25, 2024 06:32

remove is_npu_available

06d27d3

ji-huazhong force-pushed the device_npu branch from 81ab18a to 06d27d3 Compare May 25, 2024 06:39

jiaqiw09 closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add NPU support for huggingface.py #1787

add NPU support for huggingface.py #1787

jiaqiw09 commented May 6, 2024 •

edited

Loading

CLAassistant commented May 6, 2024 •

edited

Loading

jiaqiw09 commented May 7, 2024 •

edited

Loading

haileyschoelkopf left a comment

jiaqiw09 commented May 20, 2024 •

edited

Loading

ji-huazhong commented May 21, 2024 •

edited

Loading

add NPU support for huggingface.py #1787

add NPU support for huggingface.py #1787

Conversation

jiaqiw09 commented May 6, 2024 • edited Loading

what this PR do

what part to fix

how to fix and why

CLAassistant commented May 6, 2024 • edited Loading

jiaqiw09 commented May 7, 2024 • edited Loading

haileyschoelkopf left a comment

Choose a reason for hiding this comment

jiaqiw09 commented May 20, 2024 • edited Loading

ji-huazhong commented May 21, 2024 • edited Loading

jiaqiw09 commented May 6, 2024 •

edited

Loading

CLAassistant commented May 6, 2024 •

edited

Loading

jiaqiw09 commented May 7, 2024 •

edited

Loading

jiaqiw09 commented May 20, 2024 •

edited

Loading

ji-huazhong commented May 21, 2024 •

edited

Loading