Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add NPU support for huggingface.py #1787

Closed
wants to merge 4 commits into from

Conversation

jiaqiw09
Copy link
Contributor

@jiaqiw09 jiaqiw09 commented May 6, 2024

what this PR do

issue: #1797
This PR add NPU support for huggingface.py. It just does some fix of existing code to support NPU device.

what part to fix

Currently, the class HFLM just support three different ways to do evaluations:

  • using single card just set cuda:0
  • using accelerate to do evaluation on multiple cards
  • using device_map = 'auto' to do evaluation on multiple cards

how to fix and why

Here are explanation of my code:

- using single card just set cuda:0

Just simply add ["npu"] and ["npu:0"] to device_list as mps. If users want to use in different card, they can export ASCEND_RT_VISIBLE_DEVICES =1 or 2 or 3 and device npu to run task.

- using accelerate to do evaluation on multiple cards

The major change is just replace f"cuda:{accelerator.local_process_index}" with f"{accelerator.device}", it does the same thing, and I think it may help adapt more different devices later if accelerate supports.

- using device_map = 'auto' to do evaluation on multiple cards

For device_map = 'auto', there is something different. If people want to use device_map = 'auto' in NPUs, they can use following code

lm_eval --model hf \
    --tasks lambada_openai,arc_easy \
    --model_args parallelize=True \
    --device npu:0 \
    --batch_size 16

the card info should be set. I do this because of this issue, I met same problem in NPUs. And the problem could be solved by setting specific card. I think it's better just set the device.

@CLAassistant
Copy link

CLAassistant commented May 6, 2024

CLA assistant check
All committers have signed the CLA.

@jiaqiw09
Copy link
Contributor Author

jiaqiw09 commented May 7, 2024

@haileyschoelkopf @lintangsutawika

would you mind having a check?
Best

@jiaqiw09 jiaqiw09 marked this pull request as ready for review May 7, 2024 02:09
@jiaqiw09 jiaqiw09 force-pushed the device_npu branch 3 times, most recently from 06d4b13 to 0f4fe94 Compare May 8, 2024 04:44
Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jiaqiw09 !

  • using accelerate to do evaluation on multiple cards

The major change is just replace f"cuda:{accelerator.local_process_index}" with f"{accelerator.device}", it does the same thing, and I think it may help adapt more different devices later if accelerate supports.

This is a good change. For the rest though:

  • could we rename device_counts either back to gpus or to device_count?
  • we should be able to handle "npu:{i}". as @LSinev mentions.

Before any merge of this we'd want to be sure we can 1) make certain that this code isn't going to break any existing integrations and 2) try to keep this logic minimally invasive--currently it seems to rely on Accelerate and torch NPU support which I presume is recent. I don't want these to make requirements for installing the library more unwieldy.

Actually, would you be able to point out to me where NPU support lives in torch, or the easiest path to installing such? It's my understanding that as of #1470 this support was not native, in which case I'd be wary of merging code that requires torch.npu if this is still the case.

lm_eval/models/huggingface.py Outdated Show resolved Hide resolved
lm_eval/models/huggingface.py Outdated Show resolved Hide resolved
lm_eval/models/huggingface.py Outdated Show resolved Hide resolved
lm_eval/models/huggingface.py Outdated Show resolved Hide resolved
@jiaqiw09
Copy link
Contributor Author

jiaqiw09 commented May 20, 2024

@haileyschoelkopf @lintangsutawika

thanks for your suggestion and thanks @statelesshz support in fixing code. I have just test code in NPU and GPU, three methods all work. Would you mind having a look again?

Best

@jiaqiw09 jiaqiw09 requested a review from haileyschoelkopf May 21, 2024 13:32
@ji-huazhong
Copy link
Contributor

ji-huazhong commented May 21, 2024

cc @haileyschoelkopf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants