Inference speed worse on AMD CPU than on Intel CPU #119
Unanswered
CrazyChildren
asked this question in
Q&A
Replies: 1 comment
-
@CrazyChildren one quick check to verify if this is indeed due to bf16 (which is the likely case) is to load the model in fp32. Here's the relevant code: import pandas as pd # requires: pip install pandas
import torch
from chronos import ChronosPipeline
pipeline = ChronosPipeline.from_pretrained(
"amazon/chronos-t5-small",
device_map="cuda", # use "cpu" for CPU inference and "mps" for Apple Silicon
torch_dtype=torch.float32,
) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
i test chronos with intel core cpu(mac pro), linux with intel cpu(server), and linux with amd(server) on same code. it seems amd cpu has ~30x worse in inference time.
in intel cpu it approximate cost 0.7s with batch_num = 1, predict_len = 1, context_len = 70.
however in AMD, it about 30s.
i don't know it's my specific case. but i found some one said turn on AMP in AMD CPU by using auto_cast to bfloat16 would case decresing performance. Bfloat16 CPU inference speed is too slow on AMD cpu
i'm quite a newbie in torch. so if someone find a solution, please post here. thx
Beta Was this translation helpful? Give feedback.
All reactions