How to use partial GPU? #117

rifkybujana · 2024-01-10T10:57:48Z

Hi, I wonder if it's possible to use a partial portion of GPU per model instead of using 1 GPU for each model deployed? As an example, when using a G5.12xlarge instance in AWS with 4 GPUs, instead of deploying on a maximum of 4 models, by using half of the GPU, it might able to deploy eight models with quantization. Changing the num gpus per worker resulted in error.

sihanwang41 · 2024-01-10T17:12:38Z

Hi @rifkybujana , what if you change num_workers to 2, and keep num gpu per worker as it is.

lizzzcai · 2024-02-08T03:12:49Z

Any update on this? I am doing a similar test and want to know what is the best practice for deploying 8 models in a 4 GPUs instance. what is the different between 1 worker, 4 replicas and 4 workers with 1 replica each? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use partial GPU? #117

How to use partial GPU? #117

rifkybujana commented Jan 10, 2024

sihanwang41 commented Jan 10, 2024

lizzzcai commented Feb 8, 2024

How to use partial GPU? #117

How to use partial GPU? #117

Comments

rifkybujana commented Jan 10, 2024

sihanwang41 commented Jan 10, 2024

lizzzcai commented Feb 8, 2024