Inference Problem #42

LKAMING97 · 2024-09-11T14:31:04Z

Why does it take so long to infer just two pictures?

LKAMING97 · 2024-09-11T14:35:08Z

python interleaved_generation.py -i 'Please introduce the city of Gyumri with pictures.' -s "./test/"

LKAMING97 · 2024-09-11T14:37:22Z

It was running for ages so I stopped

Instruction: draw a dog
Batch size: 2
VQModel loaded from data/tokenizer/vqgan.ckpt
^CTraceback (most recent call last):
  File "/root/autodl-tmp/anole/text2image.py", line 71, in <module>
    main(args)
  File "/root/autodl-tmp/anole/text2image.py", line 46, in main
    image_tokens: torch.LongTensor = model.generate(
  File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 665, in generate
    tokens = [t.id for t in self.stream(*args, **kwargs)]
  File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 665, in <listcomp>
    tokens = [t.id for t in self.stream(*args, **kwargs)]
  File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 649, in stream
    while key_token := self.dctx.res_q.get():
  File "/root/miniconda3/lib/python3.10/multiprocessing/queues.py", line 103, in get
    res = self._recv_bytes()
  File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
^CException ignored in atexit callback: <function _exit_function at 0x7f7291b91b40>
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/multiprocessing/util.py", line 334, in _exit_function
    _run_finalizers(0)
  File "/root/miniconda3/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/root/miniconda3/lib/python3.10/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/root/miniconda3/lib/python3.10/multiprocessing/managers.py", line 674, in _finalize_manager
    process.join(timeout=1.0)
  File "/root/miniconda3/lib/python3.10/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/root/miniconda3/lib/python3.10/multiprocessing/popen_fork.py", line 40, in wait
    if not wait([self.sentinel], timeout):
  File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 936, in wait
    ready = selector.select(timeout)
  File "/root/miniconda3/lib/python3.10/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt: 
^C

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:98:00.0 Off |                  N/A |
|  0%   25C    P8             22W /  370W |     564MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  |   00000000:B1:00.0 Off |                  N/A |
|  0%   26C    P8             15W /  370W |       4MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

LKAMING97 · 2024-09-11T14:50:58Z

I can't generate anything in your example. What did I do wrong?

EthanC111 · 2024-09-13T13:34:15Z

Thank you for your interest! Inference on Anole-7b requires at least 20GB of memory. It might be related to memory issue. Do you mind try using another GPU with larger memory? Thanks!

LKAMING97 · 2024-09-13T13:51:27Z

Can I not perform model inference with a single 3090 24GB ?

…

-LLLKAMING- ***@***.***

------------------ Original ------------------ From: Ethan Chern ***@***.***> Date: Fri,Sep 13,2024 9:34 PM To: GAIR-NLP/anole ***@***.***> Cc: LKAMING ***@***.***>, Author ***@***.***> Subject: Re: [GAIR-NLP/anole] Inference Problem (Issue #42) Thank you for your interest! Inference on Anole-7b requires at least 20GB of memory. It might be related to memory issue. Do you mind try using another GPU with larger memory? Thanks! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

EthanC111 · 2024-09-14T03:41:37Z

Hi, quantization might be helpful: https://github.com/GAIR-NLP/anole/pull/21/files
If not, please let us know! Thanks.

LKAMING97 · 2024-09-24T02:27:23Z

I reinstalled the environment and ran it according to the steps, but this still happens.

Traceback (most recent call last):
  File "interleaved_generation.py", line 5, in <module>
    from chameleon.inference.chameleon import ChameleonInferenceModel, Options
  File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 32, in <module>
    from chameleon.inference import loader
  File "/root/autodl-tmp/anole/chameleon/inference/loader.py", line 13, in <module>
    from chameleon.inference.transformer import ModelArgs, Transformer
  File "/root/autodl-tmp/anole/chameleon/inference/transformer.py", line 19, in <module>
    class ModelArgs:
  File "/root/autodl-tmp/anole/chameleon/inference/transformer.py", line 24, in ModelArgs
    n_kv_heads: int | None = None
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

LKAMING97 · 2024-09-24T02:35:20Z

how to fix it

LKAMING97 · 2024-09-24T02:39:29Z

I have been installing according to your steps, but I keep having problems, which makes me very frustrated.

Chaoran-F · 2024-09-25T02:14:01Z

Use Python 3.10 or change them to the type of "rank: Union[int, None] = None", I recommend to use Python 3.10, I found a lot place need to change .

Lulahei · 2024-09-25T11:14:05Z

Hi, quantization might be helpful: https://github.com/GAIR-NLP/anole/pull/21/files If not, please let us know! Thanks.

after i use the quantization function,the program also says OutOfMemoryError as to:

Instruction: draw a dog
Batch size: 10
VQModel loaded from /data/mjl/model_zoo/Anole-7b-v0.1/tokenizer/vqgan.ckpt
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data/mjl/anole-main/chameleon/inference/chameleon.py", line 495, in _worker_impl
    model = loader.load_model(model, rank=rank)
  File "/data/mjl/anole-main/chameleon/inference/loader.py", line 61, in load_model
    return _convert(
  File "/data/mjl/anole-main/chameleon/inference/loader.py", line 23, in _convert
    torch.load(str(consolidated_path), map_location='cuda'),
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1014, in load
    return _load(opened_zipfile,
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1422, in _load
    result = unpickler.load()
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1392, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1366, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1296, in restore_location
    return default_restore_location(storage, map_location)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 381, in default_restore_location
    result = fn(storage, location)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 279, in _cuda_deserialize
    return obj.cuda(device)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/_utils.py", line 114, in _cuda
    untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacty of 23.69 GiB of which 18.75 MiB is free. Process 46513 has 558.00 MiB memory in use. Including non-PyTorch memory, this process has 23.12 GiB memory in use. Of the allocated memory 22.83 GiB is allocated by PyTorch, and 1.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
^CTraceback (most recent call last):
  File "/data/mjl/anole-main/text2image.py", line 83, in <module>
    main(args)
  File "/data/mjl/anole-main/text2image.py", line 29, in main
    unquantized_model = ChameleonInferenceModel(
  File "/data/mjl/anole-main/chameleon/inference/chameleon.py", line 569, in __init__
    self.dctx.ready_barrier.wait()
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/threading.py", line 668, in wait
    self._wait(timeout)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/threading.py", line 703, in _wait
    if not self._cond.wait_for(lambda : self._state != 0, timeout):
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/synchronize.py", line 313, in wait_for
    self.wait(waittime)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/synchronize.py", line 261, in wait
    return self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt

my device also RTX 3090 ,i dont know how to solve this problem.If you can help me solve it, I would be extremely grateful

Lulahei · 2024-09-25T13:16:33Z

also,before quantization the free memory is 18.75 MiB,after quantization the free memory is 18.75 MiB too,is the function is not work?

XiaoShuhong · 2025-01-10T11:30:53Z

Hi, quantization might be helpful: https://github.com/GAIR-NLP/anole/pull/21/files If not, please let us know! Thanks.

after i use the quantization function,the program also says OutOfMemoryError as to:

Instruction: draw a dog
Batch size: 10
VQModel loaded from /data/mjl/model_zoo/Anole-7b-v0.1/tokenizer/vqgan.ckpt
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data/mjl/anole-main/chameleon/inference/chameleon.py", line 495, in _worker_impl
    model = loader.load_model(model, rank=rank)
  File "/data/mjl/anole-main/chameleon/inference/loader.py", line 61, in load_model
    return _convert(
  File "/data/mjl/anole-main/chameleon/inference/loader.py", line 23, in _convert
    torch.load(str(consolidated_path), map_location='cuda'),
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1014, in load
    return _load(opened_zipfile,
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1422, in _load
    result = unpickler.load()
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1392, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1366, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1296, in restore_location
    return default_restore_location(storage, map_location)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 381, in default_restore_location
    result = fn(storage, location)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 279, in _cuda_deserialize
    return obj.cuda(device)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/_utils.py", line 114, in _cuda
    untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacty of 23.69 GiB of which 18.75 MiB is free. Process 46513 has 558.00 MiB memory in use. Including non-PyTorch memory, this process has 23.12 GiB memory in use. Of the allocated memory 22.83 GiB is allocated by PyTorch, and 1.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
^CTraceback (most recent call last):
  File "/data/mjl/anole-main/text2image.py", line 83, in <module>
    main(args)
  File "/data/mjl/anole-main/text2image.py", line 29, in main
    unquantized_model = ChameleonInferenceModel(
  File "/data/mjl/anole-main/chameleon/inference/chameleon.py", line 569, in __init__
    self.dctx.ready_barrier.wait()
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/threading.py", line 668, in wait
    self._wait(timeout)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/threading.py", line 703, in _wait
    if not self._cond.wait_for(lambda : self._state != 0, timeout):
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/synchronize.py", line 313, in wait_for
    self.wait(waittime)
  File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/synchronize.py", line 261, in wait
    return self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt

my device also RTX 3090 ,i dont know how to solve this problem.If you can help me solve it, I would be extremely grateful

Same device as yours. This issue occurs during model initialization(unquantized_model = ChameleonInferenceModel()), which is why quantization has not taken effect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Problem #42

Inference Problem #42

LKAMING97 commented Sep 11, 2024

LKAMING97 commented Sep 11, 2024

LKAMING97 commented Sep 11, 2024

LKAMING97 commented Sep 11, 2024

EthanC111 commented Sep 13, 2024

LKAMING97 commented Sep 13, 2024 via email •

edited

Loading

EthanC111 commented Sep 14, 2024

LKAMING97 commented Sep 24, 2024

LKAMING97 commented Sep 24, 2024

LKAMING97 commented Sep 24, 2024

Chaoran-F commented Sep 25, 2024

Lulahei commented Sep 25, 2024

Lulahei commented Sep 25, 2024

XiaoShuhong commented Jan 10, 2025

Inference Problem #42

Inference Problem #42

Comments

LKAMING97 commented Sep 11, 2024

LKAMING97 commented Sep 11, 2024

LKAMING97 commented Sep 11, 2024

LKAMING97 commented Sep 11, 2024

EthanC111 commented Sep 13, 2024

LKAMING97 commented Sep 13, 2024 via email • edited Loading

EthanC111 commented Sep 14, 2024

LKAMING97 commented Sep 24, 2024

LKAMING97 commented Sep 24, 2024

LKAMING97 commented Sep 24, 2024

Chaoran-F commented Sep 25, 2024

Lulahei commented Sep 25, 2024

Lulahei commented Sep 25, 2024

XiaoShuhong commented Jan 10, 2025

LKAMING97 commented Sep 13, 2024 via email •

edited

Loading