Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a bug when calculating neuron_cap before invoking the solver #231

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KiritoHugh
Copy link

For example,
ReluLLaMA-7B; NVIDIA GeForce RTX 2080 Ti 11264MiB; ffn_up,ffn_gate,ffn_down_t all are[4096,11008];
A neuron should be [4096,1] not [1,11008].

when env CUDA_VISIBLE_DEVICES=0 ./build/bin/main -m ./ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" :

  • before revising:
    slice_size=22016
    vram_bytes_per_slice=99072
    vram_allocatable_bytes=4212178944
    neuron_cap=170064

  • after revising:
    slice_size=8192
    vram_bytes_per_slice=24576
    vram_allocatable_bytes=4212178944
    neuron_cap=171394

For example, in ReluLLaMA-7B, NVIDIA GeForce RTX 2080 Ti 11264MiB; 
ffn_up,ffn_gate,ffn_down all are [4096,11008];
`env CUDA_VISIBLE_DEVICES=0 ./build/bin/main -m ./ReluLLaMA-7B/llama-7b-relu.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"`

- before revising:
`slice_size=22016`
`vram_bytes_per_slice=99072`
`vram_allocatable_bytes=4212178944`
`neuron_cap=170064`

- after revising:
`slice_size=8192`
`vram_bytes_per_slice=24576`
`vram_allocatable_bytes=4212178944`
`neuron_cap=171394`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant