Skip to content

in situ recurrent layering (and some ablation studies) on llama.cpp. Ugly experimental hacks. Nothing stable here.

License

Notifications You must be signed in to change notification settings

semiring/IRL-llama.cpp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is an appalling hack of llama.cpp to see if we can create in situ Frankenmerges at the computation graph building level.

Layers are specifed in the environment variable $LLAMA_CHUNKS as a string of floats of the form: "first_chunk_begin, first_chunk_end, 2nd_chunk_begin, 2nd_chunk_end, ...";

e.g.,

export LLAMA_CHUNKS="0.0,0.6,0.2,0.8,0.6,1.0"

creates a Frankenmodel that has the first 60% of the model layers, followed by a block starting at the layer 20% through, and ending at the layer 80% of the way through, and then a final block from 60% to 100%.

Note layers are always addressed as a fraction of the number of layers in the model (0.0 - 1.0).

About

in situ recurrent layering (and some ablation studies) on llama.cpp. Ugly experimental hacks. Nothing stable here.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 40.9%
  • C++ 28.7%
  • Cuda 10.6%
  • Python 6.7%
  • Metal 4.8%
  • Objective-C 4.1%
  • Other 4.2%