-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Update LLM entry-point #987
Conversation
d530ab6
to
2030000
Compare
@Giuseppe5, I've removed the quant_embedding support. It is currently broken since channelwise scaling is not supported for Quantizing the embedding seems to have limited utility anyway because:
|
|
||
|
||
@torch.no_grad() | ||
def add_zero_bias_to_linear(model: torch.nn.Module) -> torch.nn.Module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this here for loading checkpoint + bias correction?
We have a context manager for that now (load_quant_model in graph/calibrate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no it's for accelerate compatibility, nevermind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is to make bias correction work with accelerate properly. Can this also be handled by the context manager you mentioned?
|
||
from brevitas.graph.calibrate import bias_correction_mode | ||
|
||
|
||
@torch.no_grad() | ||
def apply_bias_correction(model, dataloader): | ||
with bias_correction_mode(model): | ||
for inps in dataloader: | ||
for inps in tqdm(dataloader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add tqdm as required dependency (in examples requirements but at this point maybe everywhere)
default=None, | ||
help="Filename to save checkpoint. If `None`, no checkpoint is saved (default: %(default)s)") | ||
add_bool_arg( | ||
parser, 'use-ocp', default=False, help='Use OCP format for float quantization. Default: False') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this and then I'll update the entrypoint here same style as the stable diffusion one in #971
Save for the |
There currently is no requirements file for the LLM example. I'm adding one in #1002. I'll add |
Addresses #889. Updated entry-point to leverage many features of our optimum-amd integration effort, as well as update the example to use available quantizers.
Builds on #977Now merged.Todo:
Update to use fx tracing method from optimum-amdNot necessary, already implemented--ln-affine-merge
fails--weight-equalization
--act-equalization layerwise
--act-equalization fx
--bias-corr
--act-calibration
--gptq
--replace-mha
Allow optional quantization of first (embedded) layerDisabled, see commenttorchmlirfails, won't fix in this PRtorchmlir (packed weights)fails, won't fix in this PR