Improvements to learned round #1107

Giuseppe5 · 2024-12-03T12:31:13Z

Reason for this PR

Fix entrypoint for learned scale
Fix training with float32 + amp

Testing Summary

NA

Risk Highlight

This PR includes code from another work (please detail).
This PR contains API-breaking changes.
This PR depends on work in another PR (please provide links/details).
This PR introduces new dependencies (please detail).
There are coverage gaps not covered by tests.
Documentation updates required in subsequent PR.

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

src/brevitas_examples/common/learned_round/learned_round_optimizer.py

pablomlago · 2024-12-03T15:43:53Z

src/brevitas_examples/common/learned_round/learned_round_optimizer.py

                    loss, loss_components = block_loss(quant_outs, fp_outs)
            else:
+                # Run block forward to obtain quant outputs
+                quant_outs = block_forward(block, inputs)
+                fp_outs = send_to_device(fp_outs, quant_outs.device)
                loss, loss_components = block_loss(quant_outs.to(torch.float32), fp_outs.to(torch.float32))


The code for each condition is almost exactly the same. Maybe we could have autocast(enabled=use_amp, ...) and just have a conditional for upcasting the outputs to float32 before computing the loss to avoid repeting block_forward/send_to_device/block_loss.

I understand it's less code, but I think it will be more confusing. I am leaving as it is. Extra verbosity for clarity, I am happy with that.

pablomlago

LGTM, I've included a couple of minor comments.

Giuseppe5 force-pushed the fix_llm_entrypoint branch from 00f496c to acee4d3 Compare December 3, 2024 12:52

Giuseppe5 requested a review from pablomlago December 3, 2024 13:09

pablomlago reviewed Dec 3, 2024

View reviewed changes

src/brevitas_examples/llm/llm_quant/learned_round_utils.py Outdated Show resolved Hide resolved

pablomlago reviewed Dec 3, 2024

View reviewed changes

src/brevitas_examples/common/learned_round/learned_round_optimizer.py Show resolved Hide resolved

pablomlago reviewed Dec 3, 2024

View reviewed changes

src/brevitas_examples/common/learned_round/learned_round_optimizer.py Show resolved Hide resolved

pablomlago reviewed Dec 3, 2024

View reviewed changes

pablomlago approved these changes Dec 3, 2024

View reviewed changes

Giuseppe5 added 3 commits December 3, 2024 23:13

Fix (brevitas_example/llm): fix learned_round entrypoint

d661228

Feat (ptq/learned_round): fast amp training

a9edc8e

Feat (mse): use grid search for scale

8fff8ea

Giuseppe5 force-pushed the fix_llm_entrypoint branch from a15163b to a51322e Compare December 3, 2024 23:13

Feat (brevitas_examples): Po2 per channel float OCP weight quantization

1ab9a0e

Giuseppe5 force-pushed the fix_llm_entrypoint branch from a51322e to 1ab9a0e Compare December 3, 2024 23:40

Giuseppe5 merged commit bddfe1e into Xilinx:dev Dec 3, 2024
23 checks passed

Giuseppe5 deleted the fix_llm_entrypoint branch December 3, 2024 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to learned round #1107

Improvements to learned round #1107

Giuseppe5 commented Dec 3, 2024

pablomlago Dec 3, 2024

Giuseppe5 Dec 3, 2024

pablomlago left a comment

Improvements to learned round #1107

Improvements to learned round #1107

Conversation

Giuseppe5 commented Dec 3, 2024

Reason for this PR

Testing Summary

Risk Highlight

Checklist

pablomlago Dec 3, 2024

Choose a reason for hiding this comment

Giuseppe5 Dec 3, 2024

Choose a reason for hiding this comment

pablomlago left a comment

Choose a reason for hiding this comment