[WIP] Run inference with CB/non-CB SpD Models #155

quic-agokhale · 2024-10-14T05:35:28Z

No description provided.

Akshat-Tripathi · 2024-10-24T12:24:17Z

Hey all, to help with this effort, I previously also created a minimal inference script for SpD

eplatero97 · 2024-10-30T21:17:53Z

Hello all, followed the implementation of Apoorva's inference app in tests/spd/test_spd_inference.py but tried to merge that into a more modularized pattern to fit the TextGeneration inference class in QEfficient.

this is far from perfect and needs much work and aligning to merge with QEfficient. However, I have validated this app works only for full_batch_size=1. Next week, I can further validate to make sure it works with higher batch sizes.

the general idea is that this PR will be used to validate features to export tlm/dlm work as expected:

An example of the statistics that this returns is below:

$ python3 QEfficient/generation/spd_text_generation_inference.py 
generated_output=', welcome my forum i new new this my is first.\n.\n am can help with any question problem youI happy to you with any i.\n.\n am here to you any any can.\nI to help you help any can.\nI to help help any i can am help any i i.\n am help help any i can to help you help any can am to help any i am help help any i can to help help any i am help you any am to you any can am help any can am help you any am help help any am am help you am help you any can help you am help am help'
ttft=0.0804790819529444
avg_decode_throughput=22.607819810201747
avg_num_accepted_tokens=1.9838709677419355
generated_output=', welcome my forum i new new this my is first.\n.\n am can help with any question problem youI happy to you with any i.\n.\n am here to you any any can.\nI to help you help any can.\nI to help help any i can am help any i i.\n am help help any i can to help you help any can am to help any i am help help any i can to help help any i am help you any am to you any can am help any can am help you any am help help any am am help you am help you any can help you am help am help'
ttft=0.07963072706479579
avg_decode_throughput=22.61431604245997
avg_num_accepted_tokens=1.9838709677419355

NOTE: avg_decode_throughput will be the same for each request as this calculated total number of tokens / delta_time per decode batch size.

Signed-off-by: eplatero <[email protected]>

…M/DLM Signed-off-by: eplatero <[email protected]>

Signed-off-by: eplatero <[email protected]>

quic-agokhale requested review from quic-rishinr and ochougul as code owners October 14, 2024 05:35

quic-rishinr requested a review from vbaddi October 22, 2024 07:45

eplatero97 force-pushed the run_spd_infer branch from c8d956a to 715ed2f Compare October 30, 2024 21:10

eplatero97 force-pushed the run_spd_infer branch 3 times, most recently from b9fc5e9 to b3c6da4 Compare November 19, 2024 11:59

eplatero97 force-pushed the run_spd_infer branch 2 times, most recently from ac52060 to f59a988 Compare December 11, 2024 04:13

eplatero97 added 12 commits December 10, 2024 22:13

adding spd inference script by apoorva

3f69021

Signed-off-by: eplatero <[email protected]>

use pytest parametrize configs

92410c6

Signed-off-by: eplatero <[email protected]>

rm function as it was causing some corruption when populating logits

23311ac

Signed-off-by: eplatero <[email protected]>

first draft of script getting 100% acceptance rate when using same TL…

ba08731

…M/DLM Signed-off-by: eplatero <[email protected]>

validation with full_batch_size=2 is passing

c9abb94

Signed-off-by: eplatero <[email protected]>

solved bugs with batch_size>1 and num_spec_tokens>1. 1 final bug remaing

8a1c47f

Signed-off-by: eplatero <[email protected]>

resolved some bugs

fd2c21b

Signed-off-by: eplatero <[email protected]>

rm most debug logs

86c4952

Signed-off-by: eplatero <[email protected]>

fix bug when some samples get all accepted and others do not

7e1efa3

Signed-off-by: eplatero <[email protected]>

assert spd output matches vanilla dlm output

9ec726a

Signed-off-by: eplatero <[email protected]>

linting

a1a99f5

Signed-off-by: eplatero <[email protected]>

added higher spec_len

4c1662a

Signed-off-by: eplatero <[email protected]>

ochougul merged commit 1d7c624 into quic:main Dec 18, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Run inference with CB/non-CB SpD Models #155

[WIP] Run inference with CB/non-CB SpD Models #155

quic-agokhale commented Oct 14, 2024

Akshat-Tripathi commented Oct 24, 2024

eplatero97 commented Oct 30, 2024

[WIP] Run inference with CB/non-CB SpD Models #155

[WIP] Run inference with CB/non-CB SpD Models #155

Conversation

quic-agokhale commented Oct 14, 2024

Akshat-Tripathi commented Oct 24, 2024

eplatero97 commented Oct 30, 2024