Skip to content

Commit

Permalink
Update halo-models.md
Browse files Browse the repository at this point in the history
  • Loading branch information
IanNod authored Dec 16, 2024
1 parent 8f3182b commit de72761
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions halo-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,11 @@ ITL: Average time between each new token generated in decode phase (second token
(Model is assumed to be llama3.1 in the following table, e.g. "8B FP8" means "llama3.1 8B FP8 model")
|Item | Current Week (Dec 9-13) | Next Week (Dec 16-20) |
|------------------------------|-----------------------|--------------------------|
| Sharktank Modeling | - @Ian Finish Flux Vae decode (DONE 12/11) <br> - @Kyle finish flux model (DONE: 12/11) <br> - @Boian flux clip model export and compile for bf16 (DONE: 12/11) <br> - @Dan Finish and merge FP8 llama PR (ETA 12/12) |
| IREE codegeneration | - @kunwar decode flash attention (DONE 12/11) |
| Serving |- @ean flush out bf16 flux in shortfin for flux (ETA 12/12) <br> - @Xida fix flakiness in batch handling (Done: 12/12) <br> - @Stephen test and ensure sglang/shortfin batch runs work (ETA: 12/12) |
| Test Automation |- @Avi refresh benchmarking decode and prefill for 8B, 70B (ETA: 12/12) <br> -@Archana shortfin PPL debugging (ETA: 12/10) <br> -@Rob debug multi-device (ETA: 12/11)
| Performance Tuning | -@Avi tracy profile for decode (ETA:12/11)|
| Sharktank Modeling | - @Ian Finish Flux Vae decode (DONE 12/11) <br> - @Kyle finish flux model (DONE: 12/11) <br> - @Boian flux clip model export and compile for bf16 (DONE: 12/11) <br> - @Dan Finish and merge FP8 llama PR (ETA 12/12) | - @Rob multi-device fixes (ETA 12/16) <br> - @Boian Landing flux transformer model (ETA 12/16) <br> - @Boian updating clip and T5 tests (ETA 12/16)
| IREE codegeneration | - @kunwar decode flash attention (DONE 12/11) | @Dan Reworking fp8 attention for Stan (ETA 12/16) <br> - @Dan lowering issue for fp8 (ETA 12/17)
| Serving |- @ean flush out bf16 flux in shortfin for flux (ETA 12/12) <br> - @Xida fix flakiness in batch handling (Done: 12/12) <br> - @Stephen test and ensure sglang/shortfin batch runs work (ETA: 12/12) | - @Stephen Debugging multi-device llms in shortfin (ETA: 12/16) <br> - @Ean debugging fp16 flux pipeline (ETA 12/16) <br> - @Xida Debugging batching issue (ETA 12/16)
| Test Automation |- @Avi refresh benchmarking decode and prefill for 8B, 70B (ETA: 12/12) <br> -@Archana shortfin PPL debugging (ETA: 12/10) <br> -@Rob debug multi-device (ETA: 12/11) | - @Archana triaging PPL breakages from block size and device affinities (ETA 12/16) <br> - @Archana shortfin PPL integration (ETA 12/17)
| Performance Tuning | -@Avi tracy profile for decode (ETA:12/11)| @Avi Landing fixes for block size changes (ETA 12/16) <br> @Avi tracy profiling updates (ETA 12/17)

# Nightly Test Reports
See latest [CI/Nightly Test Report](https://nod-ai.github.io/shark-ai/). Use [Nod.AI Lab](https://confluence.amd.com/pages/viewpage.action?spaceKey=ENGIT&title=Nod.AI+Lab) page to ssh into machine SharkMi300X to find logs and artifacts to triage the failures. File an issue (if not already filed/listed) and add to Issues table below.
Expand Down

0 comments on commit de72761

Please sign in to comment.