You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The optimal bucket sizes should be determined by prompt length distribution in a target application. Here, we adopt bucket lengths: 128, 256, 384, 512. Any input prompt with up to 2,047 tokens requires up to 4 graph executions. For example, a 1,500 input prompt with generation length of 256 requires 260 graph executions - 4 to process the input, and 256 to generate the output.
How is a 1,500 input prompt divided into 4 excutions? What are the section lengths? Could we have some more details, please?
The text was updated successfully, but these errors were encountered:
Hi,
According to this blog:
How is a 1,500 input prompt divided into 4 excutions? What are the section lengths? Could we have some more details, please?
The text was updated successfully, but these errors were encountered: