-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How is the Max Performance estimated? #400
Comments
Hello @LorcaQAQ, I think you refer to the max. performance calculated to see the performance ideality on Ara2, i.e., how ideal is the throughput on the maximum achievable by that kernel on the vector processor. So, max. performance is the maximum number of operations per cycle achievable, limited by the number of FPUs if the kernel is memory bound on Ara2, or limited by the memory BW if the kernel is memory bound on Ara2. Max 1 OP/cycle
Max 1.5 OP/cycle
Max 2 OP/cycle
When the kernel is memory bound, the max. performance for performance ideality is limited by the memory bandwidth (in Ara, the memory BW is 32L-bit/cycle, while the compute works at 64L-bit/cycle). Max 0.25 OP/cycle
Hope it helps! |
@mp-17 Thanks! I think I now understand the four examples. But I still have a question. In Table 2, under the first row for "matmul," what does "1 ×" represent in the expression "1 × 2.0 × L"? Additionally, in the "dropout" row, what does "2 ×" stand for? |
I am reading your manuscript on Ara2, but I have a question about your Table 2, How do you estimate the Max Performance? Could you provide some examples to illustrate the calculation? Thanks!
The text was updated successfully, but these errors were encountered: