-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCLomatic] Block Store headers core #1819
Conversation
@danhoeflinger @mmichel11 requesting review when available. Thanks. |
|
||
}; | ||
|
||
/// Stores a blocked arrangement of work items linear segment of items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have a more detail comments like
SYCLomatic/clang/runtime/dpct-rt/include/dpct/sparse_utils.hpp
Lines 278 to 306 in b2e5588
/// Computes a CSR format sparse matrix-dense matrix product. | |
/// C = alpha * op(A) * B + beta * C | |
/// \param [in] queue The queue where the routine should be executed. It must | |
/// have the in_order property when using the USM mode. | |
/// \param [in] trans The operation applied to the matrix A. | |
/// \param [in] sparse_rows Number of rows of the matrix A. | |
/// \param [in] dense_cols Number of columns of the matrix op(B) or C. | |
/// \param [in] sparse_cols Number of columns of the matrix A. | |
/// \param [in] alpha Scaling factor for the matrix A. | |
/// \param [in] info Matrix info of the matrix A. | |
/// \param [in] val An array containing the non-zero elements of the matrix A. | |
/// \param [in] row_ptr An array of length \p num_rows + 1. | |
/// \param [in] col_ind An array containing the column indices in index-based | |
/// numbering. | |
/// \param [in] b Data of the matrix B. | |
/// \param [in] ldb Leading dimension of the matrix B. | |
/// \param [in] beta Scaling factor for the matrix B. | |
/// \param [in, out] c Data of the matrix C. | |
/// \param [in] ldc Leading dimension of the matrix C. | |
template <typename T> | |
void csrmm(sycl::queue &queue, oneapi::mkl::transpose trans, int sparse_rows, | |
int dense_cols, int sparse_cols, const T *alpha, | |
const std::shared_ptr<matrix_info> info, const T *val, | |
const int *row_ptr, const int *col_ind, const T *b, int ldb, | |
const T *beta, T *c, int ldc) { | |
csrmm<T>(queue, trans, oneapi::mkl::transpose::nontrans, sparse_rows, | |
dense_cols, sparse_cols, alpha, info, val, row_ptr, col_ind, b, ldb, | |
beta, c, ldc); | |
} |
Please create a testing PR and link it in the description. Doing this earlier in the process as compared to load will help make this review progress more smoothly. |
Linking test PR : oneapi-src/SYCLomatic-test#680 WIP. |
// storage | ||
size_t linear_tid = item.get_local_linear_id(); | ||
OutputIteratorT workitem_itr = block_itr + linear_tid; | ||
size_t GROUP_WORK_ITEMS = item.get_global_range().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_global_range
returns the global dimensions of the kernel, so when launching more than a single work-group this will be incorrect.
We can switch this to what we did in group load: size_t group_work_items = item.get_local_range().size();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend making sure we have testing coverage for such a case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes local range should be used, all the tests are currently in 1 wg . Will extend tests for other wg sizes in a separate PR. thanks.
store_subgroup_striped(const Item &item, OutputIteratorT block_itr, | ||
InputT (&items)[ITEMS_PER_WORK_ITEM]) { | ||
|
||
// This implementation does not take in account range loading across |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe what you mean by "range loading" means in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range loading/storing refers to loading/storing within bounded intervals across the warps. Not in the full scope .
/// Stores a subgroup-striped arrangement of work items linear segment of items. | ||
// Created as free function until exchange mechanism is | ||
// implemented. | ||
// To-do: inline this function with BLOCK_STORE_WARP_TRANSPOSE mechanism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment means exactly. But also, lets use our own terminology here.
Closed as implemented. |
PR for Store header functions for Block API (related later to #1305 )
Linked with Load: #1640
cc @yihanwg @danhoeflinger @mmichel11