Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCLomatic] Block Store headers core #1819

Closed
wants to merge 27 commits into from

Conversation

abhilash1910
Copy link
Contributor

PR for Store header functions for Block API (related later to #1305 )
Linked with Load: #1640
cc @yihanwg @danhoeflinger @mmichel11

@abhilash1910 abhilash1910 requested a review from a team as a code owner March 25, 2024 07:31
@abhilash1910
Copy link
Contributor Author

@danhoeflinger @mmichel11 requesting review when available. Thanks.


};

/// Stores a blocked arrangement of work items linear segment of items.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have a more detail comments like

/// Computes a CSR format sparse matrix-dense matrix product.
/// C = alpha * op(A) * B + beta * C
/// \param [in] queue The queue where the routine should be executed. It must
/// have the in_order property when using the USM mode.
/// \param [in] trans The operation applied to the matrix A.
/// \param [in] sparse_rows Number of rows of the matrix A.
/// \param [in] dense_cols Number of columns of the matrix op(B) or C.
/// \param [in] sparse_cols Number of columns of the matrix A.
/// \param [in] alpha Scaling factor for the matrix A.
/// \param [in] info Matrix info of the matrix A.
/// \param [in] val An array containing the non-zero elements of the matrix A.
/// \param [in] row_ptr An array of length \p num_rows + 1.
/// \param [in] col_ind An array containing the column indices in index-based
/// numbering.
/// \param [in] b Data of the matrix B.
/// \param [in] ldb Leading dimension of the matrix B.
/// \param [in] beta Scaling factor for the matrix B.
/// \param [in, out] c Data of the matrix C.
/// \param [in] ldc Leading dimension of the matrix C.
template <typename T>
void csrmm(sycl::queue &queue, oneapi::mkl::transpose trans, int sparse_rows,
int dense_cols, int sparse_cols, const T *alpha,
const std::shared_ptr<matrix_info> info, const T *val,
const int *row_ptr, const int *col_ind, const T *b, int ldb,
const T *beta, T *c, int ldc) {
csrmm<T>(queue, trans, oneapi::mkl::transpose::nontrans, sparse_rows,
dense_cols, sparse_cols, alpha, info, val, row_ptr, col_ind, b, ldb,
beta, c, ldc);
}

@danhoeflinger
Copy link
Contributor

danhoeflinger commented May 30, 2024

Please create a testing PR and link it in the description. Doing this earlier in the process as compared to load will help make this review progress more smoothly.

@abhilash1910
Copy link
Contributor Author

Linking test PR : oneapi-src/SYCLomatic-test#680 WIP.

// storage
size_t linear_tid = item.get_local_linear_id();
OutputIteratorT workitem_itr = block_itr + linear_tid;
size_t GROUP_WORK_ITEMS = item.get_global_range().size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_global_range returns the global dimensions of the kernel, so when launching more than a single work-group this will be incorrect.

We can switch this to what we did in group load: size_t group_work_items = item.get_local_range().size();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend making sure we have testing coverage for such a case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes local range should be used, all the tests are currently in 1 wg . Will extend tests for other wg sizes in a separate PR. thanks.

store_subgroup_striped(const Item &item, OutputIteratorT block_itr,
InputT (&items)[ITEMS_PER_WORK_ITEM]) {

// This implementation does not take in account range loading across
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe what you mean by "range loading" means in this context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range loading/storing refers to loading/storing within bounded intervals across the warps. Not in the full scope .

/// Stores a subgroup-striped arrangement of work items linear segment of items.
// Created as free function until exchange mechanism is
// implemented.
// To-do: inline this function with BLOCK_STORE_WARP_TRANSPOSE mechanism
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment means exactly. But also, lets use our own terminology here.

@abhilash1910
Copy link
Contributor Author

Closed as implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants