-
Notifications
You must be signed in to change notification settings - Fork 326
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into dev/zhangrb/fused_multi_pad_cast_transpose
- Loading branch information
Showing
11 changed files
with
1,394 additions
and
1 deletion.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
21 changes: 21 additions & 0 deletions
21
transformer_engine/common/include/transformer_engine/permutation.h
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
/************************************************************************* | ||
* Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
* | ||
* See LICENSE for license information. | ||
************************************************************************/ | ||
|
||
#ifndef TRANSFORMER_ENGINE_PERMUTATION_H_ | ||
#define TRANSFORMER_ENGINE_PERMUTATION_H_ | ||
|
||
#include "transformer_engine.h" | ||
|
||
void nvte_permute(const NVTETensor input, NVTETensor output, const NVTETensor sorted_row_id, | ||
NVTETensor row_id_map, const NVTETensor prob, NVTETensor prob_grad, | ||
const NVTETensor input_fwd, const int num_rows, const int topK, | ||
const int num_cols, const int num_out_tokens, cudaStream_t stream = nullptr); | ||
|
||
void nvte_unpermute(const NVTETensor input, NVTETensor output, NVTETensor row_id_map, | ||
const NVTETensor prob, const int num_rows, const int topK, const int num_cols, | ||
cudaStream_t stream = nullptr); | ||
|
||
#endif // TRANSFORMER_ENGINE_PERMUTATION_H_ |
Oops, something went wrong.