Norms Refractor #1140

phu0ngng · 2024-08-27T20:59:33Z

Description

Layernorm and RMS refractor, preparing for TE/cuDNN Norms Integration.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <[email protected]>

phu0ngng · 2024-08-27T21:19:03Z

/te-ci pytorch

timmoon10 · 2024-08-30T18:05:26Z

transformer_engine/common/layer_norm/norms.h

+#ifndef TRANSFORMER_ENGINE_COMMON_LAYER_NORM_LN_H_
+#define TRANSFORMER_ENGINE_COMMON_LAYER_NORM_LN_H_


Suggested change

#ifndef TRANSFORMER_ENGINE_COMMON_LAYER_NORM_LN_H_

#define TRANSFORMER_ENGINE_COMMON_LAYER_NORM_LN_H_

#ifndef TRANSFORMER_ENGINE_COMMON_LAYER_NORM_NORMS_H_

#define TRANSFORMER_ENGINE_COMMON_LAYER_NORM_NORMS_H_

timmoon10 · 2024-08-30T21:11:16Z

transformer_engine/common/layer_norm/norms.h

+
+#include "../common.h"
+
+namespace transformer_engine {


I think it would be cleaner if we wrapped these objects in a namespace:

Suggested change

namespace transformer_engine {

namespace transformer_engine {

namespace norms {

The names of these classes (LaunchParams, FwdParams, etc) are not very specific, so putting them in a namespace helps make clear that they are related to LayerNorm/RMSNorm.

timmoon10 · 2024-08-30T21:23:28Z

transformer_engine/common/layer_norm/norms.cpp

+                                         z->data.dtype,     // otype,
+                                         DType::kFloat32,   // ctype,
+                                         params);
+  if (params.fp8_out) set_amax();


This zeroing might be unnecessary. The FP8 scale update kernel called by PyTorch also zeros out the amax.

That said, this matches the existing behavior so no need to change right now.

timmoon10 · 2024-08-30T21:25:37Z

transformer_engine/common/layer_norm/norms.h

+
+  // Scaling factor
+  void* scale;
+  int scale_byte_size;


Are we using this value?

Suggested change

int scale_byte_size;

timmoon10 · 2024-08-30T21:26:43Z

transformer_engine/common/layer_norm/norms.h

+
+  // AMax output
+  void* amax;
+  int amax_byte_size;


If we remove the amax zeroing kernel, I don't think we are using this value:

Suggested change

int amax_byte_size;

If we want to keep an option to zero out the amax, then could we just assume the amax is FP32 like we do in the kernels?

timmoon10 · 2024-08-30T21:41:15Z

transformer_engine/common/layer_norm/norms.h

+enum NVTE_NORM_TYPE {
+  LN_FWD_TE,
+  LN_BWD_TE,
+  LN_FWD_CUDNN,
+  LN_BWD_CUDNN,
+  RMS_FWD_TE,
+  RMS_BWD_TE,
+  RMS_FWD_CUDNN,
+  RMS_BWD_CUDNN,
+};


Logically this can be split into three enums:

Suggested change

enum NVTE_NORM_TYPE {

LN_FWD_TE,

LN_BWD_TE,

LN_FWD_CUDNN,

LN_BWD_CUDNN,

RMS_FWD_TE,

RMS_BWD_TE,

RMS_FWD_CUDNN,

RMS_BWD_CUDNN,

};

enum class NormType { LayerNorm, RMSNorm };

enum class NormStage { Forward, Backward };

enum class NormImpl { TE, CUDNN };

timmoon10 · 2024-08-30T21:58:16Z

transformer_engine/common/layer_norm/norms.cpp

+
+/* #include <transformer_engine/layer_norm.h> */


Suggested change

/* #include <transformer_engine/layer_norm.h> */

timmoon10 · 2024-08-30T22:04:01Z

transformer_engine/common/layer_norm/norms.h

+template <NVTE_NORM_TYPE NormEnum, bool IF_TUNED>
+struct RegistryType<NormEnum, IF_TUNED, typename std::enable_if<IF_TE_NORMS<NormEnum>()>::type> {
+  using type = std::conditional_t<
+      IF_TUNED, std::conditional_t<IF_TE_FWD_NORMS<NormEnum>(), FwdTunedRegistry, BwdTunedRegistry>,
+      std::conditional_t<IF_TE_FWD_NORMS<NormEnum>(), FwdGeneralRegistry, BwdGeneralRegistry>>;
+};


While templating on IF_TUNED makes sense for our current kernel implementation, I don't think it's a good idea going forward. The reason TE compilation is so slow (and is unusable on some systems) is because we statically compile so many tuned LayerNorm/RMSNorm kernels. If we choose to keep the TE tuned kernels, we should port them to NVRTC so we can compile them at runtime, similar to how we handle the transpose kernels. This would completely remove the need for this static registry for tuned kernels (we'll still need it for general kernels though).

timmoon10 · 2024-08-30T22:13:31Z

transformer_engine/common/layer_norm/norms.h

+template <NVTE_NORM_TYPE NormEnum>
+class NormFwdTe : public NormBase {
+ public:
+  NormFwdTe();


I see that calling this constructor is considered an error:

TransformerEngine/transformer_engine/common/layer_norm/norms.cpp

Lines 147 to 152 in 41273eb

template <NVTE_NORM_TYPE NormEnum>

NormFwdTe<NormEnum>::NormFwdTe() {

if constexpr (NormEnum == NVTE_NORM_TYPE::LN_FWD_TE) {

NVTE_ERROR("NormFwdTe default constructor is only for its inherited classes!");

}

}

We can catch this at compile-time with:

Suggested change

NormFwdTe();

NormFwdTe() = delete;

timmoon10 · 2024-08-30T22:17:21Z

transformer_engine/common/layer_norm/norms.cpp

+template <NVTE_NORM_TYPE NormEnum>
+FwdFunction& get_fwd_launcher(DType wtype, DType itype, DType otype, DType ctype,
+                              const FwdParams& params) {
+  if constexpr (!IF_TE_FWD_NORMS<NormEnum>()) NVTE_ERROR("Unexpected NVTE_NORM_TYPE!");


We can catch these errors at compile-time with a static_assert:

Suggested change

if constexpr (!IF_TE_FWD_NORMS<NormEnum>()) NVTE_ERROR("Unexpected NVTE_NORM_TYPE!");

static_assert(IF_TE_FWD_NORMS<NormEnum>(), "Unexpected NVTE_NORM_TYPE");

I see this if constexpr (...) { NVTE_ERROR(...) } pattern used in several other places.

phu0ngng requested a review from ptrendx August 27, 2024 20:59

norms refractor

41273eb

Signed-off-by: Phuong Nguyen <[email protected]>

phu0ngng force-pushed the ln branch from 4eb4173 to 41273eb Compare August 27, 2024 21:09

phu0ngng requested a review from zlsh80826 August 28, 2024 02:40

timmoon10 reviewed Aug 30, 2024

View reviewed changes

timmoon10 self-requested a review August 30, 2024 22:29

phu0ngng marked this pull request as draft September 4, 2024 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Norms Refractor #1140

Norms Refractor #1140

phu0ngng commented Aug 27, 2024

phu0ngng commented Aug 27, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024 •

edited

Loading

timmoon10 Aug 30, 2024

timmoon10 Aug 30, 2024

		#ifndef TRANSFORMER_ENGINE_COMMON_LAYER_NORM_LN_H_
		#define TRANSFORMER_ENGINE_COMMON_LAYER_NORM_LN_H_

	namespace transformer_engine {
	namespace transformer_engine {
	namespace norms {

	template <NVTE_NORM_TYPE NormEnum>
	NormFwdTe<NormEnum>::NormFwdTe() {
	if constexpr (NormEnum == NVTE_NORM_TYPE::LN_FWD_TE) {
	NVTE_ERROR("NormFwdTe default constructor is only for its inherited classes!");
	}
	}

	if constexpr (!IF_TE_FWD_NORMS<NormEnum>()) NVTE_ERROR("Unexpected NVTE_NORM_TYPE!");
	static_assert(IF_TE_FWD_NORMS<NormEnum>(), "Unexpected NVTE_NORM_TYPE");

Norms Refractor #1140

Are you sure you want to change the base?

Norms Refractor #1140

Conversation

phu0ngng commented Aug 27, 2024

Description

Type of change

Changes

Checklist:

phu0ngng commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timmoon10 Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timmoon10 Aug 30, 2024 •

edited

Loading