Drafting polymorphic value support for dtensors. #3940

csarofeen · 2025-02-21T16:57:11Z

Was giving this a quick shot but was having some difficulty. Just posting as a heads up for @wujingyue @syed-ahmed

I would prefer not to modify polymorphic_value.h so heavily (looks like I also removed some functions that are needed). To prevent modifying polymorphic value so much we'd need a simplier DistributedTensor implementation that doen'st depend on DeviceMesh and ParallelType

We just end up getting in some circular dependencies with type.h

A bit more context:
I intend to change all return types of nvFuser to KernelArgumentHolder which holds a vector of PolymorphicValue. So, if we want to return distributed tensors they should work with PolymorphicValue.

github-actions · 2025-02-21T16:58:02Z

Description

Added support for polymorphic value in distributed tensors.
Implemented isSame and toTensor functions in polymorphic_value.cpp.
Enhanced DistributedTensor class with copy constructor and assignment operator.
Updated Val class to use PolymorphicValue_functions::castToDtype.

Changes walkthrough 📝

Relevant files

Enhancement

9 files

device_mesh.cpp `Include additional headers for polymorphic value support.`	+3/-2
polymorphic_value.cpp Implement `isSame` and `toTensor` functions.	+91/-5
distributed_tensor.cpp `Add copy constructor, assignment operator, and destructor for` `DistributedTensor.`	+31/-1
fusion_definition.cpp Update `FusionDefinition::execute` to use pointer for `DeviceMesh`.	+1/-1
base_nodes.h Update `Val` class to use `PolymorphicValue_functions::castToDtype`.	+1/-1
polymorphic_value.h `Add DistributedTensor to PolymorphicValue and remove redundant functions.`	+19/-169
distributed_tensor.h Update `DistributedTensor` to use `std::unique_ptr` for `DeviceMesh`.	+12/-14
fusion_record.h `Update ScalarRecord to use PolymorphicValue_functions::castToDtype.`	+3/-1
type.h Remove redundant `castToDtype` function.	+0/-23

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Possible Issue

The removal of several functions from polymorphic_value.h might lead to missing functionality. Ensure that the removed functions are not needed elsewhere in the codebase.

inline bool isSameNanSensitive(const T& a, const T& b) {
  if (isNan(a) && isNan(b)) {
    return true;
  }
  return a == b;
}

bool isSame(const PolymorphicValue& a, const PolymorphicValue& b);

// Convert scalars, vector of scalars, vector of vector of scalars, etc., into
// an at::Tensor. device argument allows for the creation of CPU Scalars.
PolymorphicValue toTensor(
    const PolymorphicValue& x,
    at::DeviceType device_type = at::kCUDA,
    int8_t device_index = 0);

constexpr bool isScalar(const PolymorphicValue& x) {
  return x.is<int64_t>() || x.is<double>() || x.is<bool>() ||
      x.is<std::complex<double>>();
}

// Convert PolymorphicValue to c10::Scalar.
inline c10::Scalar toScalar(const PolymorphicValue& x) {
  if (x.is<double>()) {
    return (c10::Scalar)x.as<double>();
  } else if (x.is<int64_t>()) {
    return (c10::Scalar)x.as<int64_t>();
  } else if (x.is<bool>()) {
    return (c10::Scalar)x.as<bool>();
  } else if (x.is<std::complex<double>>()) {
    return (c10::complex<double>)x.as<std::complex<double>>();
  }
  NVF_THROW("Cannot convert ", x, " to a scalar.");
}

PolymorphicValue IValueToPolymorphicValue(const c10::IValue& val);

c10::IValue toIValue(const PolymorphicValue& x);

PolymorphicValue castToDtype(PolymorphicValue value, const DataType& dtype);

} // namespace PolymorphicValue_functions

} // namespace nvfuser

#include <struct.inl>

Circular Dependency

Including DeviceMesh and ParallelType in distributed_tensor.h might introduce circular dependencies. Ensure that the inclusion is necessary and does not cause issues.

namespace nvfuser {
class DeviceMesh;
enum class ParallelType;
namespace python_frontend {

Memory Management

The use of std::make_unique in the copy constructor and assignment operator might lead to memory management issues. Ensure that the ownership semantics are correct and that there are no memory leaks.

DistributedTensor::DistributedTensor(const DistributedTensor& other)
    : local_(other.local_), mesh_(std::make_unique<DeviceMesh>(*other.mesh_)) {
  axis_sharded_on_ = other.axis_sharded_on_;
}

DistributedTensor& DistributedTensor::operator=(
    const DistributedTensor& other) {
  if (this != &other) {
    local_ = other.local_;
    mesh_ = std::make_unique<DeviceMesh>(*other.mesh_);
    axis_sharded_on_ = other.axis_sharded_on_;
  }
  return *this;
}

DistributedTensor::~DistributedTensor() = default;

wujingyue · 2025-02-21T19:08:55Z

I think it'll help if I do https://github.com/NVIDIA/Fuser/blob/main/csrc/python_frontend/fusion_definition.h#L210. I can do that today. I didn't do that because I like to bundle up mesh, shadings and tensor for clarity.

Can you remind me why returning PValue all the way to python frontend helps latency? Thunder talks in at::tensor so at some point we'll have to extract at::tensor out of PValue

Drafting polymorphic value support for dtensors.

1eb22a7

csarofeen marked this pull request as draft February 21, 2025 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drafting polymorphic value support for dtensors. #3940

Drafting polymorphic value support for dtensors. #3940

csarofeen commented Feb 21, 2025 •

edited

Loading

github-actions bot commented Feb 21, 2025

wujingyue commented Feb 21, 2025

Drafting polymorphic value support for dtensors. #3940

Are you sure you want to change the base?

Drafting polymorphic value support for dtensors. #3940

Conversation

csarofeen commented Feb 21, 2025 • edited Loading

github-actions bot commented Feb 21, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

wujingyue commented Feb 21, 2025

csarofeen commented Feb 21, 2025 •

edited

Loading