Skip to content

[ntuple] improve type name renormalization #19323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 24 additions & 12 deletions roottest/root/ntuple/atlas-datavector/read.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,38 @@
#include <ROOT/RNTupleModel.hxx>
#include <ROOT/RNTupleReader.hxx>

#include <TClassEdit.h>

#include <gtest/gtest.h>

#include "AtlasLikeDataVector.hxx"

TEST(RNTupleAtlasDataVector, Read)
{
const auto typeNameBefore = ROOT::RField<AtlasLikeDataVector<CustomStruct>>::TypeName();
const std::string expectedBefore{"AtlasLikeDataVector<CustomStruct,DataModel_detail::NoBase>"};
// Make sure no autoloading happened yet
ASSERT_EQ(typeNameBefore, expectedBefore);
const std::string fullTypeName{"AtlasLikeDataVector<CustomStruct,DataModel_detail::NoBase>"};
const std::string shortTypeName{"AtlasLikeDataVector<CustomStruct>"};
// Make sure that TypeName() expands the optional template argument
EXPECT_EQ(fullTypeName, ROOT::RField<AtlasLikeDataVector<CustomStruct>>::TypeName());

// Make sure autoloading did not happen yet, so ROOT Meta also expands the optional template argument
EXPECT_EQ(fullTypeName, ROOT::Internal::GetDemangledTypeName(typeid(AtlasLikeDataVector<CustomStruct>)));

// By creating the RField, we autoload the dictionary. Subsequently, ROOT Meta normalizes the type name
// taking into account that AtlasLikeDataVector inherits from KeepFirstTemplateArguments<1>
EXPECT_EQ(shortTypeName, ROOT::RField<AtlasLikeDataVector<CustomStruct>>("f").GetTypeName());
EXPECT_EQ(shortTypeName, ROOT::Internal::GetDemangledTypeName(typeid(AtlasLikeDataVector<CustomStruct>)));
// Ensure that RField<T>::TypeName() is not changing depending on the loaded dictionaries
EXPECT_EQ(fullTypeName, ROOT::RField<AtlasLikeDataVector<CustomStruct>>::TypeName());

// Ensure that we can access the field by typeid, short name, and long name and the
// type name checks will be fine with it
auto reader = ROOT::RNTupleReader::Open("ntpl", "test_ntuple_datavector.root");
const auto &entry = reader->GetModel().GetDefaultEntry();
// The following call should not throw an exception
ASSERT_NO_THROW(entry.GetPtr<AtlasLikeDataVector<CustomStruct>>("my_field"));

const auto typeNameAfter = ROOT::RField<AtlasLikeDataVector<CustomStruct>>::TypeName();
const std::string expectedAfter{"AtlasLikeDataVector<CustomStruct>"};
// Make sure autoloading happened and the rule to suppress the second template argument kicked in
Comment on lines -24 to -25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that removing the test of the actual ATLAS issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. don't we need both the original testing and a new test?

Copy link
Contributor Author

@jblomer jblomer Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem observed by ATLAS is tested above. This particular part tests the TypeName() function only, which in the DataVector case is different from RField<T>::GetTypeName(). What we test now (after the patch) is that TypeName() always gives the same result independent of the currently loaded libraries.

It's arguably weak for that purpose and we may drop the TypeName() part here altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing something as I dont see anymore the test for the short name being there ... I am especially thrown off by the comment that is being removed:

Make sure autoloading happened and the rule to suppress the second template argument kicked in

which seems explicitly say that the above call and follow test are exactly for the ATLAS scenario ... and/or where should that comment move to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 21 is the actual test:

ASSERT_NO_THROW(entry.GetPtr<AtlasLikeDataVector<CustomStruct>>("my_field"));

We ask for a shared pointer without the extra template argument, and it works even tough on disk the extra argument is present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At which point in the new test is the autoloading of the library triggered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure. I guess by the constructor of ROOT::RField<AtlasLikeDataVector<CustomStruct>>? But does it matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does in the sense that the previous code/test was checking both before and after the autoloading, ensuring that the autoloading did happen. Without these tests, the test can later be unwittingly updated to no longer test the autoloading at all (for example by explicitly linking the dictionary library) and thus removing the test of 'RNTuple use of a specific time customized by the users will ensure that the user's library is auto-loaded' (and thus increasing the possibility that we unwittingly stop supporting it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Test is updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal can this be resolved?

ASSERT_EQ(typeNameAfter, expectedAfter);
AtlasLikeDataVector<CustomStruct> dummy;
EXPECT_NO_THROW(reader->GetView("my_field", &dummy));
EXPECT_NO_THROW(reader->GetView("my_field", &dummy, fullTypeName));
EXPECT_NO_THROW(reader->GetView("my_field", &dummy, shortTypeName));
EXPECT_NO_THROW(reader->GetView<AtlasLikeDataVector<CustomStruct>>("my_field"));
EXPECT_NO_THROW(reader->GetModel().GetDefaultEntry().GetPtr<AtlasLikeDataVector<CustomStruct>>("my_field"));
}

int main(int argc, char **argv)
Expand Down
8 changes: 5 additions & 3 deletions tree/ntuple/doc/BinaryFormatSpecification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# RNTuple Binary Format Specification 1.0.0.1
# RNTuple Binary Format Specification 1.0.0.2

## Versioning Notes

Expand Down Expand Up @@ -805,10 +805,12 @@ e.g. `std::vector<MyEvent>` or `std::vector<std::vector<float>>`.
### Type Name Normalization

Type names are stored according to the following normalization rules
- The type name of a field has typedefs and usings fully resolved (except for the following rule).
- The type name of a field has typedefs and usings fully resolved (except for the following two rules).
- The integer types `signed char`, `unsigned char`, and `[signed|unsigned](short|int|long[ long])`
are replaced by the corresponding (at the time of writing) `std::[u]int(8|16|32|64)_t` standard integer typedef.
- Qualifiers `volatile` and `const` that do not appear in template arguments are removed.
- Supported stdlib types are not further resolved (e.g., `std::string` is _not_ stored as `std::basic_string<char>`).
- C style array types (`T[N][M]`) are mapped to stdlib arrays (`std::array<std::array<T,M>,N>`)
- Qualifiers `volatile` and `const` that do not appear in template arguments of user-defined types are removed.
- The `class`, `struct`, and `enum` keywords are removed.
- Type names are fully qualified by the namespace in which they are declared;
the root namespace ('::' prefix) is stripped.
Expand Down
2 changes: 1 addition & 1 deletion tree/ntuple/inc/ROOT/REntry.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ private:
void EnsureMatchingType(ROOT::RFieldToken token [[maybe_unused]]) const
{
if constexpr (!std::is_void_v<T>) {
if (fFieldTypes[token.fIndex] != ROOT::RField<T>::TypeName()) {
if (!Internal::IsMatchingFieldType<T>(fFieldTypes[token.fIndex])) {
throw RException(R__FAIL("type mismatch for field " + FindFieldName(token) + ": " +
fFieldTypes[token.fIndex] + " vs. " + ROOT::RField<T>::TypeName()));
}
Expand Down
23 changes: 21 additions & 2 deletions tree/ntuple/inc/ROOT/RField.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ public:
template <typename T, typename = void>
class RField final : public RClassField {
public:
static std::string TypeName() { return ROOT::Internal::GetRenormalizedDemangledTypeName(typeid(T)); }
static std::string TypeName() { return ROOT::Internal::GetRenormalizedTypeName(typeid(T)); }
RField(std::string_view name) : RClassField(name, TypeName())
{
static_assert(std::is_class_v<T>, "no I/O support for this basic C++ type");
Expand Down Expand Up @@ -498,10 +498,29 @@ public:
// Has to be implemented after the definition of all RField<T> types
// The void type is specialized in RField.cxx

namespace Internal {

/// Helper to check if a given type name is the one expected of Field<T>. Usually, this check can be done by
/// type renormalization of the demangled type name T. The failure case, however, needs to additionally check for
/// ROOT-specific special cases.
template <class T>
bool IsMatchingFieldType(const std::string &typeName)
{
if (typeName == ROOT::RField<T>::TypeName())
return true;

// The typeName may be equal to the alternative, short type name issued by Meta. This is a rare case used, e.g.,
// by the ATLAS DataVector class to hide a default template parameter from the on-disk type name.
// Thus, we check again using first ROOT Meta normalization followed by RNTuple re-normalization.
return (typeName == ROOT::Internal::GetRenormalizedTypeName(ROOT::Internal::GetDemangledTypeName(typeid(T))));
}

} // namespace Internal

template <typename T>
std::unique_ptr<T, typename RFieldBase::RCreateObjectDeleter<T>::deleter> RFieldBase::CreateObject() const
{
if (GetTypeName() != RField<T>::TypeName()) {
if (!Internal::IsMatchingFieldType<T>(GetTypeName())) {
throw RException(
R__FAIL("type mismatch for field " + GetFieldName() + ": " + GetTypeName() + " vs. " + RField<T>::TypeName()));
}
Expand Down
2 changes: 1 addition & 1 deletion tree/ntuple/inc/ROOT/RField/RFieldProxiedCollection.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ struct IsCollectionProxy : HasCollectionProxyMemberType<T> {
template <typename T>
class RField<T, typename std::enable_if<IsCollectionProxy<T>::value>::type> final : public RProxiedCollectionField {
public:
static std::string TypeName() { return ROOT::Internal::GetRenormalizedDemangledTypeName(typeid(T)); }
static std::string TypeName() { return ROOT::Internal::GetRenormalizedTypeName(typeid(T)); }
RField(std::string_view name) : RProxiedCollectionField(name, TypeName())
{
static_assert(std::is_class<T>::value, "collection proxy unsupported for fundamental types");
Expand Down
15 changes: 6 additions & 9 deletions tree/ntuple/inc/ROOT/RFieldUtils.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,16 @@ namespace Internal {
/// Applies RNTuple specific type name normalization rules (see specs) that help the string parsing in
/// RFieldBase::Create(). The normalization of templated types does not include full normalization of the
/// template arguments (hence "Prefix").
/// Furthermore, if the type is a C-style array, rules are applied to the base type and the C style array
/// is then mapped to an std::array.
std::string GetCanonicalTypePrefix(const std::string &typeName);

/// Given a type name normalized by ROOT meta, renormalize it for RNTuple. E.g., insert std::prefix.
std::string GetRenormalizedTypeName(const std::string &metaNormalizedName);

/// Given a type info ask ROOT meta to demangle it, then renormalize the resulting type name for RNTuple. Useful to
/// ensure that e.g. fundamental types are normalized to the type used by RNTuple (e.g. int -> std::int32_t).
std::string GetRenormalizedDemangledTypeName(const std::type_info &ti);
std::string GetRenormalizedTypeName(const std::type_info &ti);

/// Applies all RNTuple type normalization rules except typedef resolution.
std::string GetNormalizedUnresolvedTypeName(const std::string &origName);
Expand All @@ -48,17 +50,12 @@ enum class ERNTupleSerializationMode {

ERNTupleSerializationMode GetRNTupleSerializationMode(TClass *cl);

/// Parse a type name of the form `T[n][m]...` and return the base type `T` and a vector that contains,
/// in order, the declared size for each dimension, e.g. for `unsigned char[1][2][3]` it returns the tuple
/// `{"unsigned char", {1, 2, 3}}`. Extra whitespace in `typeName` should be removed before calling this function.
///
/// If `typeName` is not an array type, it returns a tuple `{T, {}}`. On error, it returns a default-constructed tuple.
std::tuple<std::string, std::vector<std::size_t>> ParseArrayType(const std::string &typeName);

/// Used in RFieldBase::Create() in order to get the comma-separated list of template types
/// E.g., gets {"int", "std::variant<double,int>"} from "int,std::variant<double,int>".
/// If maxArgs > 0, stop tokenizing after the given number of tokens are found. Used to strip
/// STL allocator and other optional arguments.
/// TODO(jblomer): Try to merge with TClassEdit::TSplitType
std::vector<std::string> TokenizeTypeList(std::string_view templateType);
std::vector<std::string> TokenizeTypeList(std::string_view templateType, std::size_t maxArgs = 0);

} // namespace Internal
} // namespace ROOT
Expand Down
2 changes: 1 addition & 1 deletion tree/ntuple/inc/ROOT/RNTuple.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ public:
static constexpr std::uint16_t kVersionEpoch = 1;
static constexpr std::uint16_t kVersionMajor = 0;
static constexpr std::uint16_t kVersionMinor = 0;
static constexpr std::uint16_t kVersionPatch = 1;
static constexpr std::uint16_t kVersionPatch = 2;

private:
/// Version of the RNTuple binary format that the writer supports (see specification).
Expand Down
4 changes: 2 additions & 2 deletions tree/ntuple/inc/ROOT/RNTupleReader.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ public:
/// \sa GetView(std::string_view, std::shared_ptr<T>)
ROOT::RNTupleView<void> GetView(std::string_view fieldName, void *rawPtr, const std::type_info &ti)
{
return GetView(RetrieveFieldId(fieldName), rawPtr, ROOT::Internal::GetRenormalizedDemangledTypeName(ti));
return GetView(RetrieveFieldId(fieldName), rawPtr, ROOT::Internal::GetRenormalizedTypeName(ti));
}

/// Provides access to an individual (sub)field from its on-disk ID.
Expand Down Expand Up @@ -377,7 +377,7 @@ public:
/// \sa GetView(std::string_view, std::shared_ptr<T>)
ROOT::RNTupleView<void> GetView(ROOT::DescriptorId_t fieldId, void *rawPtr, const std::type_info &ti)
{
return GetView(fieldId, rawPtr, ROOT::Internal::GetRenormalizedDemangledTypeName(ti));
return GetView(fieldId, rawPtr, ROOT::Internal::GetRenormalizedTypeName(ti));
}

/// Provides direct access to the I/O buffers of a **mappable** (sub)field.
Expand Down
2 changes: 1 addition & 1 deletion tree/ntuple/inc/ROOT/RNTupleView.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ protected:
{
const auto &desc = pageSource.GetSharedDescriptorGuard().GetRef();
const auto &fieldDesc = desc.GetFieldDescriptor(fieldId);
if (fieldDesc.GetTypeName() != ROOT::RField<T>::TypeName()) {
if (!Internal::IsMatchingFieldType<T>(fieldDesc.GetTypeName())) {
throw RException(R__FAIL("type mismatch for field " + fieldDesc.GetFieldName() + ": " +
fieldDesc.GetTypeName() + " vs. " + ROOT::RField<T>::TypeName()));
}
Expand Down
10 changes: 0 additions & 10 deletions tree/ntuple/src/RFieldBase.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,6 @@ ROOT::RFieldBase::Create(const std::string &fieldName, const std::string &typeNa
const ROOT::RCreateFieldOptions &options, const ROOT::RNTupleDescriptor *desc,
ROOT::DescriptorId_t fieldId)
{
using ROOT::Internal::ParseArrayType;
using ROOT::Internal::ParseUIntTypeToken;
using ROOT::Internal::TokenizeTypeList;

Expand Down Expand Up @@ -331,15 +330,6 @@ ROOT::RFieldBase::Create(const std::string &fieldName, const std::string &typeNa
// try-catch block to intercept any exception that may be thrown by Unwrap() so that this
// function never throws but returns RResult::Error instead.
try {
if (auto [arrayBaseType, arraySizes] = ParseArrayType(resolvedType); !arraySizes.empty()) {
std::unique_ptr<RFieldBase> arrayField = Create("_0", arrayBaseType, options, desc, fieldId).Unwrap();
for (int i = arraySizes.size() - 1; i >= 0; --i) {
arrayField =
std::make_unique<RArrayField>((i == 0) ? fieldName : "_0", std::move(arrayField), arraySizes[i]);
}
return arrayField;
}

if (resolvedType == "bool") {
result = std::make_unique<RField<bool>>(fieldName);
} else if (resolvedType == "char") {
Expand Down
Loading