Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yul: Introduces ASTNodeRegistry #15823

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions libyul/ASTNodeRegistry.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
/*
This file is part of solidity.

solidity is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

solidity is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with solidity. If not, see <http://www.gnu.org/licenses/>.
*/
// SPDX-License-Identifier: GPL-3.0

#include <libyul/ASTNodeRegistry.h>

#include <libyul/Exceptions.h>

#include <fmt/format.h>

#include <range/v3/algorithm/max.hpp>
#include <range/v3/view/map.hpp>

using namespace solidity::yul;

ASTNodeRegistry::ASTNodeRegistry(): m_labels{""}, m_idToLabelMapping{0} {}

ASTNodeRegistry::ASTNodeRegistry(std::vector<std::string> _labels, std::vector<size_t> _idToLabelMapping)
{
yulAssert(_labels.size() >= 1);
yulAssert(_labels[0].empty());
yulAssert(_idToLabelMapping.size() >= 1);
yulAssert(_idToLabelMapping[0] == 0);
// using vector<uint8_t> over vector<bool>, as the latter is optimized for space-efficiency
std::vector<uint8_t> labelVisited (_labels.size(), false);
cameel marked this conversation as resolved.
Show resolved Hide resolved
size_t numLabels = 0;
for (auto const& id: _idToLabelMapping)
{
if (id == ghostId())
continue;
yulAssert(id < _labels.size());
// it is possible to have multiple references to empty / ghost
yulAssert(
id == 0 || !labelVisited[id],
fmt::format("NodeId {} (label \"{}\") is not unique.", id, _labels[id])
);
labelVisited[id] = true;
if (id >= 1)
++numLabels;
}
yulAssert(numLabels + 1 == _labels.size(), "Unused labels present.");
m_labels = std::move(_labels);
m_idToLabelMapping = std::move(_idToLabelMapping);
}

ASTNodeRegistry::NodeId ASTNodeRegistry::maximumId() const
{
yulAssert(m_idToLabelMapping.size() > 0);
return m_idToLabelMapping.size() - 1;
}

size_t ASTNodeRegistry::idToLabelIndex(NodeId const _id) const
{
yulAssert(_id < m_idToLabelMapping.size());
return m_idToLabelMapping[_id];
}

std::string_view ASTNodeRegistry::operator[](NodeId const _id) const
{
auto const labelIndex = idToLabelIndex(_id);
if (labelIndex == ghostId())
return lookupGhost(_id);
return m_labels[labelIndex];
}

std::optional<ASTNodeRegistry::NodeId> ASTNodeRegistry::findIdForLabel(std::string_view const _label) const {
if (_label.empty())
return emptyId();
for (NodeId id = 1; id <= maximumId(); ++id)
if ((*this)[id] == _label)
return id;
return std::nullopt;
}

std::string_view ASTNodeRegistry::lookupGhost(NodeId const _id) const
{
yulAssert(idToLabelIndex(_id) == ghostId());
auto const [it, _] = m_ghostLabelCache.try_emplace(_id, fmt::format("GHOST[{}]", _id));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation would you need to print these ghost labels? Can we just assume that it will never happen and assert against it instead?

If you do need to print them, you should at least ensure that they cannot conflict with existing labels. I see that define() and ASTNodeRegistryBuilder constructor allow arbitrary identifiers, but it seems that the implicit assumption is that they will only ever get valid Yul identifiers. That should be documented and it would be good to at least assert that the value does not contain any [ characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, currently define() will accept a label like "GHOST[@]" and just give you the ID of the ghost placeholder instead of defining a label with that exact text, which I think is unexpected. IMO it should trigger an assert and if inserting ghosts is an intended use case, there should be a dedicated method for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation would you need to print these ghost labels? Can we just assume that it will never happen and assert against it instead?

Tthey are attached to the CFG (not the regular AST) and could occur, e.g., in error messages and also in the dotgraph output of CFGs.

If you do need to print them, you should at least ensure that they cannot conflict with existing labels. I see that define() and ASTNodeRegistryBuilder constructor allow arbitrary identifiers, but it seems that the implicit assumption is that they will only ever get valid Yul identifiers. That should be documented and it would be good to at least assert that the value does not contain any [ characters.

In general I would like to avoid that assumption. The ghost stuff is a bit of an corner case which was handled pretty much the same way with YulStrings, just that now the id is the hash. I do like the suggestion though to take numeric_limits<size_t>::max() as ghost base-id and simply not have the placeholder, it should not be necessary.

return it->second;
}

ASTNodeRegistryBuilder::DefinedLabels::DefinedLabels():
m_mapping{{"", 0}}
{}

std::tuple<ASTNodeRegistry::NodeId, bool> ASTNodeRegistryBuilder::DefinedLabels::tryInsert(
std::string_view const _label,
ASTNodeRegistry::NodeId const _id
)
{
auto const [it, emplaced] = m_mapping.try_emplace(std::string{_label}, _id);
return std::make_tuple(it->second, emplaced);
}

ASTNodeRegistryBuilder::ASTNodeRegistryBuilder():
m_nextId(1)
{}

ASTNodeRegistryBuilder::ASTNodeRegistryBuilder(ASTNodeRegistry const& _existingRegistry)
{
auto const maxId = _existingRegistry.maximumId();
yulAssert(_existingRegistry[0] == "");
for (size_t i = 1; i <= maxId; ++i)
{
auto const existingLabel = _existingRegistry[i];
if (!existingLabel.empty())
{
if (_existingRegistry.idToLabelIndex(i) == ASTNodeRegistry::ghostId())
m_ghosts.push_back(i);
else
{
auto const [_, inserted] = m_definedLabels.tryInsert(_existingRegistry[i], i);
yulAssert(inserted);
}
}
}
m_nextId = _existingRegistry.maximumId() + 1;
}

ASTNodeRegistry::NodeId ASTNodeRegistryBuilder::define(std::string_view const _label)
{
auto const [id, inserted] = m_definedLabels.tryInsert(_label, m_nextId);
if (inserted)
m_nextId++;
return id;
}

ASTNodeRegistry::NodeId ASTNodeRegistryBuilder::addGhost()
{
m_ghosts.push_back(m_nextId);
return m_nextId++;
}

ASTNodeRegistry ASTNodeRegistryBuilder::build() const
{
auto const& labelToIdMapping = m_definedLabels.labelToIdMapping();
yulAssert(labelToIdMapping.contains(""));
yulAssert(labelToIdMapping.at("") == 0);

std::vector<std::string> labels{""};
labels.reserve(labelToIdMapping.size());
auto const maxLabelId = ranges::max(labelToIdMapping | ranges::views::values);
auto const maxGhostId = m_ghosts.empty() ? 0 : m_ghosts.back();
std::vector<size_t> idToLabelMapping( std::max(maxLabelId, maxGhostId) + 1, 0);
yulAssert(idToLabelMapping.size() >= 1, "Mapping must at least contain empty label");
for (auto const& [label, id]: labelToIdMapping)
{
// skip empty and ghost
if (id < 1)
continue;

labels.emplace_back(label);
idToLabelMapping[id] = labels.size() - 1;
}
for (auto const ghostId: m_ghosts)
idToLabelMapping[ghostId] = ASTNodeRegistry::ghostId();
return ASTNodeRegistry{std::move(labels), std::move(idToLabelMapping)};
}
95 changes: 95 additions & 0 deletions libyul/ASTNodeRegistry.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
/*
This file is part of solidity.

solidity is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

solidity is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with solidity. If not, see <http://www.gnu.org/licenses/>.
*/
// SPDX-License-Identifier: GPL-3.0

#pragma once

#include <limits>
#include <map>
#include <optional>
#include <string>
#include <string_view>
#include <vector>

namespace solidity::yul
{

/// Instances of the `ASTNodeRegistry` are immutable containers describing a labelling of nodes inside the AST.
/// Each element of the AST that possesses a label has a `ASTNodeRegistry::NodeId`, with which the label can
/// be queried in O(1).
/// Preferred way of creating instances is via `ASTNodeRegistryBuilder` when parsing/importing and
/// via `NodeIdDispenser` during/after optimization.
class ASTNodeRegistry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You described these classes in the PR description, but I'd much rather just have docstrings for them :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially m_idToLabelMapping and m_labels should be documented. Like, what are the numbers stored in m_idToLabelMapping and the assumptions about them (can there be duplicates? do both vectors have to have same length?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some docs - let me know what you think and/or if it should be expanded somewhere. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I still don't fully understand is the purpose of the ghost IDs. Looking at other PRs, I see the mechanism for adding them, but not yet any place that would actually add them. The intended usage is a detail that the docstring should explain.

Copy link
Member Author

@clonker clonker Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ghost nodes are added during CFG construction. They only live in the CFG itself and are not actually referenced in the AST. We could potentially also remove them here and specialize them out for CFGs. Then it's more local to where they are introduced and needed.

{
public:
/// unsafe to use from a different registry instance, it is up to the user to safeguard against this
using NodeId = size_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ID for consistency with other acronyms like AST:

Suggested change
using NodeId = size_t;
using NodeID = size_t;

Here and generally in all the other names.

I'm also still not sure NodeID and ASTNodeRegistry really reflect the purpose well. I see that in AST you even call the registry labels. Wouldn't LabelID and ASTLabelRegistry fit better? I don't think we're really using it as a node ID. We have some node types without any and technically we could also define some nodes where having more than one would make sense.


ASTNodeRegistry();
ASTNodeRegistry(std::vector<std::string> _labels, std::vector<size_t> _idToLabelMapping);

std::string_view operator[](NodeId _id) const;

static bool constexpr empty(NodeId const _id) { return _id == emptyId(); }
static NodeId constexpr emptyId() { return 0; }
static NodeId constexpr ghostId() { return std::numeric_limits<NodeId>::max(); }

std::vector<std::string> const& labels() const { return m_labels; }
NodeId maximumId() const;

size_t idToLabelIndex(NodeId _id) const;
/// this is a potentially expensive operation
std::optional<NodeId> findIdForLabel(std::string_view _label) const;
private:
std::string_view lookupGhost(NodeId _id) const;

/// unique labels in the container, beginning with empty ("") and ghost (ghostPlaceholder).
std::vector<std::string> m_labels;
/// Each element in the vector is one NodeId. The value of the vector points to the corresponding label. E.g.,
/// m_labels[m_idToLabelMapping[3]] is the label for NodeId 3. Therefore, there can be duplicates and the lengths
/// of `m_labels` and `m_idToLabelMapping` do not need to correspond.
std::vector<size_t> m_idToLabelMapping;
mutable std::map<NodeId, std::string> m_ghostLabelCache;
Comment on lines +60 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// unique labels in the container, beginning with empty ("") and ghost (ghostPlaceholder).
std::vector<std::string> m_labels;
/// Each element in the vector is one NodeId. The value of the vector points to the corresponding label. E.g.,
/// m_labels[m_idToLabelMapping[3]] is the label for NodeId 3. Therefore, there can be duplicates and the lengths
/// of `m_labels` and `m_idToLabelMapping` do not need to correspond.
std::vector<size_t> m_idToLabelMapping;
mutable std::map<NodeId, std::string> m_ghostLabelCache;
/// Unique labels in the container; the first two items are: empty ("") and ghost (ghostPlaceholder).
std::vector<std::string> m_labels;
/// Each index in the vector is one NodeId. The value of the vector points to the corresponding label. E.g.,
/// m_labels[m_idToLabelMapping[3]] is the label for NodeId 3. Therefore, there can be duplicates and the lengths
/// of `m_labels` and `m_idToLabelMapping` do not need to correspond.
std::vector<size_t> m_idToLabelMapping;
mutable std::map<NodeId, std::string> m_ghostLabelCache;

Though TBH this still does not say everything I wanted to know when I started reviewing this. That's how I'd put it myself:

Suggested change
/// unique labels in the container, beginning with empty ("") and ghost (ghostPlaceholder).
std::vector<std::string> m_labels;
/// Each element in the vector is one NodeId. The value of the vector points to the corresponding label. E.g.,
/// m_labels[m_idToLabelMapping[3]] is the label for NodeId 3. Therefore, there can be duplicates and the lengths
/// of `m_labels` and `m_idToLabelMapping` do not need to correspond.
std::vector<size_t> m_idToLabelMapping;
mutable std::map<NodeId, std::string> m_ghostLabelCache;
/// All Yul AST node labels present in the registry.
/// Always contains at least two items: an empty label and a ghost placeholder.
/// All items must be unique. All but the first two must be valid Yul identifiers.
std::vector<std::string> m_labels;
/// Assignment of labels to `NodeId`s. Indices are `NodeId`s and values are indices into `m_labels`.
/// Every label except ghost placeholder always has exactly one `NodeId` pointing at it.
/// Ghost placeholder can have more than one.
std::vector<size_t> m_idToLabelMapping;
/// Artificial labels generated for ghost IDs from a template.
/// Generated on demand through `lookupGhost()` and cached for future lookups.
/// Must never contain non-ghost IDs. Labels are guaranteed to be unique.
mutable std::map<NodeId, std::string> m_ghostLabelCache;

But note that it includes some of my assumptions how it should work, which may or may not be true depending on the answers to my earlier comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think we should make m_labels and m_idToLabelMapping const. The container is immutable so they're not supposed to ever change after initialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think we should make m_labels and m_idToLabelMapping const. The container is immutable so they're not supposed to ever change after initialization.

That'll get rid of implicit copy/move, though. The immutability is reflected by it not having any non-const methods. That would already take care of that unless one const-casts.

};

/// Produces instances of `ASTNodeRegistry`. Preferably used during parsing/importing.
class ASTNodeRegistryBuilder
{
public:
ASTNodeRegistryBuilder();
explicit ASTNodeRegistryBuilder(ASTNodeRegistry const& _existingRegistry);
ASTNodeRegistry::NodeId define(std::string_view _label);
ASTNodeRegistry::NodeId addGhost();
ASTNodeRegistry build() const;
private:
class DefinedLabels
{
public:
DefinedLabels();
std::tuple<ASTNodeRegistry::NodeId, bool> tryInsert(std::string_view _label, ASTNodeRegistry::NodeId _id);

auto const& labelToIdMapping() const { return m_mapping; }
private:
std::map<std::string, size_t, std::less<>> m_mapping;
};

DefinedLabels m_definedLabels;
std::vector<ASTNodeRegistry::NodeId> m_ghosts;
size_t m_nextId = 0;
};

}
2 changes: 2 additions & 0 deletions libyul/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ add_library(yul
AST.h
AST.cpp
ASTForward.h
ASTNodeRegistry.cpp
ASTNodeRegistry.h
AsmJsonConverter.h
AsmJsonConverter.cpp
AsmJsonImporter.h
Expand Down