Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parser #346

Merged
merged 7 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions include/mata/parser/inter-aut.hh
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,10 @@ public:
FormulaNode(const FormulaNode& n)
: type(n.type), raw(n.raw), name(n.name), operator_type(n.operator_type), operand_type(n.operand_type) {}

FormulaNode(FormulaNode&&) = default;
FormulaNode(FormulaNode&&) noexcept = default;

FormulaNode& operator=(const FormulaNode& other) = default;
FormulaNode& operator=(FormulaNode&& other) = default;
FormulaNode& operator=(FormulaNode&& other) noexcept = default;
};

/**
Expand All @@ -123,15 +123,16 @@ public:
* E.g., a formula q1 & s1 will be transformed to a tree with & as a root node
* and q1 and s2 being children nodes of the root.
*/
struct FormulaGraph {
class FormulaGraph {
public:
FormulaNode node{};
std::vector<FormulaGraph> children{};

FormulaGraph() = default;
FormulaGraph(const FormulaNode& n) : node(n), children() {}
FormulaGraph(FormulaNode&& n) : node(std::move(n)), children() {}
explicit FormulaGraph(const FormulaNode& n) : node(n), children() { children.reserve(2); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wandering, can you put the two alredy to the the line 129 with the declaration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, vector has no such constructor, unfortunately. It can be done with a lambda or a named function inside the parentheses in the initialization list. Something like

FormulaGraph() : children([](){ std::vector<FormulaNode> children; children.reserve(100); return children;}()) {}

Adda0 marked this conversation as resolved.
Show resolved Hide resolved
explicit FormulaGraph(FormulaNode&& n) : node(std::move(n)), children() { children.reserve(2); }
FormulaGraph(const FormulaGraph& g) : node(g.node), children(g.children) {}
FormulaGraph(FormulaGraph&& g) : node(std::move(g.node)), children(std::move(g.children)) {}
FormulaGraph(FormulaGraph&& g) noexcept : node(std::move(g.node)), children(std::move(g.children)) {}

FormulaGraph& operator=(const mata::FormulaGraph&) = default;
FormulaGraph& operator=(mata::FormulaGraph&&) noexcept = default;
Expand All @@ -146,7 +147,7 @@ struct FormulaGraph {
* and type of alphabet. It contains also the transitions formula and formula for initial and final
* states. The formulas are represented as tree where nodes are either operands or operators.
*/
struct IntermediateAut {
class IntermediateAut {
public:
/**
* Type of automaton. So far we support nondeterministic finite automata (NFA) and
Expand Down Expand Up @@ -218,7 +219,8 @@ public:
const FormulaGraph& get_symbol_part_of_transition(const std::pair<FormulaNode, FormulaGraph>& trans) const;

/**
* A method for building a vector of IntermediateAut for a parsed input.
* @brief A method for building a vector of IntermediateAut for a parsed input.
*
* For each section in input is created one IntermediateAut.
* It parses basic information about type of automata, its naming conventions etc.
* Then it parses input and final formula of automaton.
Expand Down
86 changes: 46 additions & 40 deletions src/inter-aut.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@
#include "mata/parser/inter-aut.hh"

namespace {
bool has_atmost_one_auto_naming(const mata::IntermediateAut& aut)
{
return !(!(aut.node_naming == mata::IntermediateAut::Naming::AUTO &&
aut.symbol_naming == mata::IntermediateAut::Naming::AUTO) &&
(aut.state_naming == mata::IntermediateAut::Naming::AUTO));

bool has_atmost_one_auto_naming(const mata::IntermediateAut& aut) {
return !(!(aut.node_naming == mata::IntermediateAut::Naming::AUTO &&
aut.symbol_naming == mata::IntermediateAut::Naming::AUTO) &&
(aut.state_naming == mata::IntermediateAut::Naming::AUTO));
}

bool is_logical_operator(char ch)
Expand Down Expand Up @@ -111,7 +111,7 @@ namespace {
if (graph.children.size() == 1) { // unary operator
const auto& child = graph.children.front();
std::string child_name = child.node.is_operand() ? child.node.raw :
"(" + serialize_graph(child.node) + ")";
"(" + serialize_graph(mata::FormulaGraph{ std::move(child.node) }) + ")";
return graph.node.raw + child_name;
}

Expand All @@ -130,7 +130,7 @@ namespace {
return lhs + " " + graph.node.raw + " " + rhs;
}

mata::FormulaNode create_node(const mata::IntermediateAut &mata, const std::string &token) {
mata::FormulaNode create_node(const mata::IntermediateAut &mata, const std::string& token) {
if (is_logical_operator(token[0])) {
switch (token[0]) {
case '&':
Expand Down Expand Up @@ -206,8 +206,8 @@ namespace {
* @param tokens Series of tokens representing transition formula parsed from the input text
* @return A postfix notation for input
*/
std::vector<mata::FormulaNode> infix_to_postfix(
const mata::IntermediateAut &aut, const std::vector<std::string>& tokens) {
std::vector<mata::FormulaNode> infix_to_postfix(const mata::IntermediateAut &aut,
const std::vector<std::string>& tokens) {
std::vector<mata::FormulaNode> opstack;
std::vector<mata::FormulaNode> output;

Expand All @@ -229,13 +229,14 @@ namespace {
break;
case mata::FormulaNode::Type::OPERATOR:
for (int j = static_cast<int>(opstack.size())-1; j >= 0; --j) {
size_t j_size_t{ static_cast<size_t>(j) };
assert(!opstack[j_size_t].is_operand());
if (opstack[j_size_t].is_leftpar())
auto formula_node_opstack_it{ opstack.begin() + j };
Comment on lines 231 to +232
Copy link
Member

@jurajsic jurajsic Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just iterating from end to begin? So we could have
for (auto formula_node_opstack_it = opstack.rbegin(); formula_node_opstack_it != opstack.rend(); ++formula_node_opstack_it)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. It should be possible to refactor this as you say. I will investigate.

Copy link
Collaborator Author

@Adda0 Adda0 Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int arithmetic is there so you can do erase() in the vector. It will not work with a reverse iterator. I would keep it as it is, or rework the logic to not have to erase inside the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be done somehow with erase(), because it returns iterator, but a forward one.
But moving erasing outside the loop would probably be the best.

assert(!formula_node_opstack_it->is_operand());
if (formula_node_opstack_it->is_leftpar()) {
break;
if (lower_precedence(node.operator_type, opstack[j_size_t].operator_type)) {
output.emplace_back(std::move(opstack[j_size_t]));
opstack.erase(opstack.begin()+j);
}
if (lower_precedence(node.operator_type, formula_node_opstack_it->operator_type)) {
output.emplace_back(std::move(*formula_node_opstack_it));
opstack.erase(formula_node_opstack_it);
}
}
opstack.emplace_back(std::move(node));
Expand Down Expand Up @@ -269,35 +270,41 @@ namespace {
std::vector<mata::FormulaGraph> opstack{};
opstack.reserve(4);
for (mata::FormulaNode& node: postfix) {
mata::FormulaGraph gr(std::move(node));
switch (gr.node.type) {
switch (node.type) {
case mata::FormulaNode::Type::OPERAND:
opstack.emplace_back(std::move(gr));
opstack.emplace_back(std::move(node));
break;
case mata::FormulaNode::Type::OPERATOR:
switch (gr.node.operator_type) {
case mata::FormulaNode::OperatorType::NEG:
case mata::FormulaNode::Type::OPERATOR: {
switch (node.operator_type) {
case mata::FormulaNode::OperatorType::NEG: { // 1 child: graph will be a NEG node.
assert(!opstack.empty());
gr.children.emplace_back(std::move(opstack.back()));
mata::FormulaGraph child{ std::move(opstack.back()) };
opstack.pop_back();
opstack.emplace_back(std::move(gr));
mata::FormulaGraph& gr{ opstack.emplace_back(std::move(node)) };
gr.children.emplace_back(std::move(child));
break;
default:
}
default: { // 2 children: Graph will be either an AND node, or an OR node.
assert(opstack.size() > 1);
mata::FormulaGraph second_child{ std::move(opstack.back()) };
opstack.pop_back();
gr.children.emplace_back(std::move(opstack.back()));
mata::FormulaGraph first_child{ std::move(opstack.back()) };
opstack.pop_back();
mata::FormulaGraph& gr{ opstack.emplace_back(std::move(node)) };
gr.children.emplace_back(std::move(first_child));
gr.children.emplace_back(std::move(second_child));
opstack.emplace_back(std::move(gr));
break;
}
}
break;
default: assert(false && "Unknown type of node");
}
default:
assert(false && "Unknown type of node");
}
}

assert(opstack.size() == 1);
return std::move(opstack[0]);
return std::move(*opstack.begin());
}

/**
Expand All @@ -307,9 +314,9 @@ namespace {
* @param postfix Postfix to which operators are eventually added
* @return A postfix with eventually added operators
*/
std::vector<mata::FormulaNode> add_disjunction_implicitly(std::vector<mata::FormulaNode> postfix)
{
if (postfix.size() == 1) // no need to add operators
std::vector<mata::FormulaNode> add_disjunction_implicitly(std::vector<mata::FormulaNode> postfix) {
const size_t postfix_size{ postfix.size() };
if (postfix_size == 1) // no need to add operators
return postfix;

for (const auto& op : postfix) {
Expand All @@ -318,15 +325,15 @@ namespace {
}

std::vector<mata::FormulaNode> res;
if (postfix.size() >= 2) {
res.push_back(postfix[0]);
res.push_back(postfix[1]);
if (postfix_size >= 2) {
res.push_back(std::move(postfix[0]));
res.push_back(std::move(postfix[1]));
res.emplace_back(
mata::FormulaNode::Type::OPERATOR, "|", "|", mata::FormulaNode::OperatorType::OR);
}

for (size_t i = 2; i < postfix.size(); ++i) {
res.push_back(postfix[i]);
for (size_t i{ 2 }; i < postfix_size; ++i) {
res.push_back(std::move(postfix[i]));
res.emplace_back(
mata::FormulaNode::Type::OPERATOR, "|", "|", mata::FormulaNode::OperatorType::OR);
}
Expand Down Expand Up @@ -375,20 +382,19 @@ namespace {
for (const auto& keypair : section.dict) {
const std::string &key = keypair.first;

if (key.find("Initial") != std::string::npos) {
if (key.starts_with("Initial")) {
auto postfix = infix_to_postfix(aut, keypair.second);
if (no_operators(postfix)) {
aut.initial_enumerated = true;
postfix = add_disjunction_implicitly(std::move(postfix));
}
aut.initial_formula = postfix_to_graph(std::move(postfix));
} else if (key.find("Final") != std::string::npos) {
} else if (key.starts_with("Final")) {
auto postfix = infix_to_postfix(aut, keypair.second);
if (no_operators(postfix)) {
postfix = add_disjunction_implicitly(std::move(postfix));
aut.final_enumerated = true;
}

aut.final_formula = postfix_to_graph(std::move(postfix));;
}
}
Expand All @@ -403,7 +409,7 @@ namespace {

return aut;
}
} // anonymous
} // Anonymous namespace.

size_t mata::IntermediateAut::get_number_of_disjuncts() const
{
Expand Down
6 changes: 3 additions & 3 deletions src/parser.cc
Original file line number Diff line number Diff line change
Expand Up @@ -199,14 +199,14 @@ std::vector<std::pair<std::string, bool>> tokenize_line(const std::string& line)

std::vector<std::pair<std::string, bool>> split_tokens(std::vector<std::pair<std::string, bool>> tokens) {
std::vector<std::pair<std::string, bool>> result;
for (const auto& token : tokens) {
for (auto& token: tokens) {
if (token.second) { // is quoted?
result.push_back(std::move(token));
continue;
}

const std::string_view token_string = token.first;
size_t last_operator = 0;
const std::string_view token_string{ token.first };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this everywhere actually?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mena why to initialize things with the curly brackets {bla} instead of = bla?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is some C++ idiom, that is advised to use. I myself don't like it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to prevent narrowing of the types (https://stackoverflow.com/a/27755143), but naturally, C++ community has to make sure it is ugly as pug's arse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly as you say. It is a good idea, only horribly implemented as far as visualization goes. But it helps indeed and warns you about potential implicit conversion bugs. In modern C++, it should always be preferred over assignment operator.

size_t last_operator{ 0 };
for (size_t i = 0, token_string_size{ token_string.size() }; i < token_string_size; ++i) {
if (is_logical_operator(token_string[i])) {
const std::string_view token_candidate = token_string.substr(last_operator, i - last_operator);
Expand Down
Loading