Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Depth First Traversal Methods #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dyst5422
Copy link
Contributor

@dyst5422 dyst5422 commented Feb 7, 2020

This adds traits and implementations to do depth first traversals (both pre-order and post-order). It's still rough and needs some work to organize the api and code, but it should provide some good utility to finding nodes in the tree efficiently.

Definitely still needs discussion before it would be ready for consideration for merging.

Additionally moved the parse and parse_all functions into the module scope rather than scoped to Element (though they are still on Element and delegated to the module scope functions to keep backwards compatibility). <- This needs to be moved into another PR I think.

I think some additional questions to be answered:

  • Is there a need to have both pre and post order traversals or is one sufficient?
  • Is there a need to have traversal methods on both Element and XMLNode?
  • Does the performance benefit of having a depth stop predicate sufficiently pay for the API overhead? (perhaps two methods, but that seems confusing as well)
  • Probably need mutable iterators as well

@eminence eminence self-assigned this Feb 9, 2020
@eminence
Copy link
Owner

eminence commented Feb 9, 2020

Thanks for the PR to start the discussion. I'm interesting in learning more about some of the use-cases you have in mind. A few thoughts in response to your questions:

  • If I had to guess, pre-order traversals would be more commonly used than post-order. I say this mostly because pre-order is generally how my brain reads XML files in an editor (with my eyes seeing the parent element before the children).

  • I'm inclined to only provide traversal on Element. I don't have a good sense of real-world usage, but I'm guessing that most users only work with Elements.

  • Can you say some more about how you envision the stop predicate being used in practice? I certainly can see the value in having an ability in controlling the traversal. But I had always assumed that most xml files used with xmltree-rs would be small enough to fully hold in memory, and small enough that traversing the entire tree would not be too expensive.

@dyst5422
Copy link
Contributor Author

dyst5422 commented Feb 9, 2020

  1. Agreed on pre-order being the only one necessary - I don't think XML is often treated as a graph tree in practice.

  2. I imagine that traversals on XMLNode would allow you to do something like

let uppercased_comments = root_xml_node.traverse(|xml_node| match xml_node {
  XMLNode::Comment(comment) => true,
  _ => false,
}).map(|XMLNode::Comment(comment)| comment.to_uppercase())

I think it becomes more interesting when combining it with the writing back out - allowing intermediate transformations on the XML while preserving the parts that are not of concern (comments, etc).

  1. Stop predicate I see as being the way to pull out certain elements
let some_elements = xml_node.traverse(|node| {
  if let XMLNode::Element(element) = node {
    return element.name == 'SomeElement';
  }
  return false;
}).filter(|node| {
  if let XMLNode::Element(element) = node {
    return element.name == 'SomeElement';
  }
  return false;
});

Although now that I write it out (and since its probably the most common use case) this is incredibly bad to write. There should probably just be a traverse() method returning an iterator and a get_children(predicate: fn(&XMLNode) -> bool) ->Vec<&XMLNode> method that does the retrieval for you.

I have a related method I threw together for modifying a tree that might similarly useful

pub fn traverse_and_process<I: Iterator<Item=XMLNode>>(
    xml_tree: I,
    processor: fn(&XMLNode) -> XMLNode,
) -> Vec<XMLNode> {
    let mut new_tree = Vec::new();
    for node in xml_tree {
        new_tree.push(match node {
            XMLNode::CData(_) => processor(&node),
            XMLNode::Comment(_) => processor(&node),
            XMLNode::ProcessingInstruction(_, _) => processor(&node),
            XMLNode::Text(_) => processor(&node),
            XMLNode::Element(element) => {
                let mut new_element = element.clone();
                new_element.children = traverse_and_process(element.children.into_iter(), processor);
                processor(&XMLNode::Element(new_element))
            }
        })
    }
    new_tree
}

Also not sure how you feel about returning Iterators vs Vectors. I like the completeness and simplicity of Vectors, but the usability and implied performance benefits of Iterators in terms of performing intermediate transformations/filters/maps/etc.

@eminence
Copy link
Owner

In this example you wrote:

let uppercased_comments = root_xml_node.traverse(|xml_node| match xml_node {
  XMLNode::Comment(comment) => true,
  _ => false,
}).map(|XMLNode::Comment(comment)| comment.to_uppercase())

I like your other suggestion better of using a traverse():

let uppercased_comments = root_xml_node.traverse()
    .filter(|node| node.as_comment().is_some())
    .map(|XMLNode::Comment(comment)| comment.to_uppercase())

Where .traverse() is a pre-order depth-first traversal iterator. Or maybe we can just implement Iterator on Element

(BTW, this example of getting all comments is compelling to me, so I now agree we should support traversal over XMLNodes)

I guess I still don't fully understand the point of something like traverse_and_process, since that looks a whole lot like root_xml_node.traverse().map(...).collect()

@dyst5422
Copy link
Contributor Author

dyst5422 commented Feb 18, 2020

With something like traverse_and_process() instead of traverse(), you preserve the structure of the tree.

Did a second pass at it here which allows removing/changing nodes

pub fn traverse_and_process<I: Iterator<Item=XMLNode>>(
    xml_tree: I,
    processor: fn(&XMLNode) -> Option<XMLNode>,
) -> Vec<XMLNode> {
    let mut new_tree = Vec::new();
    for node in xml_tree {
        let new_node_option = match node {
            XMLNode::CData(_) => processor(&node),
            XMLNode::Comment(_) => processor(&node),
            XMLNode::ProcessingInstruction(_, _) => processor(&node),
            XMLNode::Text(_) => processor(&node),
            XMLNode::Element(element) => {
                let mut new_element = element.clone();
                new_element.children = traverse_and_process(element.children.into_iter(), processor);
                processor(&XMLNode::Element(new_element))
            }
        }
        if let Some(new_node) = new_node_option {
          new_tree.push(new_node)
        }
    }
    new_tree
}

@dyst5422 dyst5422 closed this Feb 18, 2020
@eminence
Copy link
Owner

Just to confirm: did you mean to close this PR?

@dyst5422
Copy link
Contributor Author

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants