Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add position information for text nodes #533

Open
corynezin opened this issue Apr 16, 2021 · 0 comments
Open

Add position information for text nodes #533

corynezin opened this issue Apr 16, 2021 · 0 comments

Comments

@corynezin
Copy link

Would it be possible to add position information, i.e. line+column to text nodes? Or, at least make this information available to the tree builder? I implemented a very minimal proof of concept to add the information to each token and pass that along to the dom tree builder and obtain the following result:

import html5lib

html = '<div>&amp;<p>b<span>c</span></p> cab</div>'

parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))

doc = parser.parse(html)
def parse(n):
    for c in n.childNodes:
        if hasattr(c, 'sourcepos'):
            print(c.sourcepos, c)
        parse(c)

parse(doc)
None <DOM Element: head at 0x10bbed0d0>
None <DOM Element: body at 0x10bbed1f0>
(1, 5) <DOM Element: div at 0x10bbfb790>
(1, 10) <DOM Text node "'&'">
(1, 13) <DOM Element: p at 0x10bbfb820>
(1, 14) <DOM Text node "'b'">
(1, 20) <DOM Element: span at 0x10bbfb8b0>
(1, 21) <DOM Text node "'c'">
(1, 33) <DOM Text node "' '">
(1, 36) <DOM Text node "'cab'">

I would be willing to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant