Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: source locations for elements and attributes #5

Open
JoshuaWise opened this issue May 20, 2023 · 4 comments
Open

Feature request: source locations for elements and attributes #5

JoshuaWise opened this issue May 20, 2023 · 4 comments

Comments

@JoshuaWise
Copy link

JoshuaWise commented May 20, 2023

Hi, first of all, thank you for this fantastic library.

I am using it to transform HTML attributes. During this transformation, it's possible for errors to occur. I'd like to display the source location of the attribute that caused the error. However I don't see any way to get the source location of an element or attribute.

I am using the element handlers like this:

const rewriter = new HTMLRewriter(callback);
rewriter.on('*[src]', {
    async element(element) {
        try {
            element.setAttribute('src', await modify(element.getAttribute('src')));
        } catch (err) {
            // No way of getting the source location of the element or the "src" attribute
            const lineNumber = ???;
            const columnNumber = ???;
            console.error(`A problem occurred on line ${lineNumber}, column ${columnNumber}`);
        }
    },
});

Could this feature be added?

Edit: A simple offset would also work, since I could use it to calculate the line and column numbers myself.

@JoshuaWise
Copy link
Author

It seems that maybe this issue is relevant: cloudflare/lol-html#157

@mrbbot
Copy link
Contributor

mrbbot commented Jun 13, 2023

Hey! 👋 Apologies for the delayed response here. Thanks for raising this (and for making better-sqlite3!).

It looks like lol-html stores some ranges in its lexed tokens (https://github.com/cloudflare/lol-html/blob/2681dcf0b3e6907111565199df8c43cc9aab7fe8/src/parser/lexer/lexeme/token_outline.rs#L23-L29) but these don't make it to the parsed tokens (https://github.com/cloudflare/lol-html/blob/2681dcf0b3e6907111565199df8c43cc9aab7fe8/src/rewritable_units/tokens/start_tag.rs#L9-L17).

It may be possible to pipe this through, but I think it would be tricky with rewriting. Would the original position be returned, or the position in the rewritten document? As this is a streaming parser, we probably wouldn't know all original positions ahead of time, but positions in the rewritten document might not be that useful for user facing errors.

In your case, would you be able to return an error based on the content of the elements instead? Could each element have some identifiable marker you could find the location of?

@JoshuaWise
Copy link
Author

No worries, I've been quite busy too.

My goal is to display errors in the original source file, so I would need the original position, not the position in the rewritten document. The former should be simpler than the latter, since positions don't need to be calculated dynamically.

We just need to pass the positions from the lexer's tokens to the parser's AST nodes, and if an error occurs in an element handler, then it should assign the element's AST node's position info to the Error object. Also, if an error occurs while lexing or parsing, the Error that gets thrown should have the position of where the invalid syntax was encountered. This way, any unexpected error can be traced back to the original source text that caused it.

@JoshuaWise
Copy link
Author

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants