Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selector list #15

Open
Lyokovic opened this issue Jan 24, 2018 · 2 comments
Open

Selector list #15

Lyokovic opened this issue Jan 24, 2018 · 2 comments

Comments

@Lyokovic
Copy link

Hi,

I started using Lambda Soup and found that it does not seems to support selector lists, like ".bg1, .bg3".
I need to parse an HTML document with various <div> with bg2 bg1 bgbc bg3 classes and want to keep only the bg1 and bg3 ones while keeping the order.

I am wondering if it would be easy to implement this feature?

@aantron
Copy link
Owner

aantron commented Jan 29, 2018

Yes, it should be fairly straightforward. One would have to:

  1. Extend the grammar of selectors with one more level: https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L489

    simple_selector is stuff like .class-foo, [attribute-bar], combinators are >, +, etc. So, this grammar is capable of representing things like .class-foo > [attribute-bar]. It needs one more level of list to be able to represent comma-separated lists of these.

  2. This is the parser top-level function. It needs to be modified to become not the top-level function, but a parser for a single item delimited by ,, and then a new top-level function needs to wrap it, that reads commas, and calls the current parser for reading everything in between. https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L896-L913

  3. This is the select code. Its logic needs to be wrapped in a new top-level loop that tries additional selectors from the new top-level list if the preceding ones didn't yield a match. https://github.com/aantron/lambda-soup/blob/8084d5b86ce8f1223271fc1e67398ac618dacbda/src/soup.ml#L611-L647

@Lyokovic
Copy link
Author

Thanks, I'll take a look ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants