-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding source endings to tokens #76
Comments
Note: Babylon's Any thoughts? |
So, given the following examples:
The lexer currently marks the beginning of the code/interpolated-code token as being at position 1 for all examples. However, it then goes on to set If we create tokens for JavaScriptExpressions (as @ForbesLindesay mentioned here) and store them inside code/interpolated-code tokens, then we would be storing both positions. Thoughts? For now, I'll just use position 1 and will change it later if needed. |
For the filename, in the test cases, you can just replace Yes, I think we will need a separate JavaScript expression token. This will be essential when we generate source maps/use babel-code-gen. Using position 1 sounds fine for now. |
Cool, I'll go with that. I suppose we could have the lexer just store the relative file path as well. Yeah, keep thinking an identifier token for tag/attribute/mixin/class/id/filter/etc names as well as a JavaScript expression token will be in order later down the road. For now just focusing on getting an end location for all current tokens. For future reference though, consider the following:
I would expect the lexer to have an attribute token spanning from 1 to 5, followed by an identifier spanning 2 to 3 (since quoted attribute names do not include the quotes), and then a JavaScript expression to span So far, things are going pretty well. Had to write a function to console log all result tokens followed by the text they are spanning so I can quickly see if one of the endpoints are off. After that, things are moving along well. Getting a little hung up on the attributes as the code is a little tricky but not bad. Attributes with quoted names currently start at position 2 (using the example above) where I would expect it to start at position 1, correct? |
For info if you need it, here is the work-in-progress rewrite of pug-code-gen using babel AST. it works and parses the javascript expressions AST which we can remove when the lexer will 'cache' the parsed AST. https://github.com/jeromew/pug-code-gen/blob/master/index.js currently I use https://github.com/jeromew/pug-code-gen/blob/master/index.js#L108 for parsing expressions and https://github.com/jeromew/pug-code-gen/blob/master/index.js#L111 for parsing argument-like structures. The loc numbering with these hacks has an offset + there are corner cases that could be parsed wrongly with these constructs so a PR is ongoing on babylon on babel/babylon#213 to be ably to parse expressions directly. |
The lexer should not produce a separate tag token + identifier token, since the tag token in the lexer ends at the end of the tag name. It's the parser where we would create most of that nesting. Yes, I think currently we just produce a single "attributes" token, but I think it would be better if the lexer produced a stream of |
@jeromew: Nice! I'm excited for that babylon pull request. I was doing the hacky stuff you described on a project myself. Thank you for pointing these lines out! I'll keep them in mind as I get closer to that piece. @ForbesLindesay: That makes sense. I've been reading up on the difference between lexers and parsers and believe I was blurring the two. I'm keeping the single I did find an inconsistency. One of the tests has the following line: a(foo, bar, baz) The expected tokens for these attributes are: {"type":"attribute","line":3,"col":3,"name":"foo","val":true,"mustEscape":true}
{"type":"attribute","line":3,"col":8,"name":"bar","val":true,"mustEscape":false}
{"type":"attribute","line":3,"col":13,"name":"baz","val":true,"mustEscape":false} Notice the values of |
@jbsulli, good catch! Practically, I guess it doesn't matter, but I think the preferred behavior should be to reset |
Yes, this definitely looks like a bug. mustEscape should be set to the same value in all three attributes. I would tend to agree with @TimothyGu that |
@jbsulli We're moving everything to a single monorepo: https://github.com/pugjs/pug There a few advantages:
I suggest making your changes on the monorepo as this repo will be disappearing soon :) |
Done with the lexer! That was much crazier than I expected but I learned a lot. All unit tests are passing and I have comparable test coverage (I'd be down for getting it to 100%, if you want). All tokens are the same except the location data and there are no new tokens, yet. Most tokens just needed the Attributes, however, were completely rewritten as it was much too complicated to keep track of actual starting and ending locations. There are now three functions handling attributes: I'll work on building a pull request for the new monorepo so you guys can look at it. Need to work on updating the parser as well. I'm super excited for the monorepo. Should be much easier to handle pulling in changes for both the lexer and parser for this. |
Thanks, this all sounds incredibly awesome. Getting to 100% test coverage would be great, but don't stress over getting it there as part of this pull request. I'd much rather see this pull request merged with decent test coverage, then see a subsequent pull request that just increases the test coverage. Can't wait to see the code you've written, especially the new attributes parsing code. |
Started working on moving it over to the monorepo. The snapshots are making my eyes cross. Is there a way to break those out so it's not a single 46,000 line file? Haven't messed with Jest so I'm still learning it. So far it seems pretty cool. Their |
The only way to split the file up is to run the tests in separate JavaScript files. I'm starting to move towards this approach with new tests (e.g. pugjs/pug@d91bab7), but I don't think it's worth doing so with existing ones. I don't think it makes too much difference to how bad it is to review the diffs though. It might be something we could propose to jest though. |
Yeah, that's true. I'll play with them more and see how it goes. I like some of the things Jest does to try and keep diffs readable. Right now, I was able to copy over the updated tokens from the lexer output to the parser test cases but I see that as being a pain to maintain in the future. Spent most of the weekend trying to track down why I was getting strange behavior when debugging Jest tests with VSCode. Oddly, the fix was to do |
Hey everyone. I've been fighting the flu so I've not been able to get anything done on this lately. But now I'm ready to get this going again. What are your thoughts for things like tags, mixins, blocks, etc? Do you want the #content
p.first
a(href="http://www.google.com") Google
| is planning on taking over the world. So do you want the |
Hm, you also have different file sources being included into the main file. The So what I am thinking for now is have blocks span all children but the tag would span its name, id, classes, attributes, and &attributes but not the children. When includes/extends happen, the container block's |
The general rule is: the loc should refer to the entirety of the source code that was part of that token (before linking) In the lexer, this normally means just the tag. In the parser, it always means tag + children. i.e. everything that's within that object. Don't worry about the multiple files issue, since that doesn't happen until the |
@ForbesLindesay: That makes things much easier. I've gotten pretty far with the parser but not sure I like how I've implemented adding the loc value to the node. The problem is merging all children into the current node. Right now I have wrapper functions that you add around each node you add as a child. This takes care of expanding the parent's loc so it includes the child. However, the wrapper function approach seems fragile and requires a pretty good knowledge of what is actually going on. Doesn't seem like it will be very user friendly for other maintainers going forward. Right now there are 4 different functions for handling this and they really don't seem intuitive. You can see what is going on by going checking out this commit. I feel like a better approach would be to track the last legit token's ending location and expand the parent's end to that position when the children are done being parsed. I don't know that I can do this in a friendly way without some pretty big changes. I'm curious if a lot of the parser logic could be cleaned up by treating the nodes more anonymously. I'm going to play with it a bit and see if it's a bad road to go down or not. Any thoughts? Are you OK with some pretty big changes? I don't want to hijack this. If you want to really see what the loc values look like, you should check out my handy CLI tool pug-loc-debugger. It's been a life-saver for this project and is kind of fun to see in action. It should be pretty future-proof too because it's a global package and just takes over the pug modules in the folder it's running from and taps the plugin system. Great example of using the pug plugin system too. Just clone my repo, switch to the |
I don't have any strong opinions on this at the moment. Would you be able to submit a pull request for adding token endings to the lexer? I'd much rather review the updates to the lexer, get that landed, then start looking at what changes are needed in the parser. It's much easier if we do one thing at a time :) |
P.S. pug-loc-debugger looks awesome. I've been meaning to build something like that for a while. |
Kindly following up on this. Would love to see this merged so I can build proper source maps for pug based webpack loaders. Anything that I can help with? |
A pull request to the lexer: https://github.com/pugjs/pug/tree/master/packages/pug-lexer to add full location info to each token would be helpful and get accepted. |
@nojvek, thanks for the bump! I forgot about this project with all the life changes I had (new job/big move). I'm wrapping up the Pug source maps are going to be sick. Also looking forward to getting everything moved over to babel and passing the babel AST along so we (hopefully) have a nice speed improvement. |
Re: life changes. Totally understand.
Very much looking forward to sourcemaps and speed changes.
You are the change the world needs to be :)
…On Sun, Feb 11, 2018 at 12:05 AM Joshua Sullivan ***@***.***> wrote:
@nojvek <https://github.com/nojvek>, thanks for the bump! I forgot about
this project with all the life changes I had (new job/big move). I'm
wrapping up the pug-lexer piece now and just got it to pass all the
tests. This is just the first phase though. Gotta update the parser etc. to
implement endings as well.
Pug source maps are going to be sick. Also looking forward to getting
everything moved over to babel and passing the babel AST along so we
(hopefully) have a nice speed improvement.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#76 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA-JVAy9PQEJRURtgkiOlujnydtB8ZQMks5tTp80gaJpZM4LmFUJ>
.
|
Currently working on adding source line endings to all tokens as discussed here. However, I wanted to move this to it's own issue instead of hijacking the other.
Going to make it match babylon's
loc
format which looks like:The text was updated successfully, but these errors were encountered: