Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct begin and end (line+char) positions for each token and node #23

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nene
Copy link
Contributor

@nene nene commented Jun 12, 2013

Related to the issue I just raised: #22

I replaced the rudimentary @line field in Token and Node classes with @range field that contains the following 2D array:

[[start_line, start_char], [end_line, end_char]]

Unlike the old implementation, all nodes now have this information. And I've kept the #line accessor for backwards compatibility.

Also, I eliminated the whole comments-to-syntax-nodes mapping. Placing the array of all comments into the root node.

There are things to improve here:

  • Maybe the 2D array should be replaced with a hash or object, so it would be a bit more readable. e.g. {:start => {:line => 3, :char => 8}, :end => {:line => 3, :char => 10}}
  • Instead of line+character based ranges we could provide a simple index based ranges. Some tools might prefer one way, others a different way. So the best might be to support both.
  • With all comments residing in the root SourceElementsNode - all other node types no more need the @comments field. So maybe it would be better to wrap everything inside a special ProgramNode that contains this @comments field. Or not return the comments at all, instead providing RKelly::Parser#comments accessor to access the comments list if needed.

All this aside, I think it's an improvement over the current implementation which is simply broken.

nene added 5 commits June 5, 2013 19:47
Previously the @line field that some Nodes had wasn't always
the number of the line where a Node started.  And it was hard
to tell why some nodes had this info, while others didn't.

Now both the start and end line of every Token and Node is
stored in @range field as two-element array.

The #line method is still there for backwards compatibility,
but the @range field is the one that holds the actual data.

For now I had to disable the tests in test_comments.rb as
the #apply_comments (which it effectively tested) relied on
only some nodes having the line number data available.
Each Token and Node range is now a 2D array of the following:

    [[start_line, start_char], [end_line, end_char]]

One can now know the exact position of each node.
Instead of attempting to associate all comments with syntax nodes
(and actually failing to correctly do so) all comments found are
added to the root SoureElementsNode.comments - all other nodes
will have an empty #comments array.
Create classes for holding character positions data.

- The CharPos contains both line/char and absolute character index.
- The CharRange combines two CharPos objects.
- Both have #next method to abstract away the creation
  of the ranges inside Tokenizer.
@nene
Copy link
Contributor Author

nene commented Jun 12, 2013

Already took care of the two points mentioned and replaced the array-based ranges with CharRange and CharPos objects. Much cleaner.

Alias the #to_s to #inspect so that when using `pp` to inspect
the created AST, the ranges will appear nice and compact.

Also tweak to #to_s a bit to output its stuff in a bit shorter form.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant