Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: begin/end ranges for primitive nodes #50

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Artawower
Copy link

I had no luck setting begin/end for the Timestamp object (since this type already has an end property, I could use a universal property name - contentsBegin/contentsEnd, if you don't mind).

@rasendubi
Copy link
Owner

Hey, thanks for the pull request!

I researched a little, and it would be great if we could comply with Unist—all Node have position field that has start and end subfields. Doing so would allow us to use more of unified plugins (e.g., reporting messages/warnings on source file).

To convert offset to Point, we can use #location field on Reader (a vfile-location instance):

public toPoint(offset: number): Point {
  return this.#location.toPoint(offset);
}

public toPosition(begin: number, end: number): Position {
  return {
    start: this.toPoint(begin),
    end: this.toPoint(end),
  };
}

That should also resolve end collision on Keyword.

Position information adds quite a lot of details and makes AST much bigger, so it would be nice if we would only add positional information if position setting is set to true. (We need to add new setting.)

Copy link
Owner

@rasendubi rasendubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, looks good. we'll also need to update test snapshots (run npm test -- --update from the root)

packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/parse-options.ts Outdated Show resolved Hide resolved
@Artawower
Copy link
Author

npm test -- --update

When I try to run the test from the root path, I get this error Cannot find module 'uniorg-parse/lib/parser' from different packages. Do you know how to fix this?
I was running the test from the uniorg-parse package before, and it worked fine.

@rasendubi
Copy link
Owner

Try running npm run bootstrap && npm run build first

@codecov-commenter
Copy link

Codecov Report

Merging #50 (ba0581e) into master (c6fdf0d) will increase coverage by 0.01%.
The diff coverage is 98.03%.

@@            Coverage Diff             @@
##           master      #50      +/-   ##
==========================================
+ Coverage   95.93%   95.94%   +0.01%     
==========================================
  Files          15       15              
  Lines        1622     1651      +29     
  Branches      522      532      +10     
==========================================
+ Hits         1556     1584      +28     
- Misses         65       66       +1     
  Partials        1        1              
Impacted Files Coverage Δ
packages/uniorg-parse/src/parse-options.ts 100.00% <ø> (ø)
packages/uniorg-parse/src/parser.ts 95.24% <97.87%> (+0.01%) ⬆️
packages/uniorg-parse/src/reader.ts 96.29% <100.00%> (+0.19%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Owner

@rasendubi rasendubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a lot of comments. The position handling seems to be quite convoluted

It would be also great to support the positions for all nodes—I will try to look into this over the weekend.

packages/uniorg-parse/src/parse-options.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
@@ -1466,7 +1532,8 @@ class Parser {
const contentsBegin = this.r.offset() + m.index + m[1].length + m[3].length;
const contentsEnd = contentsBegin + m[4].length;
this.r.resetOffset(contentsEnd + 1);
return u('code', { value }, []);
const [begin, end] = this.getCurrentRange(value);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗🐛 getCurrentRange assumes that cursor is the beginning of the element. It's at the end after this.r.resetOffset(contentsEnd + 1)

packages/uniorg-parse/src/parser.ts Outdated Show resolved Hide resolved
packages/uniorg-parse/src/reader.ts Outdated Show resolved Hide resolved
/*
* Return begin and end positions from current cursor position + val length
*/
private getCurrentRange(val: string): [number, number] {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this function seems to be rather unfortunate—most of the usages above misuse it because cursor dependency is not obvious. It seems that it's better to remove this function and inline all the usages (that would probably make the bugs more obvious)

Copy link
Owner

@rasendubi rasendubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for the long review 🙃 the things have been a bit crazy over here

Comment on lines 848 to 854
this.parseEmptyLines();
const _end = this.r.offset();

return u('src-block', { affiliated, language, value });
return u('src-block', {
affiliated,
language,
value,
...this.getPosition(contentsBegin, contentsEnd),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still valid:

❗🐛 same here and all blocks below: begin/end should be wider

For all blocks, the begin/end should be wider than contenstBegin/contentsEnd and include #+begin_xxx and #+end_xxx lines.

Suggested change
this.parseEmptyLines();
const _end = this.r.offset();
return u('src-block', { affiliated, language, value });
return u('src-block', {
affiliated,
language,
value,
...this.getPosition(contentsBegin, contentsEnd),
this.parseEmptyLines();
const end = this.r.offset();
return u('src-block', {
affiliated,
language,
value,
...this.getPosition(begin, end),

this.r.advance(this.r.forceMatch(/^[ \t]*CLOCK:[ \t]*/));
const value = this.parseTimestamp();

const end = this.r.offset();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗🐛 the end does not account for advances on the next two lines. Need to move this line just before this.parseEmptyLines()

Comment on lines 1537 to 1539
this.r.resetOffset(contentsEnd + 1);
return u('code', { value }, []);
const begin = this.r.offset();
const end = begin + value.length;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗🐛 the begin here is set too late and points after contentsEnd (because of resetOffset on the previous line)

The correct begin should be set before contentsBegin to this.r.offset() + m.index (can use this.r.advance(m.index) right before it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that begin should be placed before backoff function here

  private parseCode(): Code | null {
    // backoff one char to check border
    const begin = this.r.offset();
    this.r.backoff(1);
    const m = this.r.lookingAt(this.re.verbatimRe());
    if (!m) return null;
    const value = m[4];
    const contentsBegin = this.r.offset() + m.index + m[1].length + m[3].length;
    const contentsEnd = contentsBegin + m[4].length;
    const end = contentsEnd + 1;
    this.r.resetOffset(end);

    return u('code', { value, ...this.getPosition(begin, end) }, []);
  }

because this code also takes into account ~ symbol before code

@@ -1808,6 +1892,7 @@ class Parser {
rawValue,
start,
end,
...this.getPosition(begin, begin + rawValue.length),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: works correctly but best to use this.r.offset() for consistency:

const end = this.r.offset();

...this.getPosition(begin, end),

@Artawower
Copy link
Author

Hey, sorry for the long review 🙃 the things have been a bit crazy over here

No problem, it really is a crazy time for all of us

Also i noticed that timestamp parsing was incorrect,

children:
  - type: "clock"
    value:
      type: "timestamp"
      timestampType: "inactive"
      rawValue: "[2021-01-10 Sun 14:36]"
      start:
        year: 2021
        month: 1
        day: 10
        hour: 14
        minute: 36
      end: null
      position:
        start:
          line: 1
          column: 8
          offset: 7
        end:
          line: 1
          column: 30
          offset: 29

The real start should be 0. I added additional argument to parseTimestamp function, that indicate about real offset before usage. I dunno might be there is better alternative to determine real offset

- Fix begin/end positions for content with children
- Fix timestamp range
- Some minor refactor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants