-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(module-source): Sub non-conforming ZWJ
prefixes with CGJ
#2436
fix(module-source): Sub non-conforming ZWJ
prefixes with CGJ
#2436
Conversation
3b32ab3
to
eff44d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m in favor of this change. Since this is a design passed down from @michaelfig and @erights, I would ask for them to sign-off as well.
There are some tests for "zero width joiner" invariants in @endo/module-source
. We need to make sure that CGJ is disallowed for all source
in ModuleSource(source)
.
@kriskowal To your point:
I only came across the following endo/packages/module-source/src/babelPlugin.js Lines 268 to 285 in 67141a1
The code is structured to ensure there is a single source of truth, so this holds with That said, I am also not sure if the following could prove problematic down the road: new ModuleSource<CGJ>(bundleSource<ZWJ>(sourcePath, { format: 'endoScript' }).source); I don't foresee this being a problem, but I wonder if adding a test case could be reasonable while we're here. Concluding Note: I raised the above based on the realization that until this change, there was never a case where a rewritten module could have undergone a second pass. In other words, while the intent is that we are changing the invisible joiner character, the outcome is that we are also altering the behaviour of the constructor. This altered behaviour may lead to edge cases where modules previously rewritten with We are in agreement that we do not foresee problems, at least in theory, omitting adding a test in this PR. |
eff44d1
to
ef92306
Compare
I’m not worried about mixing CGJ and ZWJ. The version of I just want to make the names of the tests have "combining grapheme joiner" instead of "zero-width joiner" now, or some abstract term that would apply regardless of which is used. Otherwise, this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I marked a few cases where the test description could be loosened to just "joiner". Up to you as to what you do with these descriptions.
ef92306
to
862dfd8
Compare
As requested, @SMotaal updated the occurrences of "zero width joiner" (ZWJ)
Description
This PR changes the
HIDDEN_PREFIX
ofModuleSource
from the non-conforming$h\u200D_
zero-width joiner (ZWJ
) notation to the conforming$h\u034F_
combining grapheme joiner (CGJ
) notation.A future PR may address further changes to a
$\u034F
-prefixed and\u034F$
-suffixed format as was suggested by @michaelfig in discussions.Motivation
This change is motivated after encountering a parsing error when using
rollup
which was traced back to the$h\u200D_
-prefixed identifier in anendoScript
bundle. More importantly, this is also motivated by the subsequent discovery thatrollup
's implementation was actually conforming to the ECMAScript Specification when it was throwing this error.To elaborate, while runtimes today will accept the special identifier notation that is currently being introduced by the
ModuleSource
rewrites, the current$h\u200D_
zero-width joiner (ZWJ
) notation does not conform to the specifications defined in the ECMAScript Lexical Grammar. In essence, what the specifications entail is that the character sequence for Identifier Names once unescaped would be expected to match the/^[$_\p{ID_Start}][$_\p{ID_Continue}]*$/u
pattern, aside from the additional#
character prefix required in the case of private fields.As such, one can test this in the console by evaluating the following:
The above would yield the following object in a runtime where the unicode escape sequences are retained:
Digging closer in the Unicode Standard, it seems that the zero-width joiner (
ZWJ
) may indeed be used in a conforming notation per Emoji Profile in Annex #31 of the Unicode Standard, however this is not applicable for this purpose as it would require the use of emojis.At this point, my suggestion to instead use the combining grapheme joiner (
CGJ
) is best articulated with this excerpt that I am borrowing from its canonical Wikipedia entry:The Wikipedia article offers additional nuances about the differences, while the Proposal for addition of COMBINING GRAPHEME JOINER offers the necessary context about its intent.
It is fair to note that there are many uses of the zero-width joiner (
ZWJ
) already in the wild, and in fact there are currentlytest262
tests for its occurrence. That said, unless those uses are conforming to the ECMAScript Specification and the Unicode Standard, they will limit code portability and adoption by users who may end up confused by failures similar to the one encountered withrollup
.Ultimately, with the reasonable recommendations to exercise caution when it comes to bundling
ses
and related sources that are best bundled withbundleSource
instead, those sources may still need to be parsed with tools likerollup
for different purposes that would be aligned with the expectations that they are being handled safely.Approach
Substituting the invisible joiner character
A search across the monorepo for
(?:\u200d|\\u200d)_
yields only 3 files of interest:packages/module-source/TESTS.md
packages/module-source/src/hidden.js
packages/module-source/test/module-source.test.js
While making changes to the 3 files of interest, a distinction is made between matching
\$h\\u200d_
and\$h\u200d_
where the replacements are respectively$h\\u034f_
and$h\u034f_
, along with their$c
equivalents.The search across the monorepo for
(?:\u200d|\\u200d)_
yields another 978 files that are not of interest found in:packages/test262-runner/test262/test/language/expressions/class/elements
packages/test262-runner/test262/test/language/statements/class/elements
All those files remain unchanged.
Ensuring generic wording is used
For testing and other purposes where descriptive phrases are used to refer to the use of
ZWJ
,CGJ
or other characters for this same intent, the phrase "invisible joiner character" is suggested.Security Considerations
Does not apply to my knowledge
Scaling Considerations
Does not apply to my knowledge
Documentation Considerations
Does not apply to my knowledge
Testing Considerations
See: #2436 (comment)
Compatibility Considerations
While the changes do not affect compatibility when the generated code is evaluated at runtime, there can potentially be compatibility concerns with tools that have been specifically designed to work with the current notation.
Upgrade Considerations
Does not apply to my knowledge
Footnotes
https://en.wikipedia.org/wiki/Combining_grapheme_joiner ↩