-
Notifications
You must be signed in to change notification settings - Fork 51
Pattern-matching fixes, optimization, tests #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Aside from this previously being a needless operation (when 'recursive' was specified as 'true') - previously this sometimes erroneously return 'true' for a recursive/operand match (already-matched operand which has been replaced by an expression with the same structure).
- Also account for canonical-status of the *replacement*-expr. for successful replacements. - If 'canonical' is not specified as an option during replacement: - There is differentiation between canonicalizing any direct replacements (i.e. these may be recursive), and the overall/input expr. - ^In this case, each canonical-status, of sub-exprs. will generally be *preserved*: but will 'opportunistically' mark expressions as canonical (e.g., because replaced operands are now canonical)
- easier to follow (at least from outsider's perspective) - removes unnecessary/duplicate stmts.
- In essence, more-or-less 'fixes' expr. matching behaviour for patterns involving sequences. These changes result in the following variety of cases successfully matching where they would not have done prior: - Sequence wildcards matching >1 operand, e.g.: - ['Add', '__a', 'x'] will now match ['Add', 1, 2, 3, 'x'] - Multiple sequence wildcards (at the same level), e.g.: - ['Tuple', 1, '__a', 4, '__b', 7, '__c'] will now match ['Tuple', 1, 2, 3, 4, 5, 6, 7, 8] - Handles behaviour, and operand/expression capturing for cases of a regular sequence wildcard, followed by one+ optional-seq. wildcards, e.g.: - For ['Multiply', '__f', '___g', '___h'], 'f' will now match 'greedily' and essentially 'merge' with following sequences, such that 'g' and 'h' each capture '0' operands/each: this experientially being the preferred/expected behavior. (^note that sequences of (regular) sequences do not need to be accounted for since these are considered 'invalid' anyway (a subsequent commit set to account for this)) The fact of these cases failing to match has hitherto been obscured on account of absence of tests (these have now been added). Changes/fixes have been achieved by trying permutations of quantity of operands/expressions matched by sequence wildcards (this logic appears to _mistakenly_ have been absent prior), as well as special 'lookahead' checks for cases of 'regular/optional sequence' cards. 'matchArguments' - where the majority of these changes are situated, has (should) now also be refactored for readability. - 'patterns.test.ts': re-writes, almost entirely, the test cases & structure throughout, including addition of/more particular tests, increased qty. of matchers within each, & a healthy handful of controls. - Also: - Reverts/fixes the condition of argument matching skipping pattern permutations with sequences/sub-sequences consisting of a regular sequence wildcard followed by a universal ('_') wildcard (consider ['Add', '__a', '_b']. Whilst in some sensitive intuitive - if considering sequences as 'greedy'-matching - this sequence of wildcards clearly has matching utility for some cases (notably this was breaking some existing tests, too).
* containing the replaced expr. will still however have their (previous) canonical-status | ||
* *preserved*... unless this expr. was previously non-canonical, and *replacements have resulted | ||
* in canonical operands*. In this case, an expr. meeting this criteria will be updated to | ||
* canonical status. (Canonicalization is opportunistic here, in other words). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs your verification and amendment if necessary.
After considering the possibilities here, particularly with recursion, thought that this strategy makes sense.
const canonical = | ||
options?.canonical ?? | ||
(rule instanceof _BoxedExpression ? rule.isCanonical : false); | ||
return ce.box(rule, { canonical }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think that this slightly refined check is a bit more correct ?
@@ -715,12 +728,11 @@ export function applyRule( | |||
|
|||
if (canonical && match) { | |||
const awc = getWildcards(match); | |||
const originalMatch = match; | |||
match = match.canonical; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (line) appeared to be non-effective, its removal not affecting any tests, and so removed it.
replace instanceof _BoxedExpression && | ||
replace.isCanonical | ||
) | ||
canonical = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note: this in reference to the added doc. for method replace
)
@@ -419,7 +419,7 @@ function parseRule( | |||
|
|||
// Check for conditions | |||
const conditions = parseModifierExpression(parser); | |||
if (conditions === null) return null; | |||
if (conditions === null) return `${prefix}${id}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe the fix of this typo has resulted in correct parsing of LaTeX with sequence-syntax (...n + ...
)
@@ -1122,6 +1122,9 @@ export interface BoxedExpression { | |||
* | |||
* :::info[Note] | |||
* Applicable to canonical and non-canonical expressions. | |||
* | |||
* To specify a match for single symbol (non wildcard), it must be boxed (e.g. `{ match: | |||
* ce.box('x'), ... }`), but it is suggested to use method *'subs()'* for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in reference to opening-PR message: this is the current 'workaround'
(not that it matters since subs()
can be used)
* Assuming the input is canonicalized, the resultant expr. against which the pattern is matched | ||
* is `['Add', 'x', 1, 2, 3, 5]`: which cannot be matched against any pattern permutation... (this | ||
* should be expected to be a valid match: with expr. expected to be commutative & therefore | ||
* operands validly taking on any arrangement, canonical-form aside). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see opening PR message: this a remaining issue for commutative matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A bit off-topic an observation, but also noticed that for some definitions such as Set
, these not marked as 'commutative' where they could be)
isWildcard(nextPattern) && | ||
wildcardType(nextPattern) === 'Sequence' | ||
) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Validation' step as mentioned (for call to expr.match()
) would still be needed to verify this
// @note: if pattern has been validated prior, the next should never be a '_' | ||
// (universal) or '__' (regular sequence) wildcard | ||
nextAppPattern = patterns[++nextAppPatternIndex]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a relatively big added change/check, but believe procedure here is sensible, if you want to verify... ?
// against '3 + 4 + x + b'...: the sequence will have initially captured just '3', but | ||
// this will result in a final overall no-match. In this case. allow the sequence to | ||
// capture '3 + 4' (finally permitting a 'total' match) | ||
while (j <= ops.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This the primary added loop permitting correct/fuller matching of sequences (outlined in this' commit message); appears sensible... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be end of review
Also, have seen that there are lots of changes with the recent major release, by the way! Have yet to examine them in more detail (out of interest), but looks good 👏. Were the canonicalization/binding changes included also, and was the primary change here the recognition of an evaluation-scope? |
Thanks for this PR. Will review it in details soon. The canonicalization/binding changes should be incorporated, that is only canonical functions are bound (conversely, bound functions are necessarily canonical). There are lot of changes in the latest release. Yes, the addition of proper scoping is one of them (variable capture is not yet implemented, though). There is indeed clearer difference between runtime/evaluation scope and lexical scope (the definitions). This allow proper recursion to work now. |
Ah, that's good to hear; I wasn't so sure if those changes (canonicalization) were present from the changelog, but have seen at least some evidence in the source-code. Later, I will try to get more to grips with it: [again] out of interest. Yes, apparently some decent fixes and optimizations present with this one: noticed them (bugs) whilst experimenting & pondering over pattern-matching. Should be worthwhile... |
Sorry that this is later than expected... At least, it's a substantial contribution with some good fixes.
Like usual, will do an inline/source-code review on most outstanding (in literal sense) changes.
There are still one or two remaining issues on the subject of pattern matching (notably, often referenced in source-code comments. Therefore, these likely best reviewed after scanning over changes):
A simple case is (extracted from inline doc.) is
['Add', 1, 2, 3, 'x', 5]
matched against pattern['Add', 1, '__a']
. This does not match (neither prior to, or after changes added here), due to the match being made against the canonicalized input (['Add', 'x', 1, 2, 3, 4, 5]
(I think).Specifying symbols as strings for match/replace - via
expr.replace()
- as illustrated within thePatterns and Rules
guide, does not function as expected (i.e. as documentation suggests), due to the 'match' part being parsed as a rule-string: that is, in which symbols (single-character) are parsed as wildcards.I.e. the example given listed on this page does not function as indicated, but each rule instead matches/replaces any expression:
And some general notes (again best seen after visiting changes):
'Pattern' validation ideally to be carried out upon calls to
expr.match()
: such that sequences of (adjacent) regular-sequence wildcards (consider['List', '__a', '__b', ...]
) are considered 'invalid' (a bit too strong a term).In reference to changes: generated permutations skip patterns of this structure, on the basis of these being redundant (if not, unpredictable).
There is the question also, of what the return value should be (of
expr.match()
);null
does not seem quite fitting. Perhaps something like a 'reason', as returned fromce.ask()
...?And a few future suggestions & feature requests:
Personally, would find it useful to have a
matchPermutations
option (i.e. as applicable to commutative FN's), forexpr.match()
. Sometimes, particularly when wildcards are included in a pattern, I would find desirable to only match exactly as is/as requested.Also for expr.match(), it would have utility to be able to specify expression-scoped / pertinent conditions in a similar way as can be expressed via replacement-rule LaTeX match strings (
x_{numeric} + y_{prime}
for instance.).Being able to specify LaTeX-style 'match' strings in the same manner of method 'replace', would not go amiss either...
For method
replace()
:recursive: true
should be default...? (personal preference/use is for this to be so. Semantically, the sense I get/interpret is 'replace all instances (of)'. Mentionably also, without 'recursive: true', a successful result is nothing but a (top-level) expression re-assignment (assuming assignment of call result) at the Javascript/Typescript level).A possible means could be to use latex ID-prefixes (
\operatorname
,\mathrm
...), but clearly this would be long-winded. Perhaps alternatively a special minor, syntactical rule to specify this (same level of simplicity as the use of prefix...
to match sequences, for instance).Bit heavy on the text there, but for future reference...!