Pattern-matching fixes, optimization, tests #248

samueltlg · 2025-08-01T00:52:51Z

Sorry that this is later than expected... At least, it's a substantial contribution with some good fixes.

Like usual, will do an inline/source-code review on most outstanding (in literal sense) changes.

There are still one or two remaining issues on the subject of pattern matching (notably, often referenced in source-code comments. Therefore, these likely best reviewed after scanning over changes):

Using permutations on patterns (only) does not, with current behaviour, capture, some cases of commutative-operator operand sequences where it would otherwise be expected.
A simple case is (extracted from inline doc.) is ['Add', 1, 2, 3, 'x', 5] matched against pattern ['Add', 1, '__a']. This does not match (neither prior to, or after changes added here), due to the match being made against the canonicalized input (['Add', 'x', 1, 2, 3, 4, 5] (I think).
This one may more-so just be due to outdated (public) documentation:
Specifying symbols as strings for match/replace - via expr.replace() - as illustrated within the Patterns and Rules guide, does not function as expected (i.e. as documentation suggests), due to the 'match' part being parsed as a rule-string: that is, in which symbols (single-character) are parsed as wildcards.
I.e. the example given listed on this page does not function as indicated, but each rule instead matches/replaces any expression:

const expr = ce.box(["Add", ["Multiply", "a", "x"], "b"]);
expr.replace([
    { match: "a", replace: 2 },
    { match: "b", replace: 3 }
  ],
  { recursive: true }
)?.print();

And some general notes (again best seen after visiting changes):

Match permutations (i.e. for commutative expr. operators) could benefit from further optimization.
(Was going to add this as a final commit, & would not take me much time to add):
'Pattern' validation ideally to be carried out upon calls to expr.match(): such that sequences of (adjacent) regular-sequence wildcards (consider ['List', '__a', '__b', ...]) are considered 'invalid' (a bit too strong a term).
In reference to changes: generated permutations skip patterns of this structure, on the basis of these being redundant (if not, unpredictable).
There is the question also, of what the return value should be (of expr.match()); null does not seem quite fitting. Perhaps something like a 'reason', as returned from ce.ask() ...?

And a few future suggestions & feature requests:

Personally, would find it useful to have a matchPermutations option (i.e. as applicable to commutative FN's), for expr.match(). Sometimes, particularly when wildcards are included in a pattern, I would find desirable to only match exactly as is/as requested.
Also for expr.match(), it would have utility to be able to specify expression-scoped / pertinent conditions in a similar way as can be expressed via replacement-rule LaTeX match strings (x_{numeric} + y_{prime} for instance.).
Being able to specify LaTeX-style 'match' strings in the same manner of method 'replace', would not go amiss either...
For method replace():
- Arguably, recursive: true should be default...? (personal preference/use is for this to be so. Semantically, the sense I get/interpret is 'replace all instances (of)'. Mentionably also, without 'recursive: true', a successful result is nothing but a (top-level) expression re-assignment (assuming assignment of call result) at the Javascript/Typescript level).
- A way/a syntax variant to match single-character variables as symbols (instead of wildcards) in LaTeX strings would be desirable? Could not find a way to do this (but this is of course easily expressible with boxed-expressions.
  A possible means could be to use latex ID-prefixes (\operatorname, \mathrm...), but clearly this would be long-winded. Perhaps alternatively a special minor, syntactical rule to specify this (same level of simplicity as the use of prefix ... to match sequences, for instance).

Bit heavy on the text there, but for future reference...!

Aside from this previously being a needless operation (when 'recursive' was specified as 'true') - previously this sometimes erroneously return 'true' for a recursive/operand match (already-matched operand which has been replaced by an expression with the same structure).

- Also account for canonical-status of the *replacement*-expr. for successful replacements. - If 'canonical' is not specified as an option during replacement: - There is differentiation between canonicalizing any direct replacements (i.e. these may be recursive), and the overall/input expr. - ^In this case, each canonical-status, of sub-exprs. will generally be *preserved*: but will 'opportunistically' mark expressions as canonical (e.g., because replaced operands are now canonical)

- easier to follow (at least from outsider's perspective) - removes unnecessary/duplicate stmts.

- In essence, more-or-less 'fixes' expr. matching behaviour for patterns involving sequences. These changes result in the following variety of cases successfully matching where they would not have done prior: - Sequence wildcards matching >1 operand, e.g.: - ['Add', '__a', 'x'] will now match ['Add', 1, 2, 3, 'x'] - Multiple sequence wildcards (at the same level), e.g.: - ['Tuple', 1, '__a', 4, '__b', 7, '__c'] will now match ['Tuple', 1, 2, 3, 4, 5, 6, 7, 8] - Handles behaviour, and operand/expression capturing for cases of a regular sequence wildcard, followed by one+ optional-seq. wildcards, e.g.: - For ['Multiply', '__f', '___g', '___h'], 'f' will now match 'greedily' and essentially 'merge' with following sequences, such that 'g' and 'h' each capture '0' operands/each: this experientially being the preferred/expected behavior. (^note that sequences of (regular) sequences do not need to be accounted for since these are considered 'invalid' anyway (a subsequent commit set to account for this)) The fact of these cases failing to match has hitherto been obscured on account of absence of tests (these have now been added). Changes/fixes have been achieved by trying permutations of quantity of operands/expressions matched by sequence wildcards (this logic appears to _mistakenly_ have been absent prior), as well as special 'lookahead' checks for cases of 'regular/optional sequence' cards. 'matchArguments' - where the majority of these changes are situated, has (should) now also be refactored for readability. - 'patterns.test.ts': re-writes, almost entirely, the test cases & structure throughout, including addition of/more particular tests, increased qty. of matchers within each, & a healthy handful of controls. - Also: - Reverts/fixes the condition of argument matching skipping pattern permutations with sequences/sub-sequences consisting of a regular sequence wildcard followed by a universal ('_') wildcard (consider ['Add', '__a', '_b']. Whilst in some sensitive intuitive - if considering sequences as 'greedy'-matching - this sequence of wildcards clearly has matching utility for some cases (notably this was breaking some existing tests, too).

samueltlg · 2025-08-01T01:06:43Z

src/compute-engine/global-types.ts

+   * containing the replaced expr. will still however have their (previous) canonical-status
+   * *preserved*... unless this expr. was previously non-canonical, and *replacements have resulted
+   * in canonical operands*. In this case, an expr. meeting this criteria will be updated to
+   * canonical status. (Canonicalization is opportunistic here, in other words).


Needs your verification and amendment if necessary.
After considering the possibilities here, particularly with recursion, thought that this strategy makes sense.

samueltlg · 2025-08-01T01:08:22Z

src/compute-engine/boxed-expression/rules.ts

+  const canonical =
+    options?.canonical ??
+    (rule instanceof _BoxedExpression ? rule.isCanonical : false);
+  return ce.box(rule, { canonical });


Think that this slightly refined check is a bit more correct ?

samueltlg · 2025-08-01T01:09:53Z

src/compute-engine/boxed-expression/rules.ts

@@ -715,12 +728,11 @@ export function applyRule(

  if (canonical && match) {
    const awc = getWildcards(match);
-    const originalMatch = match;
-    match = match.canonical;


This (line) appeared to be non-effective, its removal not affecting any tests, and so removed it.

samueltlg · 2025-08-01T01:12:04Z

src/compute-engine/boxed-expression/rules.ts

+    replace instanceof _BoxedExpression &&
+    replace.isCanonical
+  )
+    canonical = true;


(note: this in reference to the added doc. for method replace)

samueltlg · 2025-08-01T01:14:41Z

src/compute-engine/boxed-expression/rules.ts

@@ -419,7 +419,7 @@ function parseRule(

        // Check for conditions
        const conditions = parseModifierExpression(parser);
-        if (conditions === null) return null;
+        if (conditions === null) return `${prefix}${id}`;


Believe the fix of this typo has resulted in correct parsing of LaTeX with sequence-syntax (...n + ...)

samueltlg · 2025-08-01T01:28:41Z

src/compute-engine/global-types.ts

@@ -1122,6 +1122,9 @@ export interface BoxedExpression {
   *
   * :::info[Note]
   * Applicable to canonical and non-canonical expressions.
+   * 
+   * To specify a match for single symbol (non wildcard), it must be boxed (e.g. `{ match:
+   * ce.box('x'), ... }`), but it is suggested to use method *'subs()'* for this.


This in reference to opening-PR message: this is the current 'workaround'
(not that it matters since subs() can be used)

samueltlg · 2025-08-01T01:31:13Z

src/compute-engine/boxed-expression/match.ts

+ *   Assuming the input is canonicalized, the resultant expr. against which the pattern is matched
+ *   is `['Add', 'x', 1, 2, 3, 5]`: which cannot be matched against any pattern permutation... (this
+ *   should be expected to be a valid match: with expr. expected to be commutative & therefore
+ *   operands validly taking on any arrangement, canonical-form aside).


Also see opening PR message: this a remaining issue for commutative matching.

(A bit off-topic an observation, but also noticed that for some definitions such as Set, these not marked as 'commutative' where they could be)

samueltlg · 2025-08-01T01:34:38Z

src/compute-engine/boxed-expression/match.ts

+                isWildcard(nextPattern) &&
+                wildcardType(nextPattern) === 'Sequence'
+              )
+            );


'Validation' step as mentioned (for call to expr.match()) would still be needed to verify this

samueltlg · 2025-08-01T01:37:29Z

src/compute-engine/boxed-expression/match.ts

+              // @note: if pattern has been validated prior, the next should never be a '_'
+              // (universal) or '__' (regular sequence) wildcard
+              nextAppPattern = patterns[++nextAppPatternIndex];
+            }


This is a relatively big added change/check, but believe procedure here is sensible, if you want to verify... ?

samueltlg · 2025-08-01T01:42:41Z

src/compute-engine/boxed-expression/match.ts

+              // against '3 + 4 + x + b'...: the sequence will have initially captured just '3', but
+              // this will result in a final overall no-match. In this case. allow the sequence to
+              // capture '3 + 4' (finally permitting a 'total' match)
+              while (j <= ops.length) {


This the primary added loop permitting correct/fuller matching of sequences (outlined in this' commit message); appears sensible... ?

Should be end of review

samueltlg · 2025-08-01T17:36:01Z

Also, have seen that there are lots of changes with the recent major release, by the way! Have yet to examine them in more detail (out of interest), but looks good 👏. Were the canonicalization/binding changes included also, and was the primary change here the recognition of an evaluation-scope?
And the change in syntax for types (I think at least function signatures), inspired by the scala language, or just went by how you feel/fancied a change 🤔 ? Best

arnog · 2025-08-02T06:34:16Z

Thanks for this PR. Will review it in details soon.

The canonicalization/binding changes should be incorporated, that is only canonical functions are bound (conversely, bound functions are necessarily canonical).

There are lot of changes in the latest release. Yes, the addition of proper scoping is one of them (variable capture is not yet implemented, though). There is indeed clearer difference between runtime/evaluation scope and lexical scope (the definitions). This allow proper recursion to work now.
One tweak on the type was the syntax for variadic arguments, which used ...xs previously and now use xs+ and xs*. The motivation for the change is that while using types, I actually could never remember if ...xs was 1 or more or 0 or more, so I'd figured I would switch to a clearer syntax that I would remember :) Is that what you're referring to? Does Scala really has the same syntax? If so, amusing that we arrived at the same place. The main inspiration for this was actually the syntax for RegEx.

samueltlg · 2025-08-03T20:55:29Z

Thanks for this PR. Will review it in details soon.

The canonicalization/binding changes should be incorporated, that is only canonical functions are bound (conversely, bound functions are necessarily canonical).

There are lot of changes in the latest release. Yes, the addition of proper scoping is one of them (variable capture is not yet implemented, though). There is indeed clearer difference between runtime/evaluation scope and lexical scope (the definitions). This allow proper recursion to work now. One tweak on the type was the syntax for variadic arguments, which used ...xs previously and now use xs+ and xs*. The motivation for the change is that while using types, I actually could never remember if ...xs was 1 or more or 0 or more, so I'd figured I would switch to a clearer syntax that I would remember :) Is that what you're referring to? Does Scala really has the same syntax? If so, amusing that we arrived at the same place. The main inspiration for this was actually the syntax for RegEx.

Ah, that's good to hear; I wasn't so sure if those changes (canonicalization) were present from the changelog, but have seen at least some evidence in the source-code. Later, I will try to get more to grips with it: [again] out of interest.
From looking at the Scala documentation some time ago (have not used the language personally), I recall seeing a similar syntax, at least a 'postfix x, in the context of function signatures. I may be wrong, but will have to investigate...

Yes, apparently some decent fixes and optimizations present with this one: noticed them (bugs) whilst experimenting & pondering over pattern-matching. Should be worthwhile...

samueltlg added 6 commits July 31, 2025 22:10

!fix: parsing of sequence-wildcard syntax in LaTeX pattern-matching

c3e14fe

refactor: clearer argument matching logic for pattern matching

4eb640f

- easier to follow (at least from outsider's perspective) - removes unnecessary/duplicate stmts.

refactor: optimise pattern matching via reducing arg. permutations

03d7284

samueltlg changed the title ~~Patterns substitution fixes~~ Pattern-matching fixes, optimization, tests Aug 1, 2025

samueltlg commented Aug 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pattern-matching fixes, optimization, tests #248

Pattern-matching fixes, optimization, tests #248

Uh oh!

samueltlg commented Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg Aug 1, 2025

Uh oh!

samueltlg commented Aug 1, 2025 •

edited

Loading

Uh oh!

arnog commented Aug 2, 2025

Uh oh!

samueltlg commented Aug 3, 2025

Uh oh!

Uh oh!

Pattern-matching fixes, optimization, tests #248

Are you sure you want to change the base?

Pattern-matching fixes, optimization, tests #248

Uh oh!

Conversation

samueltlg commented Aug 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samueltlg commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arnog commented Aug 2, 2025

Uh oh!

samueltlg commented Aug 3, 2025

Uh oh!

Uh oh!

samueltlg commented Aug 1, 2025 •

edited

Loading