Skip to content

Pattern-matching fixes, optimization, tests #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

samueltlg
Copy link
Contributor

Sorry that this is later than expected... At least, it's a substantial contribution with some good fixes.

Like usual, will do an inline/source-code review on most outstanding (in literal sense) changes.

There are still one or two remaining issues on the subject of pattern matching (notably, often referenced in source-code comments. Therefore, these likely best reviewed after scanning over changes):

  • Using permutations on patterns (only) does not, with current behaviour, capture, some cases of commutative-operator operand sequences where it would otherwise be expected.
    A simple case is (extracted from inline doc.) is ['Add', 1, 2, 3, 'x', 5] matched against pattern ['Add', 1, '__a']. This does not match (neither prior to, or after changes added here), due to the match being made against the canonicalized input (['Add', 'x', 1, 2, 3, 4, 5] (I think).
  • This one may more-so just be due to outdated (public) documentation:
    Specifying symbols as strings for match/replace - via expr.replace() - as illustrated within the Patterns and Rules guide, does not function as expected (i.e. as documentation suggests), due to the 'match' part being parsed as a rule-string: that is, in which symbols (single-character) are parsed as wildcards.
    I.e. the example given listed on this page does not function as indicated, but each rule instead matches/replaces any expression:
const expr = ce.box(["Add", ["Multiply", "a", "x"], "b"]);
expr.replace([
    { match: "a", replace: 2 },
    { match: "b", replace: 3 }
  ],
  { recursive: true }
)?.print();

And some general notes (again best seen after visiting changes):

  • Match permutations (i.e. for commutative expr. operators) could benefit from further optimization.
  • (Was going to add this as a final commit, & would not take me much time to add):
    'Pattern' validation ideally to be carried out upon calls to expr.match(): such that sequences of (adjacent) regular-sequence wildcards (consider ['List', '__a', '__b', ...]) are considered 'invalid' (a bit too strong a term).
    In reference to changes: generated permutations skip patterns of this structure, on the basis of these being redundant (if not, unpredictable).
    There is the question also, of what the return value should be (of expr.match()); null does not seem quite fitting. Perhaps something like a 'reason', as returned from ce.ask() ...?

And a few future suggestions & feature requests:

  • Personally, would find it useful to have a matchPermutations option (i.e. as applicable to commutative FN's), for expr.match(). Sometimes, particularly when wildcards are included in a pattern, I would find desirable to only match exactly as is/as requested.

  • Also for expr.match(), it would have utility to be able to specify expression-scoped / pertinent conditions in a similar way as can be expressed via replacement-rule LaTeX match strings (x_{numeric} + y_{prime} for instance.).
    Being able to specify LaTeX-style 'match' strings in the same manner of method 'replace', would not go amiss either...

  • For method replace():

    • Arguably, recursive: true should be default...? (personal preference/use is for this to be so. Semantically, the sense I get/interpret is 'replace all instances (of)'. Mentionably also, without 'recursive: true', a successful result is nothing but a (top-level) expression re-assignment (assuming assignment of call result) at the Javascript/Typescript level).
    • A way/a syntax variant to match single-character variables as symbols (instead of wildcards) in LaTeX strings would be desirable? Could not find a way to do this (but this is of course easily expressible with boxed-expressions.
      A possible means could be to use latex ID-prefixes (\operatorname, \mathrm...), but clearly this would be long-winded. Perhaps alternatively a special minor, syntactical rule to specify this (same level of simplicity as the use of prefix ... to match sequences, for instance).

Bit heavy on the text there, but for future reference...!

Aside from this previously being a needless operation (when 'recursive'
was specified as 'true') - previously this sometimes erroneously return
'true' for a recursive/operand match (already-matched operand which has
been replaced by an expression with the same structure).
- Also account for canonical-status of the *replacement*-expr. for
  successful replacements.

- If 'canonical' is not specified as an option during replacement:
  - There is differentiation between canonicalizing any direct
    replacements (i.e. these may be recursive), and the overall/input
    expr.
  - ^In this case, each canonical-status, of sub-exprs. will generally
    be *preserved*: but will 'opportunistically' mark expressions as
    canonical (e.g., because replaced operands are now canonical)
- easier to follow (at least from outsider's perspective)
- removes unnecessary/duplicate stmts.
- In essence, more-or-less 'fixes' expr. matching behaviour for patterns
  involving sequences. These changes result in the following variety of
  cases successfully matching where they would not have done
  prior:
    - Sequence wildcards matching >1 operand, e.g.:
      - ['Add', '__a', 'x'] will now match ['Add', 1, 2, 3, 'x']
    - Multiple sequence wildcards (at the same level), e.g.:
      - ['Tuple', 1, '__a', 4, '__b', 7, '__c'] will now match ['Tuple',
        1, 2, 3, 4, 5, 6, 7, 8]
    - Handles behaviour, and operand/expression capturing for cases of a
      regular sequence wildcard, followed by one+ optional-seq.
      wildcards, e.g.:
      - For ['Multiply', '__f', '___g', '___h'], 'f' will now match
        'greedily' and essentially 'merge' with following sequences,
        such that 'g' and 'h' each capture '0' operands/each: this
        experientially being the preferred/expected behavior.
         (^note that sequences of (regular) sequences do not need to be
         accounted for since these are considered 'invalid' anyway
         (a subsequent commit set to account for this))

  The fact of these cases failing to match has hitherto been obscured on
  account of absence of tests (these have now been added).
  Changes/fixes have been achieved by trying permutations of quantity of
  operands/expressions matched by sequence wildcards (this logic appears
  to _mistakenly_ have been absent prior), as well as special
  'lookahead' checks for cases of 'regular/optional sequence' cards.
  'matchArguments' - where the majority of these changes are situated,
  has (should) now also be refactored for readability.

- 'patterns.test.ts': re-writes, almost entirely, the test cases &
  structure throughout, including addition of/more particular tests,
  increased qty. of matchers within each, & a healthy handful of
  controls.

- Also:
 - Reverts/fixes the condition of argument matching skipping pattern
   permutations with sequences/sub-sequences consisting of a regular
   sequence wildcard followed by a universal ('_') wildcard (consider
   ['Add', '__a', '_b']. Whilst in some sensitive intuitive - if
   considering sequences as 'greedy'-matching - this sequence of
   wildcards clearly has matching utility for some cases (notably this
   was breaking some existing tests, too).
@samueltlg samueltlg changed the title Patterns substitution fixes Pattern-matching fixes, optimization, tests Aug 1, 2025
* containing the replaced expr. will still however have their (previous) canonical-status
* *preserved*... unless this expr. was previously non-canonical, and *replacements have resulted
* in canonical operands*. In this case, an expr. meeting this criteria will be updated to
* canonical status. (Canonicalization is opportunistic here, in other words).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs your verification and amendment if necessary.
After considering the possibilities here, particularly with recursion, thought that this strategy makes sense.

const canonical =
options?.canonical ??
(rule instanceof _BoxedExpression ? rule.isCanonical : false);
return ce.box(rule, { canonical });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think that this slightly refined check is a bit more correct ?

@@ -715,12 +728,11 @@ export function applyRule(

if (canonical && match) {
const awc = getWildcards(match);
const originalMatch = match;
match = match.canonical;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (line) appeared to be non-effective, its removal not affecting any tests, and so removed it.

replace instanceof _BoxedExpression &&
replace.isCanonical
)
canonical = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note: this in reference to the added doc. for method replace)

@@ -419,7 +419,7 @@ function parseRule(

// Check for conditions
const conditions = parseModifierExpression(parser);
if (conditions === null) return null;
if (conditions === null) return `${prefix}${id}`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe the fix of this typo has resulted in correct parsing of LaTeX with sequence-syntax (...n + ...)

@@ -1122,6 +1122,9 @@ export interface BoxedExpression {
*
* :::info[Note]
* Applicable to canonical and non-canonical expressions.
*
* To specify a match for single symbol (non wildcard), it must be boxed (e.g. `{ match:
* ce.box('x'), ... }`), but it is suggested to use method *'subs()'* for this.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in reference to opening-PR message: this is the current 'workaround'
(not that it matters since subs() can be used)

* Assuming the input is canonicalized, the resultant expr. against which the pattern is matched
* is `['Add', 'x', 1, 2, 3, 5]`: which cannot be matched against any pattern permutation... (this
* should be expected to be a valid match: with expr. expected to be commutative & therefore
* operands validly taking on any arrangement, canonical-form aside).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see opening PR message: this a remaining issue for commutative matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A bit off-topic an observation, but also noticed that for some definitions such as Set, these not marked as 'commutative' where they could be)

isWildcard(nextPattern) &&
wildcardType(nextPattern) === 'Sequence'
)
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Validation' step as mentioned (for call to expr.match()) would still be needed to verify this

// @note: if pattern has been validated prior, the next should never be a '_'
// (universal) or '__' (regular sequence) wildcard
nextAppPattern = patterns[++nextAppPatternIndex];
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a relatively big added change/check, but believe procedure here is sensible, if you want to verify... ?

// against '3 + 4 + x + b'...: the sequence will have initially captured just '3', but
// this will result in a final overall no-match. In this case. allow the sequence to
// capture '3 + 4' (finally permitting a 'total' match)
while (j <= ops.length) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the primary added loop permitting correct/fuller matching of sequences (outlined in this' commit message); appears sensible... ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be end of review

@samueltlg
Copy link
Contributor Author

samueltlg commented Aug 1, 2025

Also, have seen that there are lots of changes with the recent major release, by the way! Have yet to examine them in more detail (out of interest), but looks good 👏. Were the canonicalization/binding changes included also, and was the primary change here the recognition of an evaluation-scope?
And the change in syntax for types (I think at least function signatures), inspired by the scala language, or just went by how you feel/fancied a change 🤔 ? Best

@arnog
Copy link
Member

arnog commented Aug 2, 2025

Thanks for this PR. Will review it in details soon.

The canonicalization/binding changes should be incorporated, that is only canonical functions are bound (conversely, bound functions are necessarily canonical).

There are lot of changes in the latest release. Yes, the addition of proper scoping is one of them (variable capture is not yet implemented, though). There is indeed clearer difference between runtime/evaluation scope and lexical scope (the definitions). This allow proper recursion to work now.
One tweak on the type was the syntax for variadic arguments, which used ...xs previously and now use xs+ and xs*. The motivation for the change is that while using types, I actually could never remember if ...xs was 1 or more or 0 or more, so I'd figured I would switch to a clearer syntax that I would remember :) Is that what you're referring to? Does Scala really has the same syntax? If so, amusing that we arrived at the same place. The main inspiration for this was actually the syntax for RegEx.

@samueltlg
Copy link
Contributor Author

Thanks for this PR. Will review it in details soon.

The canonicalization/binding changes should be incorporated, that is only canonical functions are bound (conversely, bound functions are necessarily canonical).

There are lot of changes in the latest release. Yes, the addition of proper scoping is one of them (variable capture is not yet implemented, though). There is indeed clearer difference between runtime/evaluation scope and lexical scope (the definitions). This allow proper recursion to work now. One tweak on the type was the syntax for variadic arguments, which used ...xs previously and now use xs+ and xs*. The motivation for the change is that while using types, I actually could never remember if ...xs was 1 or more or 0 or more, so I'd figured I would switch to a clearer syntax that I would remember :) Is that what you're referring to? Does Scala really has the same syntax? If so, amusing that we arrived at the same place. The main inspiration for this was actually the syntax for RegEx.

Ah, that's good to hear; I wasn't so sure if those changes (canonicalization) were present from the changelog, but have seen at least some evidence in the source-code. Later, I will try to get more to grips with it: [again] out of interest.
From looking at the Scala documentation some time ago (have not used the language personally), I recall seeing a similar syntax, at least a 'postfix x, in the context of function signatures. I may be wrong, but will have to investigate...

Yes, apparently some decent fixes and optimizations present with this one: noticed them (bugs) whilst experimenting & pondering over pattern-matching. Should be worthwhile...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants