Skip to content

Add trie for patterns #3076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Add trie for patterns #3076

wants to merge 9 commits into from

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Jun 17, 2025

Why

A prefix tree datastructure for patterns allows us to do quick lookup for patterns. This will be useful for required and enum field validation.

Downstream validation PR: #3044

Tests

Unit tests.

@shreyas-goenka shreyas-goenka changed the title Add PatternTrie Add trie for patterns Jun 17, 2025
@shreyas-goenka shreyas-goenka marked this pull request as ready for review June 17, 2025 16:54
@Copilot Copilot AI review requested due to automatic review settings June 17, 2025 16:54
Copilot

This comment was marked as outdated.

@shreyas-goenka shreyas-goenka marked this pull request as draft June 18, 2025 09:21
@shreyas-goenka shreyas-goenka marked this pull request as ready for review June 18, 2025 11:28
@shreyas-goenka shreyas-goenka requested a review from Copilot June 18, 2025 11:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a trie-based data structure to support efficient pattern matching with wildcards, which is useful for required and enum field validation. It adds new trie implementation logic and accompanying unit tests to verify various pattern matching scenarios.

  • Added a PatternTrie implementation supporting exact and wildcard matches.
  • Introduced comprehensive unit tests covering empty, simple, nested, overlapping, and wildcard scenarios.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
libs/dyn/pattern_trie_test.go Added unit tests for validating the behavior of the new pattern trie implementation.
libs/dyn/pattern_trie.go Implemented the PatternTrie and its recursive search functionality to support wildcards.

}

// trieNode represents a node in the pattern trie.
// Note that it can only be one of anyKey, anyIndex, collection of pathKeys, or collection of pathIndexes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this statement is enforced well enough in the current code, which is prone to state inconsistencies, which are difficult to debug, if multiple fields end up being set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That comment is misleading and I've removed it. Also added a bigger blurb explaining it all. It is valid for both anyKey and pathKeys to be set at the same time and we have a test that actually does so.

From the perspective of the trie it's also fine if both anyKey and anyIndex are set at the same time, but since such a scenario does not arise in practice I've omitted to support that in SearchPath.

While validation would be nice to have, it seem unnecessary to me given the unit tests already cover correctness. I'm happy to add validation that both key (one of anyKey and pathKeys) and index (one of anyIndex and pathIndex) are not set at the same time if you think we should add it regardless.

// Indicates if this node is the end of a pattern. Encountering a node
// with isEnd set to true in a trie means the pattern from the root to this
// node is a complete pattern.
isEnd bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are isEnd nodes expected to have any of the other fields set?

// It supports both exact matches and wildcard matches. You can insert [Pattern]s
// into the trie and then query it to see if a given [Path] matches any of the
// patterns.
type PatternTrie struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q - why have this wrapper struct, why not work on *trieNode directly?

// is not supported by the [PatternTrie.SearchPath] method. We don't perform validation for this
// case because it's not expected to arise in practice where a field is either a map or an array,
// but not both.
type trieNode struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to attach value to nodes? e.g. list of available enum values?

}

// Mark as end of pattern if this is the last component.
if !next.isEnd && i == len(pattern)-1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!next.isEnd

this check is redundant?

// SearchPath checks if the given path matches any pattern in the trie.
// A path matches if it exactly matches a pattern or if it matches a pattern with wildcards.
func (t *PatternTrie) SearchPath(path Path) (Pattern, bool) {
// We statically allocate the prefix array that is used to track the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd call this pre-allocate, static allocation means something else https://en.wikipedia.org/wiki/Static_variable


// Zero case, when the query path is the root node. We return nil here to match
// the semantics of [MustPatternFromString] which returns nil for the empty string.
if len(path) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path is always the same in all recursive calls, right? (This code somewhat implies that path is shrinking with depth).

We should not need this condition - we already check below that index == len(path).

if currentComponent.isIndex() {
child, exists := node.pathIndex[currentComponent.Index()]
if !exists {
return prefix, false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we supposed to return partial pattern? Is there a use case for that?

}

// First check if the index wildcard is set for the current index.
if currentComponent.isIndex() && node.anyIndex != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q - currentComponent.IsIndex() must be true at this point, right?


// Set of indices which this trie node matches.
// Maps to the [Index] component.
pathIndex map[int]*trieNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of any case where we would need to match exact index array. Can be dropped to simplify?

name: "empty pattern",
pattern: "",
expected: []string{""},
notExpected: []string{"foo"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expected / notExpected make it sound like we return all matching patterns, whereas we just check if they match. mustMatch / mustNotMatch would better explain that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants