-
Notifications
You must be signed in to change notification settings - Fork 93
Add trie for patterns #3076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add trie for patterns #3076
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a trie-based data structure to support efficient pattern matching with wildcards, which is useful for required and enum field validation. It adds new trie implementation logic and accompanying unit tests to verify various pattern matching scenarios.
- Added a PatternTrie implementation supporting exact and wildcard matches.
- Introduced comprehensive unit tests covering empty, simple, nested, overlapping, and wildcard scenarios.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
libs/dyn/pattern_trie_test.go | Added unit tests for validating the behavior of the new pattern trie implementation. |
libs/dyn/pattern_trie.go | Implemented the PatternTrie and its recursive search functionality to support wildcards. |
libs/dyn/pattern_trie.go
Outdated
} | ||
|
||
// trieNode represents a node in the pattern trie. | ||
// Note that it can only be one of anyKey, anyIndex, collection of pathKeys, or collection of pathIndexes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this statement is enforced well enough in the current code, which is prone to state inconsistencies, which are difficult to debug, if multiple fields end up being set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. That comment is misleading and I've removed it. Also added a bigger blurb explaining it all. It is valid for both anyKey and pathKeys to be set at the same time and we have a test that actually does so.
From the perspective of the trie it's also fine if both anyKey and anyIndex are set at the same time, but since such a scenario does not arise in practice I've omitted to support that in SearchPath
.
While validation would be nice to have, it seem unnecessary to me given the unit tests already cover correctness. I'm happy to add validation that both key (one of anyKey and pathKeys) and index (one of anyIndex and pathIndex) are not set at the same time if you think we should add it regardless.
// Indicates if this node is the end of a pattern. Encountering a node | ||
// with isEnd set to true in a trie means the pattern from the root to this | ||
// node is a complete pattern. | ||
isEnd bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are isEnd nodes expected to have any of the other fields set?
// It supports both exact matches and wildcard matches. You can insert [Pattern]s | ||
// into the trie and then query it to see if a given [Path] matches any of the | ||
// patterns. | ||
type PatternTrie struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q - why have this wrapper struct, why not work on *trieNode directly?
// is not supported by the [PatternTrie.SearchPath] method. We don't perform validation for this | ||
// case because it's not expected to arise in practice where a field is either a map or an array, | ||
// but not both. | ||
type trieNode struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to attach value to nodes? e.g. list of available enum values?
} | ||
|
||
// Mark as end of pattern if this is the last component. | ||
if !next.isEnd && i == len(pattern)-1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!next.isEnd
this check is redundant?
// SearchPath checks if the given path matches any pattern in the trie. | ||
// A path matches if it exactly matches a pattern or if it matches a pattern with wildcards. | ||
func (t *PatternTrie) SearchPath(path Path) (Pattern, bool) { | ||
// We statically allocate the prefix array that is used to track the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd call this pre-allocate, static allocation means something else https://en.wikipedia.org/wiki/Static_variable
|
||
// Zero case, when the query path is the root node. We return nil here to match | ||
// the semantics of [MustPatternFromString] which returns nil for the empty string. | ||
if len(path) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path is always the same in all recursive calls, right? (This code somewhat implies that path is shrinking with depth).
We should not need this condition - we already check below that index == len(path).
if currentComponent.isIndex() { | ||
child, exists := node.pathIndex[currentComponent.Index()] | ||
if !exists { | ||
return prefix, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we supposed to return partial pattern? Is there a use case for that?
} | ||
|
||
// First check if the index wildcard is set for the current index. | ||
if currentComponent.isIndex() && node.anyIndex != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q - currentComponent.IsIndex() must be true at this point, right?
|
||
// Set of indices which this trie node matches. | ||
// Maps to the [Index] component. | ||
pathIndex map[int]*trieNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot think of any case where we would need to match exact index array. Can be dropped to simplify?
name: "empty pattern", | ||
pattern: "", | ||
expected: []string{""}, | ||
notExpected: []string{"foo"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: expected / notExpected make it sound like we return all matching patterns, whereas we just check if they match. mustMatch / mustNotMatch would better explain that.
Why
A prefix tree datastructure for patterns allows us to do quick lookup for patterns. This will be useful for required and enum field validation.
Downstream validation PR: #3044
Tests
Unit tests.