-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve permutations #183
base: main
Are you sure you want to change the base?
Improve permutations #183
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces a comprehensive refactoring of the expression handling system in the Changes
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (15)
hassil/expression.py (4)
43-57
: Structure “Group” class for clarity.The introduction of the Group class and the text_chunk_count method is logically sound. However, there’s a potential risk of cyclical references if nested groups reference each other as parents. Consider placing a defensive check or using a specialized factory method if deeper nesting is anticipated.
78-84
: Defer lookup for rule expansion until after checking item type.Here, you yield from self._list_names when encountering sub-groups or rule references. This is correct. As a minor improvement, consider short-circuiting or caching the result if the same sub-group or rule is encountered multiple times to possibly reduce overhead.
152-156
: Retain backward compatibility handle in text_chunk_count.Asserting that self.exp is a Group might break integration if older code passes something else. If the system is fully migrated, then this is fine. If partial migrations exist, carefully handle or document the assertion.
199-199
: Raise ValueError with additional context.When raising ValueError(grp), consider providing more descriptive information about the object or its type to improve debugging.
hassil/parse_expression.py (5)
3-3
: Imports reflect new classes.The newly added imports for Alternative, Group, Permutation, and Sequence are consistent with the refactoring across expression classes. Confirm that no unused imports remain after final cleanup.
Also applies to: 6-6, 8-8, 10-10
77-78
: Adding items to Alternative or Permutation.Lines 90-104: If the parent is Alternative or Permutation, you open a new Sequence for the next item. This is a sensible approach but ensure that deeply nested sequences or a large number of alternative tokens do not degrade performance.
Also applies to: 90-104
128-129
: Spaces in permutations.Line 129 calls _add_spaces_between_items if it’s a Permutation. This is crucial for clarity but might inadvertently insert double spaces if the content already includes trailing whitespace. Potential corner case if ignoring whitespace.
Also applies to: 131-131
172-173
: Unpacking the final group in parse_sentence.Line 193 checks if the top-level group has exactly one item that’s also a Group. This is a direct approach that simplifies resulting syntax trees. Keep an eye on multi-item edge cases.
Also applies to: 188-190
359-364
: _add_spaces_between_items can lead to leading spaces.At line 363, you insert a space at the start and end of each Sequence item. This might cause leading/trailing whitespace in final expansions. That might be intended but confirm no downstream code is trimming incorrectly.
hassil/sample.py (2)
157-157
: Generate combined sentence_texts with itertools.product.For Sequence, you generate the Cartesian product. Ensure performance metrics remain acceptable for large or deeply nested Sequences.
Also applies to: 162-162
179-179
: Raise ValueError for unknown group type.Line 179 is a good fallback, but consider adding the actual group’s class name or attributes to the error message for better debugging.
hassil/string_matcher.py (3)
408-408
: Optional expansions with Alternative.Line 408 triggers iteration over all items. For large sets, the overhead can grow quickly. Consider partial or short-circuit matching if performance is a concern.
428-429
: Permutation: match in any order.All must match in arbitrary order. This can be computationally expensive for bigger expressions. A potential optimization is to detect items that can only match at the end or the beginning.
Also applies to: 430-437
439-439
: Raise descriptive error on unexpected group type.Similar feedback as in other files. Adding minimal context, such as the group class name or item count, can streamline debugging.
tests/test_expression.py (1)
23-24
: Consider improving test readabilityThe test assertions, while correct, are quite complex and could benefit from better readability. Consider breaking down complex assertions into multiple steps or using helper functions.
Example refactor for test_group_alternative:
def test_group_alternative(): - assert parse_expression(next_chunk("(test | test2)")) == Alternative( - items=[Sequence(items=[t(text="test ")]), Sequence(items=[t(text=" test2")])], - ) + result = parse_expression(next_chunk("(test | test2)")) + expected = Alternative( + items=[ + Sequence(items=[t(text="test ")]), + Sequence(items=[t(text=" test2")]) + ] + ) + assert result == expectedAlso applies to: 29-31, 35-42, 47-49, 53-57
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
hassil/__init__.py
(1 hunks)hassil/expression.py
(6 hunks)hassil/intents.py
(1 hunks)hassil/parse_expression.py
(7 hunks)hassil/recognize.py
(2 hunks)hassil/sample.py
(7 hunks)hassil/string_matcher.py
(4 hunks)tests/test_expression.py
(4 hunks)tests/test_sample.py
(8 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
tests/test_sample.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/intents.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/string_matcher.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
tests/test_expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/recognize.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/sample.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/parse_expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/__init__.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
🪛 Ruff (0.8.2)
hassil/__init__.py
4-4: .expression.Alternative
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
5-5: .expression.Group
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
6-6: .expression.ListReference
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
7-7: .expression.Permutation
imported but unused; consider removing, adding to __all__
, or using a redundant alias
(F401)
🔇 Additional comments (26)
hassil/expression.py (6)
5-5
: Confirm import usage and referencing of Group as a parent.
At line 5, importing Iterable is fine, but verify that we indeed require it for any new or existing functionality. Also, at line 25, referencing “Optional[Group]” might introduce circular import issues if used incorrectly. Make sure it's used only within type annotations or forward references.
Also applies to: 25-25
88-98
: New Sequence and Alternative classes: suitable inheritance.
Adding “Sequence” and “Alternative” as specialized subclasses is consistent with the new class hierarchy. The usage of is_optional in Alternative is clearly beneficial. Keep an eye on possible expansions of is_optional logic in the future (e.g., partial expansions).
100-111
: Assess permutations for large item counts.
Permutations can grow factorially in complexity. The implementation is correct, but consider adding safeguards or warnings for large group sizes to avoid performance bottlenecks.
145-148
: Validate Sentence’s public interface after removing inheritance.
The shift from subclassing to holding an exp attribute is a good composition pattern. Confirm that external code referencing old Sentence methods (like old sequence behaviors) is updated accordingly.
157-164
: Use consistent list name extraction for nested expansions.
list_names retrieves references recursively. This is correct. Just confirm expansions cover all call paths (especially if expansions are updated externally).
171-177
: Preemptively handle corner cases in compile method.
Ensure that pattern rewriting for spaces (“replace(r"\ ", r"[ ]*")”) performs as intended for complicated text patterns (e.g., multiple spaces or special characters).
hassil/parse_expression.py (5)
52-57
: _ensure_alternative gracefully collapses items.
Collapsing items into a Sequence is logical. Consider verifying that the newly formed Alternative indeed preserves item order by default, especially if future code depends on positional ordering.
59-64
: _ensure_permutation delegates gracefully.
Similar to _ensure_alternative, the logic is correct. Confirm we never pass an already-permuted Group multiple times, leading to nested permutations.
66-75
: parse_group handles group text extraction.
Line 75 raises a parse error if it’s not recognized as GROUP or OPT. This is correct. The short-circuit approach ensures we don’t handle unexpected parse types.
106-106
: Robust error handling in textual parsing.
Each time you create or switch to a new group, you append a Sequence item. The subsequent parse error checks are thorough (e.g., lines 125, 123). Good for diagnosing malformed inputs.
Also applies to: 109-109, 111-111, 114-114, 116-116, 119-119, 121-121, 123-123, 125-125, 126-126
143-149
: Treat optional group as Alternative.
Implementation for optional group sets is_optional=True at line 147. Good approach. Just confirm that empty expansions do not conflict with other elements in subsequent parse steps.
hassil/sample.py (5)
17-17
: Imports for Alternative, Group, and Permutation.
The new imports are correct. Ensure that Sequence is still required since Group is now the base class for all.
Also applies to: 19-19, 21-21
86-86
: Consider skipping wildcard sentences.
The logic in sample_intents excludes wildcard sentences if exclude_sentences_with_wildcards is true. This logic is consistent, but any future wildcard expansions require verifying coverage in tests.
110-121
: Introducing sample_sentence wrapper.
This new function calls sample_expression internally, providing a more intuitive interface for sampling by Sentence. This approach is maintainable.
135-135
: sample_expression handles Alternative.
Your handling of Alternative at lines 137-147 simply yields from each item. That is correct. Watch out for large Alternatives with many items, as it can produce a lot of combinations.
Also applies to: 136-147
262-262
: Apply expansions for RuleReference consistently.
Line 262 reassigns rule_body before sampling. Confirm that this is in sync with more advanced expansions (like optional or nested references).
hassil/string_matcher.py (4)
13-13
: Imports reflect new group classes.
The import for Alternative, Group, and Permutation matches the code’s usage. Remove or adapt Sequence if it’s partially replaced by Group in all references.
Also applies to: 15-15, 17-17
403-404
: Check Group usage in match_expression.
At lines 403-405, we handle Group first. This is an excellent reorganization. The typed checks for Alternative and Sequence clarify the code and reduce confusion.
Also applies to: 405-405
411-412
: Sequence items must all match.
The consecutive approach is correct. Confirm that ignoring whitespace or partial word boundaries doesn’t cause false positives or missed matches.
Also applies to: 415-415
813-813
: RuleReference expansions reflect new structure.
At line 813, we access expansion_rules[rule_ref.rule_name].exp. Confirm that we consistently update references, especially if future code references old “Sequence” semantics.
tests/test_sample.py (2)
3-4
: LGTM: Clean import of the new sample_sentence function
The import statement is correctly updated to use the new sample_sentence
function, which aligns with the PR's architectural changes.
7-7
: LGTM: Consistent test updates using sample_sentence
The test cases have been systematically updated to use sample_sentence
instead of sample_expression
, maintaining the same test coverage while adapting to the new API. The test scenarios comprehensively cover:
- Basic text chunks
- Nested groups
- Optional elements
- Alternatives
- Lists with various configurations
Also applies to: 11-11, 17-17, 26-26, 39-39, 48-48, 57-57, 69-69, 80-80, 94-94
tests/test_expression.py (1)
4-11
: LGTM: Updated imports reflect new class hierarchy
The imports correctly include the new classes (Alternative
, Permutation
) and maintain the existing ones, aligning with the architectural changes.
hassil/intents.py (1)
457-457
: LGTM: Updated return type aligns with new architecture
The change to return parse_sentence(text).exp
instead of the entire sentence object aligns with the architectural change where Sentence
now contains a Group
as an attribute (composition over inheritance).
Let's verify that this change is consistently applied throughout the codebase:
✅ Verification successful
Change is safe and consistently applied
The codebase scan shows that all other usages of parse_sentence
are either:
- The function definition itself in
parse_expression.py
- Cases where the full
Sentence
object is needed (withkeep_text=True
) - Cases in sample/utility code where the full object is acceptable
The modification in _maybe_parse_template
to return .exp
is the correct approach for this specific case and doesn't require similar changes elsewhere in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining direct usage of parse_sentence without .exp
rg "parse_sentence\([^)]+\)(?!\.exp)" -g "!test_*"
Length of output: 311
Script:
#!/bin/bash
# Search for parse_sentence usage with context
rg "parse_sentence\(" -A 2 -g "!test_*"
Length of output: 1214
hassil/recognize.py (2)
435-435
: LGTM: Consistent with the architectural changes
The change from sentence
to sentence.exp
maintains consistency with the previous change and the new architecture.
267-267
: Verify the usage of intent_sentence.exp
across the codebase
The change from intent_sentence
to intent_sentence.exp
aligns with the architectural improvements where Sentence
now uses composition instead of inheritance. This change looks correct, but let's verify its usage across the codebase.
✅ Verification successful
Based on the search results, I can generate the final response as we have sufficient information. The rg
search shows all usages of match_expression
in the codebase, and the ast-grep
search (which returned no results) confirms there are no instances where intent_sentence
is used directly without .exp
.
All usages of intent_sentence.exp
are consistent with the architectural changes
The codebase consistently uses the .exp
property when passing sentences to match_expression
. This includes:
hassil/recognize.py
: Usesintent_sentence.exp
in the changed linehassil/recognize.py
: Usessentence.exp
in another location- All other
match_expression
calls work withExpression
objects directly or via.exp
property
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent usage of intent_sentence.exp across the codebase
# Look for any remaining direct usage of intent_sentence in match_expression calls
# Search for match_expression calls
rg "match_expression\(" -A 3
# Search for any remaining direct usage of intent_sentence without .exp
ast-grep --pattern 'match_expression($$$, $_, intent_sentence$$$)'
Length of output: 3567
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
hassil/expression.py (5)
25-25
: Consider weak references for parent backlinksThe parent reference creates a bidirectional relationship between TextChunk and Group. Consider using
weakref.ref
to prevent potential memory leaks, especially in cases where chunks might be shared between groups.- parent: "Optional[Group]" = None + parent: "Optional[weakref.ReferenceType[Group]]" = None
46-47
: Add type hints for items fieldConsider adding explicit type hints for better code clarity and IDE support.
- items: List[Expression] = field(default_factory=list) + items: List[Expression] = field(default_factory=list[Expression])
Line range hint
71-77
: Consider making _list_names a private methodSince this is an internal helper method, consider using double underscore prefix to make it explicitly private.
- def _list_names( + def __list_names(
97-97
: Document the is_optional fieldAdd docstring explaining the purpose and usage of the is_optional field in Alternative class.
154-155
: Consider using isinstance checks with proper error messagesReplace assertions with explicit type checks and descriptive error messages for better error handling.
- assert isinstance(self.exp, Group) - return self.exp.text_chunk_count() # pylint: disable=no-member + if not isinstance(self.exp, Group): + raise TypeError(f"Expected Group but got {type(self.exp)}") + return self.exp.text_chunk_count()Also applies to: 162-163
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hassil/expression.py
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
hassil/expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
🔇 Additional comments (2)
hassil/expression.py (2)
104-111
: Well implemented lazy evaluation for permutations
The iterate_permutations method effectively implements lazy evaluation of permutations, which aligns with the PR's performance improvement objectives. The implementation avoids generating all permutations upfront.
189-202
: Ensure proper scoping in Alternative and Permutation compilation
The similar pattern matching logic in Alternative and Permutation compilation could lead to scoping issues when these constructs are nested. Consider adding explicit grouping to prevent any ambiguity in the resulting regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
hassil/parse_expression.py (1)
Line range hint
66-131
: Simplify parse_group logic by extracting helper methods.The function has complex nested conditions. Consider breaking it down into smaller, focused methods:
+ def _handle_item_chunk(self, item: Expression, grp: Group) -> None: + """Handle adding an item to a group.""" + if isinstance(grp, (Alternative, Permutation)): + last_item = grp.items[-1] + if not isinstance(last_item, Sequence): + raise ParseExpressionError(grp_chunk, metadata=metadata) + last_item.items.append(item) + else: + grp.items.append(item) + if isinstance(item, TextChunk): + item.parent = grp def parse_group(...): # Use the helper method if item_chunk.parse_type in (...): - if isinstance(grp, (Alternative, Permutation)): - ... - else: - ... + _handle_item_chunk(parse_expression(item_chunk), grp)
🧹 Nitpick comments (5)
tests/test_expression.py (1)
61-84
: Add test cases for edge cases in optional alternatives and permutations.While the current tests cover basic functionality, consider adding test cases for:
- Empty alternatives and permutations
- Single-item alternatives and permutations
- Nested optional alternatives within permutations and vice versa
hassil/expression.py (3)
Line range hint
44-87
: Add return type hints to methods for better maintainability.Consider adding explicit return type hints to methods:
- def text_chunk_count(self) -> int: + def text_chunk_count(self) -> int: """Return the number of TextChunk expressions in this group (recursive).""" - def _list_names( - self, - item: Expression, - expansion_rules: Optional[Dict[str, Sentence]] = None, - ) -> Iterator[str]: + def _list_names( + self, + item: Expression, + expansion_rules: Optional[Dict[str, Sentence]] = None, + ) -> Iterator[str]: """Return names of list references (recursive)."""
101-111
: Optimize permutation iteration to reduce memory allocation.The current implementation creates a new list for each permutation. Consider using a more memory-efficient approach:
def iterate_permutations(self) -> Iterable[Tuple[Expression, Permutation]]: """Iterate over all permutations.""" for i, item in enumerate(self.items): - items = self.items[:] - del items[i] + items = [self.items[j] for j in range(len(self.items)) if j != i] rest = Permutation(items=items) yield (item, rest)
Line range hint
146-217
: Strengthen type assertions in Sentence methods.Consider adding more specific type assertions to catch potential type errors earlier:
def text_chunk_count(self) -> int: """Return the number of TextChunk expressions in this sentence.""" - assert isinstance(self.exp, Group) + if not isinstance(self.exp, Group): + raise TypeError(f"Expected Group, got {type(self.exp)}") return self.exp.text_chunk_count()hassil/parse_expression.py (1)
361-366
: Document in-place modification behavior in _add_spaces_between_items.The function modifies the permutation in place, which should be clearly documented:
def _add_spaces_between_items(perm: Permutation) -> None: - """Add spaces between each 2 items of a permutation""" + """Add spaces between each 2 items of a permutation. + + Note: + This function modifies the permutation in place by adding TextChunk + objects with spaces at the beginning and end of each sequence. + + Args: + perm: The permutation to modify. + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
hassil/expression.py
(6 hunks)hassil/parse_expression.py
(7 hunks)tests/test_expression.py
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
tests/test_expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/parse_expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
hassil/expression.py (6)
Pattern */**(html|markdown|md)
: - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.
- Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
- In step-by-step instructions, front the location phrase in the instructional sentence.
- In step-by-step instructions, front the 'goal' in the instructional sentence.
- In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
- do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'
Pattern */**(html|markdown|md)
: - Use bold to mark UI strings.
- If "" are used to mark UI strings, replace them by bold.
Pattern */**(html|markdown|md)
: - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"
Pattern */**(html|markdown|md)
: - Use sentence-style capitalization also in headings.
Pattern */**(html|markdown|md)
: do not comment on HTML used for icons
Pattern */**(html|markdown|md)
: Avoid flagging inline HTML for embedding videos in future reviews for this repository.
🔇 Additional comments (2)
tests/test_expression.py (2)
4-4
: LGTM! The imports and test case align with the new class hierarchy.
The test case properly validates the nesting of Sequence objects, which is a crucial functionality in the new architecture.
Also applies to: 6-6, 22-24
104-107
: LGTM! Test case properly validates the new Sentence structure.
The test ensures that sentences correctly handle Sequence objects as their expression containers.
Make permutations lazily evaluated. That should allow for better performance with bigger permutations.
While I was redefining and reworking parts of it, I took the opportunity to clean up some things:
Summary by CodeRabbit
New Features
Alternative
,Group
, andPermutation
for enhanced expression handling.sample_sentence
function for streamlined sampling of sentences.Bug Fixes
recognize_all
andis_match
functions for better processing of sentence expressions.Documentation
Refactor
Chores