Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve permutations #183

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

arturpragacz
Copy link
Contributor

@arturpragacz arturpragacz commented Dec 20, 2024

Make permutations lazily evaluated. That should allow for better performance with bigger permutations.

While I was redefining and reworking parts of it, I took the opportunity to clean up some things:

  • Sequence, Alternative and Permutation are now subclasses of Group.
  • Sentence is no longer a subclass, instead it has a Group as an attribute, favoring composition over inheritance.

Summary by CodeRabbit

  • New Features

    • Introduced new classes: Alternative, Group, and Permutation for enhanced expression handling.
    • Added sample_sentence function for streamlined sampling of sentences.
  • Bug Fixes

    • Improved matching logic in recognize_all and is_match functions for better processing of sentence expressions.
  • Documentation

    • Updated test cases to reflect changes in expression handling and new class structures.
  • Refactor

    • Reorganized and renamed classes in expression handling for clarity and maintainability.
    • Simplified parsing logic to accommodate new class hierarchy.
    • Enhanced error handling to improve clarity in reporting.
  • Chores

    • Removed deprecated functions and types to streamline the codebase.

Copy link

coderabbitai bot commented Dec 20, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces a comprehensive refactoring of the expression handling system in the hassil library. The primary changes involve renaming and restructuring classes related to expression parsing, introducing new classes like Group, Alternative, and Permutation, and modifying how sentences and expressions are processed. The modifications impact multiple files across the library, including __init__.py, expression.py, parse_expression.py, intents.py, recognize.py, sample.py, and string_matcher.py. The changes provide a more flexible and clear approach to handling different types of expression groups and matching scenarios.

Changes

File Change Summary
hassil/__init__.py - Added imports: Alternative, Group, Permutation from .expression
- Removed import: SequenceType
hassil/expression.py - Renamed Sequence to Group
- Added new classes: Sequence, Alternative, Permutation
- Modified Sentence class to use exp attribute
- Updated text_chunk_count and list_names methods
hassil/intents.py - Modified _maybe_parse_template to return parse_sentence(text).exp
hassil/parse_expression.py - Refactored parsing functions to work with Group instead of Sequence
- Updated error handling and type checking
- Modified _add_spaces_between_items to work with Permutation
hassil/recognize.py - Updated recognize_all and is_match to use intent_sentence.exp
hassil/sample.py - Added sample_sentence function
- Updated sample_expression to handle new expression types
hassil/string_matcher.py - Added Alternative, Group, and Permutation classes
- Updated match_expression to handle new expression types
tests/test_expression.py - Removed group and alt functions
- Updated test cases to use new Sequence and Alternative classes
tests/test_sample.py - Replaced sample_expression with sample_sentence in test cases

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (15)
hassil/expression.py (4)

43-57: Structure “Group” class for clarity.

The introduction of the Group class and the text_chunk_count method is logically sound. However, there’s a potential risk of cyclical references if nested groups reference each other as parents. Consider placing a defensive check or using a specialized factory method if deeper nesting is anticipated.


78-84: Defer lookup for rule expansion until after checking item type.

Here, you yield from self._list_names when encountering sub-groups or rule references. This is correct. As a minor improvement, consider short-circuiting or caching the result if the same sub-group or rule is encountered multiple times to possibly reduce overhead.


152-156: Retain backward compatibility handle in text_chunk_count.

Asserting that self.exp is a Group might break integration if older code passes something else. If the system is fully migrated, then this is fine. If partial migrations exist, carefully handle or document the assertion.


199-199: Raise ValueError with additional context.

When raising ValueError(grp), consider providing more descriptive information about the object or its type to improve debugging.

hassil/parse_expression.py (5)

3-3: Imports reflect new classes.

The newly added imports for Alternative, Group, Permutation, and Sequence are consistent with the refactoring across expression classes. Confirm that no unused imports remain after final cleanup.

Also applies to: 6-6, 8-8, 10-10


77-78: Adding items to Alternative or Permutation.

Lines 90-104: If the parent is Alternative or Permutation, you open a new Sequence for the next item. This is a sensible approach but ensure that deeply nested sequences or a large number of alternative tokens do not degrade performance.

Also applies to: 90-104


128-129: Spaces in permutations.

Line 129 calls _add_spaces_between_items if it’s a Permutation. This is crucial for clarity but might inadvertently insert double spaces if the content already includes trailing whitespace. Potential corner case if ignoring whitespace.

Also applies to: 131-131


172-173: Unpacking the final group in parse_sentence.

Line 193 checks if the top-level group has exactly one item that’s also a Group. This is a direct approach that simplifies resulting syntax trees. Keep an eye on multi-item edge cases.

Also applies to: 188-190


359-364: _add_spaces_between_items can lead to leading spaces.

At line 363, you insert a space at the start and end of each Sequence item. This might cause leading/trailing whitespace in final expansions. That might be intended but confirm no downstream code is trimming incorrectly.

hassil/sample.py (2)

157-157: Generate combined sentence_texts with itertools.product.

For Sequence, you generate the Cartesian product. Ensure performance metrics remain acceptable for large or deeply nested Sequences.

Also applies to: 162-162


179-179: Raise ValueError for unknown group type.

Line 179 is a good fallback, but consider adding the actual group’s class name or attributes to the error message for better debugging.

hassil/string_matcher.py (3)

408-408: Optional expansions with Alternative.

Line 408 triggers iteration over all items. For large sets, the overhead can grow quickly. Consider partial or short-circuit matching if performance is a concern.


428-429: Permutation: match in any order.

All must match in arbitrary order. This can be computationally expensive for bigger expressions. A potential optimization is to detect items that can only match at the end or the beginning.

Also applies to: 430-437


439-439: Raise descriptive error on unexpected group type.

Similar feedback as in other files. Adding minimal context, such as the group class name or item count, can streamline debugging.

tests/test_expression.py (1)

23-24: Consider improving test readability

The test assertions, while correct, are quite complex and could benefit from better readability. Consider breaking down complex assertions into multiple steps or using helper functions.

Example refactor for test_group_alternative:

 def test_group_alternative():
-    assert parse_expression(next_chunk("(test | test2)")) == Alternative(
-        items=[Sequence(items=[t(text="test ")]), Sequence(items=[t(text=" test2")])],
-    )
+    result = parse_expression(next_chunk("(test | test2)"))
+    expected = Alternative(
+        items=[
+            Sequence(items=[t(text="test ")]),
+            Sequence(items=[t(text=" test2")])
+        ]
+    )
+    assert result == expected

Also applies to: 29-31, 35-42, 47-49, 53-57

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bf179c and 07c3ca9.

📒 Files selected for processing (9)
  • hassil/__init__.py (1 hunks)
  • hassil/expression.py (6 hunks)
  • hassil/intents.py (1 hunks)
  • hassil/parse_expression.py (7 hunks)
  • hassil/recognize.py (2 hunks)
  • hassil/sample.py (7 hunks)
  • hassil/string_matcher.py (4 hunks)
  • tests/test_expression.py (4 hunks)
  • tests/test_sample.py (8 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
tests/test_sample.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/intents.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/string_matcher.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

tests/test_expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/recognize.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/sample.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/parse_expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/__init__.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

🪛 Ruff (0.8.2)
hassil/__init__.py

4-4: .expression.Alternative imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


5-5: .expression.Group imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


6-6: .expression.ListReference imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


7-7: .expression.Permutation imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🔇 Additional comments (26)
hassil/expression.py (6)

5-5: Confirm import usage and referencing of Group as a parent.

At line 5, importing Iterable is fine, but verify that we indeed require it for any new or existing functionality. Also, at line 25, referencing “Optional[Group]” might introduce circular import issues if used incorrectly. Make sure it's used only within type annotations or forward references.

Also applies to: 25-25


88-98: New Sequence and Alternative classes: suitable inheritance.

Adding “Sequence” and “Alternative” as specialized subclasses is consistent with the new class hierarchy. The usage of is_optional in Alternative is clearly beneficial. Keep an eye on possible expansions of is_optional logic in the future (e.g., partial expansions).


100-111: Assess permutations for large item counts.

Permutations can grow factorially in complexity. The implementation is correct, but consider adding safeguards or warnings for large group sizes to avoid performance bottlenecks.


145-148: Validate Sentence’s public interface after removing inheritance.

The shift from subclassing to holding an exp attribute is a good composition pattern. Confirm that external code referencing old Sentence methods (like old sequence behaviors) is updated accordingly.


157-164: Use consistent list name extraction for nested expansions.

list_names retrieves references recursively. This is correct. Just confirm expansions cover all call paths (especially if expansions are updated externally).


171-177: Preemptively handle corner cases in compile method.

Ensure that pattern rewriting for spaces (“replace(r"\ ", r"[ ]*")”) performs as intended for complicated text patterns (e.g., multiple spaces or special characters).

hassil/parse_expression.py (5)

52-57: _ensure_alternative gracefully collapses items.

Collapsing items into a Sequence is logical. Consider verifying that the newly formed Alternative indeed preserves item order by default, especially if future code depends on positional ordering.


59-64: _ensure_permutation delegates gracefully.

Similar to _ensure_alternative, the logic is correct. Confirm we never pass an already-permuted Group multiple times, leading to nested permutations.


66-75: parse_group handles group text extraction.

Line 75 raises a parse error if it’s not recognized as GROUP or OPT. This is correct. The short-circuit approach ensures we don’t handle unexpected parse types.


106-106: Robust error handling in textual parsing.

Each time you create or switch to a new group, you append a Sequence item. The subsequent parse error checks are thorough (e.g., lines 125, 123). Good for diagnosing malformed inputs.

Also applies to: 109-109, 111-111, 114-114, 116-116, 119-119, 121-121, 123-123, 125-125, 126-126


143-149: Treat optional group as Alternative.

Implementation for optional group sets is_optional=True at line 147. Good approach. Just confirm that empty expansions do not conflict with other elements in subsequent parse steps.

hassil/sample.py (5)

17-17: Imports for Alternative, Group, and Permutation.

The new imports are correct. Ensure that Sequence is still required since Group is now the base class for all.

Also applies to: 19-19, 21-21


86-86: Consider skipping wildcard sentences.

The logic in sample_intents excludes wildcard sentences if exclude_sentences_with_wildcards is true. This logic is consistent, but any future wildcard expansions require verifying coverage in tests.


110-121: Introducing sample_sentence wrapper.

This new function calls sample_expression internally, providing a more intuitive interface for sampling by Sentence. This approach is maintainable.


135-135: sample_expression handles Alternative.

Your handling of Alternative at lines 137-147 simply yields from each item. That is correct. Watch out for large Alternatives with many items, as it can produce a lot of combinations.

Also applies to: 136-147


262-262: Apply expansions for RuleReference consistently.

Line 262 reassigns rule_body before sampling. Confirm that this is in sync with more advanced expansions (like optional or nested references).

hassil/string_matcher.py (4)

13-13: Imports reflect new group classes.

The import for Alternative, Group, and Permutation matches the code’s usage. Remove or adapt Sequence if it’s partially replaced by Group in all references.

Also applies to: 15-15, 17-17


403-404: Check Group usage in match_expression.

At lines 403-405, we handle Group first. This is an excellent reorganization. The typed checks for Alternative and Sequence clarify the code and reduce confusion.

Also applies to: 405-405


411-412: Sequence items must all match.

The consecutive approach is correct. Confirm that ignoring whitespace or partial word boundaries doesn’t cause false positives or missed matches.

Also applies to: 415-415


813-813: RuleReference expansions reflect new structure.

At line 813, we access expansion_rules[rule_ref.rule_name].exp. Confirm that we consistently update references, especially if future code references old “Sequence” semantics.

tests/test_sample.py (2)

3-4: LGTM: Clean import of the new sample_sentence function

The import statement is correctly updated to use the new sample_sentence function, which aligns with the PR's architectural changes.


7-7: LGTM: Consistent test updates using sample_sentence

The test cases have been systematically updated to use sample_sentence instead of sample_expression, maintaining the same test coverage while adapting to the new API. The test scenarios comprehensively cover:

  • Basic text chunks
  • Nested groups
  • Optional elements
  • Alternatives
  • Lists with various configurations

Also applies to: 11-11, 17-17, 26-26, 39-39, 48-48, 57-57, 69-69, 80-80, 94-94

tests/test_expression.py (1)

4-11: LGTM: Updated imports reflect new class hierarchy

The imports correctly include the new classes (Alternative, Permutation) and maintain the existing ones, aligning with the architectural changes.

hassil/intents.py (1)

457-457: LGTM: Updated return type aligns with new architecture

The change to return parse_sentence(text).exp instead of the entire sentence object aligns with the architectural change where Sentence now contains a Group as an attribute (composition over inheritance).

Let's verify that this change is consistently applied throughout the codebase:

✅ Verification successful

Change is safe and consistently applied

The codebase scan shows that all other usages of parse_sentence are either:

  1. The function definition itself in parse_expression.py
  2. Cases where the full Sentence object is needed (with keep_text=True)
  3. Cases in sample/utility code where the full object is acceptable

The modification in _maybe_parse_template to return .exp is the correct approach for this specific case and doesn't require similar changes elsewhere in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining direct usage of parse_sentence without .exp
rg "parse_sentence\([^)]+\)(?!\.exp)" -g "!test_*"

Length of output: 311


Script:

#!/bin/bash
# Search for parse_sentence usage with context
rg "parse_sentence\(" -A 2 -g "!test_*"

Length of output: 1214

hassil/recognize.py (2)

435-435: LGTM: Consistent with the architectural changes

The change from sentence to sentence.exp maintains consistency with the previous change and the new architecture.


267-267: Verify the usage of intent_sentence.exp across the codebase

The change from intent_sentence to intent_sentence.exp aligns with the architectural improvements where Sentence now uses composition instead of inheritance. This change looks correct, but let's verify its usage across the codebase.

✅ Verification successful

Based on the search results, I can generate the final response as we have sufficient information. The rg search shows all usages of match_expression in the codebase, and the ast-grep search (which returned no results) confirms there are no instances where intent_sentence is used directly without .exp.

All usages of intent_sentence.exp are consistent with the architectural changes

The codebase consistently uses the .exp property when passing sentences to match_expression. This includes:

  • hassil/recognize.py: Uses intent_sentence.exp in the changed line
  • hassil/recognize.py: Uses sentence.exp in another location
  • All other match_expression calls work with Expression objects directly or via .exp property
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent usage of intent_sentence.exp across the codebase
# Look for any remaining direct usage of intent_sentence in match_expression calls

# Search for match_expression calls
rg "match_expression\(" -A 3

# Search for any remaining direct usage of intent_sentence without .exp
ast-grep --pattern 'match_expression($$$, $_, intent_sentence$$$)'

Length of output: 3567

hassil/expression.py Show resolved Hide resolved
hassil/sample.py Show resolved Hide resolved
hassil/__init__.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
hassil/expression.py (5)

25-25: Consider weak references for parent backlinks

The parent reference creates a bidirectional relationship between TextChunk and Group. Consider using weakref.ref to prevent potential memory leaks, especially in cases where chunks might be shared between groups.

-    parent: "Optional[Group]" = None
+    parent: "Optional[weakref.ReferenceType[Group]]" = None

46-47: Add type hints for items field

Consider adding explicit type hints for better code clarity and IDE support.

-    items: List[Expression] = field(default_factory=list)
+    items: List[Expression] = field(default_factory=list[Expression])

Line range hint 71-77: Consider making _list_names a private method

Since this is an internal helper method, consider using double underscore prefix to make it explicitly private.

-    def _list_names(
+    def __list_names(

97-97: Document the is_optional field

Add docstring explaining the purpose and usage of the is_optional field in Alternative class.


154-155: Consider using isinstance checks with proper error messages

Replace assertions with explicit type checks and descriptive error messages for better error handling.

-        assert isinstance(self.exp, Group)
-        return self.exp.text_chunk_count()  # pylint: disable=no-member
+        if not isinstance(self.exp, Group):
+            raise TypeError(f"Expected Group but got {type(self.exp)}")
+        return self.exp.text_chunk_count()

Also applies to: 162-163

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07c3ca9 and fbf21bc.

📒 Files selected for processing (1)
  • hassil/expression.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
hassil/expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

🔇 Additional comments (2)
hassil/expression.py (2)

104-111: Well implemented lazy evaluation for permutations

The iterate_permutations method effectively implements lazy evaluation of permutations, which aligns with the PR's performance improvement objectives. The implementation avoids generating all permutations upfront.


189-202: Ensure proper scoping in Alternative and Permutation compilation

The similar pattern matching logic in Alternative and Permutation compilation could lead to scoping issues when these constructs are nested. Consider adding explicit grouping to prevent any ambiguity in the resulting regular expression.

@synesthesiam synesthesiam self-assigned this Dec 23, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
hassil/parse_expression.py (1)

Line range hint 66-131: Simplify parse_group logic by extracting helper methods.

The function has complex nested conditions. Consider breaking it down into smaller, focused methods:

+    def _handle_item_chunk(self, item: Expression, grp: Group) -> None:
+        """Handle adding an item to a group."""
+        if isinstance(grp, (Alternative, Permutation)):
+            last_item = grp.items[-1]
+            if not isinstance(last_item, Sequence):
+                raise ParseExpressionError(grp_chunk, metadata=metadata)
+            last_item.items.append(item)
+        else:
+            grp.items.append(item)
+            if isinstance(item, TextChunk):
+                item.parent = grp

     def parse_group(...):
         # Use the helper method
         if item_chunk.parse_type in (...):
-            if isinstance(grp, (Alternative, Permutation)):
-                ...
-            else:
-                ...
+            _handle_item_chunk(parse_expression(item_chunk), grp)
🧹 Nitpick comments (5)
tests/test_expression.py (1)

61-84: Add test cases for edge cases in optional alternatives and permutations.

While the current tests cover basic functionality, consider adding test cases for:

  • Empty alternatives and permutations
  • Single-item alternatives and permutations
  • Nested optional alternatives within permutations and vice versa
hassil/expression.py (3)

Line range hint 44-87: Add return type hints to methods for better maintainability.

Consider adding explicit return type hints to methods:

-    def text_chunk_count(self) -> int:
+    def text_chunk_count(self) -> int:
         """Return the number of TextChunk expressions in this group (recursive)."""
-    def _list_names(
-        self,
-        item: Expression,
-        expansion_rules: Optional[Dict[str, Sentence]] = None,
-    ) -> Iterator[str]:
+    def _list_names(
+        self,
+        item: Expression,
+        expansion_rules: Optional[Dict[str, Sentence]] = None,
+    ) -> Iterator[str]:
         """Return names of list references (recursive)."""

101-111: Optimize permutation iteration to reduce memory allocation.

The current implementation creates a new list for each permutation. Consider using a more memory-efficient approach:

     def iterate_permutations(self) -> Iterable[Tuple[Expression, Permutation]]:
         """Iterate over all permutations."""
         for i, item in enumerate(self.items):
-            items = self.items[:]
-            del items[i]
+            items = [self.items[j] for j in range(len(self.items)) if j != i]
             rest = Permutation(items=items)
             yield (item, rest)

Line range hint 146-217: Strengthen type assertions in Sentence methods.

Consider adding more specific type assertions to catch potential type errors earlier:

     def text_chunk_count(self) -> int:
         """Return the number of TextChunk expressions in this sentence."""
-        assert isinstance(self.exp, Group)
+        if not isinstance(self.exp, Group):
+            raise TypeError(f"Expected Group, got {type(self.exp)}")
         return self.exp.text_chunk_count()
hassil/parse_expression.py (1)

361-366: Document in-place modification behavior in _add_spaces_between_items.

The function modifies the permutation in place, which should be clearly documented:

 def _add_spaces_between_items(perm: Permutation) -> None:
-    """Add spaces between each 2 items of a permutation"""
+    """Add spaces between each 2 items of a permutation.
+    
+    Note:
+        This function modifies the permutation in place by adding TextChunk
+        objects with spaces at the beginning and end of each sequence.
+    
+    Args:
+        perm: The permutation to modify.
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbf21bc and fcef7cf.

📒 Files selected for processing (3)
  • hassil/expression.py (6 hunks)
  • hassil/parse_expression.py (7 hunks)
  • tests/test_expression.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
tests/test_expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/parse_expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

hassil/expression.py (6)

Pattern */**(html|markdown|md): - For instructional content in documentation, use a direct and authoritative tone. Avoid expressions of politeness such as 'may' or 'please', and ensure the goal of the instruction is fronted.

  • Apply the Microsoft Style Guide to ensure documentation maintains clarity and conciseness.
  • In step-by-step instructions, front the location phrase in the instructional sentence.
  • In step-by-step instructions, front the 'goal' in the instructional sentence.
  • In step-by-step instructions, if in doubt what to front, front the 'goal' before the location phrase in the instructional sentence.
  • do not hyphenate terms like 'top-right' or 'bottom-left' with 'corner'

Pattern */**(html|markdown|md): - Use bold to mark UI strings.

  • If "" are used to mark UI strings, replace them by bold.

Pattern */**(html|markdown|md): - Be brief in your replies and don't add fluff like "thank you for..." and "Please let me know if"


Pattern */**(html|markdown|md): - Use sentence-style capitalization also in headings.


Pattern */**(html|markdown|md): do not comment on HTML used for icons


Pattern */**(html|markdown|md): Avoid flagging inline HTML for embedding videos in future reviews for this repository.

🔇 Additional comments (2)
tests/test_expression.py (2)

4-4: LGTM! The imports and test case align with the new class hierarchy.

The test case properly validates the nesting of Sequence objects, which is a crucial functionality in the new architecture.

Also applies to: 6-6, 22-24


104-107: LGTM! Test case properly validates the new Sentence structure.

The test ensures that sentences correctly handle Sequence objects as their expression containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants