Feat/roman numerals #40

JarbasAl · 2022-11-27T15:25:08Z

Summary by CodeRabbit

New Features
- Added the ability to convert Roman numerals to integers.
- Introduced functionality to extract and normalize Roman numerals in English text.
Enhancements
- Improved text normalization by adding normalize_roman_numerals and get_color functions.
Dependencies
- Minor update to the quebra_frases package version specification (no functional change).

codecov · 2022-11-27T15:26:42Z

Codecov Report

❗ No coverage uploaded for pull request base (dev@08ed3c6). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 8b26f61 differs from pull request most recent head f9f7019. Consider uploading reports for the commit f9f7019 to get more accurate results

@@          Coverage Diff          @@
##             dev     #40   +/-   ##
=====================================
  Coverage       ?   0.00%           
=====================================
  Files          ?      65           
  Lines          ?   16425           
  Branches       ?       0           
=====================================
  Hits           ?       0           
  Misses         ?   16425           
  Partials       ?       0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

test/test_parse_common.py

authored-by: jarbasal <[email protected]>

lingua_franca/parse.py

coderabbitai · 2024-07-17T01:11:46Z

Warning

Rate limit exceeded

@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 26 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between f5f2606 and 7bd502d.

Walkthrough

The updates introduce functionality for converting and normalizing Roman numerals within the lingua_franca library, adding new utility functions, supporting imports, and a normalization feature specifically for English text. Additionally, minor changes were made to the requirements file without affecting functionality.

Changes

Files	Change Summary
`lingua_franca/lang/parse_common.py`	Added functions `roman_to_int`, `is_roman_numeral`, and `extract_roman_numeral_spans`, and the `ROMAN_NUMERALS` dictionary.
`lingua_franca/lang/parse_en.py`	Added `normalize_roman_numerals_en` function and imported `extract_roman_numeral_spans`.
`lingua_franca/parse.py`	Added `normalize_roman_numerals` and `get_color` functions, along with various module imports.
`requirements/requirements.txt`	Minor change to version specification for `quebra_frases`, no impact on functionality.

Poem

In code they roam, the numbers of Rome,
To integers they transform, no longer alone.
With functions new, and imports too,
The lingua_franca library, now shines anew.
So let's give cheer, for changes here,
As code advances, year by year.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

JarbasAl · 2024-07-17T01:13:00Z

@coderabbitai review

coderabbitai · 2024-07-17T01:13:17Z

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6a13ca5 and f5f2606.

Files selected for processing (5)

lingua_franca/lang/parse_common.py (2 hunks)
lingua_franca/lang/parse_en.py (2 hunks)
lingua_franca/parse.py (2 hunks)
requirements/requirements.txt (1 hunks)
test/unittests/test_parse_common.py (1 hunks)

Files skipped from review due to trivial changes (1)

requirements/requirements.txt

Additional context used

Ruff

lingua_franca/parse.py

18-18: lingua_franca.lang.parse_common.is_roman_numeral imported but unused

Remove unused import

(F401)

18-18: lingua_franca.lang.parse_common.roman_to_int imported but unused

Remove unused import

(F401)

20-20: difflib.SequenceMatcher imported but unused

Remove unused import: difflib.SequenceMatcher

(F401)

21-21: warnings.warn imported but unused

Remove unused import: warnings.warn

(F401)

22-22: lingua_franca.time.now_local imported but unused

Remove unused import: lingua_franca.time.now_local

(F401)

24-24: lingua_franca.internal.get_primary_lang_code imported but unused

Remove unused import

(F401)

25-25: lingua_franca.internal.get_default_lang imported but unused

Remove unused import

(F401)

25-25: lingua_franca.internal._raise_unsupported_language imported but unused

Remove unused import

(F401)

Additional comments not posted (10)

test/unittests/test_parse_common.py (4)

23-24: Setup method is correctly implemented for language loading.

The use of load_language("en") in the setup ensures that the tests will run with the correct language configuration.

40-44: Correctly tests Roman numeral span extraction.

The method provides clear and specific test cases for extracting Roman numeral spans from text.

46-48: Properly tests normalization of Roman numerals in text.

This test ensures that Roman numerals are correctly converted to their numeric equivalents in a textual context.

64-70: Effectively tests handling of spaces in Roman numerals.

The method correctly identifies that spaces between Roman numeral characters invalidate detection and tests the normalization of such spaced numerals.

lingua_franca/lang/parse_en.py (1)

29-30: Review of added import statement.

The import extract_roman_numeral_spans is correctly added for use in the new function normalize_roman_numerals_en. This aligns with the PR's goal of adding functionality related to Roman numerals.
lingua_franca/parse.py (1)
59-59: Verify the presence and format of color JSON files.

The function relies on JSON files for color definitions. It's crucial to verify that these files exist and are correctly formatted to avoid runtime errors.

Verification successful

Verify the format of the color JSON files.

The color JSON files are present. Now, we need to ensure they are correctly formatted to avoid runtime errors.

The color JSON files are present and correctly formatted.

The verification confirms that the necessary color JSON files exist and are correctly structured, ensuring no runtime errors related to these files.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for the presence of color JSON files.

# Test: Search for the JSON files. Expect: At least one file should be found.
fd 'colors.json'
Length of output: 464

Script:
#!/bin/bash
# Description: Validate the format of color JSON files.

# Test: Validate each JSON file to ensure it is correctly formatted.
fd 'colors.json' --exec jq empty {}
Length of output: 37
lingua_franca/lang/parse_common.py (4)

23-23: Dictionary for Roman numerals is correctly implemented.

The ROMAN_NUMERALS dictionary correctly maps Roman numeral characters to their integer values.

471-480: Function to convert Roman numerals to integers is correctly implemented.

The roman_to_int function correctly converts Roman numerals to integers using a well-defined algorithm.

483-484: Function to check if a word is a Roman numeral is correctly implemented.

The is_roman_numeral function efficiently checks if all characters in a word are valid Roman numerals.

487-500: Function to extract spans of Roman numerals from an utterance is correctly implemented.

The extract_roman_numeral_spans function correctly identifies and converts spans of Roman numerals in an utterance using efficient tokenization and conversion methods.

coderabbitai · 2024-07-17T01:17:45Z

lingua_franca/lang/parse_en.py

+def normalize_roman_numerals_en(utterance, ordinals=False):
+    # localization might be needed for ordinals flag
+    norm_utt = utterance
+    for num, (start, end) in reversed(extract_roman_numeral_spans(utterance)):
+        if ordinals:
+            if str(num)[-1] == "1":
+                num = f"{num}st"
+            elif str(num)[-1] == "2":
+                num = f"{num}nd"
+            elif str(num)[-1] == "3":
+                num = f"{num}rd"
+            else:
+                num = f"{num}th"
+        norm_utt = norm_utt[:start] + f"{num}" + norm_utt[end:]
+    return norm_utt


Review of the new function normalize_roman_numerals_en.

The function normalize_roman_numerals_en is designed to normalize Roman numerals within English text. The implementation uses the extract_roman_numeral_spans function to locate and replace Roman numeral spans with their numeric equivalents, optionally formatting them as ordinals.

Correctness and Logic:

The function correctly handles the conversion of Roman numerals to numbers and adjusts them based on the ordinals flag.

The use of reversed ensures that replacements do not affect the indices of subsequent replacements, which is crucial for correctness.

Performance:

The function iterates over each numeral span once, which is efficient. However, the repeated slicing and concatenation of strings might be less efficient for very large strings. Consider using a list and joining it at the end for potentially better performance.

Readability and Maintainability:

The function is well-structured and the logic is clear. Inline comments or a docstring explaining the parameters and the return value could enhance readability and maintainability.

Error Handling:

There is no explicit error handling. It would be beneficial to add error handling to manage cases where extract_roman_numeral_spans might return unexpected values or fail.

Best Practices:

Using f-strings for string operations is a good practice and is used here effectively.

Overall, the function meets the objectives of the PR and is implemented correctly, but there are opportunities for optimization and improved error handling.

def normalize_roman_numerals_en(utterance, ordinals=False): # localization might be needed for ordinals flag norm_utt = [] original_length = len(utterance) last_end = 0 for num, (start, end) in reversed(extract_roman_numeral_spans(utterance)): norm_utt.append(utterance[last_end:start]) # append the text before the numeral if ordinals: ordinal_suffix = 'th' if num[-1] not in '123' else {'1': 'st', '2': 'nd', '3': 'rd'}[num[-1]] norm_utt.append(f"{num}{ordinal_suffix}") else: norm_utt.append(f"{num}") last_end = end norm_utt.append(utterance[last_end:original_length]) # append the remaining text after the last numeral return ''.join(norm_utt) # join all parts to form the normalized utterance

coderabbitai · 2024-07-17T01:17:45Z

test/unittests/test_parse_common.py

+    def test_roman(self):
+        # valid numerals
+        self.assertEqual(roman_to_int("III"), 3)
+        self.assertEqual(roman_to_int("IV"), 4)
+        self.assertEqual(roman_to_int("V"), 5)
+        self.assertEqual(roman_to_int("MCMLXXIV"), 1974)
+        self.assertEqual(roman_to_int("MCMLXXV"), 1975)
+        self.assertEqual(is_roman_numeral("IV"), True)
+
+        # invalid numerals
+        self.assertEqual(roman_to_int("v"), None)
+        self.assertEqual(is_roman_numeral("ii"), False)
+        self.assertEqual(is_roman_numeral("the IV century"), False)
+


Comprehensive tests for Roman numeral functionalities.

The method effectively tests both valid and invalid inputs. Consider adding more edge cases, such as numerals at the boundary of valid ranges or mixed-case inputs.

Would you like me to help by adding more edge case tests?

@JarbasAl, I've prepared some additional edge case tests for the Roman numeral functionalities. These tests cover boundary values, mixed-case inputs, and numerals embedded in text to ensure robustness. Here's the updated test code:

class TestParseCommon(unittest.TestCase): def setUp(cls) -> None: load_language("en") def test_roman(self): # valid numerals self.assertEqual(roman_to_int("III"), 3) self.assertEqual(roman_to_int("IV"), 4) self.assertEqual(roman_to_int("V"), 5) self.assertEqual(roman_to_int("MCMLXXIV"), 1974) self.assertEqual(roman_to_int("MCMLXXV"), 1975) self.assertEqual(is_roman_numeral("IV"), True) # invalid numerals self.assertEqual(roman_to_int("v"), None) self.assertEqual(is_roman_numeral("ii"), False) self.assertEqual(is_roman_numeral("the IV century"), False) # edge cases self.assertEqual(roman_to_int("MCMXCIX"), 1999) # Upper boundary of common usage self.assertEqual(roman_to_int("I"), 1) # Lower boundary self.assertEqual(roman_to_int("iV"), 4) # Mixed case self.assertEqual(roman_to_int("xIv"), 14) # Mixed case self.assertEqual(is_roman_numeral("MDCCLXXVI"), True) # Valid numeral in a string self.assertEqual(is_roman_numeral("123IVabc"), False) # Invalid numeral in a string

Feel free to integrate these into your test suite or let me know if there are any other specific scenarios you'd like to cover!

test/unittests/test_parse_common.py

lingua_franca/parse.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

JarbasAl added the enhancement New feature or request label Nov 27, 2022

JarbasAl requested a review from NeonDaniel November 27, 2022 15:25

NeonDaniel requested changes Nov 28, 2022

View reviewed changes

test/test_parse_common.py Outdated Show resolved Hide resolved

JarbasAl force-pushed the feat/roman_numerals branch from 846039b to 8b26f61 Compare December 1, 2022 19:25

JarbasAl requested a review from NeonDaniel December 1, 2022 19:26

JarbasAl added 2 commits December 1, 2022 19:31

roman_numerals

d5e5cb6

authored-by: jarbasal <[email protected]>

move utils to parse_common

f9f7019

JarbasAl force-pushed the feat/roman_numerals branch from 8b26f61 to f9f7019 Compare December 1, 2022 19:32

NeonDaniel requested changes Dec 1, 2022

View reviewed changes

lingua_franca/parse.py Outdated Show resolved Hide resolved

JarbasAl added 3 commits July 17, 2024 02:05

Merge branch 'dev' into feat/roman_numerals

5a98c82

Update parse_en.py

09b5ea2

Update parse_en.py

ba78601

OpenVoiceOS deleted a comment from coderabbitai bot Jul 17, 2024

Update parse.py

f5f2606

coderabbitai bot reviewed Jul 17, 2024

View reviewed changes

Update test/unittests/test_parse_common.py

7bd502d

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

JarbasAl marked this pull request as draft July 17, 2024 01:29

JarbasAl closed this Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/roman numerals #40

Feat/roman numerals #40

JarbasAl commented Nov 27, 2022 •

edited by coderabbitai bot

Loading

codecov bot commented Nov 27, 2022 •

edited

Loading

coderabbitai bot commented Jul 17, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

JarbasAl commented Jul 17, 2024

coderabbitai bot commented Jul 17, 2024

coderabbitai bot left a comment

coderabbitai bot Jul 17, 2024

coderabbitai bot Jul 17, 2024 •

edited

Loading

JarbasAl Jul 17, 2024

coderabbitai bot Jul 17, 2024

Feat/roman numerals #40

Feat/roman numerals #40

Conversation

JarbasAl commented Nov 27, 2022 • edited by coderabbitai bot Loading

Summary by CodeRabbit

codecov bot commented Nov 27, 2022 • edited Loading

Codecov Report

coderabbitai bot commented Jul 17, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

JarbasAl commented Jul 17, 2024

coderabbitai bot commented Jul 17, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 17, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

JarbasAl Jul 17, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 17, 2024

Choose a reason for hiding this comment

JarbasAl commented Nov 27, 2022 •

edited by coderabbitai bot

Loading

codecov bot commented Nov 27, 2022 •

edited

Loading

coderabbitai bot commented Jul 17, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Jul 17, 2024 •

edited

Loading