Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/roman numerals #40

Closed
wants to merge 7 commits into from
Closed

Feat/roman numerals #40

wants to merge 7 commits into from

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Nov 27, 2022

port HelloChatterbox#28

Summary by CodeRabbit

  • New Features

    • Added the ability to convert Roman numerals to integers.
    • Introduced functionality to extract and normalize Roman numerals in English text.
  • Enhancements

    • Improved text normalization by adding normalize_roman_numerals and get_color functions.
  • Dependencies

    • Minor update to the quebra_frases package version specification (no functional change).

@JarbasAl JarbasAl added the enhancement New feature or request label Nov 27, 2022
@JarbasAl JarbasAl requested a review from NeonDaniel November 27, 2022 15:25
@codecov
Copy link

codecov bot commented Nov 27, 2022

Codecov Report

❗ No coverage uploaded for pull request base (dev@08ed3c6). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 8b26f61 differs from pull request most recent head f9f7019. Consider uploading reports for the commit f9f7019 to get more accurate results

@@          Coverage Diff          @@
##             dev     #40   +/-   ##
=====================================
  Coverage       ?   0.00%           
=====================================
  Files          ?      65           
  Lines          ?   16425           
  Branches       ?       0           
=====================================
  Hits           ?       0           
  Misses         ?   16425           
  Partials       ?       0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

test/test_parse_common.py Outdated Show resolved Hide resolved
@JarbasAl JarbasAl force-pushed the feat/roman_numerals branch from 846039b to 8b26f61 Compare December 1, 2022 19:25
@JarbasAl JarbasAl requested a review from NeonDaniel December 1, 2022 19:26
@JarbasAl JarbasAl force-pushed the feat/roman_numerals branch from 8b26f61 to f9f7019 Compare December 1, 2022 19:32
lingua_franca/parse.py Outdated Show resolved Hide resolved
@OpenVoiceOS OpenVoiceOS deleted a comment from coderabbitai bot Jul 17, 2024
Copy link

coderabbitai bot commented Jul 17, 2024

Warning

Rate limit exceeded

@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 26 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between f5f2606 and 7bd502d.

Walkthrough

The updates introduce functionality for converting and normalizing Roman numerals within the lingua_franca library, adding new utility functions, supporting imports, and a normalization feature specifically for English text. Additionally, minor changes were made to the requirements file without affecting functionality.

Changes

Files Change Summary
lingua_franca/lang/parse_common.py Added functions roman_to_int, is_roman_numeral, and extract_roman_numeral_spans, and the ROMAN_NUMERALS dictionary.
lingua_franca/lang/parse_en.py Added normalize_roman_numerals_en function and imported extract_roman_numeral_spans.
lingua_franca/parse.py Added normalize_roman_numerals and get_color functions, along with various module imports.
requirements/requirements.txt Minor change to version specification for quebra_frases, no impact on functionality.

Poem

In code they roam, the numbers of Rome,
To integers they transform, no longer alone.
With functions new, and imports too,
The lingua_franca library, now shines anew.
So let's give cheer, for changes here,
As code advances, year by year.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@JarbasAl
Copy link
Member Author

@coderabbitai review

Copy link

coderabbitai bot commented Jul 17, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6a13ca5 and f5f2606.

Files selected for processing (5)
  • lingua_franca/lang/parse_common.py (2 hunks)
  • lingua_franca/lang/parse_en.py (2 hunks)
  • lingua_franca/parse.py (2 hunks)
  • requirements/requirements.txt (1 hunks)
  • test/unittests/test_parse_common.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • requirements/requirements.txt
Additional context used
Ruff
lingua_franca/parse.py

18-18: lingua_franca.lang.parse_common.is_roman_numeral imported but unused

Remove unused import

(F401)


18-18: lingua_franca.lang.parse_common.roman_to_int imported but unused

Remove unused import

(F401)


20-20: difflib.SequenceMatcher imported but unused

Remove unused import: difflib.SequenceMatcher

(F401)


21-21: warnings.warn imported but unused

Remove unused import: warnings.warn

(F401)


22-22: lingua_franca.time.now_local imported but unused

Remove unused import: lingua_franca.time.now_local

(F401)


24-24: lingua_franca.internal.get_primary_lang_code imported but unused

Remove unused import

(F401)


25-25: lingua_franca.internal.get_default_lang imported but unused

Remove unused import

(F401)


25-25: lingua_franca.internal._raise_unsupported_language imported but unused

Remove unused import

(F401)

Additional comments not posted (10)
test/unittests/test_parse_common.py (4)

23-24: Setup method is correctly implemented for language loading.

The use of load_language("en") in the setup ensures that the tests will run with the correct language configuration.


40-44: Correctly tests Roman numeral span extraction.

The method provides clear and specific test cases for extracting Roman numeral spans from text.


46-48: Properly tests normalization of Roman numerals in text.

This test ensures that Roman numerals are correctly converted to their numeric equivalents in a textual context.


64-70: Effectively tests handling of spaces in Roman numerals.

The method correctly identifies that spaces between Roman numeral characters invalidate detection and tests the normalization of such spaced numerals.

lingua_franca/lang/parse_en.py (1)

29-30: Review of added import statement.

The import extract_roman_numeral_spans is correctly added for use in the new function normalize_roman_numerals_en. This aligns with the PR's goal of adding functionality related to Roman numerals.

lingua_franca/parse.py (1)

59-59: Verify the presence and format of color JSON files.

The function relies on JSON files for color definitions. It's crucial to verify that these files exist and are correctly formatted to avoid runtime errors.

Verification successful

Verify the format of the color JSON files.

The color JSON files are present. Now, we need to ensure they are correctly formatted to avoid runtime errors.


The color JSON files are present and correctly formatted.

The verification confirms that the necessary color JSON files exist and are correctly structured, ensuring no runtime errors related to these files.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for the presence of color JSON files.

# Test: Search for the JSON files. Expect: At least one file should be found.
fd 'colors.json'

Length of output: 464


Script:

#!/bin/bash
# Description: Validate the format of color JSON files.

# Test: Validate each JSON file to ensure it is correctly formatted.
fd 'colors.json' --exec jq empty {}

Length of output: 37

lingua_franca/lang/parse_common.py (4)

23-23: Dictionary for Roman numerals is correctly implemented.

The ROMAN_NUMERALS dictionary correctly maps Roman numeral characters to their integer values.


471-480: Function to convert Roman numerals to integers is correctly implemented.

The roman_to_int function correctly converts Roman numerals to integers using a well-defined algorithm.


483-484: Function to check if a word is a Roman numeral is correctly implemented.

The is_roman_numeral function efficiently checks if all characters in a word are valid Roman numerals.


487-500: Function to extract spans of Roman numerals from an utterance is correctly implemented.

The extract_roman_numeral_spans function correctly identifies and converts spans of Roman numerals in an utterance using efficient tokenization and conversion methods.

Comment on lines +1779 to +1793
def normalize_roman_numerals_en(utterance, ordinals=False):
# localization might be needed for ordinals flag
norm_utt = utterance
for num, (start, end) in reversed(extract_roman_numeral_spans(utterance)):
if ordinals:
if str(num)[-1] == "1":
num = f"{num}st"
elif str(num)[-1] == "2":
num = f"{num}nd"
elif str(num)[-1] == "3":
num = f"{num}rd"
else:
num = f"{num}th"
norm_utt = norm_utt[:start] + f"{num}" + norm_utt[end:]
return norm_utt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the new function normalize_roman_numerals_en.

The function normalize_roman_numerals_en is designed to normalize Roman numerals within English text. The implementation uses the extract_roman_numeral_spans function to locate and replace Roman numeral spans with their numeric equivalents, optionally formatting them as ordinals.

  1. Correctness and Logic:

    • The function correctly handles the conversion of Roman numerals to numbers and adjusts them based on the ordinals flag.
    • The use of reversed ensures that replacements do not affect the indices of subsequent replacements, which is crucial for correctness.
  2. Performance:

    • The function iterates over each numeral span once, which is efficient. However, the repeated slicing and concatenation of strings might be less efficient for very large strings. Consider using a list and joining it at the end for potentially better performance.
  3. Readability and Maintainability:

    • The function is well-structured and the logic is clear. Inline comments or a docstring explaining the parameters and the return value could enhance readability and maintainability.
  4. Error Handling:

    • There is no explicit error handling. It would be beneficial to add error handling to manage cases where extract_roman_numeral_spans might return unexpected values or fail.
  5. Best Practices:

    • Using f-strings for string operations is a good practice and is used here effectively.

Overall, the function meets the objectives of the PR and is implemented correctly, but there are opportunities for optimization and improved error handling.

def normalize_roman_numerals_en(utterance, ordinals=False):
    # localization might be needed for ordinals flag
    norm_utt = []
    original_length = len(utterance)
    last_end = 0
    for num, (start, end) in reversed(extract_roman_numeral_spans(utterance)):
        norm_utt.append(utterance[last_end:start])  # append the text before the numeral
        if ordinals:
            ordinal_suffix = 'th' if num[-1] not in '123' else {'1': 'st', '2': 'nd', '3': 'rd'}[num[-1]]
            norm_utt.append(f"{num}{ordinal_suffix}")
        else:
            norm_utt.append(f"{num}")
        last_end = end
    norm_utt.append(utterance[last_end:original_length])  # append the remaining text after the last numeral
    return ''.join(norm_utt)  # join all parts to form the normalized utterance

Comment on lines +26 to +39
def test_roman(self):
# valid numerals
self.assertEqual(roman_to_int("III"), 3)
self.assertEqual(roman_to_int("IV"), 4)
self.assertEqual(roman_to_int("V"), 5)
self.assertEqual(roman_to_int("MCMLXXIV"), 1974)
self.assertEqual(roman_to_int("MCMLXXV"), 1975)
self.assertEqual(is_roman_numeral("IV"), True)

# invalid numerals
self.assertEqual(roman_to_int("v"), None)
self.assertEqual(is_roman_numeral("ii"), False)
self.assertEqual(is_roman_numeral("the IV century"), False)

Copy link

@coderabbitai coderabbitai bot Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive tests for Roman numeral functionalities.

The method effectively tests both valid and invalid inputs. Consider adding more edge cases, such as numerals at the boundary of valid ranges or mixed-case inputs.

Would you like me to help by adding more edge case tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JarbasAl, I've prepared some additional edge case tests for the Roman numeral functionalities. These tests cover boundary values, mixed-case inputs, and numerals embedded in text to ensure robustness. Here's the updated test code:

class TestParseCommon(unittest.TestCase):
    def setUp(cls) -> None:
        load_language("en")

    def test_roman(self):
        # valid numerals
        self.assertEqual(roman_to_int("III"), 3)
        self.assertEqual(roman_to_int("IV"), 4)
        self.assertEqual(roman_to_int("V"), 5)
        self.assertEqual(roman_to_int("MCMLXXIV"), 1974)
        self.assertEqual(roman_to_int("MCMLXXV"), 1975)
        self.assertEqual(is_roman_numeral("IV"), True)

        # invalid numerals
        self.assertEqual(roman_to_int("v"), None)
        self.assertEqual(is_roman_numeral("ii"), False)
        self.assertEqual(is_roman_numeral("the IV century"), False)

        # edge cases
        self.assertEqual(roman_to_int("MCMXCIX"), 1999)  # Upper boundary of common usage
        self.assertEqual(roman_to_int("I"), 1)  # Lower boundary
        self.assertEqual(roman_to_int("iV"), 4)  # Mixed case
        self.assertEqual(roman_to_int("xIv"), 14)  # Mixed case
        self.assertEqual(is_roman_numeral("MDCCLXXVI"), True)  # Valid numeral in a string
        self.assertEqual(is_roman_numeral("123IVabc"), False)  # Invalid numeral in a string

Feel free to integrate these into your test suite or let me know if there are any other specific scenarios you'd like to cover!

test/unittests/test_parse_common.py Outdated Show resolved Hide resolved
lingua_franca/parse.py Show resolved Hide resolved
lingua_franca/parse.py Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@JarbasAl JarbasAl marked this pull request as draft July 17, 2024 01:29
@JarbasAl JarbasAl closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants