-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/roman numerals #40
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #40 +/- ##
=====================================
Coverage ? 0.00%
=====================================
Files ? 65
Lines ? 16425
Branches ? 0
=====================================
Hits ? 0
Misses ? 16425
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
846039b
to
8b26f61
Compare
authored-by: jarbasal <[email protected]>
8b26f61
to
f9f7019
Compare
Warning Rate limit exceeded@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 26 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe updates introduce functionality for converting and normalizing Roman numerals within the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- lingua_franca/lang/parse_common.py (2 hunks)
- lingua_franca/lang/parse_en.py (2 hunks)
- lingua_franca/parse.py (2 hunks)
- requirements/requirements.txt (1 hunks)
- test/unittests/test_parse_common.py (1 hunks)
Files skipped from review due to trivial changes (1)
- requirements/requirements.txt
Additional context used
Ruff
lingua_franca/parse.py
18-18:
lingua_franca.lang.parse_common.is_roman_numeral
imported but unusedRemove unused import
(F401)
18-18:
lingua_franca.lang.parse_common.roman_to_int
imported but unusedRemove unused import
(F401)
20-20:
difflib.SequenceMatcher
imported but unusedRemove unused import:
difflib.SequenceMatcher
(F401)
21-21:
warnings.warn
imported but unusedRemove unused import:
warnings.warn
(F401)
22-22:
lingua_franca.time.now_local
imported but unusedRemove unused import:
lingua_franca.time.now_local
(F401)
24-24:
lingua_franca.internal.get_primary_lang_code
imported but unusedRemove unused import
(F401)
25-25:
lingua_franca.internal.get_default_lang
imported but unusedRemove unused import
(F401)
25-25:
lingua_franca.internal._raise_unsupported_language
imported but unusedRemove unused import
(F401)
Additional comments not posted (10)
test/unittests/test_parse_common.py (4)
23-24
: Setup method is correctly implemented for language loading.The use of
load_language("en")
in the setup ensures that the tests will run with the correct language configuration.
40-44
: Correctly tests Roman numeral span extraction.The method provides clear and specific test cases for extracting Roman numeral spans from text.
46-48
: Properly tests normalization of Roman numerals in text.This test ensures that Roman numerals are correctly converted to their numeric equivalents in a textual context.
64-70
: Effectively tests handling of spaces in Roman numerals.The method correctly identifies that spaces between Roman numeral characters invalidate detection and tests the normalization of such spaced numerals.
lingua_franca/lang/parse_en.py (1)
29-30
: Review of added import statement.The import
extract_roman_numeral_spans
is correctly added for use in the new functionnormalize_roman_numerals_en
. This aligns with the PR's goal of adding functionality related to Roman numerals.lingua_franca/parse.py (1)
59-59
: Verify the presence and format of color JSON files.The function relies on JSON files for color definitions. It's crucial to verify that these files exist and are correctly formatted to avoid runtime errors.
Verification successful
Verify the format of the color JSON files.
The color JSON files are present. Now, we need to ensure they are correctly formatted to avoid runtime errors.
The color JSON files are present and correctly formatted.
The verification confirms that the necessary color JSON files exist and are correctly structured, ensuring no runtime errors related to these files.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for the presence of color JSON files. # Test: Search for the JSON files. Expect: At least one file should be found. fd 'colors.json'Length of output: 464
Script:
#!/bin/bash # Description: Validate the format of color JSON files. # Test: Validate each JSON file to ensure it is correctly formatted. fd 'colors.json' --exec jq empty {}Length of output: 37
lingua_franca/lang/parse_common.py (4)
23-23
: Dictionary for Roman numerals is correctly implemented.The
ROMAN_NUMERALS
dictionary correctly maps Roman numeral characters to their integer values.
471-480
: Function to convert Roman numerals to integers is correctly implemented.The
roman_to_int
function correctly converts Roman numerals to integers using a well-defined algorithm.
483-484
: Function to check if a word is a Roman numeral is correctly implemented.The
is_roman_numeral
function efficiently checks if all characters in a word are valid Roman numerals.
487-500
: Function to extract spans of Roman numerals from an utterance is correctly implemented.The
extract_roman_numeral_spans
function correctly identifies and converts spans of Roman numerals in an utterance using efficient tokenization and conversion methods.
def normalize_roman_numerals_en(utterance, ordinals=False): | ||
# localization might be needed for ordinals flag | ||
norm_utt = utterance | ||
for num, (start, end) in reversed(extract_roman_numeral_spans(utterance)): | ||
if ordinals: | ||
if str(num)[-1] == "1": | ||
num = f"{num}st" | ||
elif str(num)[-1] == "2": | ||
num = f"{num}nd" | ||
elif str(num)[-1] == "3": | ||
num = f"{num}rd" | ||
else: | ||
num = f"{num}th" | ||
norm_utt = norm_utt[:start] + f"{num}" + norm_utt[end:] | ||
return norm_utt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of the new function normalize_roman_numerals_en
.
The function normalize_roman_numerals_en
is designed to normalize Roman numerals within English text. The implementation uses the extract_roman_numeral_spans
function to locate and replace Roman numeral spans with their numeric equivalents, optionally formatting them as ordinals.
-
Correctness and Logic:
- The function correctly handles the conversion of Roman numerals to numbers and adjusts them based on the
ordinals
flag. - The use of
reversed
ensures that replacements do not affect the indices of subsequent replacements, which is crucial for correctness.
- The function correctly handles the conversion of Roman numerals to numbers and adjusts them based on the
-
Performance:
- The function iterates over each numeral span once, which is efficient. However, the repeated slicing and concatenation of strings might be less efficient for very large strings. Consider using a list and joining it at the end for potentially better performance.
-
Readability and Maintainability:
- The function is well-structured and the logic is clear. Inline comments or a docstring explaining the parameters and the return value could enhance readability and maintainability.
-
Error Handling:
- There is no explicit error handling. It would be beneficial to add error handling to manage cases where
extract_roman_numeral_spans
might return unexpected values or fail.
- There is no explicit error handling. It would be beneficial to add error handling to manage cases where
-
Best Practices:
- Using f-strings for string operations is a good practice and is used here effectively.
Overall, the function meets the objectives of the PR and is implemented correctly, but there are opportunities for optimization and improved error handling.
def normalize_roman_numerals_en(utterance, ordinals=False):
# localization might be needed for ordinals flag
norm_utt = []
original_length = len(utterance)
last_end = 0
for num, (start, end) in reversed(extract_roman_numeral_spans(utterance)):
norm_utt.append(utterance[last_end:start]) # append the text before the numeral
if ordinals:
ordinal_suffix = 'th' if num[-1] not in '123' else {'1': 'st', '2': 'nd', '3': 'rd'}[num[-1]]
norm_utt.append(f"{num}{ordinal_suffix}")
else:
norm_utt.append(f"{num}")
last_end = end
norm_utt.append(utterance[last_end:original_length]) # append the remaining text after the last numeral
return ''.join(norm_utt) # join all parts to form the normalized utterance
def test_roman(self): | ||
# valid numerals | ||
self.assertEqual(roman_to_int("III"), 3) | ||
self.assertEqual(roman_to_int("IV"), 4) | ||
self.assertEqual(roman_to_int("V"), 5) | ||
self.assertEqual(roman_to_int("MCMLXXIV"), 1974) | ||
self.assertEqual(roman_to_int("MCMLXXV"), 1975) | ||
self.assertEqual(is_roman_numeral("IV"), True) | ||
|
||
# invalid numerals | ||
self.assertEqual(roman_to_int("v"), None) | ||
self.assertEqual(is_roman_numeral("ii"), False) | ||
self.assertEqual(is_roman_numeral("the IV century"), False) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comprehensive tests for Roman numeral functionalities.
The method effectively tests both valid and invalid inputs. Consider adding more edge cases, such as numerals at the boundary of valid ranges or mixed-case inputs.
Would you like me to help by adding more edge case tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JarbasAl, I've prepared some additional edge case tests for the Roman numeral functionalities. These tests cover boundary values, mixed-case inputs, and numerals embedded in text to ensure robustness. Here's the updated test code:
class TestParseCommon(unittest.TestCase):
def setUp(cls) -> None:
load_language("en")
def test_roman(self):
# valid numerals
self.assertEqual(roman_to_int("III"), 3)
self.assertEqual(roman_to_int("IV"), 4)
self.assertEqual(roman_to_int("V"), 5)
self.assertEqual(roman_to_int("MCMLXXIV"), 1974)
self.assertEqual(roman_to_int("MCMLXXV"), 1975)
self.assertEqual(is_roman_numeral("IV"), True)
# invalid numerals
self.assertEqual(roman_to_int("v"), None)
self.assertEqual(is_roman_numeral("ii"), False)
self.assertEqual(is_roman_numeral("the IV century"), False)
# edge cases
self.assertEqual(roman_to_int("MCMXCIX"), 1999) # Upper boundary of common usage
self.assertEqual(roman_to_int("I"), 1) # Lower boundary
self.assertEqual(roman_to_int("iV"), 4) # Mixed case
self.assertEqual(roman_to_int("xIv"), 14) # Mixed case
self.assertEqual(is_roman_numeral("MDCCLXXVI"), True) # Valid numeral in a string
self.assertEqual(is_roman_numeral("123IVabc"), False) # Invalid numeral in a string
Feel free to integrate these into your test suite or let me know if there are any other specific scenarios you'd like to cover!
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
port HelloChatterbox#28
Summary by CodeRabbit
New Features
Enhancements
normalize_roman_numerals
andget_color
functions.Dependencies
quebra_frases
package version specification (no functional change).