-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/roman numerals #40
Changes from all commits
d5e5cb6
f9f7019
5a98c82
09b5ea2
ba78601
f5f2606
7bd502d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,4 +2,4 @@ python-dateutil~=2.6 | |
rapidfuzz | ||
colour~=0.1 | ||
webcolors | ||
quebra_frases>=0.3.7 | ||
quebra_frases>=0.3.7 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# | ||
# Copyright 2019 Mycroft AI Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import unittest | ||
from lingua_franca import load_language | ||
from lingua_franca.parse import roman_to_int, is_roman_numeral,\ | ||
extract_roman_numeral_spans, normalize_roman_numerals | ||
|
||
|
||
class TestParseCommon(unittest.TestCase): | ||
def setUp(cls) -> None: | ||
load_language("en") | ||
|
||
def test_roman(self): | ||
# valid numerals | ||
self.assertEqual(roman_to_int("III"), 3) | ||
self.assertEqual(roman_to_int("IV"), 4) | ||
self.assertEqual(roman_to_int("V"), 5) | ||
self.assertEqual(roman_to_int("MCMLXXIV"), 1974) | ||
self.assertEqual(roman_to_int("MCMLXXV"), 1975) | ||
self.assertEqual(is_roman_numeral("IV"), True) | ||
|
||
# invalid numerals | ||
self.assertEqual(roman_to_int("v"), None) | ||
self.assertEqual(is_roman_numeral("ii"), False) | ||
self.assertEqual(is_roman_numeral("the IV century"), False) | ||
|
||
Comment on lines
+26
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comprehensive tests for Roman numeral functionalities. The method effectively tests both valid and invalid inputs. Consider adding more edge cases, such as numerals at the boundary of valid ranges or mixed-case inputs. Would you like me to help by adding more edge case tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JarbasAl, I've prepared some additional edge case tests for the Roman numeral functionalities. These tests cover boundary values, mixed-case inputs, and numerals embedded in text to ensure robustness. Here's the updated test code: class TestParseCommon(unittest.TestCase):
def setUp(cls) -> None:
load_language("en")
def test_roman(self):
# valid numerals
self.assertEqual(roman_to_int("III"), 3)
self.assertEqual(roman_to_int("IV"), 4)
self.assertEqual(roman_to_int("V"), 5)
self.assertEqual(roman_to_int("MCMLXXIV"), 1974)
self.assertEqual(roman_to_int("MCMLXXV"), 1975)
self.assertEqual(is_roman_numeral("IV"), True)
# invalid numerals
self.assertEqual(roman_to_int("v"), None)
self.assertEqual(is_roman_numeral("ii"), False)
self.assertEqual(is_roman_numeral("the IV century"), False)
# edge cases
self.assertEqual(roman_to_int("MCMXCIX"), 1999) # Upper boundary of common usage
self.assertEqual(roman_to_int("I"), 1) # Lower boundary
self.assertEqual(roman_to_int("iV"), 4) # Mixed case
self.assertEqual(roman_to_int("xIv"), 14) # Mixed case
self.assertEqual(is_roman_numeral("MDCCLXXVI"), True) # Valid numeral in a string
self.assertEqual(is_roman_numeral("123IVabc"), False) # Invalid numeral in a string Feel free to integrate these into your test suite or let me know if there are any other specific scenarios you'd like to cover! |
||
# test spans | ||
self.assertEqual(extract_roman_numeral_spans("the IV century"), | ||
[(4, (4, 6))]) | ||
self.assertEqual(extract_roman_numeral_spans("the XIV century"), | ||
[(14, (4, 7))]) | ||
|
||
# test normalization | ||
self.assertEqual(normalize_roman_numerals("the XV century"), | ||
"the 15 century") | ||
|
||
# test ordinals | ||
self.assertEqual(normalize_roman_numerals("the XXI century", | ||
ordinals=True), | ||
"the 21st century") | ||
self.assertEqual(normalize_roman_numerals("the XII century", | ||
ordinals=True), | ||
"the 12th century") | ||
self.assertEqual(normalize_roman_numerals("the XXIII century", | ||
ordinals=True), | ||
"the 23rd century") | ||
self.assertEqual(normalize_roman_numerals("the XXIV century", | ||
ordinals=True), | ||
"the 24th century") | ||
|
||
# test space | ||
self.assertEqual(is_roman_numeral("I V"), False) | ||
self.assertEqual(normalize_roman_numerals("the X V century"), | ||
"the 10 5 century") | ||
self.assertEqual(extract_roman_numeral_spans("the X V century"), | ||
[(10, (4, 5)), | ||
(5, (6, 7))]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of the new function
normalize_roman_numerals_en
.The function
normalize_roman_numerals_en
is designed to normalize Roman numerals within English text. The implementation uses theextract_roman_numeral_spans
function to locate and replace Roman numeral spans with their numeric equivalents, optionally formatting them as ordinals.Correctness and Logic:
ordinals
flag.reversed
ensures that replacements do not affect the indices of subsequent replacements, which is crucial for correctness.Performance:
Readability and Maintainability:
Error Handling:
extract_roman_numeral_spans
might return unexpected values or fail.Best Practices:
Overall, the function meets the objectives of the PR and is implemented correctly, but there are opportunities for optimization and improved error handling.