-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/decimal markers #43
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #43 +/- ##
=====================================
Coverage ? 0.00%
=====================================
Files ? 65
Lines ? 16459
Branches ? 0
=====================================
Hits ? 0
Misses ? 16459
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
ffa185e
to
6626587
Compare
rebase of MycroftAI#69
6626587
to
811550c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple test cases to add
@@ -290,6 +290,17 @@ def test_combinations(self): | |||
|
|||
|
|||
class TestExtractNumber(unittest.TestCase): | |||
def test_extract_number_decimal_markers(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should include a test case for 1,000,000
and 1,000.0
(or similar). Maybe also "1000, is what I said"
and/or "I said 1,000."
to include boundary cases
@coderabbitai review |
Actions performedReview triggered.
|
Warning Rate limit exceeded@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 3 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe update introduces a Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range, codebase verification and nitpick comments (11)
lingua_franca/parse.py (1)
Line range hint
59-78
: Parameter Addition inextract_numbers
FunctionThe addition of the
decimal
parameter to theextract_numbers
function enhances flexibility by allowing different decimal characters. However, the documentation states that it will always extract numbers formatted with a decimal dot/full stop. This could be confusing for developers expecting the function to handle different decimal formats based on thedecimal
parameter.Consider clarifying the documentation or adjusting the implementation to respect the
decimal
parameter when extracting numbers.lingua_franca/lang/parse_eu.py (1)
Line range hint
287-307
: Enhancedextract_numbers_eu
function to support different decimal separators.The addition of the
decimal
parameter toextract_numbers_eu
allows for flexibility in handling different decimal characters, which is essential for internationalization. The implementation usesnormalize_decimals
to adjust the input text accordingly, which aligns well with the intended functionality.However, there's a potential issue here:
- The comment on line 303 mentions that numbers will always be extracted with a decimal dot, even if another decimal character is specified. This could lead to confusion or incorrect behavior if not clearly documented or handled in the function.
Consider clarifying this behavior in the documentation or adjusting the implementation to respect the
decimal
parameter fully.- will always extract numbers formatted with a decimal dot/full stop, + will attempt to normalize numbers to use the specified decimal character,Tools
Ruff
308-308:
extract_numbers_generic
may be undefined, or defined from star imports(F405)
lingua_franca/lang/parse_it.py (2)
Line range hint
228-249
: Review ofextract_number_it
function with newdecimal
parameter
Correctness and Consistency: The function now accepts a
decimal
parameter that allows for different decimal characters. This is consistent with the PR's objective to handle different decimal markers.Documentation: The function's documentation has been updated to reflect the new parameter. However, the note in lines 244-245 might be misleading as it implies that the function will always use a dot as a decimal separator regardless of the
decimal
parameter. This could confuse users.Implementation: The implementation uses the
normalize_decimals
function to replace the specified decimal character with a dot before processing. This approach ensures that the existing logic for number extraction does not need to be altered significantly.Suggestions:
- Clarify the documentation to ensure it accurately describes the behavior when a different decimal character is specified.
- Add a unit test to verify that the function behaves as expected when a non-dot decimal character is used.
- will always extract numbers formatted with a decimal dot/full stop, - such as '3.5', even if 'decimal' is specified. + will extract numbers formatted with the specified decimal character, + converting them internally to use a decimal dot for processing.Would you like me to help by adding a unit test for this scenario?
Line range hint
1158-1178
: Review ofextract_numbers_it
function with newdecimal
parameter
Correctness and Consistency: The function now accepts a
decimal
parameter, similar to theextract_number_it
function. This maintains consistency across the API.Documentation: Similar to
extract_number_it
, the documentation here might mislead users about the handling of the decimal parameter.Implementation: The usage of
normalize_decimals
is appropriate and ensures that the function can handle various decimal characters seamlessly.Suggestions:
- Update the documentation to accurately reflect the handling of the decimal parameter.
- Consider adding unit tests to cover scenarios with different decimal characters.
- will always extract numbers formatted with a decimal dot/full stop, - such as '3.5', even if 'decimal' is specified. + will extract numbers formatted with the specified decimal character, + converting them internally to use a decimal dot for processing.Would you like assistance in creating unit tests for these scenarios?
lingua_franca/lang/parse_nl.py (1)
Ensure all calls to
extract_number_nl
include thedecimal
parameterThe function
extract_number_nl
has been updated to handle a custom decimal marker, but the existing calls to this function do not pass thedecimal
parameter. This could lead to inconsistencies or bugs if thedecimal
parameter is required in those cases.
- File:
lingua_franca/lang/parse_nl.py
- Line: 426-442 (function definition)
- Line: 439 (function call without
decimal
parameter)Please verify and update the function calls to include the
decimal
parameter where necessary.Analysis chain
Line range hint
418-439
: Updatedextract_number_nl
function to handle custom decimal markers.The addition of the
decimal
parameter allows for flexible handling of decimal markers, which is a key objective of this PR. The use ofnormalize_decimals
to adjust the text based on thedecimal
parameter is correctly implemented.However, ensure that all calls to
extract_number_nl
in other parts of the codebase pass the correctdecimal
parameter if needed.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `extract_number_nl` pass the correct `decimal` parameter. # Test: Search for the function usage. Expect: Only occurances of the new signature with the `decimal` parameter. rg --type python -A 5 $'extract_number_nl'Length of output: 72
Script:
#!/bin/bash # Description: Verify all function calls to `extract_number_nl` pass the correct `decimal` parameter. # Test: Search for the function usage. Expect: Only occurrences of the new signature with the `decimal` parameter. rg --type py -A 5 $'extract_number_nl'Length of output: 902
lingua_franca/lang/parse_cs.py (1)
26-26
: Ensure consistent import style.The import statement for
normalize_decimals
is added separately from the other imports from the same module (parse_common
). It's generally a good practice to consolidate imports from the same module into a single line for clarity and maintainability.- from lingua_franca.lang.parse_common import normalize_decimals + from lingua_franca.lang.parse_common import is_numeric, look_for_fractions, invert_dict, ReplaceableNumber, partition_list, tokenize, Token, Normalizer, normalize_decimalslingua_franca/lang/parse_en.py (2)
Line range hint
533-553
: Review ofextract_number_en
function with addeddecimal
parameterThe function
extract_number_en
has been updated to include adecimal
parameter, allowing for the specification of different decimal markers, which is a significant enhancement for multi-language support. The default value fordecimal
is set to '.', which maintains backward compatibility.Code Correctness and Logic
- The logic to replace the decimal marker using
normalize_decimals
function before extracting numbers is correctly implemented. This ensures that numbers are consistently formatted before processing, which is crucial for accurate extraction.Documentation and Comments
- The function's documentation has been updated to reflect the new parameter and its default value. Additionally, the note explaining the behavior with the default decimal marker is clear and informative.
Potential Issue:
- The function assumes that the
decimal
parameter will always be a single character. It might be beneficial to add a validation check to ensure that thedecimal
parameter is a single character to prevent unexpected behavior.+ if len(decimal) != 1: + raise ValueError("Decimal marker must be a single character.")Also applies to: 554-554
Line range hint
1665-1683
: Review ofextract_numbers_en
function with addeddecimal
parameterThe function
extract_numbers_en
has been similarly updated to handle thedecimal
parameter. This allows for consistent handling across singular and plural number extractions, which is important for maintaining uniform functionality across the API.Code Correctness and Logic
- The implementation for handling the
decimal
parameter is consistent withextract_number_en
, using thenormalize_decimals
function to standardize the decimal marker before extraction. This consistency in handling across related functions is good practice.Documentation and Comments
- The documentation clearly states the purpose of the
decimal
parameter and its default behavior. The note added provides clarity on how numbers with a decimal point are handled, which is helpful for users of the function.Potential Improvement:
- Similar to
extract_number_en
, a validation for thedecimal
parameter to ensure it is a single character would enhance robustness and prevent errors.+ if len(decimal) != 1: + raise ValueError("Decimal marker must be a single character.")Also applies to: 1684-1684
lingua_franca/lang/parse_pt.py (3)
Line range hint
205-268
: Review ofPortugueseNormalizer
class
- Tokenization Logic: The tokenization methods handle various cases like percentages, hashtags, and hyphenated words. This is important for ensuring that the input text is correctly split into tokens that can be processed further.
- Regex Usage: The use of regular expressions to split specific patterns is efficient. However, ensure that these patterns cover all expected formats (e.g., different uses of hyphens and numbers).
- Error Handling: There is no explicit error handling in the tokenization process. It might be beneficial to add error handling to manage unexpected input formats or regex match failures.
Consider adding more robust error handling mechanisms to improve the resilience of the tokenization process.
Line range hint
270-274
: Review ofnormalize_pt
functionThe function
normalize_pt
provides a straightforward way to normalize Portuguese text by removing articles. This is essential for consistent text processing. However, it might be beneficial to expand the synonyms handling to cover more cases or make it configurable, which could enhance the function's flexibility and applicability in different contexts.
Line range hint
276-657
: Review ofextract_datetime_pt
functionThe
extract_datetime_pt
function is complex and handles a variety of date and time formats. The logic appears comprehensive, but it's quite lengthy and could benefit from being broken down into smaller, more manageable functions. This would improve readability and maintainability.Consider refactoring this function to separate concerns, such as handling different types of date parts (e.g., days, months, years) into separate helper functions or classes.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (17)
- lingua_franca/lang/parse_common.py (1 hunks)
- lingua_franca/lang/parse_cs.py (5 hunks)
- lingua_franca/lang/parse_da.py (3 hunks)
- lingua_franca/lang/parse_de.py (4 hunks)
- lingua_franca/lang/parse_en.py (5 hunks)
- lingua_franca/lang/parse_es.py (4 hunks)
- lingua_franca/lang/parse_eu.py (3 hunks)
- lingua_franca/lang/parse_fa.py (5 hunks)
- lingua_franca/lang/parse_fr.py (4 hunks)
- lingua_franca/lang/parse_it.py (5 hunks)
- lingua_franca/lang/parse_nl.py (5 hunks)
- lingua_franca/lang/parse_pl.py (5 hunks)
- lingua_franca/lang/parse_pt.py (2 hunks)
- lingua_franca/lang/parse_ru.py (3 hunks)
- lingua_franca/lang/parse_sv.py (2 hunks)
- lingua_franca/parse.py (3 hunks)
- test/unittests/test_parse_en.py (1 hunks)
Additional comments not posted (25)
lingua_franca/parse.py (1)
Line range hint
83-102
: Parameter Addition inextract_number
FunctionSimilar to the
extract_numbers
function,extract_number
now accepts adecimal
parameter. This change is consistent across both functions, maintaining uniformity in handling decimal characters. The documentation and implementation should ensure that the behavior matches the expectations set by the new parameter.The change is consistent and well-documented, but ensure that the actual behavior of handling different decimal formats is tested thoroughly.
lingua_franca/lang/parse_fa.py (2)
Line range hint
311-327
: Updated Function Signature and Logic inextract_numbers_fa
The
extract_numbers_fa
function now includes adecimal
parameter, allowing for the specification of different decimal characters. The function usesnormalize_decimals
to adjust the input text accordingly, which is a good approach for handling diverse input formats.The implementation aligns with the intended functionality of supporting multiple decimal characters. Ensure comprehensive testing to verify the correct normalization and extraction of numbers across different formats.
Line range hint
341-358
: Handling of Decimal Character inextract_number_fa
The
extract_number_fa
function has been updated similarly toextract_numbers_fa
, including the use of thedecimal
parameter and the normalization logic. This consistency across functions is beneficial for maintainability and understanding of the code.The changes are well-implemented and documented. Testing should focus on ensuring that the normalization process does not introduce any errors in number extraction.
lingua_franca/lang/parse_common.py (2)
195-204
: New Function:normalize_decimals
The
normalize_decimals
function is a critical addition that supports the uniform handling of decimal characters across different languages. The use of regex to replace the specified decimal character with a dot is a straightforward and effective approach.This function is essential for the new functionality introduced in the number extraction functions. It is well-implemented and should work as expected with various decimal characters.
206-206
: Modification inmatch_yes_or_no
FunctionThe
match_yes_or_no
function now dynamically resolves the resource file based on the language code provided. This change improves the function's flexibility and its ability to handle language-specific nuances in user input.The modification enhances the function's utility in a multilingual context. It is a positive change that aligns with the overall goals of supporting multiple languages more effectively.
lingua_franca/lang/parse_da.py (1)
Line range hint
882-898
: Updated Functionality inextract_numbers_da
The
extract_numbers_da
function now accepts adecimal
parameter, which is used to specify the character to use as the decimal point. This change increases the flexibility of the function and aligns it with similar updates in other language modules.The implementation is consistent and well-documented. Testing should ensure that the normalization process correctly handles different decimal characters without affecting the accuracy of number extraction.
lingua_franca/lang/parse_sv.py (2)
20-20
: Added import fornormalize_decimals
.The addition of
normalize_decimals
fromparse_common
is necessary for the new functionality introduced inextract_number_sv
to handle different decimal characters. This change is appropriate and aligns with the PR's objectives.
160-176
: Review ofextract_number_sv
function changes.
- Parameter Addition: The addition of the
decimal
parameter with a default value of'.'
is correctly implemented. This allows for flexibility in handling different decimal characters based on the user's locale.- Decimal Normalization: The conditional use of
normalize_decimals
based on thedecimal
parameter is logical. It ensures that the input text is normalized only when necessary, which is efficient.Overall, the changes are well-implemented and serve the intended purpose of enhancing number extraction capabilities across different locales.
lingua_franca/lang/parse_de.py (3)
24-24
: Added import fornormalize_decimals
.The import of
normalize_decimals
is crucial for the new functionality in handling different decimal characters in the German locale. This change is necessary and well-integrated into the existing codebase.
147-171
: Review ofextract_number_de
function changes.
- Parameter Addition: The introduction of the
decimal
parameter with a default value of'.'
is correctly implemented. This change allows for handling different decimal characters, enhancing the function's usability in various locales.- Decimal Normalization: The use of
normalize_decimals
to adjust the input text based on thedecimal
parameter is a smart addition. It ensures that the function can accurately process numbers formatted with different decimal characters.These changes are well-executed and align with the PR's objectives to enhance number extraction capabilities.
Line range hint
1018-1034
: Review ofextract_numbers_de
function changes.The addition of the
decimal
parameter toextract_numbers_de
is consistent with the changes made inextract_number_de
. This ensures uniformity across functions that handle number extraction. The implementation is straightforward and correctly usesnormalize_decials
when thedecimal
is not the default.This change is well-implemented and supports the overall goal of the PR to handle numbers more flexibly across different locales.
lingua_franca/lang/parse_eu.py (1)
26-26
: Import ofnormalize_decimals
function.The import of
normalize_decimals
fromparse_common
is necessary for the new functionality inextract_numbers_eu
to handle different decimal separators. This is a good addition for supporting internationalization.lingua_franca/lang/parse_fr.py (2)
26-26
: Import ofnormalize_decimals
function.Just like in
parse_eu.py
, the import ofnormalize_decimals
fromparse_common
is necessary for the new functionality inextract_number_fr
andextract_numbers_fr
to handle different decimal separators. Consistent use of this function across different language modules is a good practice.
Line range hint
1086-1106
: Updatedextract_numbers_fr
function to support different decimal separators.This function has been updated similarly to
extract_number_fr
, adding adecimal
parameter to handle different decimal characters. The implementation and documentation should ensure that the behavior is consistent and clear to the users.As with the single number extraction function, ensure the documentation accurately reflects how the
decimal
parameter influences the behavior.lingua_franca/lang/parse_es.py (1)
Line range hint
284-304
: Review ofextract_numbers_es
function signature and decimal handlingThe function
extract_numbers_es
has been appropriately updated to include adecimal
parameter. This change aligns with the updates toextract_number_es
and supports consistency across the API. The delegation toextract_numbers_generic
is a good practice, assuming that function is properly handling thedecimal
parameter as well.It's important to ensure that
extract_numbers_generic
is also updated to respect the newdecimal
parameter to maintain consistency and correctness across the API.Verification successful
Line range hint
284-304
: Review ofextract_numbers_es
function signature and decimal handlingThe function
extract_numbers_es
has been appropriately updated to include adecimal
parameter. This change aligns with the updates toextract_number_es
and supports consistency across the API. Thenormalize_decimals
function ensures that thedecimal
parameter is handled correctly before callingextract_numbers_generic
.
- The
normalize_decimals
function inlingua_franca/lang/parse_common.py
replaces the specifieddecimal
character with a period, ensuring correct number extraction.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the implementation of `extract_numbers_generic` to ensure it handles the `decimal` parameter correctly. # Test: Search for the function definition and usage. Expect: Proper handling of the `decimal` parameter. rg --type python -A 5 $'extract_numbers_generic'Length of output: 78
Script:
#!/bin/bash # Description: Verify the implementation of `extract_numbers_generic` to ensure it handles the `decimal` parameter correctly. # Test: Search for the function definition and usage. Expect: Proper handling of the `decimal` parameter. rg -A 5 'def extract_numbers_generic'Length of output: 517
Script:
#!/bin/bash # Description: Verify how `extract_numbers_es` handles the `decimal` parameter before calling `extract_numbers_generic`. # Test: Search for the usage of `extract_numbers_generic` within `extract_numbers_es` and intermediary functions. rg -A 5 'extract_numbers_generic' lingua_franca/lang/parse_es.pyLength of output: 307
Script:
#!/bin/bash # Description: Verify the implementation of `normalize_decimals` to ensure it handles the `decimal` parameter correctly. # Test: Search for the function definition of `normalize_decimals`. rg -A 5 'def normalize_decimals'Length of output: 470
Tools
Ruff
305-305:
extract_numbers_generic
may be undefined, or defined from star imports(F405)
lingua_franca/lang/parse_nl.py (2)
28-28
: Import statement added fornormalize_decimals
.The addition of
normalize_decimals
fromlingua_franca.lang.parse_common
is consistent with the PR's goal to handle different decimal markers in number extraction functions.
Line range hint
1305-1324
: Updatedextract_numbers_nl
function to handle custom decimal markers.The function now correctly handles the
decimal
parameter to allow for flexible decimal markers, aligning with the PR objectives. The implementation ofnormalize_decimals
to adjust the text based on thedecimal
parameter is consistent and appropriate.Ensure that all calls to
extract_numbers_nl
in other parts of the codebase pass the correctdecimal
parameter if needed.Verification successful
No calls to
extract_numbers_nl
found in the codebase.The
extract_numbers_nl
function is defined correctly with the newdecimal
parameter. Since no calls to this function were found in the codebase, there are no further updates required.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `extract_numbers_nl` pass the correct `decimal` parameter. # Test: Search for the function usage. Expect: Only occurances of the new signature with the `decimal` parameter. rg --type python -A 5 $'extract_numbers_nl'Length of output: 73
Script:
#!/bin/bash # Description: Verify all function calls to `extract_numbers_nl` pass the correct `decimal` parameter. # Test: Search for the function usage. Expect: Only occurrences of the new signature with the `decimal` parameter. rg --type py -A 5 $'extract_numbers_nl'Length of output: 497
Script:
#!/bin/bash # Description: Search for any calls to the `extract_numbers_nl` function across the entire codebase. rg 'extract_numbers_nl\('Length of output: 135
lingua_franca/lang/parse_pl.py (2)
Line range hint
580-601
: Review ofextract_number_pl
function:
- Parameter Addition: The addition of the
decimal
parameter with a default value of'.'
is correctly implemented. This aligns with the PR's goal to handle different decimal markers.- Logic to Handle Decimal Marker: The logic to replace the decimal marker using
normalize_decimals
when it is not'.'
is sound. This ensures that the function can handle various decimal formats as intended.- Documentation: The function's documentation has been updated to reflect the new parameter and its behavior, which is good for maintainability.
Overall, the changes are well-implemented and serve the intended purpose.
Line range hint
1343-1363
: Review ofextract_numbers_pl
function:
- Parameter Addition: Similar to
extract_number_pl
, thedecimal
parameter has been added effectively with a default value of'.'
.- Handling of Decimal Marker: The function uses the
normalize_decimals
method to handle different decimal markers, which is consistent with the changes inextract_number_pl
.- Return Type: The function now returns a list of floats, ensuring that all numbers extracted are in a consistent format, which enhances the function's usability.
These changes are consistent with the PR's objectives and are correctly implemented.
lingua_franca/lang/parse_cs.py (2)
Line range hint
582-603
: Review the implementation ofextract_number_cs
with the newdecimal
parameter.The function
extract_number_cs
has been modified to accept a new parameterdecimal
, which specifies the character to use as a decimal point. The function usesnormalize_decimals
to replace the specified decimal character with a dot ('.') before proceeding with number extraction. This is a crucial change as it allows the function to handle different decimal separators used in various locales, which is essential for internationalization.
- Correctness: The implementation correctly checks if the
decimal
parameter is different from '.' before applying normalization. This avoids unnecessary processing if the default is used.- Performance: Consider the impact of repeatedly calling
normalize_decimals
on large texts or in tight loops. If performance becomes an issue, optimizations or caching mechanisms might be needed.- Maintainability: The code is clear and modular, making it easy to understand and maintain.
Line range hint
1569-1589
: Review the implementation ofextract_numbers_cs
with the newdecimal
parameter.Similar to
extract_number_cs
, theextract_numbers_cs
function now supports adecimal
parameter to handle different decimal separators. This function extracts all numbers from the input text, normalizing them to use a dot as the decimal separator.
- Correctness: The function handles the normalization of decimal separators correctly, ensuring that all extracted numbers are returned in a consistent format.
- Performance: As with
extract_number_cs
, consider the performance implications of the normalization process.- Maintainability: The function maintains a consistent style with other parts of the module, using similar patterns and logic.
lingua_franca/lang/parse_ru.py (2)
31-31
: Import addition:normalize_decimals
The addition of
normalize_decimals
fromparse_common
is appropriate given its usage in the modifiedextract_numbers_ru
function to handle different decimal markers.
Line range hint
1581-1601
: Enhancement inextract_numbers_ru
for flexible decimal handlingThe addition of the
decimal
parameter toextract_numbers_ru
enhances flexibility by allowing different decimal markers. This is particularly useful for internationalization, where different locales use different characters for decimals.
- Correctness: The function now uses
normalize_decimals
to adjust the text based on the provideddecimal
parameter before extracting numbers. This ensures that the function can adapt to various formats of decimal numbers across different languages.- Documentation: The function's documentation has been updated accordingly, which is good practice as it maintains clarity on how the function behaves with the new parameter.
- Potential Issue: The note in the documentation states that numbers formatted with a decimal dot/full stop are always extracted, regardless of the
decimal
parameter. This could be misleading if the function behavior actually respects thedecimal
parameter. It's important to ensure that this behavior is either correctly documented or adjusted to match the documentation.- will always extract numbers formatted with a decimal dot/full stop, - such as '3.5', even if 'decimal' is specified. + will extract numbers formatted according to the 'decimal' parameter.lingua_franca/lang/parse_en.py (1)
32-32
: Added import statement fornormalize_decimals
The addition of
normalize_decimals
fromlingua_franca.lang.parse_common
is appropriate given its usage in theextract_number_en
andextract_numbers_en
functions to handle different decimal characters. This aligns with the PR's goal of handling decimal markers effectively.lingua_franca/lang/parse_pt.py (1)
81-102
: Review ofextract_number_pt
function changes
- Parameter Addition: The addition of the
decimal
parameter is consistent with the PR's goal of handling different decimal markers. This is a good enhancement for supporting multiple locales.- Documentation Update: The updated docstring clearly explains the new parameter, which is crucial for maintainability and usability.
- Functionality: The use of
normalize_decimals
function whendecimal
is not the default value is a logical approach to ensure that the input text is correctly formatted before number extraction. This should help in handling cases where different regions use different decimal markers.Overall, the changes to this function are well-implemented and align with the objectives of the PR.
def extract_number_es(text, short_scale=True, ordinals=False, decimal='.'): | ||
""" | ||
This function prepares the given text for parsing by making | ||
numbers consistent, getting rid of contractions, etc. | ||
This function extracts a number from a text string, | ||
handles pronunciations in long scale and short scale | ||
|
||
https://en.wikipedia.org/wiki/Names_of_large_numbers | ||
|
||
Args: | ||
text (str): the string to normalize | ||
short_scale (bool): use short scale if True, long scale if False | ||
ordinals (bool): consider ordinal numbers, third=3 instead of 1/3 | ||
decimal (str): character to use as decimal point. defaults to '.' | ||
Returns: | ||
(int) or (float): The value of extracted number | ||
(int) or (float) or False: The extracted number or False if no number | ||
was found | ||
Note: | ||
will always extract numbers formatted with a decimal dot/full stop, | ||
such as '3.5', even if 'decimal' is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of extract_number_es
function signature and decimal handling
The function now correctly includes a decimal
parameter allowing for flexible decimal character specification. The implementation of normalize_decimals
based on this parameter is a good approach to handle locale-specific decimal characters.
However, the note in the documentation about always extracting numbers formatted with a decimal dot/full stop might be confusing. It suggests that despite the decimal
parameter, the function may not respect it under certain conditions. This could benefit from further clarification to ensure users understand the behavior fully.
def extract_number_fr(text, short_scale=True, ordinals=False, decimal='.'): | ||
""" | ||
This function extracts a number from a text string, | ||
handles pronunciations in long scale and short scale | ||
|
||
https://en.wikipedia.org/wiki/Names_of_large_numbers | ||
|
||
Args: | ||
text (str): the string to extract a number from | ||
text (str): the string to normalize | ||
short_scale (bool): use short scale if True, long scale if False | ||
ordinals (bool): consider ordinal numbers, third=3 instead of 1/3 | ||
decimal (str): character to use as decimal point. defaults to '.' | ||
Returns: | ||
(str): The number extracted or the original text. | ||
(int) or (float) or False: The extracted number or False if no number | ||
was found | ||
Note: | ||
will always extract numbers formatted with a decimal dot/full stop, | ||
such as '3.5', even if 'decimal' is specified. | ||
|
||
""" | ||
if decimal != '.': | ||
text = normalize_decimals(text, decimal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated extract_number_fr
function to support different decimal separators.
The function now accepts a decimal
parameter, which allows for handling different decimal characters. This change is crucial for supporting number extraction in French text that might use commas or other characters as decimal separators.
However, similar to parse_eu.py
, there's a potential issue:
- The comment on line 389 states that numbers will always be extracted with a decimal dot, regardless of the
decimal
parameter. This could lead to incorrect behavior if the user expects the function to handle different decimal characters as specified.
Consider revising the documentation or the implementation to ensure that the function respects the decimal
parameter fully.
- will always extract numbers formatted with a decimal dot/full stop,
+ will attempt to normalize numbers to use the specified decimal character,
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def extract_number_fr(text, short_scale=True, ordinals=False, decimal='.'): | |
""" | |
This function extracts a number from a text string, | |
handles pronunciations in long scale and short scale | |
https://en.wikipedia.org/wiki/Names_of_large_numbers | |
Args: | |
text (str): the string to extract a number from | |
text (str): the string to normalize | |
short_scale (bool): use short scale if True, long scale if False | |
ordinals (bool): consider ordinal numbers, third=3 instead of 1/3 | |
decimal (str): character to use as decimal point. defaults to '.' | |
Returns: | |
(str): The number extracted or the original text. | |
(int) or (float) or False: The extracted number or False if no number | |
was found | |
Note: | |
will always extract numbers formatted with a decimal dot/full stop, | |
such as '3.5', even if 'decimal' is specified. | |
""" | |
if decimal != '.': | |
text = normalize_decimals(text, decimal) | |
will attempt to normalize numbers to use the specified decimal character, |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
port MycroftAI#69
Summary by CodeRabbit
New Features
normalize_decimals
function for handling different decimal characters in number extraction.Tests