-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #88 and #93: multiple decimal places and multiple decimal numbers #91
base: master
Are you sure you want to change the base?
fix #88 and #93: multiple decimal places and multiple decimal numbers #91
Conversation
found bug: multiple instances of "point" or "dot" in the input string confuse the parser. |
* extract_number(), extract_numbers(), and all helper functions gain a keyword parameter `decimal_places` (for helpers, just `places`) which does what it sounds like, using builtin round(). * avoid capturing non-adjacent numbers as decimal places * avoid capturing already-used decimal places as separate numbers in extract_numbers() * add a few tests for the above
a374d79
to
dab967e
Compare
dab967e
to
715fdda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Chance, loving all your work on this stuff.
A few questions and suggestions here, but push back if you disagree :)
if "." not in str(decimal.text): | ||
return number.value + float('0.' + str(decimal.value)), \ | ||
number.tokens + partitions[1] + decimal.tokens | ||
if "." not in str(numbers2[0].text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be updated to check for the new _DECIMAL_MARKER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By this point, _DECIMAL_MARKER
has done its job. This bit is easier to look at in a debugger than it is to explain, but numbers2
is a list of ReplaceableNumber
objects corresponding to the digits of our decimal part. It's going to be joined, then appended to a decimal point, then cast to a float and summed with the whole number part of our result.
The if
statement here goes back to Core, and it makes sure the first element of numbers2
doesn't already have a decimal point. It's hypothetically possible, but I can't figure out how, and removing the if
clause doesn't break any test cases. Still, I left it there, just removed a superfluous variable is all.
I'll see if I can work backwards through git blame
and ask whoever wrote the if statement. I was assuming it had something to do with the way the tokenizer handles certain input. If it's just a relic, I'm all for removing line 318 entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it doesnt break test cases, i say let's remove it or find a test case for it
lingua_franca/lang/parse_en.py
Outdated
if return_value == int(return_value): | ||
return_value = int(return_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I was calling an extract_decimal
function I'd expect a float is returned. What's the benefit of returning an integer instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be and not places
, good catch! That is, this is fundamentally a helper invoked by extract_number()
. Unless the user has asked for decimal places, I'd figured no rounding means a whole number is just that. On the other hand, you're right, that doesn't work either: "one" is a whole number. "one point zero" equals a whole number.
What if decimal_places=None
does no rounding, decimal_places=0
always returns an int, and decimal_places > 0
does what it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Yeah that sounds intuitive to me
- more visible funcs now have more explicitly-named parameter for places - `extract_number(decimal_places=...)` now has several options: - `decimal_places=n` will round to `n` places - `decimal_places=0` will round up to nearest int, equiv. ceil(result) - `decimal_places=-1` will round down to int, equiv. floor(result) - expanded comments and docstrings - remove old commented-out code
f94d48f
to
951b8df
Compare
@@ -1487,11 +1542,12 @@ def extract_numbers_en(text, short_scale=True, ordinals=False): | |||
is now common in most English speaking countries. | |||
See https://en.wikipedia.org/wiki/Names_of_large_numbers | |||
ordinals (bool): consider ordinal numbers, e.g. third=3 instead of 1/3 | |||
decimal_places (int or False): rounds to # decimal places. uses builtin round() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more int or False
lingua_franca/parse.py
Outdated
decimal_places (int or None): Positive value will round to X places. | ||
Val of 0 will round up to nearest int, | ||
equivalent to `math.ceil(result)` | ||
Val of -1 will round down to nearest int, | ||
equivalent to `math.floor(result)` | ||
Val of None will perform no rounding, | ||
potentially returning a very long string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a previous idea for implementation? Assuming this will be the new behaviour you described where 0 is a regular rounding and there is no handling of a -1
input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I can't decide which way is better. There are three things at play:
- If these functions don't round, their output could be a string of stupendous length
- You don't always want to round the same way
- Sometimes you feel like an int, sometimes you don't
The status box on this PR seems to be broken. If I click on Travis, it shows all the builds completed successfully. |
It happens sometimes with the checks, if there's a communication error on the webhook the status update can be lost. |
extract_number(), extract_numbers(), and all
helper functions gain a keyword parameter
decimal_places
(for helpers, justplaces
)which does what it sounds like, using builtin
round().
avoid capturing non-adjacent numbers as decimal
places
avoid capturing already-used decimal places as
separate numbers in extract_numbers()
add a few tests for the above
closes #88, #93