Add logic to normalize comma-delimited decimals #69

ChanceNCounter · 2020-01-11T05:31:51Z

Closes #65

All languages, including English, will now normalize 34,5 to 34.5 before beginning to extract numbers.

Decimal markers can now be specified through extract_number() and extract_numbers() function calls, using a new parameter decimal='.'

Note that these functions will only normalize decimals if they are called as such. Individual parsers, such as extractnumber_es(), have not been modified in any way, but will produce correct output when called via extract_number(lang='es', decimal=',')

For those who don't speak regex, though I encourage you to run the regex through a regex tester, it means this:

\b\d+,{1}\d+\b:
\b a word boundary
\d+ any number of digits, followed by
,{1} exactly one comma, followed by
\d+ any number of digits
\b another word boundary

ChanceNCounter · 2020-01-11T07:37:55Z

run test

ChanceNCounter · 2020-01-30T00:32:28Z

A few minutes ago, I realized the way I was trying to distinguish comma-delimited decimals from comma-delimited thousands wasn't gonna work.

So I came here to mark this WIP, and I now see that commit didn't make it to the branch in any case 😳

create function for both extract_number and extract_numbers to call

Alternate decimal points now specified with function parameter

ChanceNCounter · 2020-02-04T01:29:17Z

It occurs to me that this argument will almost always be used to parse ',' as a decimal point, and that users who want ',' for decimal points will almost always want '.' for thousands separators.

Should this PR address that?

If so, should it be an additional keyword, thousands=',', or should the decimal keyword be scrapped in favor of actual keywords?

Although the function signatures may become bloated, I'm partial toward two keywords. Standards exist other than full stops or commas for decimal separators. Indeed, it might be worth handling spaces, if specifically indicated in the function call, but that's another layer of complexity, probably involving a while loop.

JarbasAl · 2020-02-04T02:13:59Z

in a computer a decimal number is always represented with a . regardless of language.

I don't think we will get any stt transcription ever where this isn't the case

It's still good to think about this, chat usage is also important !

ChanceNCounter · 2020-02-04T18:22:39Z

in a computer a decimal number is always represented with a . regardless of language.

I don't think we will get any stt transcription ever where this isn't the case

It's still good to think about this, chat usage is also important !

Exactly. This PR is so that people can parse data written the opposite way. @TheLastProject asked and shall receive.

ChanceNCounter · 2020-02-05T01:13:34Z

Alright, here we go, now that I'm free and at a desktop. Say you wanted TTS to read lines from a document, containing yearly something, and you knew the file's formatting: YYYY: #,#

With this PR, you could do:

>>> foo = parse.extract_numbers("1942: 4,5".replace(":",""), decimal=",")
>>> foo
[1942.0, 4.5]
>>> d = format.nice_year(datetime(year=int(foo[0]), month=1, day=1))
>>> d
'nineteen forty two'
>>> bar = d + ", " + format.pronounce_number(foo[1])
>>> bar
'nineteen forty two, four point five'

Pass that along to TTS and you've got some nice behavior.

rebase of MycroftAI#69

rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

port lingua_nostra/pull/20 - support decimal markers rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

feat/normalize_decimals port lingua_nostra/pull/20 - support decimal markers rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

rebase of MycroftAI#69

devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Jan 11, 2020

ChanceNCounter assigned JarbasAl and unassigned JarbasAl Jan 11, 2020

ChanceNCounter requested a review from JarbasAl January 11, 2020 18:40

ChanceNCounter added the WIP - do not merge label Jan 30, 2020

ChanceNCounter added 6 commits February 2, 2020 09:56

Add logic to normalize comma-delimited decimals

12453ab

spin off normalize_decimal logic

b7c8ad6

create function for both extract_number and extract_numbers to call

iterate over regex the python.regex way

f7e8f5b

add tests for decimal normalization

9375e55

fix regex to support py3.5

246855d

replace comma-decimal handling with param

402c1f2

Alternate decimal points now specified with function parameter

ChanceNCounter force-pushed the normalize-decimal-numbers branch from e7fde5d to 402c1f2 Compare February 2, 2020 18:20

ChanceNCounter removed the WIP - do not merge label Feb 2, 2020

JarbasAl added enhancement New feature or request multi_lang relates to several languages labels Nov 2, 2020

JarbasAl added a commit to HelloChatterbox/lingua-nostra that referenced this pull request May 9, 2021

feat/support_decimal_markers

9690ffa

rebase of MycroftAI#69

JarbasAl added a commit to HelloChatterbox/lingua-nostra that referenced this pull request May 9, 2021

support decimal markers (#20)

cfbbd19

rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

JarbasAl added a commit to OpenVoiceOS/ovos-lingua-franca that referenced this pull request Nov 27, 2022

support decimal markers (#20)

e18ddcb

rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

JarbasAl added a commit to OpenVoiceOS/ovos-lingua-franca that referenced this pull request Nov 27, 2022

feat/normalize_decimals

be3a1bb

port lingua_nostra/pull/20 - support decimal markers rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

JarbasAl added a commit to OpenVoiceOS/ovos-lingua-franca that referenced this pull request Nov 27, 2022

feat/number_spans

eb257da

feat/normalize_decimals port lingua_nostra/pull/20 - support decimal markers rebase of MycroftAI#69 Co-authored-by: jarbasal <[email protected]>

JarbasAl added a commit to OpenVoiceOS/ovos-lingua-franca that referenced this pull request Nov 27, 2022

support decimal markers

3566f3a

rebase of MycroftAI#69

JarbasAl mentioned this pull request Nov 27, 2022

Feat/decimal markers OpenVoiceOS/ovos-lingua-franca#43

Closed

JarbasAl added a commit to OpenVoiceOS/ovos-lingua-franca that referenced this pull request Nov 27, 2022

support decimal markers

6626587

rebase of MycroftAI#69

JarbasAl added a commit to OpenVoiceOS/ovos-lingua-franca that referenced this pull request Nov 27, 2022

support decimal markers

811550c

rebase of MycroftAI#69

JarbasAl removed their request for review January 25, 2025 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logic to normalize comma-delimited decimals #69

Add logic to normalize comma-delimited decimals #69

ChanceNCounter commented Jan 11, 2020 •

edited

Loading

ChanceNCounter commented Jan 11, 2020

ChanceNCounter commented Jan 30, 2020

ChanceNCounter commented Feb 4, 2020

JarbasAl commented Feb 4, 2020 •

edited

Loading

ChanceNCounter commented Feb 4, 2020

ChanceNCounter commented Feb 5, 2020

Add logic to normalize comma-delimited decimals #69

Are you sure you want to change the base?

Add logic to normalize comma-delimited decimals #69

Conversation

ChanceNCounter commented Jan 11, 2020 • edited Loading

ChanceNCounter commented Jan 11, 2020

ChanceNCounter commented Jan 30, 2020

ChanceNCounter commented Feb 4, 2020

JarbasAl commented Feb 4, 2020 • edited Loading

ChanceNCounter commented Feb 4, 2020

ChanceNCounter commented Feb 5, 2020

ChanceNCounter commented Jan 11, 2020 •

edited

Loading

JarbasAl commented Feb 4, 2020 •

edited

Loading