Is it possible to have a thread where people can submit examples or testcases where the text normalizer is not doing perfectly? #494

huangruizhe · 2022-11-08T19:30:36Z

huangruizhe
Nov 8, 2022

Thanks for the great work!
Just wondered if we can submit or share ad-hoc test cases to make the text normalizer more robust?

For example, some wrong normalizations are:
2020 Third Quarter => 2023rd quarter
Monroe's financial release => monroe is financial release

jongwook · 2022-11-08T23:05:15Z

jongwook
Nov 8, 2022
Maintainer

Yes the normalizer definitely has many rough edges. Your first example seems easy to fix, the second is a bit more tricky because it's hard to differentiate between possessive 's and contractions of is or has with the rule-based approach. But would appreciate more examples like this to be aware of its failure modes!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to have a thread where people can submit examples or testcases where the text normalizer is not doing perfectly? #494

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Is it possible to have a thread where people can submit examples or testcases where the text normalizer is not doing perfectly? #494

huangruizhe Nov 8, 2022

Replies: 1 comment

jongwook Nov 8, 2022 Maintainer

huangruizhe
Nov 8, 2022

jongwook
Nov 8, 2022
Maintainer