You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When i handle roman numerals i want to convert them to integers in mixed strings for purposes of comparison in fuzz matching, so even false positives of 'roman numbers' aren't a problem as long as both sides match. Since roman numerals are near universally used in upper case in my dataset, only upper case roman numerals are recognized in my case.
So a method that translates any of possible multiple roman sequence on a mixed string, that left any not 'roman' character alone would be valuable.
I'm sure it can be done manually externally by just having a sequence of roman numerals and looking ahead on a string to 'extract' valid ones. But i'm worried a algorithm like this derived from a library would choke on 'illegal' roman numbers. Imagine passing 'IIII'. There is the possibility it would get translated to '13' or '31' but there is also the possibility that a library like this would just throw a exception.
I'm asking for a, maybe optionally, permissive method that essentially does this kind of 'best effort' translation of mixed strings with the documentation that the method is meant to be permissive enough that it's just best effort, and illegal numbers will get mistranslated into 'two valid' (or more) numbers.
Or another DIY method where only roman numerals exist, but it allows 'illegal sequences' by transforming into multiple numbers (i'm not asking for all possible combinations of multiple numbers mind you, since i want it for fuzz so a deterministic one suffices, but someone might)
The text was updated successfully, but these errors were encountered:
When i handle roman numerals i want to convert them to integers in mixed strings for purposes of comparison in fuzz matching, so even false positives of 'roman numbers' aren't a problem as long as both sides match. Since roman numerals are near universally used in upper case in my dataset, only upper case roman numerals are recognized in my case.
So a method that translates any of possible multiple roman sequence on a mixed string, that left any not 'roman' character alone would be valuable.
I'm sure it can be done manually externally by just having a sequence of roman numerals and looking ahead on a string to 'extract' valid ones. But i'm worried a algorithm like this derived from a library would choke on 'illegal' roman numbers. Imagine passing 'IIII'. There is the possibility it would get translated to '13' or '31' but there is also the possibility that a library like this would just throw a exception.
I'm asking for a, maybe optionally, permissive method that essentially does this kind of 'best effort' translation of mixed strings with the documentation that the method is meant to be permissive enough that it's just best effort, and illegal numbers will get mistranslated into 'two valid' (or more) numbers.
Or another DIY method where only roman numerals exist, but it allows 'illegal sequences' by transforming into multiple numbers (i'm not asking for all possible combinations of multiple numbers mind you, since i want it for fuzz so a deterministic one suffices, but someone might)
The text was updated successfully, but these errors were encountered: