-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Many serious TOML parsing/loading bugs caused by underlying toml library, consider switching? #439
Comments
After looking more into the other options unfortunately they are worse in other ways, for example they can barely handle a fraction of the types that I guess we are stuck writing a custom import re
from pathlib import Path
import toml
def better_toml_dump_str(val):
try:
return toml.encoder._dump_str(val)
except Exception:
# if we hit any of toml's numerous encoding bugs,
# fall back to using json representation of string
return json.dumps(str(val), default=repr)
class CustomTOMLEncoder(toml.encoder.TomlEncoder):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# override the dumper funcs for any types you need to support here:
self.dump_funcs[str] = better_toml_dump_str
self.dump_funcs[Path] = better_toml_dump_str
self.dump_funcs[re.RegexFlag] = better_toml_dump_str
... # this works:
>>> test_dict = {
"string_with_escape_sequences": "\033[00;00m",
"some_Path": "/some/path/example.txt",
"example_weird_type": re.IGNORECASE | re.UNICODE | re.MULTILINE,
# etc. try more weird types here
}
>>> print(benedict(test_dict).to_toml(encoder=CustomTOMLEncoder()))
string_with_escape_sequences = "\u001b[00;00m"
example_weird_type = "re.IGNORECASE|re.UNICODE|re.MULTILINE" You're welcome to close this issue 🤷, up to you. |
@pirate thank you for reporting this problem. Actually, for decoding I think it would be cool to improve TOML support, would you like to submit a PR with a list of failing test cases (that should succeed with the perfect library)? |
@pirate is it in your plans to submit a PR for this? |
I think given there are only two real python toml library options and you're already using the best one, I think it's not worth trying to patch the toml bugs on top within You can leave it up to end users to figure out a workaround if they really need one (e.g. my Here is a good test suite for TOML parsing though if you want one: https://github.com/toml-lang/toml-test |
@pirate totally agree, it would make more sense to fix these issue at toml lib level. |
Thanks so much for building python-benedict, it's really awesome!
So unfortunatley the underlying TOML library https://github.com/uiri/toml used by benedict has a bunch of long-time outstanding bugs that break parsing/loading and generally make it unsafe to load->dump->load the same string.
For example, you cannot dump any dict containing escape sequences without it throwing an exception:
But thats not it, there are many other fairly major string escaping, quoting, and parse/dump cycle inconsistency bugs that have bitten other projects using
uiri/toml
:Almost all of these are 2yr+ old, indicating they're probably not going to all get fixed anytime soon without significant increase in velocity. No harm no foul, it's open source we can't demand they go faster and they don't owe us anything, but maybe benedict could consider a different library with fewer major outstanding parser consistency issues?
Is benedict open to switching to a library without these issues? Maybe one of these:
I'll chip in $20 towards the work to make the switch if it's an option ⬇️
Upvote & Fund
The text was updated successfully, but these errors were encountered: