Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch From Using Confuse to Pydantic For Config Parsing #254

Open
TimothyWillard opened this issue Jul 17, 2024 · 5 comments
Open

Switch From Using Confuse to Pydantic For Config Parsing #254

TimothyWillard opened this issue Jul 17, 2024 · 5 comments
Labels
enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority.

Comments

@TimothyWillard
Copy link
Contributor

The config parsing in gempyor currently relies on the confuse in some ways that are less than ideal. The current issues I see are:

  1. The config object is a global object in gempyor.utils, which can make it challenging to unit test the config parsing/validation independent of other package components. To create unit tests for the config (or just have two instances of config in the same python process) a copy has to be made with copy.deepcopy.
  2. Extending confuse relies on a custom gempyor.utils.add_method function to hijack and add methods to the confuse.ConfigView class, which modifies confuse.ConfigView for the entire python process (see below). pydantic defines it's model in regular python classes so extending a model is as easy as adding a method.
>>> import confuse
>>> original_config = confuse.Configuration("flepiMoP", read=False)
>>> original_config.clear()
>>> original_config.read(user=False)
>>> original_config.set_file("/Users/twillard/Desktop/GitHub/HopkinsIDD/flepimop_sample/config_sample_2pop.yml")
>>> config_view_methods_before = set(dir(original_config["name"]))
>>> from gempyor.utils import *
>>> config_view_methods_after = set(dir(original_config["name"]))
>>> config_view_methods_after - config_view_methods_before
{'as_evaled_expression', 'as_random_distribution', 'as_date'}

Also, it appears that pydantic is more actively maintained than confuse. There are some other benefits as well such as opening the door to more config file format options (I think @pearsonca is a fan of JSON for example). Looks like some initial work has already been completed by @saraloo in #222. Probably need to expand on parts of the package that touch the gempyor.utils.config object to ensure stability when working on this. Thoughts @jcblemai and others?

@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority. labels Jul 17, 2024
@saraloo
Copy link
Contributor

saraloo commented Jul 17, 2024

Quick thought. Not being a software person, I just used pydantic as an enhancement with regards to config checks/validation of all existing and possible options (@jcblemai pointed me to it), but so far it isn't actually used as no one had time to build on it. I support this suggestion though. Happy to help on any of these enhancements though!

@jcblemai
Copy link
Collaborator

Thanks @TimothyWillard, that was indeed the next step after Sara developed the pydantic parser. I really love Pydantic and there are so many downstream advantage we would get switching to it (biggest one is one place with every config option and value, second come validation, ...).

I agree about confuse being... confusing. AFAIK (because I wasnot there when this choice was made) It has been used because it has some very good performance when config are very long (you can pickle small subviews in subprocesses, it's easy to read multiple files for the same configs). But now pydantic is just amazing and confuse as very strange behaviour that has caused it share of bugs.

This is not a easy change though.

@jcblemai
Copy link
Collaborator

Also absolutely not helpful at all but somewhere in the code there is an example of how to create an object for tests without reading a file, from a dictionnary or a string. Could not find it.

@TimothyWillard
Copy link
Contributor Author

Yeah, I agree that it's not an easy change. Definitely will need more test coverage of gempyor before attempting this and it might be wise to try to do this incrementally. Could write some utilities for converting between confuse and pydantic and convert the package piecemeal.

No worries on finding the example, I'm assuming it's just using io.StringIO or similar.

@TimothyWillard
Copy link
Contributor Author

Possible bug: #313 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority.
Projects
None yet
Development

No branches or pull requests

3 participants