Added configuration management using pydantic #986

benmalef · 2025-02-12T10:41:06Z

add Pydantic configuration

Fixes #ISSUE_NUMBER

Proposed Changes

Add Pydantic configuration
Refactor the config_manager

Checklist

add Pydantic configuration

github-actions · 2025-02-12T10:41:21Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sarthakpati · 2025-02-12T14:11:50Z

Please check the codacy errors: https://app.codacy.com/gh/mlcommons/GaNDLF/pull-requests/986/issues

sarthakpati · 2025-02-12T20:56:59Z

@szmazurek could you please take a first pass?

szmazurek · 2025-02-17T20:32:33Z

@szmazurek could you please take a first pass?

yup, will do in like 1hr or tomorrow morning

szmazurek · 2025-02-18T07:09:15Z

GANDLF/Configuration/Parameters/default_parameters.py

+    save_output: bool = Field(
+        default=False, description="Save outputs during validation/testing."
+    )
+    in_memory: bool = Field(default=False, description="Pin data to CPU memory.")


What does it mean to "pin data to cpu/gpu memory"? Also, is the 'in_memory' really enforcing a page-lock on memory storing given chunk of data or just keeps it all in RAM?

szmazurek · 2025-02-18T07:14:22Z

GANDLF/Configuration/Parameters/default_parameters.py

+    data_postprocessing: Union[dict, set] = Field(
+        default={}, description="Default data postprocessing configuration."
+    )
+    grid_aggregator_overlap: str = Field(


What kind of other options can we have here? I believe this cannot be an arbitraty string, therefore it should be an optional literal here with available strings?

szmazurek · 2025-02-18T07:15:51Z

GANDLF/Configuration/Parameters/model_parameters.py

+    model_config = ConfigDict(
+        extra="allow"
+    )  #  it allows extra fields in the model dict
+    dimension: Optional[int] = Field(description="Dimension.")


Is the dimension optional? Also, maybe it should accept only 2 or 3, as no other dimensionalities are supported. And perhaps the description can be made more expressive - like 'model input dimension (2D or 3D).'?

szmazurek · 2025-02-18T07:19:00Z

GANDLF/Configuration/Parameters/model_parameters.py

+    )  #  it allows extra fields in the model dict
+    dimension: Optional[int] = Field(description="Dimension.")
+    architecture: Union[ARCHITECTURE_OPTIONS, dict] = Field(description="Architecture.")
+    final_layer: str = Field(description="Final layer.")


Here we are also limited to certain amount of acceptable values - leveraging literal seems like good option.

szmazurek · 2025-02-18T07:19:39Z

GANDLF/Configuration/Parameters/model_parameters.py

+        ),
+        default=3,
+    )  # TODO: check it
+    type: Optional[str] = Field(description="Type of model.", default="torch")


Should it also be literal? Probably options are torch, openvino? @sarthakpati am I right?

szmazurek · 2025-02-18T07:21:03Z

GANDLF/Configuration/Parameters/model_parameters.py

+        default=3,
+    )  # TODO: check it
+    type: Optional[str] = Field(description="Type of model.", default="torch")
+    data_type: str = Field(description="Data type.", default="FP32")


Is this true that we support such field in the config and it really influences anything in base gandlf? I tought that precision is chaning only when amp is enabled

szmazurek · 2025-02-18T07:21:24Z

GANDLF/Configuration/Parameters/model_parameters.py

+    type: Optional[str] = Field(description="Type of model.", default="torch")
+    data_type: str = Field(description="Data type.", default="FP32")
+    save_at_every_epoch: bool = Field(default=False, description="Save at every epoch.")
+    amp: bool = Field(default=False, description="Amplifier.")


amp stands for automatic mixed precision, not amplifier

szmazurek · 2025-02-18T07:23:17Z

GANDLF/Configuration/Parameters/nested_training_parameters.py

+        default=-5,
+        description="this controls the number of validation data folds to be used for model *selection* during training (not used for back-propagation)",
+    )
+    proportional: Optional[bool] = Field(default=None)


what does this parameter do? also, if it's boolean, can't we set default as False?

szmazurek · 2025-02-18T07:24:09Z

GANDLF/Configuration/Parameters/nested_training_parameters.py

+        description="this will perform stratified k-fold cross-validation but only with offline data splitting",
+    )
+    testing: int = Field(
+        default=-5,


Open question - are there any limits to values that can be set in this field? Like, what happens if I set testing to 10? If there are limits, maybe it's worth to include possible ranges when defining this field, what do you think ? @sarthakpati @benmalef

szmazurek · 2025-02-18T07:27:50Z

GANDLF/Configuration/Parameters/patch_sampler.py

+
+
+class PatchSampler(BaseModel):
+    type: str = Field(default="uniform")


are there any other options available for type and padding_mode? if so, maybe we should use literal here?

szmazurek · 2025-02-18T07:28:56Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+    @model_validator(mode="after")
+    def validate_version(self) -> Self:
+        if version_check(self.model_dump(), version_to_check=version("GANDLF")):
+            return self


should we rise an error here if the condition is not met?

szmazurek · 2025-02-18T07:32:24Z

GANDLF/Configuration/Parameters/scheduler_parameters.py

+    )
+    # min_lr: 0.00001, #TODO: this should be defined ??
+    # max_lr: 1, #TODO: this should be defined ??
+    step_size: float = Field(description="step_size", default=None)


I think we need to think on different classes that would allow for definition of params for separate schedulers - for example, if we use a scheduler which does reduce on plateau, then we need a field to define tracked metric. Not really sure how to implement that nicely tho. Other approach is define all possible fields that any scheduler can take and later provide validation logic which takes care of conditionality - i.e if reduce_on_plateau type chosen, then we require monitor field

szmazurek · 2025-02-18T07:33:17Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+class UserDefinedParameters(DefaultParameters):
+    version: Version = Field(
+        default=Version(minimum=version("GANDLF"), maximum=version("GANDLF")),
+        description="Whether weighted loss is to be used or not.",


Descritpion is not valid I believe :P

szmazurek · 2025-02-18T07:33:54Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+    patch_size: Union[list[Union[int, float]], int, float] = Field(
+        description="Patch size."
+    )
+    model: Model = Field(..., description="The model to use. ")


should be list of avaiable strings no?

szmazurek · 2025-02-18T07:35:46Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+        description="Scheduler.", default=Scheduler(type="triangle_modified")
+    )
+    optimizer: Union[str, Optimizer] = Field(
+        description="Optimizer.", default=Optimizer(type="adam")


Question about naming - here I first assumed we are initializing real torch optimizer - perhaps the names of config classes should be suffixed with Config/params?

szmazurek · 2025-02-18T07:38:24Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+        description="Inference mechanism.", default=InferenceMechanism()
+    )
+    data_postprocessing_after_reverse_one_hot_encoding: dict = Field(
+        description="data_postprocessing_after_reverse_one_hot_encoding.", default={}


Any options we support here? Not sure if this can be an arbitrary dict defined by user

szmazurek · 2025-02-18T07:38:53Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+    data_postprocessing_after_reverse_one_hot_encoding: dict = Field(
+        description="data_postprocessing_after_reverse_one_hot_encoding.", default={}
+    )
+    differential_privacy: Any = Field(description="Differential privacy.", default=None)


Can it be just a boolean field?

szmazurek · 2025-02-18T07:39:09Z

GANDLF/Configuration/Parameters/user_defined_parameters.py

+        Field(description="Data preprocessing."),
+        AfterValidator(validate_data_preprocessing),
+    ] = {}
+    # TODO: It should be defined with a better way (using a BaseModel class)


I agree with the comments, it would allow for a lot of clarity

szmazurek · 2025-02-18T07:41:46Z

GANDLF/Configuration/utils.py

+        file.write("\n".join(markdown))
+
+
+def initialize_key(


do we need such utility? Meaning, if there are default parameters to be set, ideally they are defined via pydantic and automatically populated if user did not set them explicitly

szmazurek · 2025-02-18T07:42:42Z

setup.py

@@ -85,6 +87,7 @@
    "openslide-bin",
    "openslide-python==1.4.1",
    "lion-pytorch==0.2.2",
+    "pydantic",


I would fix version, maybe in the future releases there are going to be some breaking changes of pydantic (little chance but I am paranoid a little)

976 add pydantic configuration v1 (#34)

da9014f

add Pydantic configuration

benmalef added 2 commits February 12, 2025 21:18

fix codacy errors

79173fd

fix codacy error eval -> ast.literal_eval()

c5882f8

sarthakpati changed the title ~~Add Pydantic configuration (new)~~ Added configuration management using pydantic Feb 12, 2025

sarthakpati mentioned this pull request Feb 17, 2025

Pydantic config #976

Closed

11 tasks

szmazurek reviewed Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added configuration management using pydantic #986

Added configuration management using pydantic #986

benmalef commented Feb 12, 2025 •

edited

Loading

github-actions bot commented Feb 12, 2025 •

edited

Loading

sarthakpati commented Feb 12, 2025

sarthakpati commented Feb 12, 2025

szmazurek commented Feb 17, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025

szmazurek Feb 18, 2025



		class PatchSampler(BaseModel):
		type: str = Field(default="uniform")

Added configuration management using pydantic #986

Are you sure you want to change the base?

Added configuration management using pydantic #986

Conversation

benmalef commented Feb 12, 2025 • edited Loading

Proposed Changes

Checklist

github-actions bot commented Feb 12, 2025 • edited Loading

sarthakpati commented Feb 12, 2025

sarthakpati commented Feb 12, 2025

szmazurek commented Feb 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benmalef commented Feb 12, 2025 •

edited

Loading

github-actions bot commented Feb 12, 2025 •

edited

Loading