-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added configuration management using pydantic #986
base: master
Are you sure you want to change the base?
Added configuration management using pydantic #986
Conversation
add Pydantic configuration
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Please check the codacy errors: https://app.codacy.com/gh/mlcommons/GaNDLF/pull-requests/986/issues |
@szmazurek could you please take a first pass? |
yup, will do in like 1hr or tomorrow morning |
save_output: bool = Field( | ||
default=False, description="Save outputs during validation/testing." | ||
) | ||
in_memory: bool = Field(default=False, description="Pin data to CPU memory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean to "pin data to cpu/gpu memory"? Also, is the 'in_memory' really enforcing a page-lock on memory storing given chunk of data or just keeps it all in RAM?
data_postprocessing: Union[dict, set] = Field( | ||
default={}, description="Default data postprocessing configuration." | ||
) | ||
grid_aggregator_overlap: str = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of other options can we have here? I believe this cannot be an arbitraty string, therefore it should be an optional literal here with available strings?
model_config = ConfigDict( | ||
extra="allow" | ||
) # it allows extra fields in the model dict | ||
dimension: Optional[int] = Field(description="Dimension.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the dimension optional? Also, maybe it should accept only 2 or 3, as no other dimensionalities are supported. And perhaps the description can be made more expressive - like 'model input dimension (2D or 3D).'?
) # it allows extra fields in the model dict | ||
dimension: Optional[int] = Field(description="Dimension.") | ||
architecture: Union[ARCHITECTURE_OPTIONS, dict] = Field(description="Architecture.") | ||
final_layer: str = Field(description="Final layer.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are also limited to certain amount of acceptable values - leveraging literal seems like good option.
), | ||
default=3, | ||
) # TODO: check it | ||
type: Optional[str] = Field(description="Type of model.", default="torch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it also be literal? Probably options are torch, openvino? @sarthakpati am I right?
default=3, | ||
) # TODO: check it | ||
type: Optional[str] = Field(description="Type of model.", default="torch") | ||
data_type: str = Field(description="Data type.", default="FP32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true that we support such field in the config and it really influences anything in base gandlf? I tought that precision is chaning only when amp is enabled
type: Optional[str] = Field(description="Type of model.", default="torch") | ||
data_type: str = Field(description="Data type.", default="FP32") | ||
save_at_every_epoch: bool = Field(default=False, description="Save at every epoch.") | ||
amp: bool = Field(default=False, description="Amplifier.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amp stands for automatic mixed precision, not amplifier
default=-5, | ||
description="this controls the number of validation data folds to be used for model *selection* during training (not used for back-propagation)", | ||
) | ||
proportional: Optional[bool] = Field(default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this parameter do? also, if it's boolean, can't we set default as False?
description="this will perform stratified k-fold cross-validation but only with offline data splitting", | ||
) | ||
testing: int = Field( | ||
default=-5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question - are there any limits to values that can be set in this field? Like, what happens if I set testing to 10? If there are limits, maybe it's worth to include possible ranges when defining this field, what do you think ? @sarthakpati @benmalef
|
||
|
||
class PatchSampler(BaseModel): | ||
type: str = Field(default="uniform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other options available for type and padding_mode? if so, maybe we should use literal here?
@model_validator(mode="after") | ||
def validate_version(self) -> Self: | ||
if version_check(self.model_dump(), version_to_check=version("GANDLF")): | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rise an error here if the condition is not met?
) | ||
# min_lr: 0.00001, #TODO: this should be defined ?? | ||
# max_lr: 1, #TODO: this should be defined ?? | ||
step_size: float = Field(description="step_size", default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to think on different classes that would allow for definition of params for separate schedulers - for example, if we use a scheduler which does reduce on plateau, then we need a field to define tracked metric. Not really sure how to implement that nicely tho. Other approach is define all possible fields that any scheduler can take and later provide validation logic which takes care of conditionality - i.e if reduce_on_plateau
type chosen, then we require monitor
field
class UserDefinedParameters(DefaultParameters): | ||
version: Version = Field( | ||
default=Version(minimum=version("GANDLF"), maximum=version("GANDLF")), | ||
description="Whether weighted loss is to be used or not.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descritpion is not valid I believe :P
patch_size: Union[list[Union[int, float]], int, float] = Field( | ||
description="Patch size." | ||
) | ||
model: Model = Field(..., description="The model to use. ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be list of avaiable strings no?
description="Scheduler.", default=Scheduler(type="triangle_modified") | ||
) | ||
optimizer: Union[str, Optimizer] = Field( | ||
description="Optimizer.", default=Optimizer(type="adam") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about naming - here I first assumed we are initializing real torch optimizer - perhaps the names of config classes should be suffixed with Config
/params
?
description="Inference mechanism.", default=InferenceMechanism() | ||
) | ||
data_postprocessing_after_reverse_one_hot_encoding: dict = Field( | ||
description="data_postprocessing_after_reverse_one_hot_encoding.", default={} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any options we support here? Not sure if this can be an arbitrary dict defined by user
data_postprocessing_after_reverse_one_hot_encoding: dict = Field( | ||
description="data_postprocessing_after_reverse_one_hot_encoding.", default={} | ||
) | ||
differential_privacy: Any = Field(description="Differential privacy.", default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be just a boolean field?
Field(description="Data preprocessing."), | ||
AfterValidator(validate_data_preprocessing), | ||
] = {} | ||
# TODO: It should be defined with a better way (using a BaseModel class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comments, it would allow for a lot of clarity
file.write("\n".join(markdown)) | ||
|
||
|
||
def initialize_key( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need such utility? Meaning, if there are default parameters to be set, ideally they are defined via pydantic and automatically populated if user did not set them explicitly
@@ -85,6 +87,7 @@ | |||
"openslide-bin", | |||
"openslide-python==1.4.1", | |||
"lion-pytorch==0.2.2", | |||
"pydantic", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would fix version, maybe in the future releases there are going to be some breaking changes of pydantic (little chance but I am paranoid a little)
add Pydantic configuration
Fixes #ISSUE_NUMBER
Proposed Changes
Checklist
CONTRIBUTING
guide has been followed.typing
is used to provide type hints, including and not limited to usingOptional
if a variable has a pre-defined value).pip install
step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].logging
library is being used and noprint
statements are left.