Enhance type checking #382

ggalloni · 2024-09-04T12:59:00Z

The method validate_info of CobayaComponent, checks only bool.

This PR is enhancing that to check every relevant type, including generic types (List[], Dict[], Tuple[], etc).

ggalloni · 2024-09-04T16:06:12Z

The new code is raising a TypeError when max_samples="bad_value", however, the test_mcmc.py (MPI case) is still breaking as if it is not catching that.

Do you have an idea why this could be happening?

codecov-commenter · 2024-09-04T16:29:48Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 26.31579% with 42 lines in your changes missing coverage. Please review.

Project coverage is 74.25%. Comparing base (735f7a8) to head (19e4143).

Files with missing lines	Patch %	Lines
cobaya/component.py	22.64%	41 Missing ⚠️
cobaya/samplers/polychord/polychord.py	0.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #382      +/-   ##
==========================================
- Coverage   74.57%   74.25%   -0.33%     
==========================================
  Files         147      147              
  Lines       11200    11247      +47     
==========================================
- Hits         8352     8351       -1     
- Misses       2848     2896      +48

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ggalloni · 2024-09-06T09:56:46Z

I left the validation of bools as it was and added an enforce_types attribute that will trigger the new code.

In this way, one can force type checking by setting enforce_types=True in any descendent class of CobayaComponent, without touching the old validation.

cmbant

Interesting to see this. Bit mixed feeling though, a bit complicated than hoping (for something is just checking), not very clear how robust it is.

cmbant · 2024-09-27T10:25:58Z

cobaya/model.py

@@ -589,7 +589,7 @@ def logpost(self,
        return self.logposterior(params_values, make_finite=make_finite,
                                 return_derived=False, cached=cached).logpost

-    def get_valid_point(self, max_tries: int, ignore_fixed_ref: bool = False,
+    def get_valid_point(self, max_tries: Union[int, str], ignore_fixed_ref: bool = False,


Has to be int as used in the code?

Hello @cmbant, thanks for the feedback on this!

Indeed I think that, technically, that could be a float, as it is used just for evaluations like if tries<max_tries. Still, I wouldn't feel comfortable with having a floating max_tries, right?

Also, that number comes from the value of a NumberWithUnits, already parsed, so that should be either a float or an int. Thus, I would drop the Union with str (going back as it was before).

cmbant · 2024-09-27T10:26:23Z

cobaya/cosmo_input/convert_cosmomc.py

@@ -37,7 +37,7 @@ def cosmomc_root_to_cobaya_info_dict(root: str, derived_to_input=()) -> InputDic
            name = name.replace('chi2_', 'chi2__')
        if name.startswith('minuslogprior') or name == 'chi2':
            continue
-        param_dict: ParamDict = {'latex': par.label}
+        param_dict: 'ParamDict' = {'latex': par.label}


Actual type should be better than deferred where possible

I removed completely deferring 'ParamDict'. This comes at the cost of moving the definition of ParamsDict in typing a few lines below, after the definition of ParamDict. Let me know what you think of this

cmbant · 2024-09-27T10:27:09Z

cobaya/component.py

@@ -331,6 +331,8 @@ class CobayaComponent(HasLogger, HasDefaults):
    _at_resume_prefer_new: List[str] = ["version"]
    _at_resume_prefer_old: List[str] = []

+    enforce_types: bool = False


Probably should start with _ as not something we want changed by yaml

I am not sure about this. Indeed, adding the underscore we cannot enforce types neither through the yaml nor through the info dict (as I do in the test for instance). So, I guess this becomes a question of what default behavior we want to use. If the default value of enforce_types is False, the user can only enforce types on components he/she defines with that flag to True. But, for example, types of built-in samplers and such cannot be checked (except if we add in their definition the opposite flag). This may be a good choice since I guess that types of built-in stuff are checked anyway, in a way of the other. Vice versa, with enforce_types = True by default, everything will be explicitly checked and there is no need for the user to access that.

Am I missing something?

Would probably have to be False by default to keep people from hitting annoying errors and compatibility with existing external likelihoods. Was thinking of this more as an option for likelihood developers who support accurate type hint in their code.

Ok great, then I'll make it private and default at False 👍

cmbant · 2024-09-27T19:59:51Z

cobaya/component.py

+            elif expected_type is int: # for numpy integers
+                if value == float('inf'): # for infinite values parsed as floats
+                    return isinstance(value, float)
+                return isinstance(value, int) or "numpy.int" in str(type(value))


Can simplify some of this using generic numbers.Number types?

Yes, I was thinking about using some more generic types, I will commit that soon.

cmbant · 2024-09-27T20:03:09Z

cobaya/parameterization.py

@@ -42,7 +42,7 @@ def is_derived_param(info_param: ParamInput) -> bool:
    return expand_info_param(info_param).get("derived", False) is not False


-def expand_info_param(info_param: ParamInput, default_derived=True) -> ParamDict:
+def expand_info_param(info_param: ParamInput, default_derived=True) -> 'ParamDict':


Ops, probably a search and replace gone wrong...

cmbant · 2024-09-27T20:03:21Z

cobaya/parameterization.py

@@ -76,7 +76,7 @@ def expand_info_param(info_param: ParamInput, default_derived=True) -> ParamDict
    return info_param


-def reduce_info_param(info_param: ParamDict) -> ParamInput:
+def reduce_info_param(info_param: 'ParamDict') -> ParamInput:


I removed all of these, see above

cmbant · 2024-09-27T20:06:55Z

cobaya/samplers/mcmc/mcmc.py

@@ -56,11 +56,11 @@ class MCMC(CovmatSampler):

    # instance variables from yaml
    burn_in: NumberWithUnits
-    learn_every: NumberWithUnits
-    output_every: NumberWithUnits
+    learn_every: Union[NumberWithUnits, str]


Not sure why only some of these changed. In terms of usage, optional str is probably not very helpful since should be converted at read in, maybe NumberWithUnits and more flexible check

I changed the only ones causing errors, but, following your suggestion, I moved the problem on the type-checking side as we know the NumberWithUnits can come in as a string.

…nto checking_types

ggalloni · 2024-10-02T10:34:48Z

For assessing the robustness of this, I am not sure how to test it. An idea would be to switch the default of _enforce_types to True, let tests here run and, if successful, set it back to False. So at least we know that everything internal is working as expected. What do you think?

cmbant · 2024-10-14T15:54:00Z

cobaya/component.py

+
+        if hasattr(expected_type, "__origin__"):
+            return self._validate_composite_type(expected_type, value)
+        else:


Why are only the values general here, not also the expected_types?
e.g. any Mapping type could accept any Mapping value? (e.g. empty_dict is MappingProxyType)

Note also in numpy can also end up with "numbers" that are zero-rank arrays like np.array(1), which I suspect may not pass isinstance(Real), though not sure if that's ever an issue for setting parameters.

cmbant · 2024-11-01T12:26:12Z

The best I can come up with that works with empty_dict, Sequence, Tuple[float] and TypedDicts, and allows numpy arrays for Sequence[float] and Tuple[float], is something like this:


      def validate_info(self, name: str, value: Any, annotations: dict):
        print(annotations)
        if name in annotations:
            expected_type = annotations[name]
            print(name, expected_type)
            if not self._validate_type(expected_type, value):
                msg = f"Attribute '{name}' must be of type {expected_type}, not {type(value)}(value={value})"
                raise TypeError(msg)

    def _validate_composite_type(self, expected_type, value):
        origin = expected_type.__origin__
        try: # for Callable and Sequence types, which have no __args__
            args = expected_type.__args__
        except AttributeError:
            pass
        if origin is Union:
            return any(self._validate_type(t, value) for t in args)
        elif origin is Optional:
            return value is None or self._validate_type(args[0], value)
        elif issubclass(origin, Sequence) and isinstance(value, Iterable) and len(args)==1:
            return all(self._validate_type(args[0], item) for item in value)
        elif issubclass(origin, Sequence):
            return isinstance(value, Sequence) and len(args) == len(value) and all(
                self._validate_type(t, v) for t, v in zip(args, value)
            )
        elif origin is dict:
            return isinstance(value, Mapping) and all(
                self._validate_type(args[0], k) and self._validate_type(args[1], v)
                for k, v in value.items()
            )
        elif origin is ClassVar:
            return self._validate_type(args[0], value)
        else:
            return isinstance(value, origin)

    def _validate_type(self, expected_type, value):
        if value is None or expected_type is Any: # Any is always valid
            return True

        if hasattr(expected_type, "__origin__"):
            return self._validate_composite_type(expected_type, value)
        else:
            print(expected_type, value)
            # Exceptions for some types
            if is_typeddict(expected_type):
               type_hints = get_type_hints(expected_type)
               if not isinstance(value, Mapping) or not set(value.keys()).issubset(set(type_hints.keys())):
                     return False  
               for key, value in value.items():
                    self.validate_info(key, value, type_hints) 
               return True                
            elif expected_type is int:
                return value == float('inf') or isinstance(value, Integral)
            elif expected_type is float:
                return isinstance(value, Real) or isinstance(value, np.ndarray) and not value.ndim
            elif expected_type is NumberWithUnits:
                return isinstance(value, (Real, str))
            return isinstance(value, expected_type)

    def validate_attributes(self):
        annotations = self.get_annotations()
        for name in annotations.keys():
            self.validate_info(name, getattr(self, name, None), annotations)

However, is_typeddict is only in core typing from 3.10.

cmbant · 2024-11-04T20:09:32Z

My attempt to generalize and refactor this a bit is now in #388.
@ggalloni did you have an SOLikeT build to test against? Anything missed?

ggalloni · 2024-11-06T10:40:43Z

Hello @cmbant, thanks for your help with this!
Yes, I was using SOLikeT/#192 to test this, so it should be sufficient to point it to the new branch of #388. I guess that would also tell us if something is missing since it was passing all tests using #382 instead.

cmbant · 2024-11-06T11:59:40Z

OK great, let me know any probs. I also just pushed change to hopefully also make it work with deferred types.

ggalloni · 2024-11-06T15:13:53Z

Currently, all non-WIndows builds are failing due to CCL not building correctly...
Still, Windows is passing all tests, which is reassuring 👍

cmbant · 2024-11-06T16:48:16Z

Except that you don't have _enforce_types=True, only enforce_types...

ggalloni · 2024-11-07T17:28:50Z

I fixed that (I thought I already did...) and am getting an error handling ClassVar.

This seems to happen because that is dealt with only if origin and args are defined for the expected_type.
Instead, some checks skip all that part and produce an error at line 248 of typing.py when trying to execute

isinstance(value, typing.ClassVar)

cmbant · 2024-11-07T17:31:05Z

Can you give specific example?

cmbant · 2024-11-07T22:38:57Z

I made a fix, looks like running OK on windows

cmbant · 2024-11-11T11:58:38Z

I merged, thanks!

ggalloni added 5 commits September 3, 2024 17:31

Allow all types for checking

930db00

Handle exceptions

71fc109

Add str when allowed

e4c2971

Clean imports

dc48e84

Handle ParamDict

7a568e7

Try skipping test for now

7bf99d8

ggalloni added 5 commits September 5, 2024 15:35

Separate bool validation and optional type enforcing

61a0405

ParamDict -> ForwardRef['ParamDict'] for type check

af5c727

Handle ClassVar

a16b804

Add test for type checking

726f522

Test for compatibility

bbb1f6b

ggalloni added 2 commits September 10, 2024 14:09

Test Optional

8e62704

Merge branch 'master' into checking_types

39b75b4

cmbant reviewed Sep 27, 2024

View reviewed changes

ggalloni added 7 commits October 2, 2024 12:22

Remove deferred types

3fe9b84

Remove ForwardRef handling

b985a40

Handle ints and floats with generic types

6ecc79a

Allow NumberWithUnits to be a str

00bcbb3

Remove useless type

5988a15

Merge branch 'checking_types' of https://github.com/ggalloni/cobaya i…

4cdd11e

…nto checking_types

Change enforce_types to private attribute

436e70d

Clean

19e4143

cmbant reviewed Oct 14, 2024

View reviewed changes

Merge branch 'master' into checking_types

dad347c

cmbant closed this Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance type checking #382

Enhance type checking #382

ggalloni commented Sep 4, 2024

ggalloni commented Sep 4, 2024

codecov-commenter commented Sep 4, 2024 •

edited

Loading

ggalloni commented Sep 6, 2024

cmbant left a comment

cmbant Sep 27, 2024

ggalloni Oct 2, 2024

cmbant Sep 27, 2024

ggalloni Oct 2, 2024 •

edited

Loading

cmbant Sep 27, 2024

ggalloni Oct 2, 2024

cmbant Oct 2, 2024

ggalloni Oct 2, 2024

cmbant Sep 27, 2024

ggalloni Oct 2, 2024

cmbant Sep 27, 2024

ggalloni Oct 2, 2024

cmbant Sep 27, 2024

ggalloni Oct 2, 2024

cmbant Sep 27, 2024

ggalloni Oct 2, 2024

ggalloni commented Oct 2, 2024

cmbant Oct 14, 2024

cmbant commented Nov 1, 2024

cmbant commented Nov 4, 2024

ggalloni commented Nov 6, 2024

cmbant commented Nov 6, 2024

ggalloni commented Nov 6, 2024

cmbant commented Nov 6, 2024

ggalloni commented Nov 7, 2024

cmbant commented Nov 7, 2024

cmbant commented Nov 7, 2024

cmbant commented Nov 11, 2024

Enhance type checking #382

Enhance type checking #382

Conversation

ggalloni commented Sep 4, 2024

ggalloni commented Sep 4, 2024

codecov-commenter commented Sep 4, 2024 • edited Loading

Codecov Report

ggalloni commented Sep 6, 2024

cmbant left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggalloni Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggalloni commented Oct 2, 2024

Choose a reason for hiding this comment

cmbant commented Nov 1, 2024

cmbant commented Nov 4, 2024

ggalloni commented Nov 6, 2024

cmbant commented Nov 6, 2024

ggalloni commented Nov 6, 2024

cmbant commented Nov 6, 2024

ggalloni commented Nov 7, 2024

cmbant commented Nov 7, 2024

cmbant commented Nov 7, 2024

cmbant commented Nov 11, 2024

codecov-commenter commented Sep 4, 2024 •

edited

Loading

ggalloni Oct 2, 2024 •

edited

Loading