Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New configuration to specify qp.Ensemble parameterization in CatEstimator stages #28

Open
drewoldag opened this issue Jul 27, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@drewoldag
Copy link
Collaborator

Currently almost all subclasses of rail_base.estimator.CatEstimator will store resulting qp.Ensembles using a qp.interp gridded representation. We should add a new configuration parameter to allow users to select which qp representation is preferred. i.e. qp.hist, qp.spline, qp.packed_interp, etc...)

The work here is similar to issue #11 in that the work in this repository (rail_base) is relatively small, but the work to respect the new configuration parameter in all of the subclasses of CatEstimator will be substantial.

Also note that there will likely need to be updates made to several jupyter notebooks as well. But currently we do not have an exhaustive list of which notebooks will be affected.

@aimalz aimalz added the enhancement New feature or request label Jul 27, 2023
@drewoldag drewoldag removed their assignment Apr 26, 2024
@eacharles
Copy link
Collaborator

So, a lot of the estimators have native representations of ensembles. How would you propose to handle this in those cases?

@aimalz
Copy link
Collaborator

aimalz commented May 20, 2024

In those cases, the default value of the configuration parameter for that stage would just be the (known, for that stage) native parameterization, no?

@eacharles
Copy link
Collaborator

A couple thought.

  1. I think we should only do this in a way that only touches the base class code, not any of the sub-classes as that would be rather disruptive. This is going to be kinda tricky because we don't just write the ensemble at the end, but rather we allocate the memory at the beginning of the run() and then fill in it from the parallel processes. I.e., we will have to modify the _run() and _do_chunk_output() methods to do this.

  2. I think a better solution than requiring parameters for the output representation would be to use parameters that default to None but that allow you to force the qp representation to a particular type.

The function
qp.factory.convert(in_dist, class_name, **kwds)
used as
new_ensemble = qp.factory.convert(orig_ensemble, self.config.qp_output_classname, **self.config.qp_output_class_pars)
or

qp.Ensemble.convert_to(self, to_class, **kwargs)
used as
new_ensemble = orig_ensemble.convert_to(qp.factory.stats[self.config.qp_output_classname, **self.config.qp_output_class_pars)

Would allow you to convert from one representation to another.

So, this could be something like:

if self.config.qp_output_classname is not None:
new_ensemble = orig_ensemble.convert_to(qp.factory.stats[self.config.qp_output_classname, **self.config.qp_output_class_pars)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants