Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos and clarify sample site naming guidance in developer_documentation.rst #368

Closed
wants to merge 8 commits into from
24 changes: 13 additions & 11 deletions docs/source/developer_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,22 @@ A variety of coding conventions are enforced by automated tools in continuous in
PyRenew Principles
------------------

- Variable naming conventions
Variable naming conventions
---------------------------
- Use the ``data_`` prefix for (potentially) observed data.
- Use the ``_rv`` suffix for random variables.
- Use the ``observed_`` for the output of sample statements where ``obs`` is a ``data_`` prefixed object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure that these prefixes make complete sense. Particularly, I make an explicit distinction between observed and unobserved data. In principle, everything can be called data, so it is superfluous to use that as a prefix. Instead, I would say sampled_ for any output (deterministic or random), leave the observed_ to actually observed data, and keep the _rv suffix.

Copy link
Collaborator

@damonbayer damonbayer Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvegayon @dylanhmorris How about this? The observed_ prefix is nice because it maps to the obs argument in numpyro.sample, though I understand why it is confusing to say the observed_ object may not actually be observed.

-  Use the ``observed_`` prefix for (potentially) observed data.
-  Use the ``_rv`` suffix for random variables.
-  Use the ``sampled_`` for the output of sample statements where ``obs`` is a ``obserced_`` prefixed object.
-  Thus, code which may reasonably written like ``infections = infections.sample(x, obs=infections)`` should instead be written ``sampled_infections = infections_rv.sample(x, obs=observed_infections)``.

- Thus, code which may reasonably written like ``infections = infections.sample(x, obs=infections)`` should instead be written ``observed_infections = infections_rv.sample(x, obs=data_infections)``.

- Use the ``data_`` prefix for (potentially) observed data.
- Use the ``_rv`` suffix for random variables.
- Use the ``observed_`` for the output of sample statements where ``obs`` is a ``data_`` prefixed object.
- Thus, code which may reasonably written like ``infections = infections.sample(x, obs=infections)`` should instead be written ``observed_infections = infections_rv.sample(x, obs=data_infections)``.

- Class conventions
Class conventions
-----------------
- Composing ``Model`` objects is discouraged. To modularize and reuse code across ``Model``s, create custom ``RandomVariable`` classes.
- Returning anything from ``Model.sample`` is discouraged. Instead, sample from models using ``Predictive`` or our ``prior_predictive`` or ``posterior_predictive`` functions.
damonbayer marked this conversation as resolved.
Show resolved Hide resolved
- Using ``numpyro.deterministic`` within a ``RandomVariable.sample`` method is discouraged. Only use it in ``Model.model`` calls. If something might need to be recorded from a ``RandomVariable``, it should be returned from the ``RandomVariable.sample`` call so it can be recorded within the ``Model.model`` call.
- Using default site names in a ``RandomVariable.sample`` is discouraged. Only use default site names at the ``Model.model`` level.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a better term than “using.” This is meant to suggest that someone writing a random variable should not give their own random variable a default site name. We should say something closer to that and point out that the site name should be an argument to the RV constructor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried something in 33063a7. Might be too verbose. Let me know what you think, @damonbayer

- Use ``DeterministicVariable``\ s instead of constants within a model.

- Composing models is discouraged.
- Returning anything from ``Model.sample`` is discouraged. Instead, sample from models using ``Predictive`` or our ``prior_predictive`` or ``posterior_predictive`` functions.
- Using ``numpyro.deterministic`` within a ``RandomVariable`` is discouraged. Only use at the ``numpyro.deterministic`` ``Model`` level. If something might need to be recorded from a ``RandomVariable``, it should be returned from the ``RandomVariable`` so it can be recorded at the ``Model`` level.
- Using default site names in a ``RandomVariable`` is discouraged. Only use default site names at the ``Model`` level.
- Use ``DeterministicVariable``\ s instead of constants within a model.

Adding Documentation to Sphinx
------------------------------
Expand Down