Skip to content

Commit

Permalink
Merge pull request #2466 from stan-dev/docs/distributions
Browse files Browse the repository at this point in the history
Docs for adding new distributions to Stan Math
  • Loading branch information
rok-cesnovar authored May 7, 2021
2 parents b994475 + 2ab00b3 commit 1c31a77
Show file tree
Hide file tree
Showing 7 changed files with 629 additions and 47 deletions.
469 changes: 469 additions & 0 deletions doxygen/contributor_help_pages/adding_new_distributions.md

Large diffs are not rendered by default.

84 changes: 73 additions & 11 deletions doxygen/contributor_help_pages/common_pitfalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ class my_big_type {
We can see in the above that the standard style of a move (the constructor taking an rvalue reference) is to copy the pointer and then null out the original pointer. But in Stan, particularly for reverse mode, we need to keep memory around even if it's only a temporary for when we call the gradient calculations in the reverse pass. And since memory for reverse mode is stored in our arena allocator no copying happens in the first place.
When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy. For instance, Stan's `var` type is a PIMPL implimentation, so it simply holds a pointer of size 8 bytes. A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs. While a reference to a double is also 8 bytes, unless the function is inlined by the compiler, the computer will have to place the reference into cache, then go fetch the value that is being referenced which now takes up two words instead of one!
When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy. For instance, Stan's `var` type is a PIMPL implementation, so it simply holds a pointer of size 8 bytes. A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs. While a reference to a double is also 8 bytes, unless the function is inlined by the compiler, the computer will have to place the reference into cache, then go fetch the value that is being referenced which now takes up two words instead of one!
The general rules to follow for passing values to a function are:
Expand All @@ -227,31 +227,93 @@ The pointer is cheap to copy around and is safe to copy into lambdas for
As an example, see the implementation of `mdivide_left`
[here](https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/mdivide_left.hpp)
where `make_callback_ptr` is used to save the result of an Eigen
where `make_callback_ptr()` is used to save the result of an Eigen
Householder QR decomposition for use in the reverse pass.
The implementation is in
[stan/math/rev/core/chainable_object.hpp](https://github.com/stan-dev/math/blob/develop/stan/math/rev/core/chainable_object.hpp)
### Returning arena types
### `.val()` vs. `.val_op()`
Suppose we have a function such as
### Returning expressions
```cpp
/**
* Returns the dot product of a vector of var with itself.
*
* @tparam EigVec An Eigen type with compile time rows or columns equal to 1 and a scalar var type.
* @param[in] v Vector.
* @return Dot product of the vector with itself.
*/
template <typename EigVec, require_eigen_vt<is_var, EigVec>* = nullptr>
inline var cool_fun(const EigVec& v) {
arena_t<EigVec> arena_v(v);
arena_t<EigVec> res = arena_v.val().array() * arena_v.val().array();
reverse_pass_callback([res, arena_v]() mutable {
arena_v.adj().array() += (2.0 * res.adj().array()) * arena_v.val().array();
});
return res;
}
```

There's a deceptive problem in this return! We are returning back an `arena_matrix<EigVec>`, which is an Eigen matrix completely existing on our arena allocator. The problem here is that we've also passed `res` to our callback, and now suppose a user does something like the following.

```cpp
Eigen::Matrix<var, -1, 1> x = Eigen::Matrix<double, -1, 1>::Random(5);
auto innocent_return = cool_fun(x);
innocent_return.coeffRef(3) = var(3.0);
auto other_return = cool_fun2(innocent_return);
grad();
```

Now `res` is `innocent_return` and we've changed one of the elements of `innocent_return`, but that is also going to change the element of `res` which is being used in our reverse pass callback! The answer for this is simple but sadly requires a copy.

```cpp
template <typename EigVec, require_eigen_vt<is_var, EigVec>* = nullptr>
inline var cool_fun(const EigVec& v) {
arena_t<EigVec> arena_v(v);
arena_t<EigVec> res = arena_v.val().array() * arena_v.val().array();
reverse_pass_callback([res, arena_v]() mutable {
arena_v.adj().array() += (2.0 * res.adj().array()) * arena_v.val().array();
});
return plain_type_t<EigVec>(res);
}
```
we make a deep copy of the return whose inner `vari` will not be the same but the `var` will produce a new copy of the pointer to the `vari`. Now the user code above will be protected and it is safe for them to assign to individual elements of the `auto` returned matrix.
### Const correctness, reverse mode autodiff, and arena types
In general, it's good to have arithmetic and integral types as `const`, for instance pulling out the size of a vector to reuse later such as `const size_t x_size = x.size();`. However vars, and anything that can contain a var should not be `const`. This is because in reverse mode we want to update the value of the `adj_` inside of the var, which requires that it is non-const.
## Handy tricks
### `forward_as`
### Nested Stacks
In functions such as [Stan's distributions](https://github.com/stan-dev/math/blob/1bf96579de5ca3d06eafbc2eccffb228565b4607/stan/math/prim/prob/exponential_cdf.hpp#L64) you will see code which uses a little function called `forward_as<>` inside of if statements whose values are known at compile time. In the following code, `one_m_exp` can be either an Eigen vector type or a scalar.
```cpp
T_partials_return cdf(1.0);
if (is_vector<T_y>::value || is_vector<T_inv_scale>::value) {
cdf = forward_as<T_partials_array>(one_m_exp).prod();
} else {
cdf = forward_as<T_partials_return>(one_m_exp);
}
```

Do note that in the above, since the if statements values are known at compile time, the compiler will always remove the unused side of the `if` during the dead code elimination pass. But the dead code elimination pass does not happen until all the code is instantiated and verified as compilable. So `forward_as()` exists to trick the compiler into believing both sides of the `if` will compile. If we used C++17, the above would become

```cpp
T_partials_return cdf(1.0);
if constexpr (is_vector<T_y>::value || is_vector<T_inv_scale>::value) {
cdf = one_m_exp.prod();
} else {
cdf = one_m_exp;
}
```

### Scoped Stacks

As an aside, in order to get the actual vector result, Eigen defines the
`.eval()` operator, so `c.eval()` would return an actual `Eigen::VectorXd`.
Similarly, Math defines the `eval` function which will evaluate an Eigen
expression and forward anything that is not an Eigen expression.
Where [`if constexpr`](https://en.cppreference.com/w/cpp/language/if) is run before any tests are done to verify the code can be compiled.

### `value_of_rec`
Using `forward_as<TheTypeIWant>(the_obj)` will, when `the_obj` matches the type the user passes, simply pass back a reference to `the_obj`. But when `TheTypeIWant` and `the_obj` have different types it will throw a runtime error. This function should only be used inside of `if` statements like the above where the conditionals of the `if` are known at compile time.
2 changes: 2 additions & 0 deletions doxygen/contributor_help_pages/developer_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Thanks for checking out these docs. This is where you'll find information on contributing to the [Math library](https://github.com/stan-dev/math), how the source code is laid out, and other technical details that could help. This wiki focuses on things relevant to the Math library. There's also a separate [Stan wiki](https://github.com/stan-dev/stan/wiki) for things related to the language, services, algorithms.

For the most up to date guide on contributing to Stan Math see the [Getting Started Guide](@ref getting_started), [Common Pitfalls](@ref common_pitfalls), and [Adding New Distributions Guide](@ref new_distribution).

This wiki is a work in progress. If you have suggestions, please update the wiki or mention it on our forums on [this thread](https://discourse.mc-stan.org/t/what-doc-would-help-developers-of-the-math-library/).

-----------------
Expand Down
2 changes: 2 additions & 0 deletions doxygen/contributor_help_pages/distribution_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ From within the Math library:
```

For development on windows, add `GENERATE_DISTRIBUTION_TESTS=true` to the make/local file.

This will take hours, perhaps many hours. Most of the time is spent compiling. You might want to add the parallel flag to the python script.


Expand Down
Loading

0 comments on commit 1c31a77

Please sign in to comment.