Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjoint ode design #37

Merged
merged 14 commits into from
Apr 29, 2021
Merged

Adjoint ode design #37

merged 14 commits into from
Apr 29, 2021

Conversation

wds15
Copy link
Contributor

@wds15 wds15 commented Feb 17, 2021

This PR discusses how the adjoint ODE approach can be integrated with Stan math. The focus of the document is to

  • describe the merits of the approach
  • how the CVODES user manual notation lines up with Stan math notation
  • shows how the CVODES functionality is integrated with the Stan math autodiff library
  • proposes a super-set of tuning parameters for the numerical procedures which need to be assessed by the community in order to weed them out somewhat (or not at all)

I am not sure on how to present a rendered version of this. I used this R command to render the text:

rmarkdown::render("0027-adjoint-ode.md", output_format="bookdown::html_document2")

Here is a rendered pdf of the document of easier reading:
0027-adjoint-ode.pdf

updated version 7. March 2021

0027-adjoint-ode.pdf

update 23rd March 2021

0027-adjoint-ode.pdf

update 7th April 2021

0027-adjoint-ode.pdf

@wds15 wds15 mentioned this pull request Feb 17, 2021
5 tasks
@wds15
Copy link
Contributor Author

wds15 commented Mar 4, 2021

For anyone who wants to get his hands dirty with the running prototype of this, you need to:

  • clone cmdstan with the stan and stan-math submodule
  • cd into cmdstan/stan/lib/stan_math/
  • checkout the branch feature/adjoint-odes
  • move back to cmdstan directory
  • grab the latest stanc experimental version created by Rok as listed on the PR Feature/adjoint odes math#1905
  • Currently that means to head to https://github.com/stan-dev/stanc3/actions/runs/621697392
  • from there you grab mac-stanc (for mac) which you put into cmdstan/bin. I then copy this thing manually to bin/stanc and make sure that macos allows you to run this thing by right-clicking on the binary in the Finder while holding down the Control key. Then press "open" and from now on you can run the binary from the command line. This stuff is not needed on Linux... no idea about windows.
  • at this point it should be a matter of building cmdstan the usual way
  • Stan models can now include a ode_adjoint_tol or ode_adjoint_tol_ctl call.

An example model I used for benchmarks is in cmdstan/stan/lib/stan_math/ode-adjoint/linked-mass-flow.stan .

@bbbales2
Copy link
Member

bbbales2 commented Mar 4, 2021

@wds15 use the cmdstan tarball builder. Docs for it are hidden here

It will allow you to build a tarball that is easy to share and use in cmdstanr (see the closures tarball here)

@funko-unko
Copy link

Alright, so this is how it worked for me via @wds15 's instructions:

# Assumes that a compatible `stanc` file is located at ./stanc
# Uses 8 cores to build cmdstan
git clone --recurse-submodules https://github.com/stan-dev/cmdstan
cd cmdstan/stan/lib/stan_math
git checkout -b feature/adjoint-odes origin/feature/adjoint-odes
cd ../../../
mkdir bin
cd ..
cp stanc cmdstan/bin/stanc
cd cmdstan
chmod +x bin/stanc
make build -j8

And then this showed up, which also worked for me:
stan-dev/math#1905 (comment)

I've found this template for the design doc:
https://github.com/stan-dev/design-docs/blob/92e173efc04aaa3d3dae277dcc7b8681f1008543/0000-template.md

I guess we want to keep to that template? Is a "stan programmer" as mentioned in the template a user of stan ("programming stan models") or a developer of stan? I guess a simple example comparing forward/adjoint would be helpful both for users and developers?

Anyhow, some comments:

  • As described in the doc, the complexity of the forward mode grows quadratically, not exponentially.
  • num_checkpoints should be renamed into something like num_steps_between_checkpoints (as we talked about).
  • Increasing num_steps_between_checkpoints too much degrades performance, I suspect because the memory needed gets allocated anew at every leapfrog step? Not something that we should worry/optimize about for now I guess. Also, making this too large will of course result in failure to allocate the memory. I guess this should not really be a problem?
  • Different values of num_steps_between_checkpoints appear to lead to different results and then to (not insignificantly) different samples. Maybe something to keep in mind.
  • The interface looks fine to me.

What needs to be done?

@wds15
Copy link
Contributor Author

wds15 commented Mar 5, 2021

What we need to do in my view is

  • Find out if the simplified interface has sane defaults setup which work "OK" for many problems.
  • Gather experience with the adjoint method in problems. So find out when it is better to use adjoint is what would be good to be able to answer to some degree with some accuracy. Ideally people do that on problems they know well.
  • I came up with a scalable example - should there be other scalable examples?
  • Collect as much information as we can about making recommendations for the different knobs of the Ctl method. This will be hard to do in general, but maybe we can still distill some rules of thumb.
  • For example: I found that it is better - I think - to have greater precision (absolute tolerance being smaller) for the forward solve as for the backward solve and the quadrature stuff should have the least precision (highest absolute tolerance value).

Did I miss any option to expose for the Ctl thing?

Thanks for the other comments, will update the doc accordingly.

@funko-unko
Copy link

Hmmm, I believe at least for the interpolation method the polynomial interpolation appears to be preferred. The manual (https://computing.llnl.gov/sites/default/files/public/cvs_guide-5.7.0.pdf) states

The Hermite cubic interpolation option is present because it was implemented chronologically first and it is also used
by other adjoint solvers (e.g. daspkadjoint). The variable-degree polynomial is more memory-efficient (it requires
only half of the memory storage of the cubic Hermite interpolation) and is more accurate.

As for the number of steps between checkpoints, there probably is no way to tell a priori which number is best. 250 sounds fine I guess, most ODE problems in Stan will still be rather small, won't they? With 1k double valued unknowns, that would mean 2 MBs of RAM? The AD tape is probably larger (and also kept in memory?)?

As for the tolerances, 🤷‍♂️. We'll need a suite of testcases for that.

I do believe more and simpler (scalable) examples would be useful, not only to evaluate the use cases of the adjoint method. They could also be presented to users as examples when which methods perform well?

@wds15
Copy link
Contributor Author

wds15 commented Mar 5, 2021

I will switch to the polynomial method for the simpler method then. Maybe we even drop the option for the polynomial entirely?

For the steps between checkpoints: I tested with the example values like 10, 50, 250, 500 and found that you see speedups when going from 10 to 250, but then it got flat. In the CVODES examples they use 100s (100 or 500) as I recall. Memory wise I would not expect an issue here at all. ODE problems are small in a Bayesian context as I have seen it (or you will have to wait forever for your results).

Tolerances: I opt for making the chosen tolerance the least tolerance you get from this. So forward & backward have higher precision and the tolerance given is the one from the computed parameter gradients.

For now we need to come to the conclusion that this is all good enough to land in Stan. Maybe we mark the simple interface as "experimental" in the doc to warn people that we may change the underlying numeric details? Do you have any speicfic examples in mind before proceeding?

Tagging @charlesm93 who wanted to get his hands dirty on this.

@betanalpha
Copy link

betanalpha commented Mar 5, 2021 via email

@wds15
Copy link
Contributor Author

wds15 commented Mar 6, 2021

@betanalpha thanks for looking over it.

  • Section 1: I adapted accordingly.
  • Section 3 & 4: Will work on that.
  • Section 6: I reworded it.
  • Section 7: I adapted as suggested.

Wrt. to the suggested ode_adjoint_tol signature which sets some defaults: The thought is that users should be able to quickly try out the adjoint method. So when users already spend some thinking on their ODE problem as they use the tol flavor, then it should really be easy for users to just try adjoint. I think we should be able to setup a default which gives reasonable performance (maybe not best). This is also a big argument to encourage users to try the prototype. It's just about swapping out the function name. I would suggest to keep the simpler interface at least for getting users to test this prototype during this month... @funko-unko liked the short function call for basically those reasons given.

@wds15
Copy link
Contributor Author

wds15 commented Mar 7, 2021

@betanalpha ok, now I hopefully got your other comments straight as well. Thanks for looking into it. I am now talking of "autodiff adjoints" and "ODE adjoint states" as you suggest, which makes sense to hopefully get down the confusion level...

I suggest to now launch until Easter an experimental phase with Stan users. The goal is to collect some feedback on

  • the utility of the method
  • the utility of the tuning parameters (we can probably drop the interpolation_polynomial, for example... I think)
  • the utility of the ode_adjoint_tol call

We can collect feedback on the forum and in summarized form in the wiki. After Easter we then proceed with finishing this up and merging it to stan math during April such that it will be ready for the next May release.

Sounds like a plan?

Tagging @charlesm93, @bbbales2 , @rok-cesnovar & @funko-unko ... @yizhang-yiz

@funko-unko
Copy link

Can we maybe just slap a warning on the ode_adjoint_tol function? Something along the lines of

The other implicit tuning parameters may change in the future, the current setting is equivalent to ode_adjoint_tol_ctl(..., values of implicit parameters).

such that the user may just swap out the function call? Maybe with a notice to include this message if they post in the forums? This would discourage the use of the ode_adjoint_tol function while still making it easy to "just use" the adjoint method, and we would always know all of the tuning parameters?

PS: Clearly my opinion should have very little weight.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Mar 8, 2021

The laid out plan work for me! Maybe just mention that Easter is the 4th of April.

Longterm we might want to use tuples or even better structs (though I am not sure there is any work being done onr structs) for default arguments.

So the signature becomes:

vector[] ode_adjoint_tol_ctl(F f,
    vector y0,
    real t0, real[] times,
    (real rel_tol_f, vector abs_tol_f, real rel_tol_b, vector abs_tol_b,
    real rel_tol_q, real abs_tol_q, int max_num_steps, int num_checkpoints,
    int interpolation_polynomial, int solver_f, int solver_b)
    T1 arg1, T2 arg2, ...)

and you could call it as:

ode_adjoint_tol_ctl(udf, y0, t0, times, default_ode_adjoint_args(), ...)

where default_ode_adjoint_args() would be a built-in function that would return a tuple with the default arguments.

One could also do this:

transformed data {
  (real, vector, real, vector, real, real, int, int, int, int, int) args = default_ode_adjoint_args();
  args.7 = 500; // would set max_num_steps
}

Structs with named elements obviously works much better for this.

@betanalpha
Copy link

betanalpha commented Mar 9, 2021 via email

@betanalpha
Copy link

betanalpha commented Mar 9, 2021 via email

@rok-cesnovar
Copy link
Member

Current the compiler can not deduce where the control arguments stop and UDF args start.
For example:

vector[] ode_adjoint_tol_ctl(F f,
    vector y0, real t0, real[] times,
    real rel_tol_f, vector abs_tol_f, real rel_tol_b, vector abs_tol_b,
    real rel_tol_q, real abs_tol_q, int max_num_steps, int num_checkpoints,
    int interpolation_polynomial, int solver_f, int solver_b)
vector[] ode_adjoint_tol_ctl(F f,
    vector y0, real t0, real[] times,
    real rel_tol_f, vector abs_tol_f, real rel_tol_b, vector abs_tol_b,
    real rel_tol_q, real abs_tol_q, int max_num_steps, int num_checkpoints,
    int interpolation_polynomial, int arg1, int arg2)

are the same. We could add logic where we check what args f has and deduce based on that and add missing control args for the C++ call (C++ compiler can not do the same). But that would mean stanc3 would hold the ode default values and I am not sure I like that.

So doable but not sure its worth. I agree with "if you allow defaults people will use them and expect them to work".

@betanalpha
Copy link

betanalpha commented Mar 9, 2021 via email

@wds15
Copy link
Contributor Author

wds15 commented Mar 20, 2021

I just updated some more details.

@betanalpha did I get your comments in as you had in mind?

It would be good to get the design doc agreed so that it's ready to be merge by the end of this month.

The only points left open at this stage should be settled during the experimental phase which ends at Easter. The questions open are

  • Does the ode_adjoint_tol_ctl function have all required tuning parameters?
    • Can some parameters be dropped?
    • Are some additional ones needed?
  • Should we keep the simpler ode_adjoint_tol function signature?
    • Are the defaults sensible?

As it has been a sticking point to include the simpler ode_adjoint_tol: Is it also a possibility to include the function, but mark it in the documentation clearly that the defaults can be changed with releases? That would mimic essentially the statement about the default priors in rstanarm, which are not guaranteed to stay the same between releases.

Pinging @bbbales2 @yizhang-yiz @rok-cesnovar @funko-unko @charlesm93

@betanalpha
Copy link

Mostly small comments. I'm still not comfortable with including a simpler function in the initial release. Users will not respect any "experimental feature" warnings and they will complain about changing behavior from version to version even if the behavior is explicitly documented as not stable. Even when they don't complain they often end up using the same function without realizing the difference, which is already a huge problem with rstanarm priors.

Section 1

“severely costly to fit” -> “expensive to evaluate”. The autodiff method is independent of how hard ODE-based models are to fit.

ressources -> resources

This matters less in the design doc and more in final user facing doc, but given the heterogeneity in terminology across fields we should try to mention as many common terms as possible. For example not just “so-called forward-mode ODE method” but rather “forward-mode, direct, sensitivity”.

Section 2

“susceptible infections & removed” -> “susceptible, infected, and recovered (SIR)”

Section 3

“merely” -> “many”, although I think “more” by itself is sufficient here.

“in addition to solve M one-dimensional quadrature problems" -> "in addition to solve M one-dimensional quadrature problems that depend on the adjoint states". To harm being clear to hear to avoid any misunderstanding about the solves parallelizeable or the like.

"We need to first collect some experience before a simpler version can be made available (if that is feasible at all)." I very much agree, but then why is a simpler version being considered for the initial implementation?

Are the quadrature tolerances different for each of the M integrations?

Section 4

Use theta or p consistently to denote the parameters throughout.

g(t, y, p) will not equal l(p) for the autodiff application -- it will equal dy/dt = f(t, y, p) which allows us to propagate the autodiff adjoint information from any subsequent expression graph.

@wds15
Copy link
Contributor Author

wds15 commented Mar 22, 2021

Thanks @betanalpha !

Mostly small comments. I'm still not comfortable with including a simpler function in the initial release. Users will not respect any "experimental feature" warnings and they will complain about changing behavior from version to version even if the behavior is explicitly documented as not stable. Even when they don't complain they often end up using the same function without realizing the difference, which is already a huge problem with rstanarm priors.

I did change my mind in the process of all this. So I do think that we should supply a simplified version to make it easy for users to try this thing out (which is why there could be left-overs saying otherwise). Let's be pragmatic here and I suggest to proceed as follows: We don't provide the simpler version in the Stan language itself. Instead, we pack the simplified function call into the Stan user documentation in the form of a Stan function. ... unless another reviewer here can convince @betanalpha otherwise, of course (sounds unlikely ;) ). Is that reasonable?

Are the quadrature tolerances different for each of the M integrations?

Right now the quadrature tolerances are all the same... but I just checked, the CVODES interface allows to specify the absolute tolerance for each parameter separately (I thought that is not possible). Should we change that and make that also a vector for the absolute tolerances? That's an awful lot of stuff to specify then and it's a bit inconvenient as the length of the vector changes whenever you add or drop a parameter...even worse is that the correspondence of absolute error tolerance to actual parameter is not so transparent in that we have a variadic interface, but the absolute tolerances would correspond to a "flattened" version of the variadic parameter pack... for the sake of simplifying at least something already now, I would leave things as they are right now and keep the same absolute tolerance for all parameters. Agree?

Use theta or p consistently to denote the parameters throughout.

My intention is to keep "theta" as denoting a super-set of parameters and p is a subset of parameters needed for the ODE. I would like to keep notation as is, since this sets apart notation in stan-math (theta) and notation in the CVODES manual (p).

g(t, y, p) will not equal l(p) for the autodiff application -- it will equal dy/dt = f(t, y, p) which allows us to propagate the autodiff adjoint information from any subsequent expression graph.

Maybe not ideal as it is right now... I will have one more look, but I would like to make it as simple as possible. Possibly we just say that l(theta) is - as an example - a normal lpdf of an ODE state at time T.

@rok-cesnovar
Copy link
Member

Instead, we pack the simplified function call into the Stan user documentation in the form of a Stan function

I agree, this way the default values are out in the open in the UDF code and you also get the option to switch to adjoint ODEs "easily".

@wds15
Copy link
Contributor Author

wds15 commented Mar 23, 2021

There is just one catch with putting the function into the manual: It would be a user-defined Stan function and we do not support variadic arguments to user defined functions (or dow we?).

Anyway, we can still agree at a later time-point on the simpler function being part of the language itself... unless anyone else reviewer this has different views.

@betanalpha is that a plan?

@charlesm93
Copy link

I read through the design doc and I think overall it's very well written. The only comments I have is that I expect the adjoint method to work better than forward diff also in the small number of parameters, large number of states regime -- to be tested, I'll put something together -- and that one reference we could include is the work on Neural ODEs by Chen et al (https://arxiv.org/abs/1806.07366).

I agree with @betanalpha and @funko-unko that we should have per-parameter absolute tolerances for the quadrature step. This is consistent with other absolute tolerances we have, I don't think it's a burden to users who won't use it, and it can help us study adjoint methods for ODEs in a Bayesian context. It seems a reasonable application would be a case where we use varying absolute tolerances for the forward and backward solves. CVODEs looks at different contexts and the Julia package is fairly novel -- though I'd be interested to hear their reasoning.

In my view, the most compelling reason for not having varying absolute tolerances right now is that it's more work for the developers and it'll delay the feature's release -- Can someone confirm this? That's fair enough. We can release the version as is but the design doc should acknowledge varying absolute tolerance is a desirable extension if not a top priority. And also that design as is doesn't hinder this extension. This strikes me as an acceptable compromise.

I'd love some documentation on the benefits of scaling parameters in ODEs, some examples, and how well this works in practice!

@funko-unko
Copy link

I believe part of the problem with settling the question whether or not these parameterwise tolerances are needed is that there is no quick/easy way to estimate the impact of these tolerances, neither for us in the very general setting, nor for users with a specific model.

I believe the only way we have right now to estimate this is to just run full HMC and see how the sampler performs, which will take "forever". I think it would help a lot if it were (easily or at all) possible to explore how the solver performs for samples from the prior, i.e. what tolerances are needed for a given (estimated) final error in the density and in the gradient.

The first you can do in a slightly hacky way, but for the latter you have to jump through a thousand hoops. I strongly believe that this should be made very very (very) easy for the user.

One way to do this would be to be able to pass a csv file to fixed_param. And while we're at it, users should also be very interested in the amount of time it takes to integrate the ODE for a given set of parameters, which as we all know may vary widely across the parameter space.

@wds15
Copy link
Contributor Author

wds15 commented Apr 13, 2021

At this point it would be great to really get to the point of rejecting (option 1) or accepting (option 2) with your post. Rationale as to why a specific option has been chosen is a great bonus, of course.

The design doc already has a short discussion under "Drawbacks" about the lack of the per-parameter tolerances. Here are a few more points to this:

  1. We have not yet seen any case where the per-parameter absolute tolerances would have been needed. The empirical observation is that other communities (CVODES, Julia) do not make use of such a feature and hence the need for this feature should be driven by an actual problem which we cannot solve with the simplified approach. Moreover, I think that we should have a discussion about wether we need this feature or not only once we have proven that there is a need. Per se the default is that the feature is not needed for real-world problems for which Stan is being used for.
  2. Our current ODE solvers also do not allow for different tolerances for anything. They are all the same and no one has ever complained about it. There were discussions on extending that to by-state, but that did not happen for various reasons. In any case, all Stan programs using ODEs so far have been working nicely with just a single absolute tolerance for states and gradients. Thus, if this assumption is wrong by now, then this actually affects all our solvers and we should treat this separately from adjoint would be my view.
  3. The usability is severely complicated given one has a variadic interface. One would have to respect ordering of parameters and their storage order. That can be avoided if the user restricts to only using 1d array inputs, but still.
  4. Indeed, my own resources are limited and in addition I do personally not think that the per-parameter thing is good for users as it complicates usability a lot.
  5. The per-parameter tolerance thing can be emulated - to some extend and with some work - using scaling and shifting of parameters and states. This won't work perfect, but that is totally possible to do,
  6. Finally, the code will be written in a way so that the per-parameter tolerance thing can be added at a later point - should it be deemed useful for problems we encounter (and deem these problems to be important enough to warrant an implementation).

So far we do not even have any strong use-case for per-parameter tolerances for the parameters. Instead, we do have seen that the existing system gives users huge speedups.

@yizhang-yiz
Copy link

What's the established voting procedure on design doc and on github?

@wds15
Copy link
Contributor Author

wds15 commented Apr 14, 2021

I am not calling out for an established formal ruling here. Instead I brought up the topic at the Stan Gathering last week Thursday April 8th and it was suggested to go through a voting here in the way I started it (at least this is how I understood the meeting outcome). To me that is reasonable given that

  • we are in principle in alignment on the proposal in that we want adjoint ODE
  • we seem to have different views on a specific feature
  • I would assume that we can resolve this now with a voting procedure in a constructive and collaborative manner
  • I also assume that we can trust each other here as I don't see any reason as to why someone would intend to manipulate, cheat or do any other odd things... we do agree on the goal to get adjoint ODE in such that I would expect people to be constructive here.

I hope that makes sense to everyone involved.

@SteveBronder
Copy link
Collaborator

In the past for these we had a one week voting period where folks would use the github review process to review or request changes. If the majority is approve then we merge the design doc.

If we wanted anyone to be able to vote we could also just have a comment people could thumbs up or down. I'm fine with either but prefer the review process so that all the tallies stay at the bottom of the page and dissents show up as request review comments which are easier to note looking over a proposal's history

@yizhang-yiz
Copy link

yizhang-yiz commented Apr 14, 2021

Thanks @SteveBronder . Looks like this procedure is missing in the voting proposal discussion. I suggest we stick to the process in that proposal.

@SteveBronder
Copy link
Collaborator

I don't want to get the doc discussion off track so maybe this would be better to discuss in the general meeting Thursday. The actual review process in the design doc readme is a bit vague, which tbh I think is good actually as it lets the scope of the voting pool be decided by the reviewer(s). Whatever way people want to vote is cool and good by me

@betanalpha
Copy link

betanalpha commented Apr 15, 2021 via email

@mitzimorris
Copy link
Member

I have read through this thread and was present at the online meeting where the issues were clearly laid out and discussed by Sebastian and Charles, among others.
Sebastian's proposal is both pragmatic and promising. I vote yes.

@wds15
Copy link
Contributor Author

wds15 commented Apr 15, 2021

As per discussion today with @charlesm93 I have added a small sentence to clarify that a vector absolute tolerance for the parameters can be added later to the implementation. That is, the C++ code should be written in a way to allow for an extension should this be deemed necessary at a later point. Otherwise no change.

Great to see one vote here (even better that it is positive from my personal perspective).

@bbbales2
Copy link
Member

I vote yes. Reasons over here

@funko-unko
Copy link

For what it's worth, I also vote yes. But I do see @betanalpha's point.

@wds15
Copy link
Contributor Author

wds15 commented Apr 19, 2021

BUMP.

One more week to go for voting. So far we have 3 positive votes and no vote against it... though @betanalpha has concerns against the proposal, but I have not read a clear vote for or against it.

Tagging again interested people I am aware of:

@charlesm93 @yizhang-yiz @bob-carpenter @rok-cesnovar @SteveBronder

@rok-cesnovar
Copy link
Member

I also give a positive vote. Even though I am familiar with the underlying implementation and have worked on the Stanc3 side of things, I do have to acknowledge that I am by no means an expert in modeling with ODEs so put a lesser weight on my vote :)

Comments:

  • I agree with Micheal's comments about this being a somewhat arbitrary subset of exposed arguments, though I think the reasoning for this particular one seems reasonable to me (easier to implement, no per-parameter tolerances are currently being used in ODE solvers in Stan/CVODES/Julia, etc) so I think this is a good step 1 that does not hinder us for any future evolutions of the ODE adjoint interface.

  • This implementation does not make a per-parameter tolerances implementation any more difficult to implement in the future if we find that to be needed. The only cost is that we would have 2 Stan function signatures but the backend code would overlap so there is not much maintenance cost there. I do not think we will have to deprecate the existing one as it seems to be working for at least the problems currently used.

  • By waiting with the per-parameter tolerances implementation we could add much friendlier and less error-prone signature in the future once tuples are introduced (maybe for this release but probably 50/50 at this point) as instead of parameters we would supply a (parameter, tolerance) tuple. So the signature would be

vector[] ode_adjoint_tol_ctl(F f,
    vector y0,
    real t0, real[] times,
    (real rel_tol_f, vector abs_tol_f, real rel_tol_b, vector abs_tol_b,
    real rel_tol_q, real abs_tol_q, int max_num_steps, int num_checkpoints,
    int interpolation_polynomial, int solver_f, int solver_b)
    (T1 arg1, T1 arg_tol), (T2 arg2, T2 arg2_tol) ...)

So bottom line for me is: this seems useful (I trust the judgement of Sebastian, Charles and the rest), should not put a
significant maintenance burden on us even if the per-parameter tolerance implementation would become available.

@SteveBronder
Copy link
Collaborator

+1 I vote yes!

@charlesm93
Copy link

I also vote yes.

@wds15
Copy link
Contributor Author

wds15 commented Apr 22, 2021

BUMP.

6 positive votes and no (definite) negative vote by now.

Thanks for everyone taking the time to dive into this so far.

Voting closes on Monday 26th April 2021.

@syclik
Copy link
Member

syclik commented Apr 25, 2021

My vote: yes.

@wds15
Copy link
Contributor Author

wds15 commented Apr 26, 2021

BUMP.

Last day for voting is today!

7 positive votes and no (definite) negative vote by now.

@betanalpha
Copy link

betanalpha commented Apr 26, 2021 via email

@yizhang-yiz
Copy link

Note also that in the history of Stan we have never actually deprecated a function from the language

IMO Allowing deprecation is a healthy to do in the long term. Maybe we should have a discussion on that at some point.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Apr 26, 2021

Note also that in the history of Stan we have never actually deprecated a function from the language.

The functions are deprecated in the sense that the compiler will warn the users that a function or some syntax (˙#˙ is deprecated for comments, as is <-, etc.) is depreacted. We have not removed them as that would require a bump of the major version and when that will happen requires a bit more planning and discussion then just a typical release cycle. There is no immediate plan on when we will do that.

I have personally been slowly collecting things that will need to be removed once we bump (will open an issue somewhere at some point). Some already have an issue in one of the repos, others do not.

So far collected:

Language:

  • _log, _cdf_log, _ccdf_log suffixed functions
  • multiply_log, binomial_coefficient_log, cov_exp_quad
  • integrate_ode_* functions
  • get_lp, if_else, increment_log_prob
  • abs in favor of fabs (there is flip-flopping so this one is a maybe)
  • the old array syntax in favor of array[] (another maybe, I dont think this was decided, though it would simplify code)

Others:

  • bin/print in cmdstan
  • remove Math code associated with all deprecated functions

EDIT: Lets not derail this thread, we can discuss that elsewhere if needed. Just wanted to point out removing deprecated is in plans once we decide to bump.

@wds15
Copy link
Contributor Author

wds15 commented Apr 26, 2021

My understanding form the start was to expose a function signature which exposes a large set of tuning options - and we do have that now. In fact, I personally was for including an additional bdf variant, but in discussions over it I did concur with others that this functionality is not needed. Everything is a trade off.

There is no variadic vs per-parameter thing here, not at all. Also, I really do not see a need to deprecate at some point the proposed signature in favor for the more complicated one with per-parameter tolerances. Deprecating the simpler signature would only be needed if almost all cases we encounter require the more complicated signature - but that is really unlikely (time will tell).

Let's see it positive: We are really making progress here! Solving large ODE systems in Stan will be faster for a very large class of problems (at least for those we benchmarked).

@betanalpha
Copy link

betanalpha commented Apr 26, 2021 via email

@wds15
Copy link
Contributor Author

wds15 commented Apr 27, 2021

Voting period is over as of today 27th April 2021

Result:

  • positive votes: 7
  • negative votes: 1

Thus, the decision is that we accept this PR as is ... and merge it.

Thanks for everyone taking the time to go through the material!

The PR for this in stan-math is getting along very well.

@martinmodrak
Copy link

I don't have an opinion on the technical aspects of the design doc, but the discussion prompted me to think about culture I would personally like to see in the Stan dev community (while others may obviously differ).

I think it is important to bear in mind that everybody here shares a big portion of values - we all want Stan to be useful, reliable, usable, fast, ... We all value the effort others put into improving Stan, we all want our perspectives to be treated seriously. Disagreements like this are IMHO rarely due to some deep fundamental differences. Instead they arise partly from genuine uncertainty about what a design choice will actually entail in the real world and partly from somewhat different weights each of us puts on individual values/goals and how we evaluate the uncertainty. And I think this should be a reason for humility - some of the judgements any of us makes will be wrong. Some of our values/goals may turn out to lead to outcomes we didn't envision and don't like. Moreover whether a decision will in fact turn out to be good in practice doesn't in my experience correlate neatly with ability to convince others or to come up with good arguments. To me a consequence is that in most projects one should expect to rarely have all their requirements/expectations/... met - otherwise it would likely mean that the project only reflects the values of some of its team members.

This humility also IMHO entails that we should try to avoid looking at results of votes (or any decision) as personal wins or losses, or try to keep scores on who was right how many times, or engage in some form of "I told you so" when it turns out somebody was wrong. We share most of our goals and we all benefit if Stan works great and lose out when there are issues. And I felt it is important to remind ourselves of this from time to time. End of my two cents.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since votes were good!

@SteveBronder SteveBronder merged commit 91177e7 into stan-dev:master Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.