Add troubleshooting page

penelopeysm · penelopeysm · commit f43bb30fa6db · 2025-05-21T15:50:40.000+01:00
diff --git a/_quarto.yml b/_quarto.yml
@@ -66,6 +66,7 @@ website:
             - usage/sampler-visualisation/index.qmd
             - usage/dynamichmc/index.qmd
             - usage/external-samplers/index.qmd
+            - usage/troubleshooting/index.qmd
 
         - section: "Tutorials"
           contents:
@@ -181,17 +182,19 @@ probabilistic-pca: tutorials/11-probabilistic-pca
 gplvm: tutorials/12-gplvm
 seasonal-time-series: tutorials/13-seasonal-time-series
 using-turing-advanced: tutorials/docs-09-using-turing-advanced
-using-turing-autodiff: tutorials/docs-10-using-turing-autodiff
-using-turing-dynamichmc: tutorials/docs-11-using-turing-dynamichmc
 using-turing: tutorials/docs-12-using-turing-guide
-using-turing-performance-tips: tutorials/docs-13-using-turing-performance-tips
-using-turing-sampler-viz: tutorials/docs-15-using-turing-sampler-viz
-using-turing-external-samplers: tutorials/docs-16-using-turing-external-samplers
-using-turing-mode-estimation: tutorials/docs-17-mode-estimation
-usage-probability-interface: tutorials/usage-probability-interface
-usage-custom-distribution: tutorials/usage-custom-distribution
-usage-tracking-extra-quantities: tutorials/tracking-extra-quantities
-usage-modifying-logprob: tutorials/usage-modifying-logprob
+
+usage-automatic-differentiation: usage/automatic-differentiation
+usage-custom-distribution: usage/custom-distribution
+usage-dynamichmc: usage/dynamichmc
+usage-external-samplers: usage/external-samplers
+usage-mode-estimation: usage/mode-estimation
+usage-modifying-logprob: usage/modifying-logprob
+usage-performance-tips: usage/performance-tips
+usage-probability-interface: usage/probability-interface
+usage-sampler-visualisation: usage/sampler-visualisation
+usage-tracking-extra-quantities: usage/tracking-extra-quantities
+usage-troubleshooting: usage/troubleshooting
 
 contributing-guide: developers/contributing
 dev-model-manual: developers/compiler/model-manual
diff --git a/usage/performance-tips/index.qmd b/usage/performance-tips/index.qmd
@@ -52,7 +52,7 @@ supports several AD backends, including [ForwardDiff](https://github.com/JuliaDi
 
 For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try
 different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g.
-`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation]({{<meta using-turing-autodiff>}}) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
+`NUTS(adtype = AutoZygote())`. See [Automatic Differentiation]({{<meta usage-automatic-differentiation>}}) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
 few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra
 operations. If in doubt, it's easy to try a few different backends to see how they compare.
 
diff --git a/usage/troubleshooting/index.qmd b/usage/troubleshooting/index.qmd
@@ -0,0 +1,98 @@
+---
+title: Troubleshooting
+engine: julia
+---
+
+```{julia}
+#| echo: false
+#| output: false
+using Pkg;
+Pkg.instantiate();
+```
+
+This page collects a number of common error messages observed when using Turing, along with suggestions on how to fix them.
+
+If the suggestions here do not resolve your problem, please do feel free to [open an issue](https://github.com/TuringLang/Turing.jl/issues).
+
+```{julia}
+using Turing
+```
+
+## T0001
+
+> failed to find valid initial parameters in {N} tries. This may indicate an error with the model or AD backend...
+
+This error is seen when a Hamiltonian Monte Carlo sampler is unable to determine a valid set of initial parameters for the sampling.
+Here, 'valid' means that the log probability density of the model, as well as its gradient with respect to each parameter, is finite and not `NaN`.
+
+### `NaN` gradient
+
+One of the most common causes of this error is having a `NaN` gradient.
+To find out whether this is happening, you can evaluate the gradient manually.
+Here is an example with a model that is known to be problematic:
+
+```{julia}
+using Turing
+using DynamicPPL.TestUtils.AD: run_ad
+
+@model function t0001_bad()
+    a ~ Normal()
+    x ~ truncated(Normal(a), 0, Inf)
+end
+
+model = t0001_bad()
+adtype = AutoForwardDiff()
+result = run_ad(model, adtype; test=false, benchmark=false)
+result.grad_actual
+```
+
+(See [the DynamicPPL docs](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities) for more details on the `run_ad` function and its return type.)
+
+In this case, the `NaN` gradient is caused by the `Inf` argument to `truncated`.
+(See, e.g., [this issue on Distributions.jl](https://github.com/JuliaStats/Distributions.jl/issues/1910).)
+Here, the upper bound of `Inf` is not needed, so it can be removed:
+
+```{julia}
+@model function t0001_good()
+    a ~ Normal()
+    x ~ truncated(Normal(a); lower=0)
+end
+
+model = t0001_good()
+adtype = AutoForwardDiff()
+run_ad(model, adtype; test=false, benchmark=false).grad_actual
+```
+
+More generally, you could try using a different AD backend; if you don't know why a model is returning `NaN` gradients, feel free to open an issue.
+
+### `-Inf` log density
+
+Another cause of this error is having models with very extreme parameters.
+This example is taken from [this Turing.jl issue](https://github.com/TuringLang/Turing.jl/issues/2476):
+
+```{julia}
+@model function t0001_bad2()
+	  x ~ Exponential(100)
+	  y ~ Uniform(0, x)
+end
+model = t0001_bad2() | (y = 50.0,)
+```
+
+The problem here is that HMC attempts to find initial values for parameters inside the region of `[-2, 2]`, _after_ the parameters have been transformed to unconstrained space.
+For a distribution of `Exponential(100)`, the appropriate transformation is `log(x)` (see the [variable transformation docs]({{< meta dev-transforms-distributions >}}) for more info).
+
+Thus, HMC attempts to find initial values of `log(x)` in the region of `[-2, 2]`, which corresponds to `x` in the region of `[exp(-2), exp(2)]` = `[0.135, 7.39]`.
+However, all of these values of `x` will give rise to a zero probability density for `y` because the value of `y = 50.0` is outside the support of `Uniform(0, x)`.
+Thus, the log density of the model is `-Inf`, as can be seen with `logjoint`:
+
+```{julia}
+logjoint(model, (x = exp(-2),))
+```
+
+```{julia}
+logjoint(model, (x = exp(2),))
+```
+
+The most direct way of fixing this is to manually provide a set of initial parameters that are valid.
+For example, you can obtain a set of initial parameters with `rand(Dict, model)`, and then pass this as the `initial_params` keyword argument to `sample`.
+Otherwise, though, you may want to consider reparameterising the model to avoid such issues.