index.qmd

---
title: "Bayesian Vector Autoregressions"
author: "by Tomasz Woźniak"
email: "tomasz.wozniak@unimelb.edu.au"
title-slide-attributes:
  data-background-color: "#F500BD"
number-sections: false
format: 
  revealjs: 
    theme: [simple, theme.scss]
    slide-number: c
    transition: concave
    smaller: true
    multiplex: true
execute: 
  echo: true
---

##  {background-color="#F500BD"}

$$ $$

### Vector Autoregressions

### Three Useful Distributions

### Bayesian Estimation

### Minnesota and Dummy Observations Prior

### Bayesian Estimation for Hierarchical Prior

### Bayesian Forecasting using VARs

### US Data Analysis Using R Package bsvarSIGNs


##  {background-color="#001D31"}

![](bsvarSIGNs.png){.absolute top=30 right=275 width="500"}


## Materials {background-color="#F500BD"}

$$ $$

### Lecture Slides [as a Website](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)

### A Dedicated Reading [Woźniak (2016, AERev)]( https://doi.org/10.1111/1467-8462.12179)

### **Quarto** [document template](https://github.com/bsvars/2024-10-be24-bsvarSIGNs/blob/main/be23-AUdata.qmd) for your own Australian data forecasting

### GitHub [repo](https://github.com/bsvars/2024-10-be24-bsvarSIGNs) to reproduce the slides and results

### A [Kahoot!](https://kahoot.it/) Quiz

### Tasks


## Vector Autoregressions {background-color="#F500BD"}

## Vector Autoregressions

-   go-to models for forecasting

::: incremental
-   simple: *linear and Gaussian*
-   extendible: *featuring many variations in specification*
    -   non-normality
    -   heteroskedasticity
    -   time-varying parameters
    -   Bayesian
-   interpretable
    -   Granger causality
    -   spillovers
    -   networks
    -   structural
-   Proposed by [Sims (1980)](https://doi.org/10.2307/1912017)
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## VAR(p) Model

### Model equations.

\begin{align*}
y_t &= \mathbf{A}_1 y_{t-1} + \dots + \mathbf{A}_p y_{t-p} + \boldsymbol\mu_0 + \epsilon_t\\
\epsilon_t|Y_{t-1} &\sim iidN\left(\mathbf{0}_N,\mathbf\Sigma\right)
\end{align*} for $t=1,\dots,T$

::: fragment
### Notation.

-   $y_t$ is an $N\times 1$ vector of observations at time $t$
-   $\mathbf{A}_i$ - $N\times N$ matrix of autoregressive slope parameters
-   $\boldsymbol\mu_0$ - $N\times 1$ vector of constant terms
-   $\epsilon_t$ - $N\times 1$ vector of error terms - a multivariate white noise process
-   $Y_{t-1}$ - information set collecting observations on} $y$ up to time $t-1$
-   $\mathbf\Sigma$ - $N\times N$ covariance matrix of the error term
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## A Bivariate VAR(2) Model

Let the number of variable $N=2$ and the lag order $p=2$.

Then, the model equation is:

```{=tex}
\begin{align*} 
\begin{bmatrix} y_{1.t} \\ y_{2.t} \end{bmatrix}
&= \begin{bmatrix} \mathbf{A}_{1.11} & \mathbf{A}_{1.12} \\ \mathbf{A}_{1.21} & \mathbf{A}_{1.22} \end{bmatrix}  \begin{bmatrix} y_{1.t-1} \\ y_{2.t-1} \end{bmatrix}
+ \begin{bmatrix} \mathbf{A}_{2.11} & \mathbf{A}_{2.12} \\ \mathbf{A}_{2.21} & \mathbf{A}_{2.22}   \end{bmatrix} \begin{bmatrix} y_{1.t-2} \\ y_{2.t-2} \end{bmatrix} + \begin{bmatrix} \boldsymbol\mu_{0.1} \\ \boldsymbol\mu_{0.2} \end{bmatrix} + \begin{bmatrix} \epsilon_{1.t} \\ \epsilon_{2.t}  \end{bmatrix}\\[2ex]
\begin{bmatrix} \epsilon_{1.t} \\ \epsilon_{2.t}  \end{bmatrix} &\Big|Y_{t-1} \sim iid N_2\left( \begin{bmatrix} 0\\ 0\end{bmatrix}, \begin{bmatrix}\boldsymbol\sigma_1^2 & \boldsymbol\sigma_{12} \\ \boldsymbol\sigma_{12} & \boldsymbol\sigma_2^2\end{bmatrix} \right)
\end{align*}
```
. . .

### Task:

Perform the matrix multiplications and write out the equations for $y_{1.t}$.

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Three Useful Distributions {background-color="#F500BD"}

## Matrix-Variate Normal Distribution

A $K\times N$ matrix $\mathbf{A}$ is said to follow a *matrix-variate normal* distribution: $$ \mathbf{A} \sim MN_{K\times N}\left( M, Q, P \right), $$ where

-   $M$ - a $K\times N$ matrix of the mean
-   $Q$ - a $N\times N$ row-specific covariance matrix
-   $P$ - a $K\times K$ column-specific covariance matrix

if $\text{vec}(\mathbf{A})$ is multivariate normal: $$ \text{vec}(\mathbf{A}) \sim N_{KN}\left( \text{vec}(M), Q\otimes P \right) $$

### Density function.

```{=tex}
\begin{align*}
MN_{K\times N}\left( M, Q, P \right) &\propto \exp\left\{ -\frac{1}{2}\text{tr}\left[ Q^{-1}(\mathbf{A}-M)'P^{-1}(\mathbf{A}-M) \right] \right\}
\end{align*}
```
-   $\text{tr}()$ is a trace of a matrix - a sum of diagonal elements

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Inverse Wishart Distribution

An $N\times N$ square symmetric and positive definite matrix $\mathbf\Sigma$ follows an *inverse Wishart* distribution: $$ \mathbf\Sigma \sim IW_{N}\left( S, \nu \right) $$ where

-   $S$ is $N\times N$ positive definite symmetric matrix called the scale matrix
-   $\nu \geq N$ denotes degrees of freedom, if its density is given by:

### Density function.

```{=tex}
\begin{align*}
IW_{N}\left( S, \nu \right) \propto \text{det}(\mathbf\Sigma)^{-\frac{\nu+N+1}{2}}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf\Sigma^{-1} S \right] \right\}
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Normal-Inverse Wishart Distribution

```{=tex}
\begin{align*}
\mathbf{A}|\mathbf\Sigma &\sim MN_{K\times N}\left( M, \mathbf\Sigma, P \right)\\
\mathbf\Sigma &\sim IW_{N}\left( S, \nu \right)
\end{align*}
```
then the joint distribution of $(\mathbf{A},\mathbf\Sigma)$ is *normal-inverse Wishart* $$
p(\mathbf{A},\mathbf\Sigma) = NIW_{K\times N}\left( M,P,S,\nu\right)
$$

### Density function.

```{=tex}
\begin{align*}
NIW_{K\times N}\left( M,P,S,\nu\right) \propto &\text{det}(\mathbf{\Sigma})^{-(\nu+N+K+1)/2}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1} S \right] \right\}\\ 
&\times\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1} (\mathbf{A}-M)'P^{-1}(\mathbf{A}-M) \right] \right\}
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Example: Error Term Distribution

The model assumptions state: \begin{align*}
\epsilon_t|Y_{t-1} &\sim iidN_N\left(\mathbf{0}_N,\mathbf\Sigma\right)
\end{align*}

Collect error term vectors in a $T\times N$ matrix: $$\underset{(T\times N)}{E}= \begin{bmatrix}\epsilon_1 & \epsilon_2 & \dots & \epsilon_{T}\end{bmatrix}'$$

Error term matrix is matrix-variate distributed: \begin{align*} 
E|X &\sim MN_{T\times N}\left(\mathbf{0}_{T\times N},\mathbf\Sigma, I_T\right)
\end{align*}

::: {.fragment .fade-in}
### Tasks: what is

-   the covariance of $\text{vec}(E)$
-   the distribution of the first equation error terms $\begin{bmatrix}\epsilon_{1.1} &\dots&\epsilon_{1.T}\end{bmatrix}'$
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Example: Univariate *Inverse Wishart* Distribution

The *inverse Wishart* density function is proportional to: \begin{align*}
\text{det}(\mathbf\Sigma)^{-\frac{\nu+N+1}{2}}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf\Sigma^{-1} S \right] \right\}
\end{align*}

Consider a case where:

-   $N=1$
-   the matrix $\mathbf\Sigma$ is replaced by a scalar $\boldsymbol\sigma^2$

::: {.fragment .fade-in}
### Task:

-   write out the kernel of the density function for $\boldsymbol\sigma^2$
-   the kernel of what density it represents?
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Bayesian Estimation {background-color="#F500BD"}

## The model in Matrix Notation

### VAR(p) model.

```{=tex}
\begin{align*}
y_t &= \mathbf{A}_1 y_{t-1} + \dots + \mathbf{A}_p y_{t-p} + \boldsymbol\mu_0 + \epsilon_t\\
\epsilon_t|Y_{t-1} &\sim iidN_N\left(\mathbf{0}_N,\mathbf\Sigma\right)
\end{align*}
```
### Matrix notation.

```{=tex}
\begin{align*} 
Y &= X\mathbf{A} + E\\
E|X &\sim MN_{T\times N}\left(\mathbf{0}_{T\times N},\mathbf\Sigma, I_T\right)
\end{align*}
```
::: {.fragment .fade-out}
$$ 
\underset{(K\times N)}{\mathbf{A}}=\begin{bmatrix} \mathbf{A}_1'\\ \vdots \\ \mathbf{A}_p' \\ \boldsymbol\mu_0' \end{bmatrix} \quad
\underset{(T\times N)}{Y}= \begin{bmatrix}y_1' \\ y_2'\\ \vdots \\ y_T'\end{bmatrix} \quad
\underset{(K\times1)}{x_t}=\begin{bmatrix} y_{t-1}\\ \vdots \\ y_{t-p}\\ 1 \end{bmatrix}\quad
\underset{(T\times K)}{X}= \begin{bmatrix}x_1' \\ x_2' \\ \vdots \\ x_{T}'\end{bmatrix} \quad
\underset{(T\times N)}{E}= \begin{bmatrix}\epsilon_1' \\ \epsilon_2' \\ \vdots \\ \epsilon_{T}'\end{bmatrix}
$$ where $K=pN+1$
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## The model as Predictive Density

### VAR model.

```{=tex}
\begin{align*} 
Y &= X\mathbf{A} +E\\
E|X &\sim MN_{T\times N}\left(\mathbf{0}_{T\times N},\mathbf\Sigma, I_T\right)
\end{align*}
```
### Predictive density.

```{=tex}
\begin{align*} 
Y|X,\mathbf{A}, \mathbf{\Sigma} &\sim MN_{T\times N}\left(X\mathbf{A},\mathbf{\Sigma},I_T\right)
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Likelihood Function

### Predictive density.

```{=tex}
\begin{align*} 
Y|X,\mathbf{A}, \mathbf{\Sigma} &\sim MN_{T\times N}\left(X\mathbf{A},\mathbf{\Sigma},I_T\right)
\end{align*}
```
### Likelihood function.

```{=tex}
\begin{align*}
L\left(\mathbf{A},\mathbf{\Sigma}|Y,X\right)&\propto\text{det}(\mathbf{\Sigma})^{-\frac{T}{2}}\exp\left\{-\frac{1}{2}\text{tr}\left[\mathbf{\Sigma}^{-1}(Y-X\mathbf{A})'(Y-X\mathbf{A})\right]\right\}
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Likelihood Function as NIW

Define the MLE: $\widehat{A}=(X'X)^{-1}X'Y$

Perform simple transformation of the likelihood

```{=tex}
\begin{align*}
L\left(\mathbf{A},\mathbf{\Sigma}|Y,X\right)&\propto\text{det}(\mathbf{\Sigma})^{-\frac{T}{2}}\exp\left\{-\frac{1}{2}\text{tr}\left[\mathbf{\Sigma}^{-1}(Y-X\mathbf{A})'(Y-X\mathbf{A})\right]\right\}\\
&=\text{det}(\mathbf{\Sigma})^{-\frac{T}{2}}\\
&\quad\times\exp\left\{ -\frac{1}{2}\text{tr}\left[\mathbf{\Sigma}^{-1}(\mathbf{A}-\widehat{A})'X'X(\mathbf{A}-\widehat{A}) \right] \right\}\\
&\quad\times \exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1}(Y-X\widehat{A})'(Y-X\widehat{A}) \right] \right\}
\end{align*}
```
Under the likelihood, $(\mathbf{A},\mathbf{\Sigma})$ are *normal-inverse Wishart* distributed:

```{=tex}
\begin{align*}
L\left( \mathbf{A},\mathbf{\Sigma}|Y,X \right) &= NIW_{K\times N}\left(\widehat{A},(X'X)^{-1},(Y-X\widehat{A})'(Y-X\widehat{A}), T-N-K-1 \right)
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Prior Distribution

### Construction.

A natural-conjugate prior leads to joint posterior distribution for $(\mathbf{A},\mathbf{\Sigma})$ of the same form \begin{align*} 
p\left( \mathbf{A}, \mathbf{\Sigma} \right) &= p\left( \mathbf{A}| \mathbf{\Sigma} \right)p\left( \mathbf{\Sigma} \right)\\
\mathbf{A}|\mathbf{\Sigma} &\sim MN_{K\times N}\left( \underline{A},\mathbf{\Sigma},\underline{V} \right)\\
\mathbf{\Sigma} &\sim IW_N\left( \underline{S}, \underline{\nu} \right)
\end{align*}

### Kernel.

```{=tex}
\begin{align*} 
p\left( \mathbf{A},\mathbf{\Sigma} \right) 
&\propto \text{det}(\mathbf{\Sigma})^{-\frac{N+K+\underline{\nu}+1}{2}}\\
&\quad\times\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1}(\mathbf{A}-\underline{A})'\underline{V}^{-1}(\mathbf{A}-\underline{A}) \right] \right\}\\
&\quad\times \exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1}\underline{S} \right] \right\}
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Posterior Distribution

### Bayes Rule.

```{=tex}
\begin{align*} 
p\left( \mathbf{A}, \mathbf{\Sigma}|Y,X \right) &\propto L(\mathbf{A},\mathbf{\Sigma}|Y,X)p\left( \mathbf{A}, \mathbf{\Sigma} \right)\\
&= L(\mathbf{A},\mathbf{\Sigma}|Y,X)p\left( \mathbf{A}| \mathbf{\Sigma} \right)p\left( \mathbf{\Sigma} \right)
\end{align*}
```
### Kernel.

```{=tex}
\begin{align*} 
p\left( \mathbf{A},\mathbf{\Sigma} |Y,X\right) 
&\propto  \text{det}(\mathbf{\Sigma})^{-\frac{T}{2}}\exp\left\{-\frac{1}{2}\text{tr}\left[\mathbf{\Sigma}^{-1}(Y-X\mathbf{A})'(Y-X\mathbf{A})\right]\right\}\\
& \quad\times\text{det}(\mathbf{\Sigma})^{-\frac{N+K+\underline{\nu}+1}{2}}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1}\underline{S} \right] \right\}\\
&\quad\times\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf{\Sigma}^{-1}(\mathbf{A}-\underline{A})'\underline{V}^{-1}(\mathbf{A}-\underline{A}) \right] \right\}
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Joint Posterior Distribution

### Conditional and marginal.

```{=tex}
\begin{align*} 
p\left( \mathbf{A}, \mathbf{\Sigma}|Y,X \right) &= p(\mathbf{A}|Y,X,\mathbf{\Sigma})p\left( \mathbf{\Sigma}|Y,X \right)\\[2ex]
p(\mathbf{A}|Y,X,\mathbf{\Sigma}) &= MN_{K\times N}\left( \overline{A},\mathbf{\Sigma},\overline{V} \right)\\
p(\mathbf{\Sigma}|Y,X) &= IW_N\left( \overline{S}, \overline{\nu} \right)
\end{align*}
```
### Posterior parameters.

```{=tex}
\begin{align*}
\overline{V}&= \left( X'X + \underline{V}^{-1}\right)^{-1} \\
\overline{A}&= \overline{V}\left( X'Y + \underline{V}^{-1}\underline{A} \right)\\
\overline{\nu}&= T+\underline{\nu}\\
\overline{S}&= \underline{S}+Y'Y + \underline{A}'\underline{V}^{-1}\underline{A} - \overline{A}'\overline{V}^{-1}\overline{A}
\end{align*}
```
::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Joint Posterior Distribution

### Task.

- Derive the full conditional posterior distribution of $\mathbf{A}$ given $\mathbf{\Sigma}$ and data $Y$ and $X$, denoted by $p(\mathbf{A}|Y,X,\mathbf{\Sigma})$

$$ $$

Hint: Proceed by:

- writing out the kernel of this distribution as a product of the likelihood and the prior,
- collecting all the terms within the exponential function and under the trace operator,
- completing the squares,
- applying the conditioning whenever convenient.

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Posterior Mean of $\mathbf{A}$

Posterior mean of matrix $\mathbf{A}$ is: \begin{align*}
\overline{A} &= \overline{V}\left( X'Y + \underline{V}^{-1}\underline{A} \right)\\[2ex] 
&= \overline{V}\left( X'X\widehat{A} + \underline{V}^{-1}\underline{A} \right)\\[2ex]
&= \overline{V} X'X\widehat{A} + \overline{V}\underline{V}^{-1}\underline{A} 
\end{align*} <font color="F500BD">a linear combination</font> of the MLE $\widehat{A}$ and the prior mean $\underline{A}$

Note that: $$
\overline{V} X'X + \overline{V}\underline{V}^{-1} = \overline{V} ( X'X + \underline{V}^{-1}) = I_K
$$

[Play with the posterior in an interactive graph](https://rpsychologist.com/d3/bayes/){preview-link="true"}

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Marginal Data Density

According to *Bayes Rule*, the kernel of the posterior is normalised by the *Marginal Data Density* $p(data)$:

$$
p\left( \mathbf{A}, \mathbf{\Sigma}| data \right) = \frac{L(\mathbf{A},\mathbf{\Sigma}| data)p\left( \mathbf{A}, \mathbf{\Sigma} \right)}{p(data)}
$$

For *Bayesian VARs* the posterior is known $$
p\left( \mathbf{A}, \mathbf{\Sigma}| data \right) = MNIW\left(\overline{A},\overline{V}, \overline{S}, \overline{\nu} \right)
$$

and so is the analytical formula for the *MDD*: $$p(data)$$

This can be used to our advantage!

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Minnesota and Dummy Observations Prior {background-color="#F500BD"}

## Minnesota Prior

[Sims, Litterman, Doan (1984)](https://doi.org/10.1080/07474938408800053) proposed an interpretable way of setting the hyper-parameters on the NIW prior $\underline{A}$, $\underline{V}$, $\underline{S}$, and $\underline{\nu}$ for macroeconomic data.

. . .

$$    $$ The prior reflects the following stylised facts about macro time series:

-   the data are unit-root non-stationary
-   the effect of more lagged variables should be smaller and smaller
-   the effect of other variables lags should be less than that of own lags

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Minnesota Prior

### Inverse-Wishart prior.

```{=tex}
\begin{align*}
\mathbf{\Sigma} &\sim IW_N\left( \underline{S}, \underline{\nu} \right)
\end{align*}
```
Set

```{=tex}
\begin{align*}
\underline{S} &= \begin{bmatrix}
\psi_1 &0 &\dots & 0 \\
0 & \psi_2 &\dots & 0\\
\vdots &\vdots&\ddots& \vdots\\
0&0&\dots&\psi_N
\end{bmatrix}\\[2ex]
\underline{\nu}&= N+2 
\end{align*}
```
### Hyper-parameters.

$\psi =(\psi_1, \dots, \psi_N)$ have to be chosen (or estimated)

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Minnesota Prior

### Matrix-Variate Normal prior.

```{=tex}
\begin{align*}
\mathbf{A}|\mathbf{\Sigma} &\sim MN_{K\times N}\left( \underline{A},\mathbf{\Sigma},\underline{V} \right)
\end{align*}
```
For $\quad l = 1+\text{floor}((i-1)/N) \quad\text{and }\quad k = i - (l-1)N$, set:
\begin{align*}
\underline{A} &= \begin{bmatrix} I_N \\ \mathbf{0}_{((p-1)N +1)\times N}\end{bmatrix}&
\underline{V}_{ij} &= \left\{\begin{array} (\lambda ^ 2 / (\psi_k l^2) &\text{ for }i=j,\text{ and } i\neq pN+1 \\
\lambda^2 &\text{ for }i=j,\text{ and } i= pN+1 \\
0&\text{ for } i\neq j
\end{array}\right.
\end{align*}

### Hyper-parameters.

$\lambda^2$ has to be chosen (or estimated)

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Minnesota Prior

### Task.

Consider a simple case of a model for $N=2$ observations and with $p=1$ lag.

- write out the $3\times2$ matrix $\underline{A}$ and the $3\times3$ matrix $\underline{V}$ for the Minnesota prior determining every of its elements.

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Dummy Observations Prior

### Idea.

1.  Generate artificial data matrices with $T_d$ rows $Y^*$ and $X^*$
2.  Append them to the original data matrices $Y$ and $X$ respectively.

::: fragment
### Implied prior distribution.

Use Bayes Rule to derive the joint prior of $(\mathbf{A},\mathbf\Sigma)$ given $Y^*$ and $X^*$.

It is given by the MNIW distribution:

\begin{align*} 
p\left( \mathbf{A}, \mathbf{\Sigma}|Y^*,X^* \right) &= MNIV_{K\times N}\left( \underline{A}^*,\underline{V}^*, \underline{S}^*, \underline{\nu}^* \right)
\end{align*} \begin{align*} 
\underline{V}^*&= \left( X^{*\prime}X^* + \underline{V}^{-1}\right)^{-1} & \underline{A}^*&= \underline{V}^*\left( X^{*\prime}Y^* + \underline{V}^{-1}\underline{A} \right)\\
\underline{\nu}^*&= T_d+\underline{\nu} & \underline{S}^*&= \underline{S}+Y^{*\prime}Y^* + \underline{A}'\underline{V}^{-1}\underline{A} - \underline{A}^{*\prime}\underline{V}^{*-1}\underline{A}^*
\end{align*}
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Dummy Observations Prior

Let a $p\times N$ matrix $Y_0$ denote the initial observations, that is, the first $p$ observations of the available time series.

Let an $N$-vector $\bar{Y}_0$ denote its columns' means.

::: fragment
### Sum-of-coefficients prior.

Generate additional $N$ rows by $$
Y^+ = \text{diag}\left(\frac{\bar{Y}_0}{\mu}\right) \quad\text{ and }\quad X^+ = \begin{bmatrix}Y^+ & \dots & Y^+ & \mathbf{0}_N \end{bmatrix}
$$

-   $\mu$ is a hyper-parameter to be chosen (or estimated)
-   if $\mu \rightarrow 0$ the prior implies the presence of a unit root in each equation and rules out cointegration
-   if $\mu \rightarrow\infty$ the prior becomes uninformative
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Dummy Observations Prior

### Dummy-initial-observation prior.

Generate an additional row by $$
Y^{++} = \frac{\bar{Y}_0'}{\delta} \quad\text{ and }\quad X^{++} = \begin{bmatrix} Y^{++} & \dots & Y^{++} & \frac{1}{\delta} \end{bmatrix}
$$

-   hyper-parameter $\delta$ is to be chosen (or estimated)
-   if $\delta \rightarrow 0$ all the variables of the VAR are forced to be at their unconditional mean, or the system is characterized by the presence of an unspecified number of unit roots without drift (cointegration)
-   if $\delta \rightarrow\infty$ the prior becomes uninformative

::: fragment
### Combining dummy observations.

$$
Y^* = \begin{bmatrix}Y^+ \\ Y^{++} \end{bmatrix}\quad\text{ and }\quad
X^* = \begin{bmatrix}X^+ \\ X^{++} \end{bmatrix}
$$
:::

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Dummy Observations Prior

### Task.

Suppose that:

- $\bar{Y}_0 = \begin{bmatrix}1&2\end{bmatrix}'$
- $\mu = 0.5$
- $\delta = 3$
- $p = 1$

Write out the matrices $Y^*$ and $X^*$ of dimensions $2\times 3$ and $3\times 3$ respectively.


::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Bayesian Estimation for Hierarchical Prior {background-color="#F500BD"}

## Bayesian Estimation for Hierarchical Prior

Hyper-parameters $\psi$, $\lambda$, $\mu$ and $\delta$ can be fixed to values chosen by the econometrician.

### Hierarcical prior.

A better idea is to assume priors for these hyper-parameters and estimate them as in [Giannone, Lenza, Primiceri (2015)](https://doi.org/10.1162/REST_a_00483).

Extend the existing prior to: \begin{align*} 
p\left( \mathbf{A}, \mathbf{\Sigma}|Y^*,X^*,\psi,\lambda,\mu,\delta \right) &= MNIV_{K\times N}\left( \underline{A}^*,\underline{V}^*, \underline{S}^*, \underline{\nu}^* \right)
\end{align*} And specify: \begin{align*} 
\psi_n &\sim IG\left(0.02^2, 0.02^2\right)\\
\lambda &\sim G\left(0.2,2\right)\\
\mu &\sim G\left(1,2\right)\\
\delta &\sim G\left(1,2\right)
\end{align*}

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Bayesian Estimation for Hierarchical Prior

[Giannone, Lenza, Primiceri (2015)](https://doi.org/10.1162/REST_a_00483) propose the following estimation procedure:

#### Step 1: Estimate $(\psi,\lambda,\mu,\delta)$ using a random-walk Metropolis-Hastings sampler

-   Sample these hyper-parameters marginally on $(\mathbf{A},\mathbf\Sigma)$
-   extend the conditioning of *Marginal Data Density*: $$ p(data|\psi,\lambda,\mu,\delta)$$
-   apply Bayes Rule to obtain the kernel of the posterior:

$$ p(\psi,\lambda,\mu,\delta|data) \propto p(\psi,\lambda,\mu,\delta)p(data|\psi,\lambda,\mu,\delta)$$ - Use an $(N+3)$-variate Student-t distribution as the candidate generating density

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

## Bayesian Estimation for Hierarchical Prior

#### Step 2: For each draw of $(\psi,\lambda,\mu,\delta)$ sample the corresponding draw of $(\mathbf{A},\mathbf{\Sigma})$

Use the MNIW posterior derived for the implied prior:

```{=tex}
\begin{align*} 
p\left( \mathbf{A}, \mathbf{\Sigma}|Y,X, Y^*,X^* \right) &= MNIW_{K\times N}\left( \overline{A}^*,\overline{V}^*,\overline{S}^*, \overline{\nu}^* \right)\\[2ex]
\overline{V}^*&= \left( X'X + \underline{V}^{*-1}\right)^{-1} \\
\overline{A}^*&= \overline{V}^*\left( X'Y + \underline{V}^{*-1}\underline{A}^* \right)\\
\overline{\nu}^*&= T+\underline{\nu}^*\\
\overline{S}^*&= \underline{S}^*+Y'Y + \underline{A}^{*\prime}\underline{V}^{*-1}\underline{A}^* - \overline{A}^{*\prime}\overline{V}^{*-1}\overline{A}^*
\end{align*}
```
### R implementation.

Package [**bsvarSIGNs**](https://cran.r-project.org/package=bsvarSIGNs) by [Wang & Woźniak (2024)](https://doi.org/10.32614/CRAN.package.bsvarSIGNs) implements this algorithm.

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::


## Bayesian Forecasting using VARs {background-color="#F500BD"}


## The objective of economic forecasting

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

$\left.\right.$

... is to use the available data to provide a statistical characterisation of the unknown future values of quantities of interest.

$\left.\right.$

The full statistical characterisation of the unknown future values of random variables is given by their *predictive density*.

$\left.\right.$

Simplified outcomes in a form of statistics summarising the predictive densities are usually used in decision-making processes.

$\left.\right.$

Summary statistics are also communicated to general audiences.


## One-Period Ahead Predictive Density

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

### VAR(p) model.
\begin{align*}
y_t &= \mathbf{A}_1 y_{t-1} + \dots + \mathbf{A}_p y_{t-p} + \boldsymbol\mu_0 + \epsilon_t\\[2ex]
\epsilon_t|Y_{t-1} &\sim iidN_N\left(\mathbf{0}_N,\mathbf\Sigma\right)\\
&
\end{align*}

### One-Period Ahead Conditional Predictive Density

... is implied by the model formulation:
\begin{align*}
y_{t+h}|Y_{t+h-1},\mathbf{A},\mathbf\Sigma &\sim N_N\left(\mathbf{A}_1 y_{t+h-1} + \dots + \mathbf{A}_p y_{t+h-p} + \boldsymbol\mu_0,\mathbf\Sigma\right)
\end{align*}


## One-Period Ahead Predictive Density

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

$\left.\right.$

Bayesian forecasting takes into account the uncertainty w.r.t. parameter estimation by integrating it out from the predictive density.


\begin{align*}
&\\
p(y_{T+1}|Y,X) &= \int p(y_{T+1}|Y_{T},\mathbf{A},\mathbf\Sigma) p(\mathbf{A},\mathbf\Sigma|Y,X) d(\mathbf{A},\mathbf\Sigma)\\ &
\end{align*}

- $p(y_{T+1}|Y,X)$ - predictive density
- $p(y_{T+1}|Y_{t},\mathbf{A},\mathbf\Sigma)$ - one-period-ahead conditional predictive density
- $p(\mathbf{A},\mathbf\Sigma|Y,X)$ - marginal posterior distribution


## Sampling from One-Period Ahead Predictive Density

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

$\left.\right.$

#### Step 1: Sample from the posterior 

... and obtain $S$ draws $\left\{ \mathbf{A}^{(s)},\mathbf\Sigma^{(s)} \right\}_{s=1}^{S}$

$\left.\right.$

#### Step 2: Sample from the predictive density

In order to obtain draws from $p(y_{T+1}|Y,X)$, for each of the $S$ draws of $(\mathbf{A},\mathbf\Sigma)$ sample the corresponding draw of $y_{T+1}$:

Sample $y_{T+1}^{(s)}$ from 
$$ 
N_N\left(\mathbf{A}_1^{(s)} y_{T} + \dots + \mathbf{A}_p^{(s)} y_{T-p+1} + \boldsymbol\mu_0^{(s)},\mathbf\Sigma^{(s)}\right)
$$
and obtain $\left\{y_{T+1}^{(s)}\right\}_{s=1}^{S}$


## $h$-Period Ahead Predictive Density

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

$\left.\right.$

This procedure can be generalised to any forecasting horizon. 

This is an illustration for $h=2$.

\begin{align*}
&\\
p(y_{T+2},y_{T+1}|Y,X) 
&= \int p(y_{T+2},y_{T+1}|Y_{T},\mathbf{A},\mathbf\Sigma) p(\mathbf{A},\mathbf\Sigma|Y,X) d(\mathbf{A},\mathbf\Sigma)\\[1ex]
&= \int p(y_{T+2}|y_{T+1},Y_{T},\mathbf{A},\mathbf\Sigma)p(y_{T+1}|Y_{T},\mathbf{A},\mathbf\Sigma) p(\mathbf{A},\mathbf\Sigma|Y,X) d(\mathbf{A},\mathbf\Sigma)\\ &
\end{align*}


## $h$-Period Ahead Predictive Density

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

$\left.\right.$

#### Step 1: Sample from the posterior 

... and obtain $S$ draws $\left\{ \mathbf{A}^{(s)},\mathbf\Sigma^{(s)} \right\}_{s=1}^{S}$

#### Step 2: Sample from 1-period ahead predictive density

For each of the $S$ draws, sample $y_{T+1}^{(s)}$ from 
$$ 
N_N\left(\mathbf{A}_1^{(s)} y_{T} + \dots + \mathbf{A}_p^{(s)} y_{T-p+1} + \boldsymbol\mu_0^{(s)},\mathbf\Sigma^{(s)}\right)
$$

#### Step 3: Sample from 2-period ahead predictive density

For each of the $S$ draws, sample $y_{T+2}^{(s)}$ from 
$$ 
N_N\left(\mathbf{A}_1^{(s)} y_{T+1}^{(s)} + \mathbf{A}_2 y_{T} + \dots + \mathbf{A}_p^{(s)} y_{T-p+2} + \boldsymbol\mu_0^{(s)},\mathbf\Sigma^{(s)}\right)
$$

and obtain $\left\{y_{T+2}^{(s)},y_{T+1}^{(s)}\right\}_{s=1}^{S}$


## US Data Analysis Using R Package bsvarSIGNs {background-color="#F500BD"}

## Data 

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: us-data
#| warning: false
#| fig-align: "center"
#| fig-height: 11
#| output-location: column
#| cache: true

# upload package
library(bsvarSIGNs)

# upload data
data(optimism)

# plot data
plot.ts(
  optimism[,c(1,2,5)], 
  main = "",
  col = "#F500BD",
  lwd = 4,
  cex.axis = 2,
  cex.lab = 2
)
```


## Setup and hyper-parameters estimation

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: priors
#| warning: false
#| messages: false
#| cache: true

# set seed
set.seed(123)

# MCMC settings
burn = 1000
S    = 10000

# specify the model
spec = specify_bsvarSIGN$new(
  optimism[,c(1,2,5)],
  p        = 4,
)

# estimate all hyper-parameters
spec$prior$estimate_hyper(
  S = S + burn, 
  burn_in = burn, 
  mu = TRUE, 
  delta = TRUE, 
  lambda = TRUE, 
  psi = TRUE
)
```


## Setup and hyper-parameters estimation

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: prior_plot
#| warning: false
#| messages: false
#| cache: true
#| fig-align: "center"
#| fig-height: 11
#| output-location: column

# assign hyper-parameter names to estimation output
rownames(spec$prior$hyper) = 
  c("mu", 
    "delta", 
    "lambda", 
    paste0("psi", 1:3)
  )

# trace plot
plot.ts(
  t(spec$prior$hyper), 
  main = "Hyper-parameter trace plot",
  col = "#F500BD", 
  bty = "n",
  nc = 1,
  cex.axis = 2,
  cex.lab = 2,
  cex.main = 2
)
```


## Estimation, Forecasting and Visualisation

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: estimation
#| cache: true
#| fig-align: "center"
#| fig-height: 11
#| output-location: column

# estimate the model
post = estimate(
  spec, 
  S = S,
  show_progress = FALSE
)

# forecast
fore = forecast(post, horizon = 8)

# plot the forecasts
plot(
  fore, 
  data_in_plot = 0.5, 
  probability = 0.68, 
  col = "#F500BD"
)

```


## Forecasting

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: 3D-gdp
#| warning: false
#| fig-align: "center"
#| output-location: slide
#| fig-height: 7
#| cache: true

forecasts_prod = fore$forecasts[1,,]
h           = dim(forecasts_prod)[1]


limits.1    = range(forecasts_prod)

point.f     = apply(forecasts_prod,1,mean)
interval.f  = apply(forecasts_prod,1,HDInterval::hdi,credMass=0.90)

x           = seq(from=limits.1[1], to=limits.1[2], length.out=100)
z           = matrix(NA,h,99)
for (i in 1:h){
  z[i,]     = hist(forecasts_prod[i,], breaks=x, plot=FALSE)$density
}
x           = hist(forecasts_prod[i,], breaks=x, plot=FALSE)$mids
yy          = 1:h
z           = t(z)

library(plot3D)
theta = 150
phi   = 15.5
f4    = persp3D(x=x, y=yy, z=z, phi=phi, theta=theta, xlab="\nproductivity[t+h|t]", ylab="h", zlab="\npredictive densities of productivity", shade=NA, border=NA, ticktype="detailed", nticks=3,cex.lab=1, col=NA,plot=FALSE)

perspbox (x=x, y=yy, z=z, bty="f", col.axis="black", phi=phi, theta=theta, xlab="\nproductivity[t+h|t]", ylab="h", zlab="\npredictive densities of productivity", ticktype="detailed", nticks=3,cex.lab=1, col = NULL, plot = TRUE)

polygon3D(x=c(interval.f[1,],interval.f[2,h:1]), y=c(1:h,h:1), z=rep(0,2*h), col = "#F500BD", NAcol = "white", border = NA, add = TRUE, plot = TRUE)

for (i in 1:h){
  f4.l = trans3d(x=x, y=yy[i], z=z[,i], pmat=f4)
  lines(f4.l, lwd=0.9, col="black")
}
f4.l1 = trans3d(x=point.f, y=yy, z=0, pmat=f4)
lines(f4.l1, lwd=2, col="black")
```


## MCMC convergence for $\mathbf\Sigma_{1\cdot}$

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: mcmc-si
#| warning: false
#| fig-align: "center"
#| out.width: "100%"
#| cache: true

plot.ts(t(post$posterior$Sigma[1,,]), main = "", col = "#F500BD", xlab = "s", cex.lab = 1.3, cex.axis = 1.3, bty = "n")
```

## MCMC convergence for $\boldsymbol\mu_0$

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: mcmc-mu
#| warning: false
#| fig-align: "center"
#| out.width: "100%"
#| cache: true

plot.ts(t(post$posterior$A[,13,]), main = "", col = "#F500BD", xlab = "s", cex.lab = 1.3, cex.axis = 1.3, bty = "n")
```

## Posterior means for $\mathbf{A}$

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: mean-a
#| warning: false
#| out.width: "100%"
#| cache: true

mean_A  = apply(post$posterior$A, 1:2, mean)
rownames(mean_A) = colnames(optimism[,c(1,2,5)])
colnames(mean_A) = c(paste0("A",1:4 %x% rep(1,3)),"mu0")
knitr::kable(mean_A, caption = "Posterior estimates for autoregressive parameters", digits = 2)
```


## Posterior means for $\mathbf\Sigma$

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: mean-s
#| warning: false
#| out.width: "100%"
#| cache: true

mean_S  = t(apply(post$posterior$Sigma, 1:2, mean))
mean_S  = cbind(mean_S, cov2cor(mean_S))
rownames(mean_S) = colnames(optimism[,c(1,2,5)])
colnames(mean_S) = c(rep("cov",3),rep("cor",3))
knitr::kable(mean_S, caption = "Posterior estimates for covariance", digits = 5)
```

## Posterior means for hyper-parameters

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

```{r}
#| label: mean-h
#| warning: false
#| out.width: "100%"
#| cache: true

mean_h  = rbind(apply(run$hyper, 2, mean), apply(run$hyper, 2, sd))
rownames(mean_h) = c("E[hyper|data]", "sd[hyper|data]")
knitr::kable(mean_h, caption = "Posterior estimates for hyper-parameters", digits = 3)
```


## Australian Data Forecasting

::: footer
[Bayesian VARs](https://bsvars.github.io/2024-10-be24-bsvarSIGNs)
:::

$$ $$

[DOWNLOAD THE SCRIPT](https://github.com/bsvars/2024-10-be24-bsvarSIGNs/blob/main/be23-AUdata.qmd)