Introduction
Let \(\mathbf{y}_t =
(y_{t1},y_{t2},\ldots,y_{tK})^\top\) denote a \(K\times 1\) vector of endogenous variables.
A vector autoregressive (VAR) model of order \(p\) can be expressed as \[
\mathbf{y}_t = \mathbf{A}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{A}_p
\mathbf{y}_{t-p} + \mathbf{C} \mathbf{d}_t + \boldsymbol\epsilon_t,
\qquad\qquad\qquad (1)
\] where \(\mathbf{d}_t\) is
an \(L\times 1\) vector of
deterministic regressors, \(\boldsymbol\epsilon_t\) is a \(K\times 1\) noise vector, and \(\mathbf{A}_1,\ldots,\mathbf{A}_p\) and
\(\mathbf{C}\) are coefficient matrices
(Hamilton 1994; Tsay 2005).
Backgrounds
Recently, several R
packages have been developed for parameter estimation and forecasting
using stochastic time series models. The forecast
package provides various methods and tools for univariate time series
models such as the ARIMA and ETS models (Hyndman
et al. 2018). The MTS package has been developed
for a wide variety of multivariate linear time series models and
multivariate volatility models such as the VARMA, multivariate EWMA, and
low-dimensional BEKK models (Tsay and Wood
2018). The vars package provides methods for
multivariate time series analysis using the VAR, SVAR, and SVEC models
(Pfaff and Stigler 2018).
In this study, we focus on shrinkage estimation of VAR model
parameters. Shrinkage estimation methods have been playing crucial roles
in high-dimensional statistical modeling; see, e.g., Beltrachini, von Ellenrieder, and Muravchik
(2013), Böhm and von Sachs (2009),
Fiecas and Ombao (2011) and Ledoit and Wolf (2004). For VAR models, several
shrinkage estimation methods have been suggested such as a Stein-type
nonparametric shrinkage estimation method (Opgen-Rhein and Strimmer 2007b), Bayesian VARs
using informative priors (Bańbura, Giannone, and
Reichlin 2010; Doan, Litterman, and Sims 1984; Koop and Korobilis 2010;
Litterman 1986), Bayesian VARs using noninformative priors (Ni and Sun 2005; Sun and Ni 2004), and a
semiparametric Bayesian approach adopting a modified \(K\)-fold cross validation (Lee, Choi, and Kim 2016).
Due to its popularity in macroeconomic time series analysis, several
Bayesian VAR methods have been implemented in R packages. For example, the
function BVAR() in MTS implements a
Bayesian VAR method using an informative prior (Tsay and Wood 2018); the package
bvarsv implements Bayesian VAR models with stochastic
volatility and time-varying parameters (Krueger
2015; Koop and Korobilis 2010; Primiceri 2005).
On the other hand, we note that other types of shrinkage methods
which have been implemented for other purposes than multivariate time
series analysis can be applied to shrinkage estimation of VAR
parameters. For instance, the function cov.shrink() in the
package corpcor has been implemented to compute
shrinkage estimates of covariances, but it has been applied to estimate
VAR coefficients in Opgen-Rhein and Strimmer
(2007b; Schäfer et al. 2017). Moreover, we note that VAR models
can be reformulated into multivariate regression problems, so that
penalized least squares methods can be used for shrinkage estimation of
VAR parameters, e.g., the functions lm.gls() for
generalized least squares and lm.ridge() for ridge
regression in the package MASS (Ripley et al. 2018); the function
glmnet() for Lasso and Elastic-Net regularized generalized
linear models in the package glmnet() (Friedman et al. 2018); the function
linearRidge() for ridge regression in the package
ridge (Moritz and Cule
2018).
Main Purpose
While Bayesian approaches have been widely used in the literature, we
note that nonparametric and semiparametric approaches have advantages in
the case of high-dimensional VARs with more than several hundreds of
time series variables due to their relatively low computational costs
(Opgen-Rhein and Strimmer 2007b). Despite
of their relatively high computational costs, Bayesian approaches can
impose proper assumptions on the multivariate time series data flexibly,
such as VAR roots near unity and correlations between noise processes
(Lee, Choi, and Kim 2016). In this sense,
a semiparametric approach can be a trade-off between nonparametric and
parametric approaches (Lee, Choi, and Kim
2016).
In this study, we developed an integrative R package,
VARshrink, for implementing nonparametric, parametric,
and semiparametric approaches for shrinkage estimation of VAR model
parameters. By providing a simple interface function,
VARshrink(), for running various types of shrinkage
estimation methods, the performance of the methods can be easily
compared. We note that the package vars implemented an
ordinary least squares method for VAR models by the function
VAR(). The function VARshrink() was built to
extend VAR() to shrinkage estimation methods, so that the
VARshrink package takes advantage of the tools and
methods available in the vars package. For example, we
demonstrate the use of model selection criteria such as AIC and BIC in
this paper, where the methods AIC(), BIC(), and
logLik() can handle multivariate t-distribution and the
effective number of parameters in VARshrink.
This paper is a brief version of the original manuscript.
This paper is organized as follows. In Section 2, we explain the formulation of VAR models in a
multivariate regression problem, which simplifies implementation of the
package. In Section 3, we describe the common
interface function and the four shrinkage estimation methods included in
the package. We clearly present closed form expressions for the
shrinkage estimators inferred by the shrinkage methods, so that we can
indicate the role of the shrinkage intensity parameters in each method.
In addition, we explain how the effective number of parameters can be
computed for shrinkage estimators. In Section 4, we present numerical experiments using
benchmark data and simulated data briefly for comparing
performances of the shrinkage estimation methods. Discussion and
conclusions are provided in Section 5.
Shrinkage Estimation
Methods
In this section, we will describe the shrinkage estimation methods
implemented in this package. The methods are used for estimating the VAR
coefficient matrix \(\mathbf{\Psi}\)
alone or both of the \(\mathbf{\Psi}\)
and \(\mathbf{\Sigma}\) in Eq. (6).
We provide a common R
function interface VARshrink() for running the estimation
methods, which is defined by
VARshrink(y, p = 1, type = c("const", "trend", "both", "none"),
season = NULL, exogen = NULL, method = c("ridge", "ns", "fbayes",
"sbayes", "kcv"), lambda = NULL, lambda_var = NULL, dof = Inf, ...)
The input arguments are described as follows.
y: A T-by-K matrix of endogenous variables
p: Integer for the lag order
type: Type of deterministic regressors to include.
- “const” - the constant: \(\mathbf{d}_t =
1\).
- “trend” - the trend: \(\mathbf{d}_t =
t\).
- “both” - both the constant and the trend: \(\mathbf{d}_t = (t, 1)^\top\).
- “none” - no deterministic regressors.
season: An integer value of frequency for inclusion of
centered seasonal dummy variables.
exogen: A T-by-L matrix of exogenous variables.
method:
- “ridge” - multivariate ridge regression.
- “ns” - a Stein-type nonparametric shrinkage method.
- “fbayes” - a full Bayesian shrinkage method using noninformative
priors.
- “sbayes” - a semiparametric Bayesian shrinkage method using
parameterized cross validation.
- “kcv” - a semiparametric Bayesian shrinkage method using K-fold
cross validation
lambda, lambda_var: Shrinkage parameter value(s). Use
of this parameter is slightly different for each method: the same value
does not imply the same shrinkage estimates. See description in the
following subsections for the use of shrinkage parameters in each
method.
dof: Degree of freedom of multivariate t-distribution
for noise. Valid only for method = "fbayes" and
method = "sbayes". dof = Inf means
multivariate normal distribution.
The output value is an object of class “varshrinkest”, which inherits
the class “varest” in the package vars, and an object
of class “varest” can be obtained by the function VAR() in
the package vars. The input arguments and the output
value indicate that VARshrink() is an extension of
VAR() in the vars package. As a result,
almost all the methods and functions included in the package
vars are available for the package
VARshrink, such as fevd(), Phi(), plot().
The methods and functions available in the VARshrink
package are summarized in Table 1. The names of the functions for class
“varshrinkest” has suffix “_sh” in order to distinguish them from the
functions for class “varest”. We remark that the classes “varshrinkest”,
“varshirf”, and “varshsum” inherit the classes “varest”, “varirf”, and
“varsum” of the vars package, respectively.
Table 1. Structure of the package VARshrink.
|
Function or method
|
Class
|
Methods for class
|
Functions for class
|
|
VAR
|
varshrinkest, varest
|
coef, fevd, fitted, irf, logLik, Phi, plot, predict, print, Psi, resid,
summary
|
Acoef_sh, arch.test_sh, Bcoef_sh, BQ_sh, causality_sh,
normality.test_sh, restrict_sh, roots_sh, serial.test_sh, stability_sh
|
|
fevd
|
varfevd
|
plot, print
|
|
|
irf
|
varshirf, varirf
|
plot, print
|
|
|
predict
|
varprd
|
plot, print
|
fanchart
|
|
summary
|
varshsum, varsum
|
print
|
|
|
arch.test_sh
|
varcheck
|
plot, print
|
|
|
normality.test_sh
|
varcheck
|
plot, print
|
|
|
serial.test_sh
|
varcheck
|
plot, print
|
|
|
stability_sh
|
varstabil
|
plot, print
|
|
Multivariate Ridge
Regression
The ridge regression method is a kind of penalized least squares
(PLS) method, which produces a biased estimate of the VAR coefficient
(Hoerl and Kennard 1970). Formally
speaking, the ridge regression estimator of \(\mathbf{\Psi}\) can be obtained by
minimizing the penalized sum of squared prediction errors (SSPE) as
\[\begin{equation}
\widehat{ \mathbf{\Psi} }^{\text{R}} (\lambda) =
\arg\min_{\mathbf{\Psi}}
\
\frac{1}{N}
\left\|\mathbf{Y} - \mathbf{X} \mathbf{\Psi} \right\|_F^2 +
\lambda \left\| \mathbf{\Psi} \right\|_F^2,
\end{equation}\]| |_F^2, \end{equation} where \(\|\mathbf{A}\|_F = \sqrt{ \sum_{i} \sum_{j}
a_{ij}^2 }\) is the Frobenius norm of a matrix \(\mathbf{A}\), \(N
= T-p\), and \(\lambda \geq 0\)
is called the regularization parameter or the shrinkage parameter. The
ridge regression estimator \(\widehat{
\mathbf{\Psi} }^\text{R} (\lambda)\) can be expressed in the
closed form \[\begin{equation}
\widehat{ \mathbf{\Psi} }^\text{R} (\lambda) =
\left( \mathbf{X}^\top \mathbf{X} + N \lambda \mathbf{I}
\right)^{-1} \mathbf{X}^\top \mathbf{Y},
\qquad
\lambda \geq 0.
\end{equation}\]
The shrinkage parameter \(\lambda\)
for the ridge regression can be automatically determined by using the
generalized cross-validation (GCV) (Golub, Heath,
and Wahba 1979). The GCV selects the value of \(\lambda\) where the GCV score given below
is minimized: \[\begin{equation}
GCV(\lambda) = \frac{1}{N} \left\| (\mathbf{I} - \mathbf{H}(\lambda)
\mathbf{Y} ) \right\|^2_\text{F}
\left/
\left[ \frac{1}{N} Trace(\mathbf{I} - \mathbf{H}(\lambda) )
\right]^2
\right. ,
\end{equation}\] where \(\mathbf{H}(\lambda) = \mathbf{X}^\top
\left(\mathbf{X}^\top \mathbf{X} + N \lambda \mathbf{I}
\right)^{-1} \mathbf{X}^\top\).
In this package, the interface to the shrinkage estimation methods is
provided by the function VARshrink(method = "ridge", ...).
If the input argument lambda is set at a value or a vector
of values, then corresponding GCV score is computed automatically for
each \(\lambda\) value, and the VAR
coefficients with the smallest GCV score is selected. If
lambda = NULL, then the default value of
c(0.0001, 0.0005, 0.001, 0.005, ..., 10, 50) is used.
For example, simulated time series data of length \(T=100\) were generated based on a
multivariate normal distribution for noise and a VAR model with \(p=1\), \(K=2\), \(\mathbf{A}_1 = 0.5\mathbf{I}_2\), \(\mathbf{C}=(0.2, 0.7)^\top\), and \(\mathbf{\Sigma} = 0.1^2\mathbf{I}_2\) as
follows:
set.seed(1000)
myCoef <- list(A = list(matrix(c(0.5, 0, 0, 0.5), 2, 2)), c = c(0.2, 0.7))
myModel <- list(Coef = myCoef, Sigma = diag(0.1^2, 2), dof = Inf)
Y <- simVARmodel(numT = 100, model = myModel, burnin = 10)
resu_estim <- list()
Then, the multivariate ridge regression can be carried out for VAR
models as follows. The result printed on the screen shows all the
lambda values considered and the corresponding GCV values.
The VAR parameters are estimated using the lambda value with the minimum
GCV value.
resu_estim$`Ridge regression` <-
VARshrink(Y, p = 1, type = "const", method = "ridge", lambda = NULL)
resu_estim$`Ridge regression`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.61424699 0.07318362 0.05264740
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.1195550 0.3254618 0.9055450
#>
#>
#> lambda: 1e-04 5e-04 0.001 0.005 0.01 0.05 0.1 0.5 1 5 10 50 (estimated: TRUE)
#> GCV: 0.0187 0.0186 0.0186 0.0192 0.02 0.0227 0.025 0.0647 0.1506 0.8406 1.2754 1.9329
The method summary() is available for objects of class
“varshrinkest” as follows:
summary(resu_estim$`Ridge regression`)
#>
#> VAR Shrinkage Estimation Results:
#> ===================================
#> Endogenous variables: y1, y2
#> Deterministic variables: const
#> Sample size: 99
#> Log Likelihood: 188.793
#> Roots of the characteristic polynomial:
#> 0.6419 0.2978
#> Call:
#> VARshrink(y = Y, p = 1, type = "const", method = "ridge", lambda = NULL)
#>
#>
#> Estimation results for equation y1:
#> ===================================
#> y1 = y1.l1 + y2.l1 + const
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.61425 0.07636 8.045 2.27e-12 ***
#> y2.l1 0.07318 0.07549 0.969 0.335
#> const 0.05265 0.10888 0.484 0.630
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.08913 on 96.16244 degrees of freedom
#> Multiple R-Squared: 0.3864, Adjusted R-squared: 0.3747
#> F-statistic: 32.95 on 1.837559 and 96.16244 DF, p-value: 4.725e-11
#>
#> Estimation results for equation y2:
#> ===================================
#> y2 = y1.l1 + y2.l1 + const
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.11955 0.08632 1.385 0.169249
#> y2.l1 0.32546 0.08534 3.814 0.000242 ***
#> const 0.90555 0.12308 7.357 6.3e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.1008 on 96.16244 degrees of freedom
#> Multiple R-Squared: 0.1219, Adjusted R-squared: 0.1051
#> F-statistic: 7.263 on 1.837559 and 96.16244 DF, p-value: 0.00157
#>
#>
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#> y1 y2
#> y1 0.0077160 -0.0006829
#> y2 -0.0006829 0.0098611
#>
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] Inf
#>
#> Correlation matrix of Sigma:
#> y1 y2
#> y1 1.00000 -0.07829
#> y2 -0.07829 1.00000
Nonparametric
Shrinkage (NS)
The nonparametric shrinkage (NS) estimation method for VAR models,
proposed by Opgen-Rhein and Strimmer
(2007b), produces an estimate of \(\mathbf{\Psi}\) based on a James-Stein type
shrinkage of sample covariance matrices (Opgen-Rhein and Strimmer 2007a; Schäfer and Strimmer
2005).
We will briefly describe the NS method. In the NS method, the data
matrices \(\mathbf{X}\) and \(\mathbf{Y}\) are mean-corrected so that
each column has the mean of zero. Let \(\mathbf{Z} = [\mathbf{X}, \mathbf{Y}] \in
\mathbb{R}^{N \times (Kp + K + L)}\) be a combined data matrix.
The sample covariance matrix of \(\mathbf{Z}\) is partitioned as \[\begin{equation}
\mathbf{S}_\text{ZZ} = \frac{1}{N - 1} \mathbf{Z}^\top \mathbf{Z} =
\begin{bmatrix} \mathbf{S}_\text{XX} & \mathbf{S}_\text{XY} \\
\mathbf{S}_\text{XY}^\top & \mathbf{S}_\text{YY}
\end{bmatrix}
\in \mathbb{R}^{(Kp + K + L) \times (Kp + K + L)},
\end{equation}\] where \(\mathbf{S}_\text{XX} = (N - 1)^{-1}
\mathbf{X}^\top \mathbf{X}\), \(\mathbf{S}_\text{XY} = (N - 1)^{-1}
\mathbf{X}^\top \mathbf{Y}\), and \(\mathbf{S}_\text{YY} = (N - 1)^{-1}
\mathbf{Y}^\top \mathbf{Y}\). The matrix \(\mathbf{S}_\text{ZZ}\) can be decomposed as
\[\begin{equation}
\mathbf{S}_\text{ZZ} = \mathbf{D}_\text{Z}^{1/2}
\mathbf{R}_\text{ZZ} \mathbf{D}_\text{Z}^{1/2},
\end{equation}\] where \(\mathbf{R}_\text{ZZ}\) is the sample
correlation matrix and \(\mathbf{D}_\text{Z} =
\text{diag}(s_{11}, s_{22}, \ldots, s_{Kp + K + L, Kp + K + L})\)
is a diagonal matrix with diagonal elements of sample variances.
Shrinkage estimates of the correlation matrix and the variances can be
written as \[\begin{equation}
\widehat{\mathbf{R}}_\text{ZZ} =
(1-\lambda) \mathbf{R}_\text{ZZ} + \lambda \mathbf{I}
\quad
\text{and}
\quad
\widehat{\mathbf{D}}_\text{Z} =
\text{diag}(\hat{s}_{11}, \hat{s}_{22}, \ldots,
\hat{s}_{Kp+K+L,Kp+K+L})
\end{equation}\] with \[\begin{equation}
\hat{s}_{ii} = (1-\lambda_v) s_{ii} + \lambda_v s_\text{med},
\quad i = 1, 2, \ldots, Kp + K + L,
\end{equation}\] where \(s_\text{med}\) is a median of all the
sample variances \(s_{ii}\), and \(0\leq \lambda, \lambda_v \leq 1\) are
shrinkage parameters. The estimated covariance matrix can be computed by
\[\begin{equation}
\widehat{\mathbf{S}}_\text{ZZ}(\lambda, \lambda_v)
= \widehat{\mathbf{D}}_\text{Z}^{1/2}
\widehat{\mathbf{R}}_\text{ZZ}
\widehat{\mathbf{D}}_\text{Z}^{1/2}.
\end{equation}\] The values of the shrinkage parameters \(\lambda\) and \(\lambda_v\) are determined by the
James-Stein type shrinkage method, which we call the NS method described
in (Opgen-Rhein and Strimmer 2007a; Schäfer and
Strimmer 2005).
The ordinary least squares estimate \(\widehat{\mathbf{\Psi}}^{\text{OLS}}\) of
\(\mathbf{\Psi}\) is given by \(\widehat{\mathbf{\Psi}}^{\text{OLS}} =
\mathbf{S}_\text{XX}^{-1} \mathbf{S}_\text{XY}\). We define the
NS estimate of \(\mathbf{\Psi}\) as
\[\begin{equation}
\widehat{ \mathbf{\Psi} }^\text{N} (\lambda, \lambda_v) =
\widehat{\mathbf{S}}_\text{XX}^{-1}
\widehat{\mathbf{S}}_\text{XY},
\qquad
0\leq \lambda, \lambda_v \leq 1.
\end{equation}\] where \(\widehat{\mathbf{S}}_\text{XX}\) and \(\widehat{\mathbf{S}}_\text{XY}\) are parts
of the estimated covariance matrix, \[\begin{equation}
\widehat{\mathbf{S}}_\text{ZZ} (\lambda, \lambda_v) =
\begin{bmatrix} \widehat{\mathbf{S}}_\text{XX} &
\widehat{\mathbf{S}}_\text{XY} \\
\widehat{\mathbf{S}}_\text{XY}^\top &
\widehat{\mathbf{S}}_\text{YY}
\end{bmatrix}.
\end{equation}\]
In the package VARshrink, the function
VARshrink(method = "ns", ...) provides an interface with
the NS method. In specific, the package corpcor (Schäfer et al. 2017) includes the R function
cov.shrink(), which can determine \(\lambda\) and \(\lambda_v\) and estimate the covariance
matrix \(\widehat{\mathbf{S}}_\text{ZZ}(\lambda,
\lambda_v)\). The function VARshrink() in the
VARshrink package infers the NS estimates of VAR
coefficients, \(\widehat{\mathbf{\Psi}}^\text{N} (\lambda,
\lambda_v)\), by using the covariance matrix \(\widehat{\mathbf{S}}_\text{ZZ}(\lambda,
\lambda_v)\). If the input arguments lambda and
lambda_var are set as lambda = NULL and
lambda_var = NULL, then \(\lambda\) and \(\lambda_v\) are determined automatically.
We can find the estimated \(\lambda\)
and \(\lambda_v\) values on the printed
screen. For example,
resu_estim$`Nonparametric shrinkage` <-
VARshrink(Y, p = 1, type = "none", method = "ns",
lambda = NULL, lambda_var = NULL)
resu_estim$`Nonparametric shrinkage`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1
#>
#> y1.l1 y2.l1
#> 0.56430137 0.06139375
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1
#>
#> y1.l1 y2.l1
#> 0.1080570 0.2537706
#>
#>
#> lambda: 0.104881997093098 (estimated: TRUE)
#> lambda_var: 1 (estimated: TRUE)
The summary() shows statistical inference on the
estimated VAR coefficients together with the estimated scale matrix
\(\mathbf{\Sigma}\) of multivariate
t-distribution for noise.
summary(resu_estim$`Nonparametric shrinkage`)
#>
#> VAR Shrinkage Estimation Results:
#> ===================================
#> Endogenous variables: y1, y2
#> Deterministic variables: none
#> Sample size: 99
#> Log Likelihood: 188.443
#> Roots of the characteristic polynomial:
#> 0.5844 0.2337
#> Call:
#> VARshrink(y = Y, p = 1, type = "none", method = "ns", lambda = NULL,
#> lambda_var = NULL)
#>
#>
#> Estimation results for equation y1:
#> ===================================
#> y1 = y1.l1 + y2.l1
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.564301 0.007754 72.775 < 2e-16 ***
#> y2.l1 0.061394 0.007315 8.393 3.47e-13 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.08828 on 98.81547 degrees of freedom
#> Multiple R-Squared: 0.3446
#> Warning in pf(result$fstatistic[1L], result$fstatistic[2L],
#> result$fstatistic[3L], : NaNs produced
#> , Adjusted R-squared: 0.35
#> F-statistic: -63.71 on -0.8154669 and 98.81547 DF, p-value: NA
#>
#> Estimation results for equation y2:
#> ===================================
#> y2 = y1.l1 + y2.l1
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.108057 0.008725 12.38 <2e-16 ***
#> y2.l1 0.253771 0.008231 30.83 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.09933 on 98.81547 degrees of freedom
#> Multiple R-Squared: 0.08139
#> Warning in pf(result$fstatistic[1L], result$fstatistic[2L],
#> result$fstatistic[3L], : NaNs produced
#> , Adjusted R-squared: 0.08897
#> F-statistic: -10.74 on -0.8154669 and 98.81547 DF, p-value: NA
#>
#>
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#> y1 y2
#> y1 0.0077791 -0.0006733
#> y2 -0.0006733 0.0098486
#>
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] Inf
#>
#> Correlation matrix of Sigma:
#> y1 y2
#> y1 1.00000 -0.07693
#> y2 -0.07693 1.00000
Full Bayesian
Shrinkage
Ni and Sun (2005) and Sun and Ni (2004) studied Bayesian estimation
methods using noninformative priors for VAR models, where they used
Markov Chain Monte Carlo (MCMC) methods for estimating coefficient
matrices, noise covariance matrices, and other hyperparameters in
Bayesian VAR models. In Ni and Sun (2005),
various Bayesian estimators were compared using several types of loss
functions, noninformative priors, and multivariate normal and
t-distributions. Among them, Bayesian estimators using a certain type of
noninformative priors showed higher accuracies than the other Bayesian
estimators in simulated experiments. The noninformative prior for
coefficient matrices was called the shrinkage prior, and the prior for
the noise covariance matrix was called the reference prior.
In the package VARshrink, we can obtain Bayesian
estimators using the shrinkage prior and the reference prior, which are
the noninformative priors that yielded the highest accuracies in the
simulation results (Ni and Sun 2005). As a
Bayesian estimator of VAR parameters, the minimizer of the posterior
expectation of quadratic loss is computed, which is the mean of the
posterior distribution. Additionally, the minimizer of the posterior
expectation of LINEX loss is also computed (Ni
and Sun 2005; Zellner 1986). In this section, we will explain the
model assumptions in more detail.
First, the noise vectors are independent and identically distributed
from multivariate t-distributions with the degree of freedom \(\nu\), i.e., \(\boldsymbol\epsilon_t \sim t_\nu (\mathbf{0},
\mathbf{\Sigma})\). It can be expressed as a hierarchical model
as \[\begin{equation}
\begin{split}
( \boldsymbol\epsilon_t | q_t ) & \sim \text{N}_K
(\mathbf{0}, q_t^{-1} \mathbf{\Sigma}).
\qquad\qquad\qquad (7)
\\
q_t & \sim \text{Gamma}(\nu/2, \nu/2).
\end{split}
\end{equation}\]
Second, we denote the vectorized version of \(\mathbf{\Psi} \in\mathbb{R}^{(J/K)\times
K}\) by \(\boldsymbol\psi =
\text{vec}(\mathbf{\Psi}) \in\mathbb{R}^{J}\). Here \(J = K (Kp + L)\). The shrinkage prior \(\pi_\text{S}(\boldsymbol\psi)\) for \(\boldsymbol\psi \in\mathbb{R}^{J}\) is
taken as \(\pi_\text{S}(\boldsymbol\psi)
\propto \left\| \boldsymbol\psi \right\|^{ -(J-2) }.\) By
introducing a latent variable \(\lambda>0\), the shrinkage prior can
also be expressed as a scale mixture of multivariate normals as \[\begin{equation}
\begin{split}
(\boldsymbol\psi | \lambda) & \sim \text{N}_{J} (\mathbf{0},
\lambda^{-1} \mathbf{I}_J),
\qquad\qquad\qquad (8)
\\
\pi (\lambda) & \propto 1.
\end{split}
\end{equation}\]
Third, the reference prior \(\pi_\text{R}(\mathbf{\Sigma})\) for \(\mathbf{\Sigma}\) is taken as \[\begin{equation}
\pi_\text{R}(\mathbf{\Sigma}) \propto \left| \mathbf{\Sigma}
\right|^{-1} \prod_{1\leq i\leq j\leq K} (\lambda_i - \lambda_j)^{-1},
\end{equation}\] where \(\lambda_1
> \lambda_2 > \cdots > \lambda_K\) are eigenvalues of
\(\mathbf{\Sigma}\).
Note that no hyperparameters are involved in the shrinkage prior and
the reference prior, since they are noninformative priors. The Gibbs
MCMC method makes use of the hierarchical expression of \(\boldsymbol\epsilon_t\). The Gibbs MCMC
samples the parameters \((\boldsymbol\psi, \lambda, \mathbf{\Sigma},
\mathbf{Q}, \nu)\) with \(\mathbf{Q} =
\text{diag}(q_{p+1}, \ldots, q_T)\) from conditional posterior
distributions (Ni and Sun 2005). We remark
that the mean of the conditional posterior distribution, \(\pi(\boldsymbol\psi | \lambda, \mathbf{\Sigma},
\mathbf{Q}, \nu; \mathbf{Y})\) of \(\boldsymbol\psi\) is given by \[\begin{equation}
\widehat{\boldsymbol\psi}^{F} (\lambda) =
\left[\left(\mathbf{\Sigma}^{-1} \otimes \left( \mathbf{X}^\top
\mathbf{Q} \mathbf{X} \right) \right) + \lambda
\mathbf{I}_J \right]^{-1} \text{vec}\left( \mathbf{X}^\top
\mathbf{Q} \mathbf{Y} \mathbf{\Sigma}^{-1} \right),
\qquad
\lambda > 0.
\qquad\qquad\qquad (9)
\end{equation}\] Note that if \(\mathbf{\Sigma} = \mathbf{I}_K\) and \(q_{p+1}=\cdots=q_T = 1\), then the
estimator becomes the ridge regression estimator. That is, Bayesian
shrinkage estimators have more flexible and abundant expressions, even
though more computational effort is required to estimate more
parameters.
In the package VARshrink, the function
VARshrink(method = "fbayes", ...) plays the role as an
interface with the full Bayesian shrinkage method with the shrinkage
prior and the reference prior. The shrinkage parameter \(\lambda\) is not set at a fixed value,
i.e., the arguments lambda and lambda_var are
of no use here. Instead, the mean of the posterior distribution, \(\hat{\delta} = \mathbb{E}\left[ \lambda^{-1} |
\mathbf{Y} \right]\) is estimated during the MCMC process (Ni and Sun 2005), and we define \(\hat{\lambda} = \hat{\delta}^{-1}\). There
are several arguments to be specified as follows.
dof: If dof = Inf, then a multivariate
normal distribution is applied and and the weight \(\mathbf{Q}\) is not estimated. If
dof is a finite value, then we apply the multivariate
t-distribution with a fixed degree of freedom \(\nu=\) dof and estimate \(\mathbf{Q}\). If dof = NULL,
we estimate both the degree of freedom \(\nu\) and the weight \(\mathbf{Q}\). In this case, the package
ars is required for sampling the parameter \(\nu\) from its conditional posterior
distribution.
burnincycle, mcmccycle: The Gibbs MCMC method
samples a series of parameter values of the length,
burnincycle + mcmccycle. burnincycle is the
number of sampled parameter values to be discarded in the beginning, and
mcmccycle is the number to be attained and used for
computing the parameter estimates. By default, we set
burnincycle = 1000 and
mcmccycle = 2000.
For example, we run the full Bayesian shrinkage method with a fixed
\(\nu = 6\) as follows.
resu_estim$`Full Bayes (fixed dof)` <-
VARshrink(Y, p = 1, type = "const", method = "fbayes", dof = 6,
burnincycle = 1000, mcmccycle = 2000)
resu_estim$`Full Bayes (fixed dof)`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.64042931 0.05773600 0.06400323
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.09971729 0.31292326 0.93493057
#>
#>
#> Sigma for noise:
#> [,1] [,2]
#> [1,] 0.0072637763 -0.0001772936
#> [2,] -0.0001772936 0.0081118713
#> dof for noise: 6 (estimated: FALSE)
#> lambda: 1.46042035262722 (estimated: TRUE)
By using summary(), we can find the estimated scale
matrix \(\mathbf{\Sigma}\) of
multivariate t-distribution for noise.
summary(resu_estim$`Full Bayes (fixed dof)`)
#>
#> VAR Shrinkage Estimation Results:
#> ===================================
#> Endogenous variables: y1, y2
#> Deterministic variables: const
#> Sample size: 99
#> Log Likelihood: 184.114
#> Roots of the characteristic polynomial:
#> 0.6572 0.2962
#> Call:
#> VARshrink(y = Y, p = 1, type = "const", method = "fbayes", dof = 6,
#> burnincycle = 1000, mcmccycle = 2000)
#>
#>
#> Estimation results for equation y1:
#> ===================================
#> y1 = y1.l1 + y2.l1 + const
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.64043 0.07897 8.110 1.66e-12 ***
#> y2.l1 0.05774 0.08313 0.695 0.489
#> const 0.06400 0.12041 0.532 0.596
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.08916 on 96.04009 degrees of freedom
#> Multiple R-Squared: 0.4049, Adjusted R-squared: 0.3927
#> F-statistic: 33.34 on 1.959913 and 96.04009 DF, p-value: 1.399e-11
#>
#> Estimation results for equation y2:
#> ===================================
#> y2 = y1.l1 + y2.l1 + const
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.09972 0.08931 1.117 0.26696
#> y2.l1 0.31292 0.09401 3.329 0.00124 **
#> const 0.93493 0.13617 6.866 6.52e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.1008 on 96.04009 degrees of freedom
#> Multiple R-Squared: 0.1101, Adjusted R-squared: 0.09197
#> F-statistic: 6.064 on 1.959913 and 96.04009 DF, p-value: 0.003513
#>
#>
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#> [,1] [,2]
#> [1,] 0.0072638 -0.0001773
#> [2,] -0.0001773 0.0081119
#>
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] 6
#>
#> Correlation matrix of Sigma:
#> [,1] [,2]
#> [1,] 1.0000 -0.0231
#> [2,] -0.0231 1.0000
Instead, we can estimate the degree of freedom, \(\nu\), of multivariate t-distribution by
the argument dof = NULL as follows:
resu_estim$`Full Bayes (estim dof)` <-
VARshrink(Y, p = 1, type = "const", method = "fbayes", dof = NULL,
burnincycle = 1000, mcmccycle = 2000)
resu_estim$`Full Bayes (estim dof)`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.63997678 0.05938708 0.06191316
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.1043844 0.3132871 0.9323669
#>
#>
#> Sigma for noise:
#> [,1] [,2]
#> [1,] 0.0070211398 -0.0002166658
#> [2,] -0.0002166658 0.0081808592
#> dof for noise: 8.43991341078547 (estimated: TRUE)
#> lambda: 1.35334572461602 (estimated: TRUE)
Semiparametric
Bayesian Shrinkage
Whereas full Bayesian shrinkage methods estimate all the
hyperparameters including latent variables via MCMC methods,
semiparametric Bayesian methods estimate some of the hyperparameters by
a certain nonparametric method and estimate the rest in a Bayesian
framework. The semiparametric approach is advantageous especially when
the dimensionality of the model is so high that standard MCMC methods
are not computationally tractable.
Lee, Choi, and Kim (2016) assumed scale
mixtures of multivariate normal distributions for noise vectors as in
Eq. (7). The prior distribution for the model coefficients, \(\boldsymbol\psi =
\text{vec}(\mathbf{\Psi})\), was set as multivariate normal
distributions, similarly to Eq. (8). Here, we can choose either the
conjugate prior (CJ) and non-conjugate (NCJ) prior as follows:
The conjugate prior for \(\boldsymbol\psi \in\mathbb{R}^J\) is
expressed by \[\begin{equation}
(\boldsymbol\psi | \lambda, \mathbf{\Sigma}) \sim \text{N}_{J}
\left(\mathbf{0}, \frac{1-\lambda}{(N - 1) \lambda} \mathbf{\Sigma}
\otimes \mathbf{I} \right),
\qquad
0<\lambda < 1.
\end{equation}\]
The non-conjugate prior for \(\boldsymbol\psi \in\mathbb{R}^J\) is
expressed by \[\begin{equation}
(\boldsymbol\psi | \lambda) \sim \text{N}_{J} \left(\mathbf{0},
\frac{1-\lambda}{(N - 1) \lambda} \mathbf{I}_J \right),
\qquad
0<\lambda < 1.
\end{equation}\]
The scale matrix \(\mathbf{\Sigma}\)
for noise is included in the equation for conjugate prior distribution
but not for the non-conjugate prior distribution. The non-conjugate
prior is quite similar to the full Bayesian formulation in Eq. (8).
However, the main difference is that, in the semiparametric Bayesian
approach, the shrinkage parameter \(\lambda\) should be estimated explicitly
via a nonparametric method, but in the full Bayes approach, it is a
latent variable which should be sampled and estimated implicitly via an
MCMC method.
The prior distribution for \(\mathbf\Sigma\) was set as an inverse
Wishart distribution as \[\begin{equation}
(\mathbf{\Sigma} | \mathbf{L}_0, m_0) \sim
\text{InvWishart}( \mathbf{L}_0, m_0),
\qquad
\mathbf{L}_0 \succ \mathbf{0}, m_0 > K - 1.
\end{equation}\] where \(\mathbf{L}_0 \succ \mathbf{0}\) means that
\(\mathbf{L}_0\) is positive
definite.
Once the shrinkage parameter \(\lambda\) is set at a fixed value, the
other parameters, \(\boldsymbol\psi,
\mathbf{\Sigma}\), and \(\mathbf{Q}\) can be estimated iteratively
in a Bayesian framework efficiently. Briefly speaking, we consider
estimating the parameters \(\boldsymbol\psi\) and \(\mathbf{\Sigma}\) at the maximum point
(i.e., the mode) of the marginal posterior density function \(\pi (\boldsymbol\psi, \mathbf{\Sigma} | \lambda;
\mathbf{Y})\). In the case of the non-conjugate prior, the mode,
\((\widehat{\boldsymbol\psi}^\text{S}
(\lambda), \widehat{\mathbf{\Sigma}}^\text{S} (\lambda))\), is
expressed as \[\begin{equation}
\widehat{\boldsymbol\psi}^{S} (\lambda) =
\left[\left( (\widehat{\mathbf{\Sigma}}^\text{S})^{-1} \otimes
\left( \mathbf{X}^\top \widehat{\mathbf{Q}}^\text{S}
\mathbf{X} \right) \right) + \frac{ (N - 1) \lambda}{1-\lambda}
\mathbf{I}_J \right]^{-1}
\text{vec}\left( \mathbf{X}^\top \widehat{\mathbf{Q}}^\text{S}
\mathbf{Y} ( \widehat{\mathbf{\Sigma}}^\text{S})^{-1} \right),
\qquad\qquad\qquad (10)
\end{equation}\] and
\[\begin{equation}
\widehat{\mathbf{\Sigma}}^\text{S} (\lambda) =
\frac{1}{m_0 + T + K + 1} \left(
\mathbf{L}_0 + \mathbf{Y}^\top \widehat{\mathbf{Q}} \mathbf{Y} -
\mathbf{Y}^\top \widehat{\mathbf{Q}} \mathbf{X}
\widehat{\mathbf{\Psi}}^\text{S} (\lambda)
\right),
\end{equation}\] for \(0<
\lambda < 1\), where \(\widehat{\mathbf{Q}}^\text{S} =
\text{diag}(\hat{q}_{p+1}, \ldots, \hat{q}_T)\) is obtained in an
iterative manner. The values \(\widehat{\mathbf{Q}}^\text{S}\) is also
expressed as \[\begin{equation}
\hat{q}_t =
h \left( \left\| (\widehat{\mathbf{\Sigma}}^\text{S})^{-1/2}
\hat{\boldsymbol\epsilon}_t \right\|^2 \right),
\qquad
t = p+1, \ldots, T,
\end{equation}\] where \(h(x)\) is defined depending on the noise
distribution and \(\hat{\boldsymbol\epsilon}_t\) is the
residual given by \(\hat{\boldsymbol\epsilon}_t = \mathbf{y}_t -
\widehat{\mathbf{\Psi}}^{\text{S}\top} \mathbf{x}_t\) (Lee, Choi, and Kim 2016).
The shrinkage parameter \(\lambda\)
is determined via an internal nonparametric method, which is called the
parameterized cross validation (PCV). This algorithm can be considered
as a modified \(K\)-fold cross
validation, especially for estimating the shrinkage parameter of VAR
models; see Lee, Choi, and Kim (2016) for
a detailed explanation.
In addition, the semiparametric Bayesian shrinkage method adopts the
idea of shrinking both the correlations and variances from the NS
method. As a result, there are two shrinkage parameters \(\lambda\) and \(\lambda_v\), where \(0 \leq \lambda \leq 1\) is used for the
shrinkage estimation of the VAR coefficient matrix \(\Psi\) and noise covariance matrix \(\mathbf{\Sigma}\) while \(0 \leq \lambda_v \leq 1\) is used for the
shrinkage estimation of the variances of the variables \(y_{tj}, j = 1, \ldots, K\).
In the package VARshrink, the function
VARshrink(method = "sbayes", ...) is for the semiparametric
Bayesian shrinkage method. There are several input arguments to these
functions as follows:
dof: If dof = Inf, we use a multivariate
normal distribution for noise vectors. Otherwise, dof can
be set as a finite number \(\nu\)
between \(0\) and \(\infty\) for the multivariate
t-distribution with the degree of freedom \(\nu\). If dof = NULL, then
\(\nu\) is automatically selected by
the \(K\)-fold cross validation, with
\(K\) = num_fold.
dof = Inf by default.
lambda, lambda_var: lambda = NULL and
lambda_var = NULL implies that the shrinkage parameters
\(\lambda\) and \(\lambda_v\) are estimated automatically
with \(0\leq \lambda, \lambda_v \leq
1\).
prior_type: Either prior_type = "NCJ" or
prior_type = "CJ", which implies non-conjugate prior or
conjugate prior. The default value is “NCJ”.
num_folds: The number of folds for the parameterized
cross validation method for determining \(\lambda\) value. num_folds = 5
by default. It works only when lambda = NULL or
dof = NULL.
m0: The value of the hyperparameter \(m_0\) of the inverse Wishart prior of \(\mathbf{\Sigma}\). m0 =\(K\) by default. On the other hand, the
other hyperparameter \(\mathbf{L}_0\)
of the inverse Wishart prior is set by \(\mathbf{L}_0 = (m_0 + K + 1)
\mathbf{I}_K\).
For example, we can run the semiparametric shrinkage method as
follows.
resu_estim$`Semi Bayes (fixed dof)` <-
VARshrink(Y, p = 1, type = "const", method = "sbayes", dof = 6,
lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
num_folds = 5, m0 = ncol(Y))
resu_estim$`Semi Bayes (fixed dof)`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.65206869 0.05452195 0.05686968
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.1178293 0.3705490 0.8807748
#>
#>
#> Sigma for noise:
#> y1 y2
#> y1 0.0064309599 -0.0003425727
#> y2 -0.0003425727 0.0092293198
#> dof for noise: 6 (estimated: FALSE)
#> lambda: 0.000798110056835296 (estimated: TRUE)
#> lambda_var: 0.91370763181838 (estimated: TRUE)
summary(resu_estim$`Semi Bayes (fixed dof)`)
#>
#> VAR Shrinkage Estimation Results:
#> ===================================
#> Endogenous variables: y1, y2
#> Deterministic variables: const
#> Sample size: 99
#> Log Likelihood: 178.1
#> Roots of the characteristic polynomial:
#> 0.6733 0.3493
#> Call:
#> VARshrink(y = Y, p = 1, type = "const", method = "sbayes", lambda = NULL,
#> lambda_var = NULL, dof = 6, prior_type = "NCJ", num_folds = 5,
#> m0 = ncol(Y))
#>
#>
#> Estimation results for equation y1:
#> ===================================
#> y1 = y1.l1 + y2.l1 + const
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.65207 0.07737 8.428 3.5e-13 ***
#> y2.l1 0.05452 0.08503 0.641 0.523
#> const 0.05687 0.12323 0.461 0.645
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.08948 on 96.00218 degrees of freedom
#> Multiple R-Squared: 0.4116, Adjusted R-squared: 0.3993
#> F-statistic: 33.61 on 1.997823 and 96.00218 DF, p-value: 8.772e-12
#>
#> Estimation results for equation y2:
#> ===================================
#> y2 = y1.l1 + y2.l1 + const
#>
#> Estimate Std. Error t value Pr(>|t|)
#> y1.l1 0.11783 0.09351 1.260 0.210682
#> y2.l1 0.37055 0.10276 3.606 0.000496 ***
#> const 0.88077 0.14894 5.914 5.13e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Residual standard error: 0.1082 on 96.00218 degrees of freedom
#> Multiple R-Squared: 0.1311, Adjusted R-squared: 0.113
#> F-statistic: 7.247 on 1.997823 and 96.00218 DF, p-value: 0.001176
#>
#>
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#> y1 y2
#> y1 0.0064310 -0.0003426
#> y2 -0.0003426 0.0092293
#>
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] 6
#>
#> Correlation matrix of Sigma:
#> y1 y2
#> y1 1.00000 -0.04447
#> y2 -0.04447 1.00000
We can also let the software package to choose the degree of freedom
parameter \(\nu\) automatically by
setting dof = NULL. In this case, the package uses simply a
\(K\)-fold cross validation to find an
optimal value of \(\nu\).
resu_estim$`Semi Bayes (estim dof)` <-
VARshrink(Y, p = 1, type = "const", method = "sbayes", dof = NULL,
lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
num_folds = 5, m0 = ncol(Y))
resu_estim$`Semi Bayes (estim dof)`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.63678246 0.06545189 0.04630952
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.1478771 0.4041864 0.8161762
#>
#>
#> Sigma for noise:
#> y1 y2
#> y1 0.0074456739 -0.0006360331
#> y2 -0.0006360331 0.0116072333
#> dof for noise: Inf (estimated: TRUE)
#> lambda: 0.00126661656054025 (estimated: TRUE)
#> lambda_var: 0.91370763181838 (estimated: TRUE)
K-fold Cross
Validation for Semiparametric Shrinkage
The VARshrink includes an implementation of the
K-fold cross validation (CV) method for selecting shrinkage parameters.
In the current version of the package, the K-fold CV method can select
the \(\lambda\) and \(\lambda_v\) values for the semiparametric
Bayesian shrinkage estimator described in Section 3.4. Note the the semiparametric shrinkage method
in the previous section selects a \(\lambda\) value by using the PCV method and
selects a \(\lambda_v\) value by a
Stein-type nonparametric shrinkage method.
The K-fold CV method can be run as follows. The arguments to
VARshrink() are same to those for the semiparametric
Bayesian shrinkage method except for the argument
method = "kcv".
resu_estim$`K-fold CV (fixed dof)` <-
VARshrink(Y, p = 1, type = "const", method = "kcv", dof = 6,
lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
num_folds = 5, m0 = ncol(Y))
resu_estim$`K-fold CV (fixed dof)`
#>
#> VAR Shrinkage Estimation Results:
#> =================================
#>
#> Estimated coefficients for equation y1:
#> =======================================
#> Call:
#> y1 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.65223689 0.05916827 0.05784108
#>
#>
#> Estimated coefficients for equation y2:
#> =======================================
#> Call:
#> y2 = y1.l1 + y2.l1 + const
#>
#> y1.l1 y2.l1 const
#> 0.1135260 0.3886018 0.8219991
#>
#>
#> Sigma for noise:
#> y1 y2
#> y1 0.0068900761 -0.0003378826
#> y2 -0.0003378826 0.0087180682
#> dof for noise: 6 (estimated: FALSE)
#> lambda: 0.001 (estimated: TRUE)
#> lambda_var: 0.001 (estimated: TRUE)
Degree of freedom of multivariate t-distribution for noise can be
selected automatically by setting the argument dof = Inf as
follows.
resu_estim$`K-fold CV (estim dof)` <-
VARshrink(Y, p = 1, type = "const", method = "kcv", dof = NULL,
lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
num_folds = 5, m0 = ncol(Y))
After all, the function calcSSE_Acoef() computes the sum
of squared errors (SSEs) between two VAR model parameters, \(\{\mathbf{A}_k^{(1)}\}\) and \(\{\mathbf{A}_k^{(2)}\}\), as \(SSE = \sum_{k=1}^p \sum_{i,j}
(\mathbf{A}^{(1)}_{kij} - \mathbf{A}^{(2)}_{kij})^2\). For
example, Table 2 shows the SSEs of the estimated VAR coefficients.
resu_sse <- data.frame(SSE = sapply(resu_estim,
function(x) calcSSE_Acoef(Acoef_sh(x), myCoef$A)))
Table 2. Sum of squared errors of VAR coefficients estimated by
the shrinkage methods.
| Ridge regression |
0.063 |
| Nonparametric shrinkage |
0.080 |
| Full Bayes (fixed dof) |
0.068 |
| Full Bayes (estim dof) |
0.069 |
| Semi Bayes (fixed dof) |
0.057 |
| Semi Bayes (estim dof) |
0.054 |
| K-fold CV (fixed dof) |
0.052 |
| K-fold CV (estim dof) |
0.040 |