Abstract

We introduce an R software package, VARshrink, for providing shrinkage estimation methods for vector autoregressive (VAR) models. Contrary to the standard ordinary least squares method, shrinkage estimation methods can be applied to high-dimensional VAR models with dimensionality greater than the number of observations. While existing R packages for VAR shrinkage are mostly based on parametric, Bayesian estimation methods using informative priors, the VARshrink aims to be an integrative R package delivering nonparametric, parametric, and semiparametric methods in a unified and consistent manner. The current version of VARshrink contains four shrinkage estimation methods, which are the multivarate ridge regression, a nonparametric shrinkage method, a full Bayesian approach using noninformative priors, and a semiparametric Bayesian approach using informative priors. VAR estimation problems are formulated as a multivariate regression form, so that all the shrinkage estimation methods can be run by one interface function named VARshrink(). We clearly present mathematical expressions of shrinkage estimators of the shrinkage estimation methods. The effective number of parameters can be calculated based on a shrinkage intensity parameter value, which plays an important role for further statistical analyses. We provide a sample R code for demonstration of the package and model comparison.

keywords: Bayes estimation, vector autoregression (VAR), high-dimensionality, shrinkage, multivariate time series

3 Shrinkage Estimation Methods

In this section, we will describe the shrinkage estimation methods implemented in this package. The methods are used for estimating the VAR coefficient matrix \(\mathbf{\Psi}\) alone or both of the \(\mathbf{\Psi}\) and \(\mathbf{\Sigma}\) in Eq. (6).

We provide a common R function interface VARshrink() for running the estimation methods, which is defined by

VARshrink(y, p = 1, type = c("const", "trend", "both", "none"),
  season = NULL, exogen = NULL, method = c("ridge", "ns", "fbayes",
  "sbayes", "kcv"), lambda = NULL, lambda_var = NULL, dof = Inf, ...)

The input arguments are described as follows.

y: A T-by-K matrix of endogenous variables
p: Integer for the lag order
type: Type of deterministic regressors to include.
1. “const” - the constant: \(\mathbf{d}_t = 1\).
2. “trend” - the trend: \(\mathbf{d}_t = t\).
3. “both” - both the constant and the trend: \(\mathbf{d}_t = (t, 1)^\top\).
4. “none” - no deterministic regressors.
season: An integer value of frequency for inclusion of centered seasonal dummy variables.
exogen: A T-by-L matrix of exogenous variables.
method:
1. “ridge” - multivariate ridge regression.
2. “ns” - a Stein-type nonparametric shrinkage method.
3. “fbayes” - a full Bayesian shrinkage method using noninformative priors.
4. “sbayes” - a semiparametric Bayesian shrinkage method using parameterized cross validation.
5. “kcv” - a semiparametric Bayesian shrinkage method using K-fold cross validation
lambda, lambda_var: Shrinkage parameter value(s). Use of this parameter is slightly different for each method: the same value does not imply the same shrinkage estimates. See description in the following subsections for the use of shrinkage parameters in each method.
dof: Degree of freedom of multivariate t-distribution for noise. Valid only for method = "fbayes" and method = "sbayes". dof = Inf means multivariate normal distribution.

The output value is an object of class “varshrinkest”, which inherits the class “varest” in the package vars, and an object of class “varest” can be obtained by the function VAR() in the package vars. The input arguments and the output value indicate that VARshrink() is an extension of VAR() in the vars package. As a result, almost all the methods and functions included in the package vars are available for the package VARshrink, such as fevd(), Phi(), plot(). The methods and functions available in the VARshrink package are summarized in Table 1. The names of the functions for class “varshrinkest” has suffix “_sh” in order to distinguish them from the functions for class “varest”. We remark that the classes “varshrinkest”, “varshirf”, and “varshsum” inherit the classes “varest”, “varirf”, and “varsum” of the vars package, respectively.

Table 1. Structure of the package VARshrink.
Function or method	Class	Methods for class	Functions for class
VAR	varshrinkest, varest	coef, fevd, fitted, irf, logLik, Phi, plot, predict, print, Psi, resid, summary	Acoef_sh, arch.test_sh, Bcoef_sh, BQ_sh, causality_sh, normality.test_sh, restrict_sh, roots_sh, serial.test_sh, stability_sh
fevd	varfevd	plot, print
irf	varshirf, varirf	plot, print
predict	varprd	plot, print	fanchart
summary	varshsum, varsum	print
arch.test_sh	varcheck	plot, print
normality.test_sh	varcheck	plot, print
serial.test_sh	varcheck	plot, print
stability_sh	varstabil	plot, print

3.1 Multivariate Ridge Regression

The ridge regression method is a kind of penalized least squares (PLS) method, which produces a biased estimate of the VAR coefficient (Hoerl and Kennard 1970). Formally speaking, the ridge regression estimator of \(\mathbf{\Psi}\) can be obtained by minimizing the penalized sum of squared prediction errors (SSPE) as \[\begin{equation} \widehat{ \mathbf{\Psi} }^{\text{R}} (\lambda) = \arg\min_{\mathbf{\Psi}} \ \frac{1}{N} \left\|\mathbf{Y} - \mathbf{X} \mathbf{\Psi} \right\|_F^2 + \lambda \left\| \mathbf{\Psi} \right\|_F^2, \end{equation}\]| |_F^2, \end{equation} where \(\|\mathbf{A}\|_F = \sqrt{ \sum_{i} \sum_{j} a_{ij}^2 }\) is the Frobenius norm of a matrix \(\mathbf{A}\), \(N = T-p\), and \(\lambda \geq 0\) is called the regularization parameter or the shrinkage parameter. The ridge regression estimator \(\widehat{ \mathbf{\Psi} }^\text{R} (\lambda)\) can be expressed in the closed form \[\begin{equation} \widehat{ \mathbf{\Psi} }^\text{R} (\lambda) = \left( \mathbf{X}^\top \mathbf{X} + N \lambda \mathbf{I} \right)^{-1} \mathbf{X}^\top \mathbf{Y}, \qquad \lambda \geq 0. \end{equation}\]

The shrinkage parameter \(\lambda\) for the ridge regression can be automatically determined by using the generalized cross-validation (GCV) (Golub, Heath, and Wahba 1979). The GCV selects the value of \(\lambda\) where the GCV score given below is minimized: \[\begin{equation} GCV(\lambda) = \frac{1}{N} \left\| (\mathbf{I} - \mathbf{H}(\lambda) \mathbf{Y} ) \right\|^2_\text{F} \left/ \left[ \frac{1}{N} Trace(\mathbf{I} - \mathbf{H}(\lambda) ) \right]^2 \right. , \end{equation}\] where \(\mathbf{H}(\lambda) = \mathbf{X}^\top \left(\mathbf{X}^\top \mathbf{X} + N \lambda \mathbf{I} \right)^{-1} \mathbf{X}^\top\).

In this package, the interface to the shrinkage estimation methods is provided by the function VARshrink(method = "ridge", ...). If the input argument lambda is set at a value or a vector of values, then corresponding GCV score is computed automatically for each \(\lambda\) value, and the VAR coefficients with the smallest GCV score is selected. If lambda = NULL, then the default value of c(0.0001, 0.0005, 0.001, 0.005, ..., 10, 50) is used.

For example, simulated time series data of length \(T=100\) were generated based on a multivariate normal distribution for noise and a VAR model with \(p=1\), \(K=2\), \(\mathbf{A}_1 = 0.5\mathbf{I}_2\), \(\mathbf{C}=(0.2, 0.7)^\top\), and \(\mathbf{\Sigma} = 0.1^2\mathbf{I}_2\) as follows:

set.seed(1000)
myCoef <- list(A = list(matrix(c(0.5, 0, 0, 0.5), 2, 2)), c = c(0.2, 0.7))
myModel <- list(Coef = myCoef, Sigma = diag(0.1^2, 2), dof = Inf)
Y <- simVARmodel(numT = 100, model = myModel, burnin = 10)
resu_estim <- list()

Then, the multivariate ridge regression can be carried out for VAR models as follows. The result printed on the screen shows all the lambda values considered and the corresponding GCV values. The VAR parameters are estimated using the lambda value with the minimum GCV value.

resu_estim$`Ridge regression` <-
  VARshrink(Y, p = 1, type = "const", method = "ridge", lambda = NULL)

resu_estim$`Ridge regression`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.61424699 0.07318362 0.05264740 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>     y1.l1     y2.l1     const 
#> 0.1195550 0.3254618 0.9055450 
#> 
#> 
#> lambda: 1e-04 5e-04 0.001 0.005 0.01 0.05 0.1 0.5 1 5 10 50 (estimated: TRUE) 
#> GCV:  0.0187 0.0186 0.0186 0.0192 0.02 0.0227 0.025 0.0647 0.1506 0.8406 1.2754 1.9329

The method summary() is available for objects of class “varshrinkest” as follows:

summary(resu_estim$`Ridge regression`)
#> 
#> VAR Shrinkage Estimation Results:
#> =================================== 
#> Endogenous variables: y1, y2 
#> Deterministic variables: const 
#> Sample size: 99 
#> Log Likelihood: 188.793 
#> Roots of the characteristic polynomial:
#> 0.6419 0.2978
#> Call:
#> VARshrink(y = Y, p = 1, type = "const", method = "ridge", lambda = NULL)
#> 
#> 
#> Estimation results for equation y1: 
#> =================================== 
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1  0.61425    0.07636   8.045 2.27e-12 ***
#> y2.l1  0.07318    0.07549   0.969    0.335    
#> const  0.05265    0.10888   0.484    0.630    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.08913 on 96.16244 degrees of freedom
#> Multiple R-Squared: 0.3864,  Adjusted R-squared: 0.3747 
#> F-statistic: 32.95 on 1.837559 and 96.16244 DF,  p-value: 4.725e-11 
#> 
#> Estimation results for equation y2: 
#> =================================== 
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1  0.11955    0.08632   1.385 0.169249    
#> y2.l1  0.32546    0.08534   3.814 0.000242 ***
#> const  0.90555    0.12308   7.357  6.3e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.1008 on 96.16244 degrees of freedom
#> Multiple R-Squared: 0.1219,  Adjusted R-squared: 0.1051 
#> F-statistic: 7.263 on 1.837559 and 96.16244 DF,  p-value: 0.00157 
#> 
#> 
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#>            y1         y2
#> y1  0.0077160 -0.0006829
#> y2 -0.0006829  0.0098611
#> 
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] Inf
#> 
#> Correlation matrix of Sigma:
#>          y1       y2
#> y1  1.00000 -0.07829
#> y2 -0.07829  1.00000

3.2 Nonparametric Shrinkage (NS)

The nonparametric shrinkage (NS) estimation method for VAR models, proposed by Opgen-Rhein and Strimmer (2007b), produces an estimate of \(\mathbf{\Psi}\) based on a James-Stein type shrinkage of sample covariance matrices (Opgen-Rhein and Strimmer 2007a; Schäfer and Strimmer 2005).

We will briefly describe the NS method. In the NS method, the data matrices \(\mathbf{X}\) and \(\mathbf{Y}\) are mean-corrected so that each column has the mean of zero. Let \(\mathbf{Z} = [\mathbf{X}, \mathbf{Y}] \in \mathbb{R}^{N \times (Kp + K + L)}\) be a combined data matrix. The sample covariance matrix of \(\mathbf{Z}\) is partitioned as \[\begin{equation} \mathbf{S}_\text{ZZ} = \frac{1}{N - 1} \mathbf{Z}^\top \mathbf{Z} = \begin{bmatrix} \mathbf{S}_\text{XX} & \mathbf{S}_\text{XY} \\ \mathbf{S}_\text{XY}^\top & \mathbf{S}_\text{YY} \end{bmatrix} \in \mathbb{R}^{(Kp + K + L) \times (Kp + K + L)}, \end{equation}\] where \(\mathbf{S}_\text{XX} = (N - 1)^{-1} \mathbf{X}^\top \mathbf{X}\), \(\mathbf{S}_\text{XY} = (N - 1)^{-1} \mathbf{X}^\top \mathbf{Y}\), and \(\mathbf{S}_\text{YY} = (N - 1)^{-1} \mathbf{Y}^\top \mathbf{Y}\). The matrix \(\mathbf{S}_\text{ZZ}\) can be decomposed as \[\begin{equation} \mathbf{S}_\text{ZZ} = \mathbf{D}_\text{Z}^{1/2} \mathbf{R}_\text{ZZ} \mathbf{D}_\text{Z}^{1/2}, \end{equation}\] where \(\mathbf{R}_\text{ZZ}\) is the sample correlation matrix and \(\mathbf{D}_\text{Z} = \text{diag}(s_{11}, s_{22}, \ldots, s_{Kp + K + L, Kp + K + L})\) is a diagonal matrix with diagonal elements of sample variances. Shrinkage estimates of the correlation matrix and the variances can be written as \[\begin{equation} \widehat{\mathbf{R}}_\text{ZZ} = (1-\lambda) \mathbf{R}_\text{ZZ} + \lambda \mathbf{I} \quad \text{and} \quad \widehat{\mathbf{D}}_\text{Z} = \text{diag}(\hat{s}_{11}, \hat{s}_{22}, \ldots, \hat{s}_{Kp+K+L,Kp+K+L}) \end{equation}\] with \[\begin{equation} \hat{s}_{ii} = (1-\lambda_v) s_{ii} + \lambda_v s_\text{med}, \quad i = 1, 2, \ldots, Kp + K + L, \end{equation}\] where \(s_\text{med}\) is a median of all the sample variances \(s_{ii}\), and \(0\leq \lambda, \lambda_v \leq 1\) are shrinkage parameters. The estimated covariance matrix can be computed by \[\begin{equation} \widehat{\mathbf{S}}_\text{ZZ}(\lambda, \lambda_v) = \widehat{\mathbf{D}}_\text{Z}^{1/2} \widehat{\mathbf{R}}_\text{ZZ} \widehat{\mathbf{D}}_\text{Z}^{1/2}. \end{equation}\] The values of the shrinkage parameters \(\lambda\) and \(\lambda_v\) are determined by the James-Stein type shrinkage method, which we call the NS method described in (Opgen-Rhein and Strimmer 2007a; Schäfer and Strimmer 2005).

The ordinary least squares estimate \(\widehat{\mathbf{\Psi}}^{\text{OLS}}\) of \(\mathbf{\Psi}\) is given by \(\widehat{\mathbf{\Psi}}^{\text{OLS}} = \mathbf{S}_\text{XX}^{-1} \mathbf{S}_\text{XY}\). We define the NS estimate of \(\mathbf{\Psi}\) as \[\begin{equation} \widehat{ \mathbf{\Psi} }^\text{N} (\lambda, \lambda_v) = \widehat{\mathbf{S}}_\text{XX}^{-1} \widehat{\mathbf{S}}_\text{XY}, \qquad 0\leq \lambda, \lambda_v \leq 1. \end{equation}\] where \(\widehat{\mathbf{S}}_\text{XX}\) and \(\widehat{\mathbf{S}}_\text{XY}\) are parts of the estimated covariance matrix, \[\begin{equation} \widehat{\mathbf{S}}_\text{ZZ} (\lambda, \lambda_v) = \begin{bmatrix} \widehat{\mathbf{S}}_\text{XX} & \widehat{\mathbf{S}}_\text{XY} \\ \widehat{\mathbf{S}}_\text{XY}^\top & \widehat{\mathbf{S}}_\text{YY} \end{bmatrix}. \end{equation}\]

In the package VARshrink, the function VARshrink(method = "ns", ...) provides an interface with the NS method. In specific, the package corpcor (Schäfer et al. 2017) includes the R function cov.shrink(), which can determine \(\lambda\) and \(\lambda_v\) and estimate the covariance matrix \(\widehat{\mathbf{S}}_\text{ZZ}(\lambda, \lambda_v)\). The function VARshrink() in the VARshrink package infers the NS estimates of VAR coefficients, \(\widehat{\mathbf{\Psi}}^\text{N} (\lambda, \lambda_v)\), by using the covariance matrix \(\widehat{\mathbf{S}}_\text{ZZ}(\lambda, \lambda_v)\). If the input arguments lambda and lambda_var are set as lambda = NULL and lambda_var = NULL, then \(\lambda\) and \(\lambda_v\) are determined automatically. We can find the estimated \(\lambda\) and \(\lambda_v\) values on the printed screen. For example,

resu_estim$`Nonparametric shrinkage` <-
  VARshrink(Y, p = 1, type = "none", method = "ns",
                    lambda = NULL, lambda_var = NULL)

resu_estim$`Nonparametric shrinkage`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 
#> 
#>      y1.l1      y2.l1 
#> 0.56430137 0.06139375 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 
#> 
#>     y1.l1     y2.l1 
#> 0.1080570 0.2537706 
#> 
#> 
#> lambda: 0.104881997093098 (estimated: TRUE) 
#> lambda_var: 1 (estimated: TRUE)

The summary() shows statistical inference on the estimated VAR coefficients together with the estimated scale matrix \(\mathbf{\Sigma}\) of multivariate t-distribution for noise.

summary(resu_estim$`Nonparametric shrinkage`)
#> 
#> VAR Shrinkage Estimation Results:
#> =================================== 
#> Endogenous variables: y1, y2 
#> Deterministic variables: none 
#> Sample size: 99 
#> Log Likelihood: 188.443 
#> Roots of the characteristic polynomial:
#> 0.5844 0.2337
#> Call:
#> VARshrink(y = Y, p = 1, type = "none", method = "ns", lambda = NULL, 
#>     lambda_var = NULL)
#> 
#> 
#> Estimation results for equation y1: 
#> =================================== 
#> y1 = y1.l1 + y2.l1 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1 0.564301   0.007754  72.775  < 2e-16 ***
#> y2.l1 0.061394   0.007315   8.393 3.47e-13 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.08828 on 98.81547 degrees of freedom
#> Multiple R-Squared: 0.3446
#> Warning in pf(result$fstatistic[1L], result$fstatistic[2L],
#> result$fstatistic[3L], : NaNs produced
#> ,    Adjusted R-squared:  0.35 
#> F-statistic: -63.71 on -0.8154669 and 98.81547 DF,  p-value: NA 
#> 
#> Estimation results for equation y2: 
#> =================================== 
#> y2 = y1.l1 + y2.l1 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1 0.108057   0.008725   12.38   <2e-16 ***
#> y2.l1 0.253771   0.008231   30.83   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.09933 on 98.81547 degrees of freedom
#> Multiple R-Squared: 0.08139
#> Warning in pf(result$fstatistic[1L], result$fstatistic[2L],
#> result$fstatistic[3L], : NaNs produced
#> ,    Adjusted R-squared: 0.08897 
#> F-statistic: -10.74 on -0.8154669 and 98.81547 DF,  p-value: NA 
#> 
#> 
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#>            y1         y2
#> y1  0.0077791 -0.0006733
#> y2 -0.0006733  0.0098486
#> 
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] Inf
#> 
#> Correlation matrix of Sigma:
#>          y1       y2
#> y1  1.00000 -0.07693
#> y2 -0.07693  1.00000

3.3 Full Bayesian Shrinkage

Ni and Sun (2005) and Sun and Ni (2004) studied Bayesian estimation methods using noninformative priors for VAR models, where they used Markov Chain Monte Carlo (MCMC) methods for estimating coefficient matrices, noise covariance matrices, and other hyperparameters in Bayesian VAR models. In Ni and Sun (2005), various Bayesian estimators were compared using several types of loss functions, noninformative priors, and multivariate normal and t-distributions. Among them, Bayesian estimators using a certain type of noninformative priors showed higher accuracies than the other Bayesian estimators in simulated experiments. The noninformative prior for coefficient matrices was called the shrinkage prior, and the prior for the noise covariance matrix was called the reference prior.

In the package VARshrink, we can obtain Bayesian estimators using the shrinkage prior and the reference prior, which are the noninformative priors that yielded the highest accuracies in the simulation results (Ni and Sun 2005). As a Bayesian estimator of VAR parameters, the minimizer of the posterior expectation of quadratic loss is computed, which is the mean of the posterior distribution. Additionally, the minimizer of the posterior expectation of LINEX loss is also computed (Ni and Sun 2005; Zellner 1986). In this section, we will explain the model assumptions in more detail.

First, the noise vectors are independent and identically distributed from multivariate t-distributions with the degree of freedom \(\nu\), i.e., \(\boldsymbol\epsilon_t \sim t_\nu (\mathbf{0}, \mathbf{\Sigma})\). It can be expressed as a hierarchical model as \[\begin{equation} \begin{split} ( \boldsymbol\epsilon_t | q_t ) & \sim \text{N}_K (\mathbf{0}, q_t^{-1} \mathbf{\Sigma}). \qquad\qquad\qquad (7) \\ q_t & \sim \text{Gamma}(\nu/2, \nu/2). \end{split} \end{equation}\]

Second, we denote the vectorized version of \(\mathbf{\Psi} \in\mathbb{R}^{(J/K)\times K}\) by \(\boldsymbol\psi = \text{vec}(\mathbf{\Psi}) \in\mathbb{R}^{J}\). Here \(J = K (Kp + L)\). The shrinkage prior \(\pi_\text{S}(\boldsymbol\psi)\) for \(\boldsymbol\psi \in\mathbb{R}^{J}\) is taken as \(\pi_\text{S}(\boldsymbol\psi) \propto \left\| \boldsymbol\psi \right\|^{ -(J-2) }.\) By introducing a latent variable \(\lambda>0\), the shrinkage prior can also be expressed as a scale mixture of multivariate normals as \[\begin{equation} \begin{split} (\boldsymbol\psi | \lambda) & \sim \text{N}_{J} (\mathbf{0}, \lambda^{-1} \mathbf{I}_J), \qquad\qquad\qquad (8) \\ \pi (\lambda) & \propto 1. \end{split} \end{equation}\]

Third, the reference prior \(\pi_\text{R}(\mathbf{\Sigma})\) for \(\mathbf{\Sigma}\) is taken as \[\begin{equation} \pi_\text{R}(\mathbf{\Sigma}) \propto \left| \mathbf{\Sigma} \right|^{-1} \prod_{1\leq i\leq j\leq K} (\lambda_i - \lambda_j)^{-1}, \end{equation}\] where \(\lambda_1 > \lambda_2 > \cdots > \lambda_K\) are eigenvalues of \(\mathbf{\Sigma}\).

Note that no hyperparameters are involved in the shrinkage prior and the reference prior, since they are noninformative priors. The Gibbs MCMC method makes use of the hierarchical expression of \(\boldsymbol\epsilon_t\). The Gibbs MCMC samples the parameters \((\boldsymbol\psi, \lambda, \mathbf{\Sigma}, \mathbf{Q}, \nu)\) with \(\mathbf{Q} = \text{diag}(q_{p+1}, \ldots, q_T)\) from conditional posterior distributions (Ni and Sun 2005). We remark that the mean of the conditional posterior distribution, \(\pi(\boldsymbol\psi | \lambda, \mathbf{\Sigma}, \mathbf{Q}, \nu; \mathbf{Y})\) of \(\boldsymbol\psi\) is given by \[\begin{equation} \widehat{\boldsymbol\psi}^{F} (\lambda) = \left[\left(\mathbf{\Sigma}^{-1} \otimes \left( \mathbf{X}^\top \mathbf{Q} \mathbf{X} \right) \right) + \lambda \mathbf{I}_J \right]^{-1} \text{vec}\left( \mathbf{X}^\top \mathbf{Q} \mathbf{Y} \mathbf{\Sigma}^{-1} \right), \qquad \lambda > 0. \qquad\qquad\qquad (9) \end{equation}\] Note that if \(\mathbf{\Sigma} = \mathbf{I}_K\) and \(q_{p+1}=\cdots=q_T = 1\), then the estimator becomes the ridge regression estimator. That is, Bayesian shrinkage estimators have more flexible and abundant expressions, even though more computational effort is required to estimate more parameters.

In the package VARshrink, the function VARshrink(method = "fbayes", ...) plays the role as an interface with the full Bayesian shrinkage method with the shrinkage prior and the reference prior. The shrinkage parameter \(\lambda\) is not set at a fixed value, i.e., the arguments lambda and lambda_var are of no use here. Instead, the mean of the posterior distribution, \(\hat{\delta} = \mathbb{E}\left[ \lambda^{-1} | \mathbf{Y} \right]\) is estimated during the MCMC process (Ni and Sun 2005), and we define \(\hat{\lambda} = \hat{\delta}^{-1}\). There are several arguments to be specified as follows.

dof: If dof = Inf, then a multivariate normal distribution is applied and and the weight \(\mathbf{Q}\) is not estimated. If dof is a finite value, then we apply the multivariate t-distribution with a fixed degree of freedom \(\nu=\) dof and estimate \(\mathbf{Q}\). If dof = NULL, we estimate both the degree of freedom \(\nu\) and the weight \(\mathbf{Q}\). In this case, the package ars is required for sampling the parameter \(\nu\) from its conditional posterior distribution.
burnincycle, mcmccycle: The Gibbs MCMC method samples a series of parameter values of the length, burnincycle + mcmccycle. burnincycle is the number of sampled parameter values to be discarded in the beginning, and mcmccycle is the number to be attained and used for computing the parameter estimates. By default, we set burnincycle = 1000 and mcmccycle = 2000.

For example, we run the full Bayesian shrinkage method with a fixed \(\nu = 6\) as follows.

resu_estim$`Full Bayes (fixed dof)` <-
  VARshrink(Y, p = 1, type = "const", method = "fbayes", dof = 6,
                     burnincycle = 1000, mcmccycle = 2000)

resu_estim$`Full Bayes (fixed dof)`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.64042931 0.05773600 0.06400323 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.09971729 0.31292326 0.93493057 
#> 
#> 
#> Sigma for noise:
#>               [,1]          [,2]
#> [1,]  0.0072637763 -0.0001772936
#> [2,] -0.0001772936  0.0081118713
#> dof for noise: 6 (estimated: FALSE) 
#> lambda: 1.46042035262722 (estimated: TRUE)

By using summary(), we can find the estimated scale matrix \(\mathbf{\Sigma}\) of multivariate t-distribution for noise.

summary(resu_estim$`Full Bayes (fixed dof)`)
#> 
#> VAR Shrinkage Estimation Results:
#> =================================== 
#> Endogenous variables: y1, y2 
#> Deterministic variables: const 
#> Sample size: 99 
#> Log Likelihood: 184.114 
#> Roots of the characteristic polynomial:
#> 0.6572 0.2962
#> Call:
#> VARshrink(y = Y, p = 1, type = "const", method = "fbayes", dof = 6, 
#>     burnincycle = 1000, mcmccycle = 2000)
#> 
#> 
#> Estimation results for equation y1: 
#> =================================== 
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1  0.64043    0.07897   8.110 1.66e-12 ***
#> y2.l1  0.05774    0.08313   0.695    0.489    
#> const  0.06400    0.12041   0.532    0.596    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.08916 on 96.04009 degrees of freedom
#> Multiple R-Squared: 0.4049,  Adjusted R-squared: 0.3927 
#> F-statistic: 33.34 on 1.959913 and 96.04009 DF,  p-value: 1.399e-11 
#> 
#> Estimation results for equation y2: 
#> =================================== 
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1  0.09972    0.08931   1.117  0.26696    
#> y2.l1  0.31292    0.09401   3.329  0.00124 ** 
#> const  0.93493    0.13617   6.866 6.52e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.1008 on 96.04009 degrees of freedom
#> Multiple R-Squared: 0.1101,  Adjusted R-squared: 0.09197 
#> F-statistic: 6.064 on 1.959913 and 96.04009 DF,  p-value: 0.003513 
#> 
#> 
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#>            [,1]       [,2]
#> [1,]  0.0072638 -0.0001773
#> [2,] -0.0001773  0.0081119
#> 
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] 6
#> 
#> Correlation matrix of Sigma:
#>         [,1]    [,2]
#> [1,]  1.0000 -0.0231
#> [2,] -0.0231  1.0000

Instead, we can estimate the degree of freedom, \(\nu\), of multivariate t-distribution by the argument dof = NULL as follows:

resu_estim$`Full Bayes (estim dof)` <-
  VARshrink(Y, p = 1, type = "const", method = "fbayes", dof = NULL,
            burnincycle = 1000, mcmccycle = 2000)

resu_estim$`Full Bayes (estim dof)`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.63997678 0.05938708 0.06191316 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>     y1.l1     y2.l1     const 
#> 0.1043844 0.3132871 0.9323669 
#> 
#> 
#> Sigma for noise:
#>               [,1]          [,2]
#> [1,]  0.0070211398 -0.0002166658
#> [2,] -0.0002166658  0.0081808592
#> dof for noise: 8.43991341078547 (estimated: TRUE) 
#> lambda: 1.35334572461602 (estimated: TRUE)

3.4 Semiparametric Bayesian Shrinkage

Whereas full Bayesian shrinkage methods estimate all the hyperparameters including latent variables via MCMC methods, semiparametric Bayesian methods estimate some of the hyperparameters by a certain nonparametric method and estimate the rest in a Bayesian framework. The semiparametric approach is advantageous especially when the dimensionality of the model is so high that standard MCMC methods are not computationally tractable.

Lee, Choi, and Kim (2016) assumed scale mixtures of multivariate normal distributions for noise vectors as in Eq. (7). The prior distribution for the model coefficients, \(\boldsymbol\psi = \text{vec}(\mathbf{\Psi})\), was set as multivariate normal distributions, similarly to Eq. (8). Here, we can choose either the conjugate prior (CJ) and non-conjugate (NCJ) prior as follows:

The conjugate prior for \(\boldsymbol\psi \in\mathbb{R}^J\) is expressed by \[\begin{equation} (\boldsymbol\psi | \lambda, \mathbf{\Sigma}) \sim \text{N}_{J} \left(\mathbf{0}, \frac{1-\lambda}{(N - 1) \lambda} \mathbf{\Sigma} \otimes \mathbf{I} \right), \qquad 0<\lambda < 1. \end{equation}\]
The non-conjugate prior for \(\boldsymbol\psi \in\mathbb{R}^J\) is expressed by \[\begin{equation} (\boldsymbol\psi | \lambda) \sim \text{N}_{J} \left(\mathbf{0}, \frac{1-\lambda}{(N - 1) \lambda} \mathbf{I}_J \right), \qquad 0<\lambda < 1. \end{equation}\]

The scale matrix \(\mathbf{\Sigma}\) for noise is included in the equation for conjugate prior distribution but not for the non-conjugate prior distribution. The non-conjugate prior is quite similar to the full Bayesian formulation in Eq. (8). However, the main difference is that, in the semiparametric Bayesian approach, the shrinkage parameter \(\lambda\) should be estimated explicitly via a nonparametric method, but in the full Bayes approach, it is a latent variable which should be sampled and estimated implicitly via an MCMC method.

The prior distribution for \(\mathbf\Sigma\) was set as an inverse Wishart distribution as \[\begin{equation} (\mathbf{\Sigma} | \mathbf{L}_0, m_0) \sim \text{InvWishart}( \mathbf{L}_0, m_0), \qquad \mathbf{L}_0 \succ \mathbf{0}, m_0 > K - 1. \end{equation}\] where \(\mathbf{L}_0 \succ \mathbf{0}\) means that \(\mathbf{L}_0\) is positive definite.

Once the shrinkage parameter \(\lambda\) is set at a fixed value, the other parameters, \(\boldsymbol\psi, \mathbf{\Sigma}\), and \(\mathbf{Q}\) can be estimated iteratively in a Bayesian framework efficiently. Briefly speaking, we consider estimating the parameters \(\boldsymbol\psi\) and \(\mathbf{\Sigma}\) at the maximum point (i.e., the mode) of the marginal posterior density function \(\pi (\boldsymbol\psi, \mathbf{\Sigma} | \lambda; \mathbf{Y})\). In the case of the non-conjugate prior, the mode, \((\widehat{\boldsymbol\psi}^\text{S} (\lambda), \widehat{\mathbf{\Sigma}}^\text{S} (\lambda))\), is expressed as \[\begin{equation} \widehat{\boldsymbol\psi}^{S} (\lambda) = \left[\left( (\widehat{\mathbf{\Sigma}}^\text{S})^{-1} \otimes \left( \mathbf{X}^\top \widehat{\mathbf{Q}}^\text{S} \mathbf{X} \right) \right) + \frac{ (N - 1) \lambda}{1-\lambda} \mathbf{I}_J \right]^{-1} \text{vec}\left( \mathbf{X}^\top \widehat{\mathbf{Q}}^\text{S} \mathbf{Y} ( \widehat{\mathbf{\Sigma}}^\text{S})^{-1} \right), \qquad\qquad\qquad (10) \end{equation}\] and
\[\begin{equation} \widehat{\mathbf{\Sigma}}^\text{S} (\lambda) = \frac{1}{m_0 + T + K + 1} \left( \mathbf{L}_0 + \mathbf{Y}^\top \widehat{\mathbf{Q}} \mathbf{Y} - \mathbf{Y}^\top \widehat{\mathbf{Q}} \mathbf{X} \widehat{\mathbf{\Psi}}^\text{S} (\lambda) \right), \end{equation}\] for \(0< \lambda < 1\), where \(\widehat{\mathbf{Q}}^\text{S} = \text{diag}(\hat{q}_{p+1}, \ldots, \hat{q}_T)\) is obtained in an iterative manner. The values \(\widehat{\mathbf{Q}}^\text{S}\) is also expressed as \[\begin{equation} \hat{q}_t = h \left( \left\| (\widehat{\mathbf{\Sigma}}^\text{S})^{-1/2} \hat{\boldsymbol\epsilon}_t \right\|^2 \right), \qquad t = p+1, \ldots, T, \end{equation}\] where \(h(x)\) is defined depending on the noise distribution and \(\hat{\boldsymbol\epsilon}_t\) is the residual given by \(\hat{\boldsymbol\epsilon}_t = \mathbf{y}_t - \widehat{\mathbf{\Psi}}^{\text{S}\top} \mathbf{x}_t\) (Lee, Choi, and Kim 2016).

The shrinkage parameter \(\lambda\) is determined via an internal nonparametric method, which is called the parameterized cross validation (PCV). This algorithm can be considered as a modified \(K\)-fold cross validation, especially for estimating the shrinkage parameter of VAR models; see Lee, Choi, and Kim (2016) for a detailed explanation.

In addition, the semiparametric Bayesian shrinkage method adopts the idea of shrinking both the correlations and variances from the NS method. As a result, there are two shrinkage parameters \(\lambda\) and \(\lambda_v\), where \(0 \leq \lambda \leq 1\) is used for the shrinkage estimation of the VAR coefficient matrix \(\Psi\) and noise covariance matrix \(\mathbf{\Sigma}\) while \(0 \leq \lambda_v \leq 1\) is used for the shrinkage estimation of the variances of the variables \(y_{tj}, j = 1, \ldots, K\).

In the package VARshrink, the function VARshrink(method = "sbayes", ...) is for the semiparametric Bayesian shrinkage method. There are several input arguments to these functions as follows:

dof: If dof = Inf, we use a multivariate normal distribution for noise vectors. Otherwise, dof can be set as a finite number \(\nu\) between \(0\) and \(\infty\) for the multivariate t-distribution with the degree of freedom \(\nu\). If dof = NULL, then \(\nu\) is automatically selected by the \(K\)-fold cross validation, with \(K\) = num_fold. dof = Inf by default.
lambda, lambda_var: lambda = NULL and lambda_var = NULL implies that the shrinkage parameters \(\lambda\) and \(\lambda_v\) are estimated automatically with \(0\leq \lambda, \lambda_v \leq 1\).
prior_type: Either prior_type = "NCJ" or prior_type = "CJ", which implies non-conjugate prior or conjugate prior. The default value is “NCJ”.
num_folds: The number of folds for the parameterized cross validation method for determining \(\lambda\) value. num_folds = 5 by default. It works only when lambda = NULL or dof = NULL.
m0: The value of the hyperparameter \(m_0\) of the inverse Wishart prior of \(\mathbf{\Sigma}\). m0 =\(K\) by default. On the other hand, the other hyperparameter \(\mathbf{L}_0\) of the inverse Wishart prior is set by \(\mathbf{L}_0 = (m_0 + K + 1) \mathbf{I}_K\).

For example, we can run the semiparametric shrinkage method as follows.

resu_estim$`Semi Bayes (fixed dof)` <-
  VARshrink(Y, p = 1, type = "const", method = "sbayes", dof = 6,
            lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
            num_folds = 5, m0 = ncol(Y))

resu_estim$`Semi Bayes (fixed dof)`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.65206869 0.05452195 0.05686968 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>     y1.l1     y2.l1     const 
#> 0.1178293 0.3705490 0.8807748 
#> 
#> 
#> Sigma for noise:
#>               y1            y2
#> y1  0.0064309599 -0.0003425727
#> y2 -0.0003425727  0.0092293198
#> dof for noise: 6 (estimated: FALSE) 
#> lambda: 0.000798110056835296 (estimated: TRUE) 
#> lambda_var: 0.91370763181838 (estimated: TRUE)
summary(resu_estim$`Semi Bayes (fixed dof)`)
#> 
#> VAR Shrinkage Estimation Results:
#> =================================== 
#> Endogenous variables: y1, y2 
#> Deterministic variables: const 
#> Sample size: 99 
#> Log Likelihood: 178.1 
#> Roots of the characteristic polynomial:
#> 0.6733 0.3493
#> Call:
#> VARshrink(y = Y, p = 1, type = "const", method = "sbayes", lambda = NULL, 
#>     lambda_var = NULL, dof = 6, prior_type = "NCJ", num_folds = 5, 
#>     m0 = ncol(Y))
#> 
#> 
#> Estimation results for equation y1: 
#> =================================== 
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1  0.65207    0.07737   8.428  3.5e-13 ***
#> y2.l1  0.05452    0.08503   0.641    0.523    
#> const  0.05687    0.12323   0.461    0.645    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.08948 on 96.00218 degrees of freedom
#> Multiple R-Squared: 0.4116,  Adjusted R-squared: 0.3993 
#> F-statistic: 33.61 on 1.997823 and 96.00218 DF,  p-value: 8.772e-12 
#> 
#> Estimation results for equation y2: 
#> =================================== 
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>       Estimate Std. Error t value Pr(>|t|)    
#> y1.l1  0.11783    0.09351   1.260 0.210682    
#> y2.l1  0.37055    0.10276   3.606 0.000496 ***
#> const  0.88077    0.14894   5.914 5.13e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.1082 on 96.00218 degrees of freedom
#> Multiple R-Squared: 0.1311,  Adjusted R-squared: 0.113 
#> F-statistic: 7.247 on 1.997823 and 96.00218 DF,  p-value: 0.001176 
#> 
#> 
#> Scale matrix, Sigma, of the multivariate t-distribution for noise:
#>            y1         y2
#> y1  0.0064310 -0.0003426
#> y2 -0.0003426  0.0092293
#> 
#> Degrees of freedom of the multivariate t-distribution for noise:
#> [1] 6
#> 
#> Correlation matrix of Sigma:
#>          y1       y2
#> y1  1.00000 -0.04447
#> y2 -0.04447  1.00000

We can also let the software package to choose the degree of freedom parameter \(\nu\) automatically by setting dof = NULL. In this case, the package uses simply a \(K\)-fold cross validation to find an optimal value of \(\nu\).

resu_estim$`Semi Bayes (estim dof)` <-
  VARshrink(Y, p = 1, type = "const", method = "sbayes", dof = NULL,
            lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
            num_folds = 5, m0 = ncol(Y))

resu_estim$`Semi Bayes (estim dof)`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.63678246 0.06545189 0.04630952 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>     y1.l1     y2.l1     const 
#> 0.1478771 0.4041864 0.8161762 
#> 
#> 
#> Sigma for noise:
#>               y1            y2
#> y1  0.0074456739 -0.0006360331
#> y2 -0.0006360331  0.0116072333
#> dof for noise: Inf (estimated: TRUE) 
#> lambda: 0.00126661656054025 (estimated: TRUE) 
#> lambda_var: 0.91370763181838 (estimated: TRUE)

3.5 K-fold Cross Validation for Semiparametric Shrinkage

The VARshrink includes an implementation of the K-fold cross validation (CV) method for selecting shrinkage parameters. In the current version of the package, the K-fold CV method can select the \(\lambda\) and \(\lambda_v\) values for the semiparametric Bayesian shrinkage estimator described in Section 3.4. Note the the semiparametric shrinkage method in the previous section selects a \(\lambda\) value by using the PCV method and selects a \(\lambda_v\) value by a Stein-type nonparametric shrinkage method.

The K-fold CV method can be run as follows. The arguments to VARshrink() are same to those for the semiparametric Bayesian shrinkage method except for the argument method = "kcv".

resu_estim$`K-fold CV (fixed dof)` <-
  VARshrink(Y, p = 1, type = "const", method = "kcv", dof = 6,
            lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
            num_folds = 5, m0 = ncol(Y))

resu_estim$`K-fold CV (fixed dof)`
#> 
#> VAR Shrinkage Estimation Results:
#> ================================= 
#> 
#> Estimated coefficients for equation y1: 
#> ======================================= 
#> Call:
#> y1 = y1.l1 + y2.l1 + const 
#> 
#>      y1.l1      y2.l1      const 
#> 0.65223689 0.05916827 0.05784108 
#> 
#> 
#> Estimated coefficients for equation y2: 
#> ======================================= 
#> Call:
#> y2 = y1.l1 + y2.l1 + const 
#> 
#>     y1.l1     y2.l1     const 
#> 0.1135260 0.3886018 0.8219991 
#> 
#> 
#> Sigma for noise:
#>               y1            y2
#> y1  0.0068900761 -0.0003378826
#> y2 -0.0003378826  0.0087180682
#> dof for noise: 6 (estimated: FALSE) 
#> lambda: 0.001 (estimated: TRUE) 
#> lambda_var: 0.001 (estimated: TRUE)

Degree of freedom of multivariate t-distribution for noise can be selected automatically by setting the argument dof = Inf as follows.

resu_estim$`K-fold CV (estim dof)` <-
  VARshrink(Y, p = 1, type = "const", method = "kcv", dof = NULL,
            lambda = NULL, lambda_var = NULL, prior_type = "NCJ",
            num_folds = 5, m0 = ncol(Y))

After all, the function calcSSE_Acoef() computes the sum of squared errors (SSEs) between two VAR model parameters, \(\{\mathbf{A}_k^{(1)}\}\) and \(\{\mathbf{A}_k^{(2)}\}\), as \(SSE = \sum_{k=1}^p \sum_{i,j} (\mathbf{A}^{(1)}_{kij} - \mathbf{A}^{(2)}_{kij})^2\). For example, Table 2 shows the SSEs of the estimated VAR coefficients.

resu_sse <- data.frame(SSE = sapply(resu_estim,
  function(x) calcSSE_Acoef(Acoef_sh(x), myCoef$A)))

Table 2. Sum of squared errors of VAR coefficients estimated by the shrinkage methods.
	SSE
Ridge regression	0.063
Nonparametric shrinkage	0.080
Full Bayes (fixed dof)	0.068
Full Bayes (estim dof)	0.069
Semi Bayes (fixed dof)	0.057
Semi Bayes (estim dof)	0.054
K-fold CV (fixed dof)	0.052
K-fold CV (estim dof)	0.040

	AIC.p=1	BIC.p=1	AIC.p=2	BIC.p=2	AIC.p=3	BIC.p=3
Ridge regression	465.8	504.6	442.9	509.3	445.3	525.9
Nonparametric shrinkage	454.5	478.9	426.6	471.4	422.5	485.4
Full Bayes	458.9	500.6	446.9	514.6	453.0	536.8
Semi Bayes	585.0	629.3	462.6	525.5	499.0	591.5
K-fold CV	478.0	517.4	462.8	525.0	463.2	525.6

VARshrink 0.3: Shrinkage Estimation Methods for Vector Autoregressive Models (A Brief Version)

Namgil Lee, Heon-Young Yang, and Sung-Ho Kim*

2026-01-10

1 Introduction

1.1 Backgrounds

1.2 Main Purpose

2 Models

3 Shrinkage Estimation Methods

3.1 Multivariate Ridge Regression

3.2 Nonparametric Shrinkage (NS)

3.3 Full Bayesian Shrinkage

3.4 Semiparametric Bayesian Shrinkage

3.5 K-fold Cross Validation for Semiparametric Shrinkage

4 Numerical Experiments

4.1 Benchmark Data

5 Conclusions

References