--- title: "Modified HP Filter: Theory and Methodology" author: - name: "Muhammad Yaseen" affiliation: "School of Mathematical and Statistical Sciences, Clemson University" email: "myaseen208@gmail.com" orcid: "0000-0002-5923-1714" - name: "Javed Iqbal" affiliation: "State Bank of Pakistan" email: "Javed.iqbal6@sbp.org.pk" - name: "M. Nadim Hanif" affiliation: "State Bank of Pakistan" email: "Nadeem.hanif@sbp.org.pk" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true fig_width: 8 fig_height: 6 pdf_document: toc: true number_sections: true keep_tex: false fig_width: 8 fig_height: 6 vignette: > %\VignetteIndexEntry{Modified HP Filter: Theory and Methodology} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Introduction This vignette provides a detailed explanation of the methodology underlying the Modified Hodrick-Prescott (HP) filter as implemented in the `mhpfilter` package. We cover the mathematical foundations, the cross-validation approach for optimal parameter selection, and empirical evidence supporting its use. ```{r library} library(mhpfilter) library(fastverse) library(tidyverse) ``` ## 1. The Standard HP Filter ### 1.1 Problem Formulation The Hodrick-Prescott (1997) filter decomposes a time series $y_t$ into a trend component $g_t$ and a cyclical component $c_t$: $$y_t = g_t + c_t, \quad t = 1, 2, \ldots, T$$ The filter estimates the trend by solving the optimization problem: $$\min_{g_t} \left\{ \sum_{t=1}^{T}(y_t - g_t)^2 + \lambda \sum_{t=3}^{T}[(g_t - g_{t-1}) - (g_{t-1} - g_{t-2})]^2 \right\}$$ This objective function balances two competing goals: 1. **Goodness of fit**: $\sum_{t=1}^{T}(y_t - g_t)^2$ - the trend should be close to the data 2. **Smoothness**: $\sum_{t=3}^{T}(\Delta^2 g_t)^2$ - the trend's growth rate should not change too rapidly The smoothing parameter $\lambda$ controls the trade-off: - $\lambda \to 0$: Trend equals original series (no smoothing) - $\lambda \to \infty$: Trend is a linear time trend (maximum smoothing) ### 1.2 Matrix Formulation The solution can be expressed in matrix form. Define the second-difference operator matrix $K$ of dimension $(T-2) \times T$: $$K = \begin{pmatrix} 1 & -2 & 1 & 0 & \cdots & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 \\ \vdots & & \ddots & \ddots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & -2 & 1 \end{pmatrix}$$ Let $A = K'K$. Then the HP filter solution is: $$\hat{g} = (I + \lambda A)^{-1} y$$ where $I$ is the $T \times T$ identity matrix. ### 1.3 The Problem with Fixed Lambda Hodrick and Prescott (1997) recommended $\lambda = 1600$ for quarterly U.S. GDP data, based on the assumption that: > "A 5 percent cyclical component is moderately large, as is one-eighth of > 1 percent change in the growth rate in a quarter." This led to $\lambda = (5 / 0.125)^2 = 1600$. However, this assumption may not hold for: - Different countries (emerging markets have more volatile cycles) - Different series (investment is more volatile than consumption) - Different time periods (Great Moderation vs earlier periods) ```{r lambda-effect, fig.cap="Effect of Lambda on Trend Smoothness"} set.seed(100) y <- cumsum(rnorm(80)) + 2*sin((1:80)*pi/20) lambdas <- c(100, 400, 1600, 6400) # Create data frame for all lambda values plot_data <- data.frame() for (lam in lambdas) { result <- hp_filter(y, lambda = lam) temp_data <- data.frame( time = rep(1:length(y), 2), value = c(y, result$trend), series = rep(c("Original", "Trend"), each = length(y)), lambda = paste("λ =", lam) ) plot_data <- rbind(plot_data, temp_data) } # Create faceted plot with combined legend ggplot(plot_data, aes(x = time, y = value, color = series, linewidth = series)) + geom_line() + scale_color_manual(values = c("Original" = "black", "Trend" = "blue")) + scale_linewidth_manual(values = c("Original" = 0.7, "Trend" = 1.2)) + facet_wrap(~ lambda, ncol = 2) + labs(x = "Time", y = "Value", color = "Series", linewidth = "Series") + theme_minimal() + theme(legend.position = "bottom", strip.text = element_text(size = 10, face = "bold")) ``` ## 2. The Modified HP Filter ### 2.1 Cross-Validation Approach The Modified HP filter, developed by McDermott (1997) and applied empirically by Choudhary, Hanif & Iqbal (2014), selects $\lambda$ using **generalized cross-validation (GCV)**. The idea is based on leave-one-out cross-validation: 1. For each observation $k$, fit the HP filter to all data *except* $y_k$ 2. Predict the left-out observation using the fitted trend 3. Choose $\lambda$ that minimizes the average prediction error Let $g_{T,\lambda}^{(k)}(t_k)$ denote the predicted value at time $k$ from the spline fitted leaving out observation $k$. The cross-validation function is: $$CV(\lambda) = \frac{1}{T} \sum_{k=1}^{T} \left(y_k - g_{T,\lambda}^{(k)}(t_k)\right)^2$$ ### 2.2 Generalized Cross-Validation Computing the leave-one-out CV is computationally intensive. Craven & Wahba (1979) showed it can be approximated by the **generalized cross-validation** (GCV) function: $$GCV(\lambda) = \frac{T^{-1} \sum_{k=1}^{T}(y_k - g_{t,k}^\lambda)^2}{\left(1 - \frac{1}{T}\text{tr}(B(\lambda))\right)^2}$$ where $B(\lambda) = (I + \lambda A)^{-1}$ is the "hat matrix" and $g_{t,k}^\lambda = \sum_{s=1}^{T} b_{ks}(\lambda) y_s$. Using Silverman's (1984) approximation for the trace: $$\text{tr}(B(\lambda)) \approx \frac{T}{\lambda}$$ This yields the simplified GCV formula used in this package: $$GCV(\lambda) = T^{-1} \left(1 + \frac{2T}{\lambda}\right) \sum_{k=1}^{T}(y_k - g_{t,k}^\lambda)^2$$ ### 2.3 Algorithm The algorithm implemented in `mhpfilter` is: 1. Compute $A = K'K$ once (shared across all $\lambda$ values) 2. For $\lambda = 1, 2, \ldots, \lambda_{max}$: - Compute trend: $g = (I + \lambda A)^{-1} y$ - Compute residuals: $r = y - g$ - Compute GCV: $GCV(\lambda) = T^{-1}(1 + 2T/\lambda) \cdot r'r$ 3. Select $\lambda^* = \arg\min_\lambda GCV(\lambda)$ 4. Return final decomposition using $\lambda^*$ ```{r gcv-curve, fig.cap="GCV as a Function of Lambda"} set.seed(42) y <- cumsum(rnorm(100, 0.5, 0.3)) + arima.sim(list(ar = 0.8), 100) # Compute GCV for range of lambdas lambdas <- seq(100, 5000, by = 50) gcv_values <- sapply(lambdas, function(lam) { res <- hp_filter(y, lambda = lam, as_dt = FALSE) T <- length(y) resid_ss <- sum((y - res$trend)^2) (1 + 2*T/lam) * resid_ss / T }) # Create data frame for plotting plot_data <- data.frame( lambda = lambdas, gcv = gcv_values ) # Find minimum opt_idx <- which.min(gcv_values) opt_lambda <- lambdas[opt_idx] opt_gcv <- gcv_values[opt_idx] # Create ggplot ggplot(plot_data, aes(x = lambda, y = gcv)) + geom_line(linewidth = 1.2) + geom_vline(xintercept = opt_lambda, color = "red", linetype = "dashed", linewidth = 0.8) + geom_point(data = data.frame(lambda = opt_lambda, gcv = opt_gcv), aes(x = lambda, y = gcv), color = "red", size = 3) + # Corrected: Use annotate() instead of geom_label() with aes() annotate("label", x = opt_lambda + 500, y = max(gcv_values) * 0.95, label = paste("Optimal λ =", opt_lambda), color = "red", fill = "white", size = 4) + labs( x = expression(lambda), y = "GCV", title = "Generalized Cross-Validation Function" ) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), axis.title = element_text(size = 12), panel.grid.minor = element_blank() ) ``` ## 3. Why Modified HP Filter Works Better ### 3.1 Simulation Evidence Choudhary et al. (2014) conducted Monte Carlo simulations comparing the standard HP filter with the Modified HP filter. The data generating process was: $$g_t = \mu + g_{t-1} + \varepsilon_t, \quad \varepsilon_t \sim N(0, \sigma_\varepsilon^2)$$ $$c_t = \phi_1 c_{t-1} + \phi_2 c_{t-2} + \xi_t, \quad \xi_t \sim N(0, \sigma_\xi^2)$$ They varied: - The ratio $\sigma_\varepsilon / \sigma_\xi$ (relative importance of trend vs cycle) - AR coefficients $\phi_1, \phi_2$ (cycle persistence) - Linear vs nonlinear trend specifications **Key finding**: The Modified HP filter produced lower mean squared error in recovering the true cycle in nearly 100% of simulations. ```{r simulation, fig.cap="Simulation: True vs Estimated Cycles"} set.seed(2024) n <- 100 # Generate true components trend_true <- cumsum(c(0, rnorm(n-1, 0.5, 0.2))) cycle_true <- arima.sim(list(ar = c(1.2, -0.4)), n, sd = 1.5) y <- trend_true + cycle_true # Apply both filters mhp_result <- mhp_filter(y, max_lambda = 10000) hp_result <- hp_filter(y, lambda = 1600) # Compute MSE mse_mhp <- mean((mhp_result$cycle - cycle_true)^2) mse_hp <- mean((hp_result$cycle - cycle_true)^2) oldpar <- par(mfrow = c(1, 1)) plot(cycle_true, type = "l", lwd = 2, ylim = range(c(cycle_true, mhp_result$cycle, hp_result$cycle)), main = "True vs Estimated Cycles", ylab = "Cycle") lines(mhp_result$cycle, col = "blue", lwd = 2) lines(hp_result$cycle, col = "red", lwd = 2, lty = 2) legend("topright", c(paste0("True Cycle"), paste0("MHP (λ=", get_lambda(mhp_result), ", MSE=", round(mse_mhp, 2), ")"), paste0("HP (λ=1600, MSE=", round(mse_hp, 2), ")")), col = c("black", "blue", "red"), lty = c(1, 1, 2), lwd = 2) par(oldpar) ``` ### 3.2 Empirical Evidence Choudhary et al. (2014) estimated optimal $\lambda$ for 93 countries using annual data and 25 countries using quarterly data. Key findings: | Statistic | Annual Data | Quarterly Data | |-----------|-------------|----------------| | Range of λ | 11 - 6,566 | 229 - 4,898 | | Fixed λ | 100 | 1,600 | **Implications**: 1. Optimal $\lambda$ varies substantially across countries 2. Few countries have $\lambda$ close to conventional values 3. Standard HP tends to underestimate cyclical volatility (positive `sd_diff`) 4. AR(1) coefficients are relatively robust to filter choice 5. Correlations between series can differ significantly ```{r empirical-demo} # Demonstrate cross-country variation set.seed(999) # Simulate countries with different characteristics countries <- data.table( name = c("Stable_Developed", "Volatile_Emerging", "Moderate"), trend_sd = c(0.2, 0.5, 0.3), cycle_sd = c(0.5, 2.0, 1.0), cycle_ar = c(0.9, 0.7, 0.8) ) results <- lapply(1:nrow(countries), function(i) { trend <- cumsum(rnorm(80, 0.5, countries$trend_sd[i])) cycle <- arima.sim(list(ar = countries$cycle_ar[i]), 80, sd = countries$cycle_sd[i]) y <- trend + cycle res <- mhp_filter(y, max_lambda = 10000) data.table(country = countries$name[i], lambda = get_lambda(res)) }) print(rbindlist(results)) ``` ## 4. Theoretical Justification ### 4.1 Why Cross-Validation? Cross-validation is optimal in the sense that it asymptotically minimizes the **integrated squared error** between the estimated trend and the true trend (Craven & Wahba, 1979). The GCV function has several desirable properties: 1. **Rotation invariant**: Does not depend on the coordinate system 2. **Asymptotically optimal**: Converges to the best possible $\lambda$ 3. **Computationally efficient**: Closed-form approximation available ### 4.2 Relationship to Signal Extraction The HP filter can be viewed as a Wiener-Kolmogorov filter for signal extraction. If: - Trend follows a random walk: $\Delta^2 g_t = \varepsilon_t$ - Cycle is stationary noise: $c_t$ stationary Then $\lambda = \sigma_c^2 / \sigma_\varepsilon^2$ is the optimal filter. The GCV approach implicitly estimates this ratio from the data. ### 4.3 Connection to Spline Smoothing The HP filter is equivalent to a cubic smoothing spline. The GCV method for choosing $\lambda$ is the same used in spline smoothing literature (Silverman, 1984). ## 5. Practical Recommendations ### 5.1 When to Use Modified HP Filter **Use Modified HP when**: - Analyzing series from different countries (cross-country comparisons) - Working with different macro variables (GDP, consumption, investment) - Unsure about appropriate λ for your specific context - Need defensible, data-driven parameter selection **Standard HP may be acceptable when**: - Following established literature conventions exactly - Comparing results with published studies using λ = 1600 - Time series closely matches U.S. quarterly GDP characteristics ### 5.2 Choosing max_lambda The `max_lambda` parameter affects computation time and search range: | max_lambda | Use Case | Speed | |------------|----------|-------| | 5,000 | Quick exploratory analysis | ~0.05s | | 10,000 | Most quarterly macro series | ~0.1s | | 50,000 | Conservative, unusual series | ~0.5s | | 100,000 | Very smooth trends needed | ~1s | ```{r max-lambda-choice} # For most macro applications, 10000 is sufficient result <- mhp_filter(y, max_lambda = 10000) cat("Optimal lambda:", get_lambda(result), "\n") # If optimal λ hits the upper bound, increase max_lambda if (get_lambda(result) >= 9900) { warning("Lambda near upper bound - consider increasing max_lambda") } ``` ### 5.3 Interpreting Results Key diagnostics when comparing HP vs Modified HP: ```{r diagnostics} comp <- mhp_compare(y, frequency = "quarterly") print(comp) # Interpretation guide: # 1. Large lambda difference: series-specific smoothing important # 2. Positive sd_diff: HP under-estimates cycle volatility # 3. Large ar1_diff: affects model calibration ``` ## 6. Mathematical Appendix ### 6.1 Derivation of HP Filter Solution The first-order conditions of the minimization problem: $$\frac{\partial}{\partial g} \left[ (y-g)'(y-g) + \lambda g'A g \right] = 0$$ $$-2(y-g) + 2\lambda A g = 0$$ $$(I + \lambda A)g = y$$ $$g = (I + \lambda A)^{-1} y$$ ### 6.2 Structure of Matrix A The matrix $A = K'K$ is a symmetric, positive semi-definite band matrix: $$A = \begin{pmatrix} 1 & -2 & 1 & 0 & 0 & \cdots \\ -2 & 5 & -4 & 1 & 0 & \cdots \\ 1 & -4 & 6 & -4 & 1 & \cdots \\ 0 & 1 & -4 & 6 & -4 & \cdots \\ \vdots & & & \ddots & & \\ \end{pmatrix}$$ This banded structure allows efficient computation using sparse matrix methods (not exploited in this implementation, but potential for future optimization). ### 6.3 GCV Approximation Starting from: $$GCV(\lambda) = \frac{T^{-1} \|y - \hat{g}\|^2}{(1 - T^{-1}\text{tr}(B))^2}$$ Using $(1-x)^{-2} \approx 1 + 2x$ for small $x$: $$GCV(\lambda) \approx T^{-1} \|y - \hat{g}\|^2 (1 + 2T^{-1}\text{tr}(B))$$ With Silverman's approximation $\text{tr}(B) \approx T/\lambda$: $$GCV(\lambda) \approx T^{-1}(1 + 2T/\lambda) \|y - \hat{g}\|^2$$ ## References 1. Choudhary, M.A., Hanif, M.N., & Iqbal, J. (2014). On smoothing macroeconomic time series using the modified HP filter. *Applied Economics*, 46(19), 2205-2214. 2. Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions. *Numerische Mathematik*, 31, 377-403. 3. Hodrick, R.J., & Prescott, E.C. (1997). Postwar US business cycles: An empirical investigation. *Journal of Money, Credit and Banking*, 29(1), 1-16. 4. Marcet, A., & Ravn, M.O. (2004). The HP-filter in cross-country comparisons. *CEPR Discussion Paper* No. 4244. 5. McDermott, C.J. (1997). An automatic method for choosing the smoothing parameter in the HP filter. Unpublished manuscript, IMF. 6. Ravn, M.O., & Uhlig, H. (2002). On adjusting the Hodrick-Prescott filter for the frequency of observations. *Review of Economics and Statistics*, 84(2), 371-376. 7. Silverman, B.W. (1984). A fast and efficient cross-validation method for smoothing parameter choice in spline regression. *Journal of the American Statistical Association*, 79, 584-589.