--- title: "Introduction to mhpfilter" author: - name: "Muhammad Yaseen" affiliation: "School of Mathematical and Statistical Sciences, Clemson University" email: "myaseen208@gmail.com" orcid: "0000-0002-5923-1714" - name: "Javed Iqbal" affiliation: "State Bank of Pakistan" email: "Javed.iqbal6@sbp.org.pk" - name: "M. Nadim Hanif" affiliation: "State Bank of Pakistan" email: "Nadeem.hanif@sbp.org.pk" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true fig_width: 8 fig_height: 6 pdf_document: toc: true number_sections: true keep_tex: false fig_width: 8 fig_height: 6 vignette: > %\VignetteIndexEntry{Introduction to mhpfilter} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5.5, dpi = 96, message = FALSE, warning = FALSE ) ``` ## Overview The `mhpfilter` package provides a fast, efficient implementation of the **Modified Hodrick-Prescott (HP) filter** for decomposing time series into trend and cyclical components. The key innovation is **automatic optimal selection** of the smoothing parameter λ using generalized cross-validation (GCV), based on the methodology of Choudhary, Hanif & Iqbal (2014). ### Why Modified HP Filter? The standard HP filter uses **fixed λ values**: - λ = 1600 for quarterly data - λ = 100 for annual data - λ = 14400 for monthly data However, these values were calibrated for U.S. quarterly GDP in the 1990s. They may not be appropriate for: 1. **Different countries** (emerging markets vs developed economies) 2. **Different variables** (investment is more volatile than consumption) 3. **Different time periods** (pre/post Great Moderation) The Modified HP filter solves this by **estimating the optimal λ** from the data itself using the GCV criterion: $$\text{GCV}(\lambda) = T^{-1} \left(1 + \frac{2T}{\lambda}\right) \sum_{t=1}^{T}(y_t - g_t)^2$$ where $g_t$ is the estimated trend for a given $\lambda$, and $T$ is the number of observations. ## Installation ```{r, eval=FALSE} # Install from GitHub devtools::install_github("myaseen208/mhpfilter") ``` ## Quick Start ```{r library} library(mhpfilter) library(data.table) library(ggplot2) ``` ### Example 1: GDP-like Series Let's start with a simulated series that resembles real GDP data: ```{r gdp-simulation} set.seed(2024) n <- 120 # 30 years of quarterly data # Generate components trend_growth <- 0.005 # 0.5% quarterly growth trend <- cumsum(c(100, rnorm(n - 1, mean = trend_growth, sd = 0.002))) # Business cycle: AR(2) process cycle <- arima.sim(list(ar = c(1.3, -0.5)), n, sd = 2) # Observed GDP (in logs) gdp <- trend + cycle # Apply Modified HP filter result <- mhp_filter(gdp, max_lambda = 10000) cat("Optimal lambda:", get_lambda(result), "\n") cat("GCV value:", format(get_gcv(result), digits = 4), "\n") ``` ### Visualization with autoplot ```{r autoplot-example, fig.cap="Figure 1: Modified HP Filter Decomposition"} result_obj <- mhp_filter(gdp, max_lambda = 10000, as_dt = FALSE) autoplot(result_obj) + labs( title = "Modified HP Filter Decomposition of Simulated GDP", subtitle = paste0("Optimal λ = ", get_lambda(result_obj)) ) + theme_minimal(base_size = 11) + theme( plot.title = element_text(face = "bold", size = 13), plot.subtitle = element_text(color = "gray40"), legend.position = "bottom" ) ``` ## Comparison with Standard HP Filter One of the key features of `mhpfilter` is the ability to compare results with the standard HP filter: ### Example 2: Why Fixed Lambda Matters ```{r comparison-setup} # Apply both filters mhp_res <- mhp_filter(gdp, max_lambda = 10000) hp_res <- hp_filter(gdp, lambda = 1600) # Extract key statistics comp <- mhp_compare(gdp, frequency = "quarterly", max_lambda = 10000) print(comp) ``` ### Visual Comparison ```{r comparison-plot, fig.cap="Figure 2: HP vs Modified HP Filter Comparison", fig.height=7} plot_comparison(gdp, frequency = "quarterly", max_lambda = 10000, show_cycle = TRUE) + theme_minimal(base_size = 11) + theme( plot.title = element_text(face = "bold", size = 13), legend.position = "bottom", legend.box = "vertical" ) ``` ### Key Insights from Comparison The comparison reveals important differences: ```{r interpretation} # Optimal lambda from MHP opt_lambda <- get_lambda(mhp_res) cat("Optimal λ (Modified HP):", opt_lambda, "\n") cat("Fixed λ (Standard HP): 1600\n\n") # Cyclical volatility sd_mhp <- sd(mhp_res$cycle) sd_hp <- sd(hp_res$cycle) cat("Cycle SD (Modified HP):", round(sd_mhp, 3), "\n") cat("Cycle SD (Standard HP):", round(sd_hp, 3), "\n") cat("Difference:", round(sd_mhp - sd_hp, 3), paste0("(", round(100 * (sd_mhp - sd_hp) / sd_hp, 1), "%)"), "\n") ``` ## Batch Processing: Multi-Country Analysis A powerful feature is batch processing multiple series simultaneously. This is particularly useful for cross-country macroeconomic comparisons. ### Example 3: Cross-Country GDP Comparison ```{r batch-setup} set.seed(456) n_time <- 100 # 25 years quarterly # Simulate GDP for multiple countries with different characteristics countries <- c("USA", "Germany", "Japan", "China", "Brazil", "India") characteristics <- data.table( country = countries, trend_growth = c(0.5, 0.4, 0.3, 1.5, 0.6, 1.2), # % quarterly trend_vol = c(0.2, 0.15, 0.15, 0.4, 0.5, 0.6), cycle_vol = c(1.5, 1.2, 1.0, 2.5, 3.0, 2.0), cycle_ar = c(0.85, 0.88, 0.90, 0.75, 0.70, 0.75) ) # Generate data gdp_data <- sapply(1:nrow(characteristics), function(i) { char <- characteristics[i, ] trend <- 100 + cumsum(rnorm(n_time, mean = char$trend_growth / 100, sd = char$trend_vol / 100)) cycle <- arima.sim(list(ar = char$cycle_ar), n_time, sd = char$cycle_vol) trend + cycle }) colnames(gdp_data) <- countries # Apply Modified HP filter to all countries batch_result <- mhp_batch(gdp_data, max_lambda = 10000) # View optimal lambdas lambdas_table <- attr(batch_result, "lambdas") print(lambdas_table) ``` ### Visualizing Cross-Country Cycles ```{r batch-cycles-plot, fig.cap="Figure 3: Business Cycles Across Countries", fig.height=8} plot_batch(batch_result, show = "cycle", facet = TRUE) + labs( title = "Business Cycle Components: Cross-Country Comparison", subtitle = "Modified HP Filter with Country-Specific λ", caption = "Note: Each country has its own optimal smoothing parameter" ) + theme_minimal(base_size = 10) + theme( plot.title = element_text(face = "bold", size = 12), strip.text = element_text(face = "bold", size = 9), strip.background = element_rect(fill = "gray95", color = "gray80"), panel.spacing = unit(0.8, "lines") ) ``` ### Highlighting Specific Countries ```{r batch-highlight-plot, fig.cap="Figure 4: Highlighted Comparison", fig.height=6} plot_batch(batch_result, show = "cycle", facet = FALSE, highlight = c("USA", "China", "India")) + labs( title = "Business Cycles: Advanced vs Emerging Economies", subtitle = "USA (developed) vs China & India (emerging markets)", color = "Country (λ)" ) + theme_minimal(base_size = 11) + theme( plot.title = element_text(face = "bold", size = 12), legend.position = "right", legend.key.height = unit(0.8, "lines") ) ``` ### Batch Comparison with Standard HP ```{r batch-compare} comparison <- batch_compare(gdp_data, frequency = "quarterly", max_lambda = 10000) print(comparison) # Analyze differences cat("\nSummary of Lambda Differences:\n") cat("Mean optimal λ:", round(mean(comparison$lambda), 0), "\n") cat("Fixed λ (HP): 1600\n") cat("Range of optimal λ:", paste0("[", min(comparison$lambda), ", ", max(comparison$lambda), "]"), "\n\n") cat("Countries with λ > 2000:", paste(comparison$series[comparison$lambda > 2000], collapse = ", "), "\n") cat("Countries with λ < 1000:", paste(comparison$series[comparison$lambda < 1000], collapse = ", "), "\n") ``` ## Example 4: Real Business Cycle Analysis Demonstrating how the Modified HP filter affects business cycle statistics: ```{r business-cycle-stats} # Focus on one country for detailed analysis usa_gdp <- gdp_data[, "USA"] usa_mhp <- mhp_filter(usa_gdp, max_lambda = 10000, as_dt = FALSE) usa_hp <- hp_filter(usa_gdp, lambda = 1600, as_dt = FALSE) # Create comparison data.table bc_stats <- data.table( Method = c("Modified HP", "Standard HP"), Lambda = c(usa_mhp$lambda, 1600), `Mean Cycle` = c(mean(usa_mhp$cycle), mean(usa_hp$cycle)), `SD Cycle` = c(sd(usa_mhp$cycle), sd(usa_hp$cycle)), `Min Cycle` = c(min(usa_mhp$cycle), min(usa_hp$cycle)), `Max Cycle` = c(max(usa_mhp$cycle), max(usa_hp$cycle)), `AR(1) Coef` = c( cor(usa_mhp$cycle[-1], usa_mhp$cycle[-length(usa_mhp$cycle)]), cor(usa_hp$cycle[-1], usa_hp$cycle[-length(usa_hp$cycle)]) ) ) print(bc_stats) ``` ### Cyclical Properties Visualization ```{r cycle-properties, fig.cap="Figure 5: Cyclical Properties Comparison", fig.height=6} # Create data for plotting plot_data <- data.table( Time = rep(1:n_time, 2), Cycle = c(usa_mhp$cycle, usa_hp$cycle), Method = rep(c("Modified HP", "Standard HP"), each = n_time), Lambda = rep(c(paste0("λ = ", usa_mhp$lambda), "λ = 1600"), each = n_time) ) ggplot(plot_data, aes(x = Time, y = Cycle, color = Method, linetype = Method)) + geom_hline(yintercept = 0, linetype = "dashed", color = "gray60", linewidth = 0.5) + geom_line(linewidth = 0.8) + scale_color_manual(values = c("Modified HP" = "#0072B2", "Standard HP" = "#D55E00")) + scale_linetype_manual(values = c("Modified HP" = "solid", "Standard HP" = "dashed")) + labs( title = "Business Cycle Components: Method Comparison", subtitle = "Modified HP filter allows for data-driven smoothing", x = "Time Period (Quarters)", y = "Cyclical Component", color = NULL, linetype = NULL ) + theme_minimal(base_size = 11) + theme( plot.title = element_text(face = "bold", size = 12), legend.position = "bottom", panel.grid.minor = element_blank() ) ``` ## Example 5: Sensitivity to Lambda Demonstrating why optimal lambda selection matters: ```{r lambda-sensitivity, fig.cap="Figure 6: Sensitivity to Lambda Choice", fig.height=7} # Try different lambda values lambdas_to_try <- c(100, 400, 1600, 6400, get_lambda(mhp_res)) lambda_results <- lapply(lambdas_to_try, function(lam) { res <- hp_filter(gdp, lambda = lam, as_dt = FALSE) data.table( Time = 1:length(gdp), Original = gdp, Trend = res$trend, Cycle = res$cycle, Lambda = paste0("λ = ", lam), Is_Optimal = lam == get_lambda(mhp_res) ) }) sens_data <- rbindlist(lambda_results) # Plot trends ggplot(sens_data, aes(x = Time)) + geom_line(aes(y = Original), color = "gray60", linewidth = 0.5, alpha = 0.7) + geom_line(aes(y = Trend, color = Is_Optimal), linewidth = 0.9) + facet_wrap(~Lambda, ncol = 2) + scale_color_manual(values = c("TRUE" = "#D55E00", "FALSE" = "#0072B2"), labels = c("Sub-optimal", "Optimal (MHP)"), name = NULL) + labs( title = "Effect of Lambda on Trend Estimation", subtitle = "Lower λ = less smooth (overfitting), Higher λ = too smooth (underfitting)", x = "Time Period", y = "Value", caption = "Gray line shows original series" ) + theme_minimal(base_size = 10) + theme( plot.title = element_text(face = "bold", size = 12), strip.text = element_text(face = "bold"), strip.background = element_rect(fill = "gray95", color = NA), legend.position = "bottom", panel.spacing = unit(1, "lines") ) ``` ## Performance Tips ### Choosing max_lambda The `max_lambda` parameter determines the search range: ```{r max-lambda-guidance} # For most macroeconomic applications result_standard <- mhp_filter(gdp, max_lambda = 10000) cat("Standard search (max_lambda=10000):\n") cat(" Optimal λ:", get_lambda(result_standard), "\n") cat(" GCV:", format(get_gcv(result_standard), digits = 4), "\n\n") # For very smooth trends or unusual series result_extended <- mhp_filter(gdp, max_lambda = 50000) cat("Extended search (max_lambda=50000):\n") cat(" Optimal λ:", get_lambda(result_extended), "\n") cat(" GCV:", format(get_gcv(result_extended), digits = 4), "\n\n") # Check if we're near the boundary if (get_lambda(result_standard) > 0.95 * 10000) { cat("WARNING: Optimal λ near upper bound. Consider increasing max_lambda.\n") } ``` ### Batch Processing Efficiency ```{r batch-efficiency, eval=FALSE} # Efficient for multiple series large_dataset <- matrix(rnorm(100 * 50), nrow = 100, ncol = 50) system.time(batch_result <- mhp_batch(large_dataset, max_lambda = 5000)) # Less efficient: loop over columns system.time({ manual_result <- lapply(1:50, function(i) { mhp_filter(large_dataset[, i], max_lambda = 5000) }) }) ``` ## Summary and Recommendations ### When to Use Modified HP Filter **✓ Recommended for:** - Cross-country macroeconomic comparisons - Analysis of different economic variables (GDP, consumption, investment) - Uncertain about appropriate λ for your context - Need defensible, data-driven parameter selection - Comparing business cycles across countries with different volatility **Standard HP may suffice when:** - Replicating published studies using fixed λ - U.S. quarterly GDP analysis matching original HP (1997) context - Speed is critical and λ = 1600 is demonstrably appropriate ### Key Takeaways 1. **Optimal λ varies substantially** across countries and variables 2. **Fixed λ may mis-assign variation** to trend vs cycle 3. **Modified HP filter provides data-driven selection** via GCV 4. **Batch processing enables efficient multi-series analysis** 5. **Visualization tools help understand differences** between methods ## References - Choudhary, M.A., Hanif, M.N., & Iqbal, J. (2014). On smoothing macroeconomic time series using the modified HP filter. *Applied Economics*, 46(19), 2205-2214. https://doi.org/10.1080/00036846.2014.896982 - Hodrick, R.J., & Prescott, E.C. (1997). Postwar US business cycles: An empirical investigation. *Journal of Money, Credit and Banking*, 29(1), 1-16. - McDermott, C.J. (1997). An automatic method for choosing the smoothing parameter in the HP filter. Unpublished manuscript, International Monetary Fund. - Ravn, M.O., & Uhlig, H. (2002). On adjusting the Hodrick-Prescott filter for the frequency of observations. *Review of Economics and Statistics*, 84(2), 371-376. ## See Also - `vignette("methodology", package = "mhpfilter")` - Detailed mathematical theory - `?mhp_filter` - Main filtering function - `?mhp_batch` - Batch processing - `?mhp_compare` - Comparison tools