--- title: "Time Series Analysis with NMF-VAR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Time Series Analysis with NMF-VAR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{vars} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Introduction This vignette demonstrates how to apply the `nmfkc` package to time series data using the **NMF-VAR (Non-negative Matrix Factorization - Vector Autoregression)** framework. This approach models the coefficient matrix of NMF as a VAR process ($B \approx C A_{lag}$), allowing for both dimensionality reduction and temporal modeling. Key functions in `nmfkc` such as `nmfkc.ar` and `nmfkc.ar.degree.cv` directly support R's native **`ts` objects**, preserving time series properties automatically. We will cover two scenarios: 1. **Univariate AR:** Forecasting airline passengers (`AirPassengers`). 2. **Multivariate VAR:** Analyzing macroeconomic indicators (`Canada`). First, let's load the necessary packages. ```{r load-packages} library(nmfkc) library(vars) # For Canada dataset ``` ----- ## Example 1: Univariate Autoregression with AirPassengers The `AirPassengers` dataset contains monthly international airline passenger numbers. We will model this univariate time series using an autoregressive (AR) model. ### 1\. Data Preparation We simply log-transform the `ts` object to stabilize variance. We do **not** need to manually convert it to a matrix or create time vectors; `nmfkc` functions handle `ts` objects directly. ```{r air-data-prep} # Load and transform the ts object d_air <- AirPassengers d_air_log <- log10(d_air) # Still a ts object ``` ### 2\. Model Selection (Lag Order) We use **Cross-Validation** to find the optimal degree (lag order). By passing the `ts` object `d_air_log`, the function automatically handles the time dimension. ```{r air-degree-cv} # Evaluate lag orders from 1 to 14 # Note: ts objects are automatically transposed to (Variables x Time) internally cv_res <- nmfkc.ar.degree.cv(d_air_log, rank = 1, degree = 1:14, epsilon=1e-6, maxit=500000) # Check the optimal degree cv_res$degree # For this example, we will proceed with D=12 (capturing monthly seasonality) D <- 12 ``` ### 3\. Model Fitting We construct the observation matrix `Y` and covariate matrix `A` (lagged Y) using `nmfkc.ar()`. The returned matrices `Y` and `A` will have time-based column names derived from the `ts` object. ```{r air-model-fit} # Create matrices for the AR(12) model a_air <- nmfkc.ar(d_air_log, degree = D, intercept = TRUE) # Fit the NMF-AR model (Rank=1 for univariate) res_air <- nmfkc(Y = a_air$Y, A = a_air$A, rank = 1, epsilon = 1e-6, maxit=500000) # Check goodness of fit res_air$r.squared # Check for stationarity (spectral radius < 1) nmfkc.ar.stationarity(res_air) ``` ### 4\. Forecasting We can forecast future values using the fitted model. `nmfkc.ar.predict` uses the time properties stored in the model to generate the correct future time sequence. ```{r air-forecast} # Forecast next 2 years (24 months) h <- 24 pred_res <- nmfkc.ar.predict(x = res_air, Y = a_air$Y, n.ahead = h) # Convert predictions back to original scale pred_val <- 10^as.vector(pred_res$pred) pred_time <- pred_res$time # Future time points generated by the function # --- Plotting --- # Setup plot range xlim_range <- range(c(time(d_air), pred_time)) ylim_range <- range(c(d_air, pred_val)) # 1. Observed data (Black) plot(d_air, type = "l", col = "black", xlim = xlim_range, ylim = ylim_range, lwd = 1, xlab = "Year", ylab = "Air Passengers", main = "NMF-VAR Forecast (h=24)") # 2. Fitted values during training (Red) # a_air$Y has column names as time strings; we parse them for plotting fitted_time <- as.numeric(colnames(res_air$XB)) lines(fitted_time, 10^as.vector(res_air$XB), col = "red", lwd = 2) # 3. Forecast (Blue) # Connect the last observed point to the first forecast for a continuous line last_t <- tail(as.numeric(time(d_air)), 1) last_y <- tail(as.vector(d_air), 1) lines(c(last_t, pred_time), c(last_y, pred_val), col = "blue", lwd = 2, lty = 2) # Add legend legend("topleft", legend = c("Observed", "Fitted", "Forecast"), col = c("black", "red", "blue"), lty = c(1, 1, 2), lwd = 2) ``` ----- ## Example 2: Vector Autoregression with Canada Dataset The `Canada` dataset contains four quarterly macroeconomic variables. We'll use a multivariate NMF-VAR model to analyze the relationship between these variables. ### 1\. Data Preparation We take the first difference to achieve stationarity, normalize the data to $[0, 1]$, and transpose it to the `Variables x Time` format required by NMF. ```{r canada-data-prep} # Load, difference, and normalize d0_canada <- Canada dd_canada <- apply(d0_canada, 2, diff) # Returns a matrix (Time x Vars) dn_canada <- nmfkc.normalize(dd_canada) # Transpose to (Variables x Time) for NMF Y0_canada <- t(dn_canada) # Create matrices for VAR(1) a_canada <- nmfkc.ar(Y0_canada, degree = 1, intercept = TRUE) ``` ### 2\. Model Fitting We fit a model with `rank = 2` to identify two latent economic conditions driving the four variables. ```{r canada-model-fit} # Fit the NMF-VAR model res_canada <- nmfkc(Y = a_canada$Y, A = a_canada$A, rank = 2, prefix = "Condition", epsilon = 1e-6) # R-squared and Stationarity res_canada$r.squared nmfkc.ar.stationarity(res_canada) ``` ### 3\. Latent Structure & Causal Graph We can visualize how the two latent conditions change over time. ```{r canada-soft-cluster} # Visualize soft clustering of time trends barplot(res_canada$B.prob, col = c(2, 3), border = NA, main = "Soft Clustering of Economic Conditions", xlab = "Time", ylab = "Probability", names.arg = colnames(a_canada$Y), las=2, cex.names = 0.5) legend("topright", legend = colnames(res_canada$X), fill = c(2, 3), bg = "white") ``` Finally, we can generate a **DOT script** to visualize the Granger causality (relationships) between variables inferred by the model. ```{r canada-dot} # Generate DOT script for graph visualization dot_script <- nmfkc.ar.DOT(res_canada, intercept = TRUE, threshold=0.01) # plot(dot_script) # requires DiagrammeR ```