--- title: "Model Selection for Masked Series Systems via Likelihood Ratio Tests" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Model Selection for Masked Series Systems via Likelihood Ratio Tests} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} header-includes: - \renewcommand{\v}[1]{\boldsymbol{#1}} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, echo = TRUE, comment = "#>", fig.align = "center") # Set to TRUE to regenerate long-running simulation results run_long <- FALSE library(maskedcauses) old_opts <- options(digits = 4) ``` Introduction {#introduction} ============================ When analyzing masked series system data, a fundamental question precedes parameter estimation: which parametric family should we use for the component lifetimes? The `maskedcauses` package provides three models of increasing generality --- exponential, homogeneous Weibull, and heterogeneous Weibull --- each nested in the next. This hierarchy forms a natural scaffold for model selection via the likelihood ratio test (LRT). We develop a two-step testing cascade that moves top-down from the most general model, simplifying only when the data cannot distinguish between levels. We demonstrate the cascade on worked examples and then assess the power of the shape heterogeneity test through Monte Carlo simulation. The Weibull Nesting Chain {#nesting-chain} ========================================== ## Three levels of complexity The three models form a strict nesting chain: $$ \underbrace{\text{Exponential}}_{\lambda_1, \ldots, \lambda_m} \;\subset\; \underbrace{\text{Hom.\ Weibull}}_{k,\, \beta_1, \ldots, \beta_m} \;\subset\; \underbrace{\text{Het.\ Weibull}}_{k_1, \beta_1, \ldots, k_m, \beta_m} $$ | Model | Constructor | Parameters | Count | |-------|-------------|------------|:-----:| | Exponential | `exp_series_md_c1_c2_c3()` | $\lambda_1, \ldots, \lambda_m$ | $m$ | | Hom. Weibull | `wei_series_homogeneous_md_c1_c2_c3()` | $k, \beta_1, \ldots, \beta_m$ | $m + 1$ | | Het. Weibull | `wei_series_md_c1_c2_c3()` | $k_1, \beta_1, \ldots, k_m, \beta_m$ | $2m$ | ## The two LRT steps Each nesting defines a hypothesis test: **Step 1: Shape heterogeneity.** Does the data support different shapes for different components? $$ H_0: k_1 = k_2 = \cdots = k_m \quad \text{(homogeneous)} \quad\text{vs}\quad H_1: \text{shapes unrestricted (heterogeneous)} $$ The LRT statistic is $\Lambda = -2[\ell(\hat\theta_{\text{hom}}) - \ell(\hat\theta_{\text{het}})]$, which under $H_0$ follows $\chi^2_{m-1}$ asymptotically. **Step 2: Aging.** If the homogeneous model is retained, does the common shape differ from 1 (exponential)? $$ H_0: k = 1 \quad \text{(exponential)} \quad\text{vs}\quad H_1: k \neq 1 \quad \text{(homogeneous Weibull)} $$ The LRT statistic follows $\chi^2_1$ under $H_0$. ## Physical interpretation The nesting chain reflects increasing physical realism: - **Exponential $\to$ Homogeneous Weibull** tests whether components *age* ($k > 1$, increasing hazard) or exhibit infant mortality ($k < 1$, decreasing hazard). Rejecting $k = 1$ means the constant-hazard assumption is inadequate. - **Homogeneous $\to$ Heterogeneous Weibull** tests whether components have *different aging patterns*. Rejecting equal shapes indicates at least one component has a fundamentally different failure mechanism --- for example, an electronic component with infant mortality alongside a bearing with wear-out. ## Top-down testing cascade We recommend a **top-down** approach: start with the most general model and simplify until a restriction is rejected: 1. Fit all three models to the data. 2. Test heterogeneous vs homogeneous. If we cannot reject $H_0$, adopt the homogeneous model. 3. Test homogeneous vs exponential. If we cannot reject $H_0$, adopt the exponential model. This approach controls the risk of prematurely simplifying the model. Worked Example: Homogeneous True Model {#example-hom} ===================================================== We generate data from a 5-component homogeneous Weibull system --- the "middle" model in the chain --- and show that the LRT correctly identifies it. ## Setup and data generation ```{r hom-setup} m <- 5 theta_hom <- c(k = 1.5, beta1 = 300, beta2 = 400, beta3 = 500, beta4 = 600, beta5 = 700) model_hom <- wei_series_homogeneous_md_c1_c2_c3() # Right-censoring time at the 80th percentile of the system lifetime beta_sys <- wei_series_system_scale(theta_hom[1], theta_hom[-1]) tau <- qweibull(0.8, shape = theta_hom[1], scale = beta_sys) cat("System scale:", round(beta_sys, 1), "\n") cat("Right-censoring time (q = 0.8):", round(tau, 1), "\n") ``` ```{r hom-generate} set.seed(2024) gen_hom <- rdata(model_hom) df <- gen_hom(theta_hom, n = 300, p = 0.2, observe = observe_right_censor(tau = tau)) cat("Observation types:\n") print(table(df$omega)) ``` ## Fitting all three models ```{r hom-fit-all, warning=FALSE} # Exponential (m params) model_exp <- exp_series_md_c1_c2_c3() fit_e <- fit(model_exp)(df, par = rep(0.002, m), method = "Nelder-Mead") # Homogeneous Weibull (m+1 params) fit_h <- fit(model_hom)(df, par = c(1, rep(500, m)), method = "Nelder-Mead", control = list(maxit = 5000)) # Heterogeneous Weibull (2m params) — warm-start from hom fit model_het <- wei_series_md_c1_c2_c3() het_start <- as.numeric(rbind(rep(fit_h$par[1], m), fit_h$par[-1])) fit_w <- fit(model_het)(df, par = het_start, method = "Nelder-Mead", control = list(maxit = 10000)) ``` ## Likelihood ratio tests ```{r hom-lrt} np_exp <- m np_hom <- m + 1 np_het <- 2 * m n_obs <- nrow(df) # LRT: heterogeneous vs homogeneous LRT_shape <- -2 * (fit_h$loglik - fit_w$loglik) df_shape <- np_het - np_hom # m - 1 = 4 p_shape <- pchisq(LRT_shape, df = df_shape, lower.tail = FALSE) # LRT: homogeneous vs exponential LRT_aging <- -2 * (fit_e$loglik - fit_h$loglik) df_aging <- np_hom - np_exp # 1 p_aging <- pchisq(LRT_aging, df = df_aging, lower.tail = FALSE) # AIC and BIC aic_val <- function(ll, k) -2 * ll + 2 * k bic_val <- function(ll, k, n) -2 * ll + log(n) * k comparison <- data.frame( Model = c("Exponential", "Hom. Weibull", "Het. Weibull"), Params = c(np_exp, np_hom, np_het), LogLik = c(fit_e$loglik, fit_h$loglik, fit_w$loglik), AIC = c(aic_val(fit_e$loglik, np_exp), aic_val(fit_h$loglik, np_hom), aic_val(fit_w$loglik, np_het)), BIC = c(bic_val(fit_e$loglik, np_exp, n_obs), bic_val(fit_h$loglik, np_hom, n_obs), bic_val(fit_w$loglik, np_het, n_obs)) ) knitr::kable(comparison, digits = 2, caption = "Model comparison (true model: homogeneous Weibull)", col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC")) ``` ```{r hom-lrt-results} cat("Step 1: Het vs Hom (shape heterogeneity)\n") cat(sprintf(" LRT = %.2f, df = %d, p = %.4f\n", LRT_shape, df_shape, p_shape)) if (p_shape < 0.05) { cat(" Decision: Reject homogeneous -> adopt heterogeneous model\n\n") } else { cat(" Decision: Fail to reject -> retain homogeneous model\n\n") } cat("Step 2: Hom vs Exp (aging)\n") cat(sprintf(" LRT = %.2f, df = %d, p = %.4f\n", LRT_aging, df_aging, p_aging)) if (p_aging < 0.05) { cat(" Decision: Reject exponential -> adopt homogeneous model\n") } else { cat(" Decision: Fail to reject -> retain exponential model\n") } ``` Since the true model is homogeneous Weibull with $k = 1.5$, we expect: 1. The heterogeneous vs homogeneous test to **fail to reject** --- the additional shape parameters do not significantly improve the fit. 2. The homogeneous vs exponential test to **reject** --- the data shows clear evidence of $k \neq 1$. The cascade correctly identifies the homogeneous model. Both AIC and BIC should agree, with the homogeneous model having the smallest values. Worked Example: Heterogeneous True Model {#example-het} ======================================================= We now generate data from a system where components have meaningfully different shapes, and show that the LRT detects this. ```{r het-setup} # Shapes span DFR to strong IFR theta_het_true <- c(0.8, 300, # DFR (electronics) 1.0, 400, # CFR (random) 1.5, 500, # mild IFR (seals) 2.0, 600, # moderate IFR (bearings) 2.5, 700) # strong IFR (wear-out) model_het <- wei_series_md_c1_c2_c3() gen_het <- rdata(model_het) set.seed(7231) df_het <- gen_het(theta_het_true, n = 300, p = 0.2, observe = observe_right_censor(tau = tau)) cat("Observation types:\n") print(table(df_het$omega)) ``` ```{r het-fit-all, warning=FALSE} fit_e2 <- fit(model_exp)(df_het, par = rep(0.002, m), method = "Nelder-Mead") fit_h2 <- fit(model_hom)(df_het, par = c(1, rep(500, m)), method = "Nelder-Mead", control = list(maxit = 5000)) het_start2 <- as.numeric(rbind(rep(fit_h2$par[1], m), fit_h2$par[-1])) fit_w2 <- fit(model_het)(df_het, par = het_start2, method = "Nelder-Mead", control = list(maxit = 10000)) ``` ```{r het-lrt} # LRT: heterogeneous vs homogeneous LRT_shape2 <- -2 * (fit_h2$loglik - fit_w2$loglik) p_shape2 <- pchisq(LRT_shape2, df = m - 1, lower.tail = FALSE) # LRT: homogeneous vs exponential LRT_aging2 <- -2 * (fit_e2$loglik - fit_h2$loglik) p_aging2 <- pchisq(LRT_aging2, df = 1, lower.tail = FALSE) comparison2 <- data.frame( Model = c("Exponential", "Hom. Weibull", "Het. Weibull"), Params = c(np_exp, np_hom, np_het), LogLik = c(fit_e2$loglik, fit_h2$loglik, fit_w2$loglik), AIC = c(aic_val(fit_e2$loglik, np_exp), aic_val(fit_h2$loglik, np_hom), aic_val(fit_w2$loglik, np_het)), BIC = c(bic_val(fit_e2$loglik, np_exp, nrow(df_het)), bic_val(fit_h2$loglik, np_hom, nrow(df_het)), bic_val(fit_w2$loglik, np_het, nrow(df_het))) ) knitr::kable(comparison2, digits = 2, caption = "Model comparison (true model: heterogeneous Weibull)", col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC")) cat(sprintf("\nStep 1: Het vs Hom — LRT = %.2f, df = %d, p = %.4f\n", LRT_shape2, m - 1, p_shape2)) cat(sprintf("Step 2: Hom vs Exp — LRT = %.2f, df = %d, p = %.4f\n", LRT_aging2, 1, p_aging2)) ``` With true shapes spanning 0.8 to 2.5, the cascade correctly: 1. **Rejects** the homogeneous constraint (step 1) --- the shapes are genuinely different. The cascade stops here and adopts the heterogeneous model. Note that the homogeneous vs exponential test (step 2) may *fail to reject* even though component aging is present. This is because the pooled shape estimate averages across DFR ($k < 1$) and IFR ($k > 1$) components, yielding $\hat{k} \approx 1$ --- which is close to exponential. The top-down cascade avoids this trap by testing heterogeneity *first*. Monte Carlo Study: Power of the Heterogeneity Test {#power-study} ================================================================= The worked examples show the cascade on individual datasets. We now assess the *statistical power* of the shape heterogeneity test (step 1) as a function of how different the component shapes actually are. ## Design We parameterize shape heterogeneity by the **coefficient of variation (CV)** of the shape parameters. All $m = 5$ shapes are set to $k_j = \bar{k}(1 + \text{cv} \cdot z_j)$ where $\bar{k} = 1.5$ and $z_1, \ldots, z_m$ are fixed offsets with mean 0 and standard deviation 1. | CV | Meaning | |----|---------| | 0% | All shapes equal --- homogeneous model is correct | | 5--10% | Mild heterogeneity --- hard to detect | | 20--30% | Moderate heterogeneity --- detectable with adequate $n$ | | 50% | Strong heterogeneity --- reliably detected | Other settings: $n = 500$ observations, $p = 0.2$ (masking probability), right-censoring at the 80th percentile of the system lifetime under the homogeneous baseline, $\alpha = 0.05$. ```{r load-precomputed-mc, include=FALSE, eval=!run_long} mc_data <- readRDS("precomputed_model_selection.rds") ``` ```{r mc-setup, eval=run_long} set.seed(42) m <- 5 base_k <- 1.5 base_scales <- c(300, 400, 500, 600, 700) cv_levels <- c(0, 0.05, 0.10, 0.20, 0.30, 0.50) n_reps <- 200 n_obs <- 500 p_mask <- 0.2 alpha <- 0.05 # Fixed offsets: mean 0, sd 1 offsets <- c(-2, -1, 0, 1, 2) offsets <- offsets / sd(offsets) # Right-censoring time (based on homogeneous baseline at CV = 0) beta_sys <- wei_series_system_scale(base_k, base_scales) tau <- qweibull(0.8, shape = base_k, scale = beta_sys) # Models and solvers model_exp <- exp_series_md_c1_c2_c3() model_hom <- wei_series_homogeneous_md_c1_c2_c3() model_het <- wei_series_md_c1_c2_c3() solver_exp <- fit(model_exp) solver_hom <- fit(model_hom) solver_het <- fit(model_het) gen_het <- rdata(model_het) np_exp <- m np_hom <- m + 1 np_het <- 2 * m ``` ```{r mc-run, eval=run_long, cache=TRUE, warning=FALSE, message=FALSE} results_list <- vector("list", length(cv_levels) * n_reps) idx <- 1 for (cv in cv_levels) { shapes <- base_k * (1 + cv * offsets) theta <- as.numeric(rbind(shapes, base_scales)) for (r in seq_len(n_reps)) { df_r <- gen_het(theta, n = n_obs, p = p_mask, observe = observe_right_censor(tau = tau)) row <- list(cv = cv, rep = r, reject_het_vs_hom = NA, reject_hom_vs_exp = NA, ll_exp = NA, ll_hom = NA, ll_het = NA, aic_exp = NA, aic_hom = NA, aic_het = NA, bic_exp = NA, bic_hom = NA, bic_het = NA, converged_all = FALSE) tryCatch({ fe <- solver_exp(df_r, par = rep(0.002, m), method = "Nelder-Mead") # Two-phase for hom: Nelder-Mead warm-up -> L-BFGS-B refinement fh_nm <- solver_hom(df_r, par = c(1, rep(500, m)), method = "Nelder-Mead", control = list(maxit = 5000)) fh <- solver_hom(df_r, par = fh_nm$par, method = "L-BFGS-B", lower = rep(1e-6, np_hom)) # Two-phase for het: warm-start from hom, Nelder-Mead -> L-BFGS-B het_start <- as.numeric(rbind(rep(fh$par[1], m), fh$par[-1])) fw_nm <- solver_het(df_r, par = het_start, method = "Nelder-Mead", control = list(maxit = 10000)) fw <- solver_het(df_r, par = fw_nm$par, method = "L-BFGS-B", lower = rep(1e-6, np_het)) if (fe$converged && fh$converged && fw$converged) { row$ll_exp <- fe$loglik row$ll_hom <- fh$loglik row$ll_het <- fw$loglik row$aic_exp <- -2 * fe$loglik + 2 * np_exp row$aic_hom <- -2 * fh$loglik + 2 * np_hom row$aic_het <- -2 * fw$loglik + 2 * np_het row$bic_exp <- -2 * fe$loglik + log(n_obs) * np_exp row$bic_hom <- -2 * fh$loglik + log(n_obs) * np_hom row$bic_het <- -2 * fw$loglik + log(n_obs) * np_het LRT_het <- -2 * (fh$loglik - fw$loglik) LRT_hom <- -2 * (fe$loglik - fh$loglik) row$reject_het_vs_hom <- pchisq(LRT_het, df = m - 1, lower.tail = FALSE) < alpha row$reject_hom_vs_exp <- pchisq(LRT_hom, df = 1, lower.tail = FALSE) < alpha row$converged_all <- TRUE } }, error = function(e) NULL) results_list[[idx]] <- row idx <- idx + 1 } cat(sprintf("CV = %.0f%% complete\n", cv * 100)) } power_study <- do.call(rbind, lapply(results_list, as.data.frame)) ``` ```{r mc-save, eval=run_long, include=FALSE} mc_data <- list( power_study = power_study, config = list(m = m, n = n_obs, p = p_mask, tau = tau, n_reps = n_reps, alpha = alpha, base_k = base_k, base_scales = base_scales, offsets = offsets) ) saveRDS(mc_data, "precomputed_model_selection.rds") ``` ## Rejection rates ```{r mc-rejection-rates} ps <- mc_data$power_study ps_conv <- ps[ps$converged_all, ] rej_summary <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) { data.frame( CV_pct = d$cv[1] * 100, n_converged = nrow(d), reject_het_vs_hom = mean(d$reject_het_vs_hom), reject_hom_vs_exp = mean(d$reject_hom_vs_exp) ) })) knitr::kable(rej_summary, digits = 3, row.names = FALSE, caption = sprintf( "Rejection rates at alpha = %.2f (n = %d, m = %d)", mc_data$config$alpha, mc_data$config$n, mc_data$config$m), col.names = c("Shape CV (%)", "Converged", "Reject Hom (Het vs Hom)", "Reject Exp (Hom vs Exp)")) ``` ## Power curve ```{r mc-power-curve, fig.width=7, fig.height=5} plot(rej_summary$CV_pct, rej_summary$reject_het_vs_hom, type = "b", pch = 19, col = "steelblue", lwd = 2, xlab = "Coefficient of Variation of Shapes (%)", ylab = "Rejection Rate", ylim = c(0, 1), main = "Power of the LRT Cascade") lines(rej_summary$CV_pct, rej_summary$reject_hom_vs_exp, type = "b", pch = 17, col = "firebrick", lwd = 2, lty = 2) abline(h = mc_data$config$alpha, lty = 3, col = "gray50") legend("right", c("Het vs Hom (shape heterogeneity)", "Hom vs Exp (aging)", expression(alpha == 0.05)), col = c("steelblue", "firebrick", "gray50"), lty = c(1, 2, 3), pch = c(19, 17, NA), lwd = 2, cex = 0.85) grid() ``` At CV = 0% (no heterogeneity), the rejection rate of the shape heterogeneity test should be near the nominal level $\alpha = 0.05$ --- the Type I error rate. As CV increases, the power rises, eventually reaching near 1 at CV = 50%. The aging test (hom vs exp) maintains high rejection rates across all CV levels, because the baseline shape $\bar{k} = 1.5$ is far from 1 regardless of how much the individual shapes vary around it. ## AIC/BIC model selection accuracy We also evaluate how often each information criterion selects the correct model. At CV = 0, the correct model is homogeneous Weibull. At CV > 0, it is heterogeneous Weibull. ```{r mc-aic-accuracy} aic_correct <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) { cv <- d$cv[1] aic_choice <- ifelse(d$aic_het < d$aic_hom & d$aic_het < d$aic_exp, "het", ifelse(d$aic_hom < d$aic_exp, "hom", "exp")) bic_choice <- ifelse(d$bic_het < d$bic_hom & d$bic_het < d$bic_exp, "het", ifelse(d$bic_hom < d$bic_exp, "hom", "exp")) correct_model <- if (cv == 0) "hom" else "het" data.frame( CV_pct = cv * 100, AIC_correct = mean(aic_choice == correct_model), BIC_correct = mean(bic_choice == correct_model) ) })) knitr::kable(aic_correct, digits = 3, row.names = FALSE, caption = "Proportion selecting the correct model", col.names = c("Shape CV (%)", "AIC Correct", "BIC Correct")) ``` Practical Guidelines {#guidelines} =================================== Based on the Monte Carlo study and general theory, we offer the following recommendations for practitioners: | Shape CV | LRT Power | Recommended Approach | |----------|-----------|---------------------| | 0% (identical) | $\sim\alpha$ (nominal) | Homogeneous Weibull --- no benefit to extra parameters | | $< 10\%$ | Low ($< 20\%$) | Prefer homogeneous --- heterogeneity is undetectable | | 10--25% | Moderate | Run the cascade; decision depends on $n$ and censoring | | $> 25\%$ | High ($> 80\%$) | Heterogeneous Weibull --- shape differences are reliably detected | **The aging test is almost always decisive.** When the true common shape is far from 1, even moderate sample sizes ($n \geq 200$) overwhelmingly reject the exponential model. The choice between exponential and Weibull is rarely ambiguous. **The heterogeneity test is the crux.** Power depends strongly on how different the component shapes are. With $n = 500$ and $m = 5$, the test has good power for CV $\geq 25\%$ but is nearly blind to CV $< 10\%$. Practitioners should consider: - Increasing sample size if moderate heterogeneity is suspected - Reducing censoring to provide more information per observation - Using AIC/BIC as complementary criteria --- BIC tends to favor simpler models while AIC is more sensitive to heterogeneity For extensive sensitivity analysis varying $n$, $m$, $p$, and censoring schemes, see the companion paper on model selection for masked series systems. ```{r cleanup, include=FALSE} options(old_opts) ```