---
title: "jab_after_boot"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{jab_after_boot}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
csl: apa.csl
link-citations: true
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

This vignette demonstrates how to use `jab_after_boot()`
in the package [`semboottools`](https://yangzhen1999.github.io/semboottools/) [@yang_forming_2026] to compute **Jackknife-after-Bootstrap (JAB)** influence values
[@efron_introduction_1993]
for a single model parameter from
a `lavaan` model fitted with stored bootstrap replicates (including `boot.idx`).
The function summarizes the full bootstrap distribution and the leave-one-out (LOO) subdistributions, ranks influential observations, and (optionally) plots a diagnostic figure.

The following two packages are needed:


```{r setup}
library(semboottools)
library(lavaan)
```

## Arguments

The function `jab_after_boot()` accepts the following arguments:

| Argument | Description |
|----|----|
| `fit` | A fitted `lavaan` object for which `store_boot()` has been called with `keep.idx = TRUE`. Must contain `fit@external$sbt_boot_ustd` (with `boot.idx` attribute), and typically `fit@external$sbt_boot_std`. |
| `param` | The target parameter to diagnose. Can be given as (i) `"lhs op rhs"` (e.g., `"Y ~ X"`, `"Y ~~ Y"`, `"X =~ x1"`), (ii) a user-defined `":="` label (e.g., `"ab"`), or (iii) a parameter label (e.g., `"b"`). |
| `standardized` | Logical. If `TRUE`, use standardized bootstrap estimates (`standardizedSolution_boot()`); if `FALSE`, use unstandardized (`parameterEstimates_boot()`). |
| `top_k` | Integer. Number of most influential cases (by absolute JAB value) to report in the summary table. |
| `ci_level` | Numeric between 0 and 1. Confidence level for percentile bootstrap confidence intervals recomputed on each leave-one-out (LOO) subdistribution. |
| `min_keep` | Integer. Minimum number of bootstrap replicates to retain in each LOO subset. Default is `max(30, floor(0.2 * B))`. |
| `plot` | Logical. If `TRUE`, produce a diagnostic plot showing full-sample bootstrap mean and case-deleted bootstrap means. |
| `plot_engine` | Character. Choose `"ggplot2"` (default, modern graphics) or `"base"` (basic R graphics) for the plot. |
| `ylab_override` | Optional character. Override the default y-axis label in the plot. |
| `verbose` | Logical. If `TRUE` (default), print summaries to console (ALL vs. LOO). |
| `font_family` | Character. Font family for plotting (default `"serif"`). Use `"sans"` or `"Times New Roman"` for cross-platform robustness. |

------------------------------------------------------------------------

## Value

The function returns a list with the following components:

| Component | Description |
|----|----|
| `param` | The target parameter string. |
| `standardized` | Logical flag indicating whether standardized bootstrap was used. |
| `full_summary` | Data frame with the summary statistics (mean, SE, CI) for the full bootstrap distribution (`scope = "ALL"`). |
| `cases_summary` | Data frame of the top `top_k` cases ranked by absolute JAB influence, with case index, JAB value, and LOO summaries. |
| `F` | The B × n occurrence matrix (bootstrap replicate by observation). |
| `tstar` | Full bootstrap vector for the parameter. |
| `plot_obj` | A `ggplot` object if `plot = TRUE` and `plot_engine = "ggplot2"`, otherwise `NULL`. |

------------------------------------------------------------------------

## Example

```{r}
library(lavaan)
# Simulate data
set.seed(1234)
n <- 200
x <- runif(n) - 0.5
m <- 0.4 * x + rnorm(n)
y <- 0.3 * m + rnorm(n)
dat <- data.frame(x, m, y)

# Specify model
model <- '
  m ~ a * x
  y ~ b * m + cp * x
  ab := a * b
'

# Fit model
fit0 <- sem(model,
            data = dat,
            fixed.x = FALSE)

# Store bootstrap draws using `store_boot()`.
# `R`, the number of bootstrap samples, should be ≥2000 in real studies.
# `parallel` should be used unless fitting the model is fast.
# Set `ncpus` to a larger value or omit it in real studies.
# Before calling `jab_after_boot()`, you **must** re-run the model with store_boot(keep.idx = TRUE).  This is crucial: without `keep.idx = TRUE`, the bootstrap index matrix (boot.idx) will not be saved, and JAB cannot compute leave-one-out subdistributions.

fit2 <- store_boot(
          fit0,
          R = 500,
          ncpus = 2,
          iseed = 2345,
          keep.idx = TRUE,
          parallel = "snow"
        )
```

When you run `jab_after_boot()` with `verbose = TRUE`, two blocks of output are shown:

1.  **Full-sample bootstrap summary (ALL)**\
    This block reports the overall bootstrap distribution of the chosen parameter across all bootstrap replicates. It includes the bootstrap mean, the standard error (SE), and the percentile confidence interval. These values are consistent with what you would obtain from `standardizedSolution_boot()` or `parameterEstimates_boot()` without JAB. In other words, it is the reference point against which the leave-one-out results are compared.

2.  **Leave-one-out (LOO) summaries**\
    This block lists the top `top_k` most influential cases, ranked by the absolute JAB influence statistic. For each case, it shows:

-   **JAB_value**: the influence index $$
    I_j = n \big( \bar{\theta}_{(-j)} - \bar{\theta} \big)
    $$ Large positive values mean that excluding the case increases the bootstrap mean; large negative values mean it decreases the bootstrap mean.
-   **mean, SE, CI.Lo, CI.Up**: the bootstrap mean, standard error, and confidence interval recomputed when this case is excluded from the resampling. By comparing these numbers to the full-sample summary, you can diagnose which individual observations exert the strongest influence on the bootstrap results.

The diagnostic plot visualizes the same information:

-   **Black points**: leave-one-out bootstrap means for each observation.

-   **Blue points**: the most influential cases (top `top_k` by absolute JAB value).

-   **Horizontal line**: the full-sample bootstrap mean, serving as the reference. If a case is highly influential, its corresponding point will be far away from the horizontal line. The plot therefore provides an intuitive complement to the numeric summaries shown above.

```{r}
# Run JAB analysis for b
res1 <- semboottools::jab_after_boot(
  fit2,
  param        = "b",
  standardized = TRUE,
  top_k        = 5,
  plot         = TRUE,
  plot_engine  = "ggplot2",
  font_family  = "sans"
)

# Run JAB analysis for ab
res2 <- semboottools::jab_after_boot(
  fit2,
  param        = "ab",
  standardized = TRUE,
  top_k        = 10,
  plot         = TRUE
)
```

## References