funbootband computes simultaneous
prediction and confidence bands for dense time series data on a common
grid. It supports both i.i.d. and clustered (hierarchical) designs and
uses a fast ‘Rcpp’ backend.
Curves are preprocessed via finite Fourier series
(k.coef harmonics) and bootstrapped to
generate empirical distributions, from which band limits are obtained as
quantiles.
The main user function is:
band(data, type = c("prediction","confidence"),
alpha = 0.05, iid = TRUE, id = NULL,
B = 1000L, k.coef = 50L)This document gives a quick tour of funbootband
including examples with simulated data for both i.i.d. and hierarchical
scenarios. The functional bootstrap method implemented in the package is
an implementation of the algorithm described in Lenhoff et al. (1999)
and further developed for hierarchical design in Koska et al. (2023).
The vignette contains additional advice for choosing function arguments
in band().
The vignette was written in R Markdown, using ‘knitr’.
When curves are independent and identically distributed, the function
argument iid in band() should be set to
TRUE.
For this example, consider a set of simulated smooth curves from a Gaussian process on a common grid (it is generally assumed that all curves are of equal length and reasonably smooth).
library(funbootband)
set.seed(1)
T <- 200
n <- 10
x <- seq(0, 1, length.out = T)
# Simulate smooth Gaussian-process-like curves of equal length
mu <- 10 * sin(2 * pi * x)
ell <- 0.12; sig <- 3
Kmat <- outer(x, x, function(s, t) sig^2 * exp(-(s - t)^2 / (2 * ell^2)))
ev <- eigen(Kmat + 1e-8 * diag(T), symmetric = TRUE)
Z <- matrix(rnorm(T * n), T, n)
Y <- mu + ev$vectors %*% (sqrt(pmax(ev$values, 0)) * Z)
Y <- Y + matrix(rnorm(T * n, sd = 0.2), T, n) # observation noiseSimultaneous prediction and confidence bands are then computed by
setting the type argument to either prediction
or confidence:
# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)When plotting the bands alongside the original curves, we see that the shaded region is calibrated to contain entire curves with probability \(1-\alpha\) (simultaneous coverage).
x_idx <- seq_len(fit_pred$meta$T)
ylim <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)
plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
xlab = "Index (Time)", ylab = "Amplitude",
main = "Simultaneous bands (i.i.d.)")
matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)When the i.i.d. assumption is violated, iid needs to be
set to FALSE. band() will then automatically
detect the cluster structure from the column names. Optionally, an
integer/factor vector of length ncol(data) giving a cluster
id for each curve can be used.
The clustered case is illustrated using a design, where each group (cluster) follows its own mean pattern with smooth within-cluster variation.
library(funbootband)
set.seed(2)
T <- 200
m <- c(5, 5)
x <- seq(0, 1, length.out = T)
# Cluster-specific means
mu <- list(
function(z) 8 * sin(2 * pi * z),
function(z) 8 * cos(2 * pi * z)
)
# Generate curves with smooth within-cluster variation
Bm <- cbind(sin(2 * pi * x), cos(2 * pi * x))
gen_curve <- function(k) {
sc <- rnorm(ncol(Bm), sd = c(2.0, 1.5))
mu[[k]](x) + as.vector(Bm %*% sc)
}
Ylist <- lapply(seq_along(m), function(k) {
sapply(seq_len(m[k]), function(i) gen_curve(k) + rnorm(T, sd = 0.6))
})
Y <- do.call(cbind, Ylist)
colnames(Y) <- unlist(mapply(
function(k, mk) paste0("C", k, "_", seq_len(mk)),
seq_along(m), m
))
# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)Note: When iid = FALSE, the bootstrap
uses a two-stage scheme: sample clusters (with
replacement), then choose curves within cluster (without replacement
until exhausted, then with replacement), which respects within-cluster
dependence.
x_idx <- seq_len(fit_pred$meta$T)
ylim <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)
plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
xlab = "Index (Time)", ylab = "Amplitude",
main = "Simultaneous bands (clustered)")
matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)k.coef (Fourier harmonics)k.coef controls the number of sine/cosine harmonics
(plus intercept) used to represent each curve before bootstrapping.
k.coef
to the maximum meaningful value on a length-T periodic
grid. If a larger value is supplied, it is reduced accordingly.k.coef that
adequately reconstructs your curves.Heuristic for selecting k.coef
The appropriate value of k.coef depends on both the
grid length T and the
smoothness of the curves:
T) can support higher
k.coef values.As a rule of thumb, increase k.coef only as far as
necessary. Very high values mainly fit high-frequency noise, increase
runtime, and may lead to numerical instability near the Nyquist
limit.
One simple way to inspect the effect of k.coef is to
evaluate the reconstruction error (e.g., mean squared
error, MSE) for different choices of k.coef:
# MSE vs k.coef for the i.i.d. example (uses Y from above)
fourier_basis <- function(T, K) {
t <- seq_len(T)
if (K == 0L) return(cbind(1))
cbind(
1,
sapply(1:K, function(k) cos(2*pi*k*t/T)),
sapply(1:K, function(k) sin(2*pi*k*t/T))
)
}
reconstruct <- function(B, Y) {
coef <- qr.coef(qr(B), Y) # solve for all curves at once
B %*% coef
}
mse <- function(A, B) mean((A - B)^2)
Ks <- c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 99L)
T <- nrow(Y)
tab <- do.call(rbind, lapply(Ks, function(K){
B <- fourier_basis(T, K)
Yhat <- reconstruct(B, Y)
data.frame(k.coef = K,
mse = mse(Y, Yhat),
pve = 100 * (1 - sum((Y - Yhat)^2) / sum((Y - mean(Y))^2)))
}))
row.names(tab) <- NULL
print(tab)
# Plot
op <- par(mar = c(4,4,2,1))
plot(tab$k.coef, tab$mse, type = "b", xlab = "k.coef", ylab = "MSE",
main = "Fourier reconstruction error vs. k.coef")Recommendation
If your curves are smooth and runtime is not a concern, moderately
large k.coef values (e.g., 20–50 for dense grids) are often
a good choice. For noisy data or when computation time matters, examine
how the reconstruction error (or resulting band limits) changes with
k.coef and select the smallest value beyond which improvements become
negligible. Recomputing the bands for a few candidate values can also
help confirm that results have converged.
B (bootstrap replicates)The argument B controls the number of bootstrap replications.
B generally yields tighter and more stable
bands, as quantile estimates become more precise.B can be
used to reduce runtime. For final inference, rerun with a sufficiently
large B to ensure convergence of the band limits.band(data, type = c("prediction","confidence"),
alpha = 0.05,
iid = TRUE,
id = NULL,
B = 1000L,
k.coef = 50L)T × n (rows: time
points; cols: curves). A numeric data.frame is
accepted."prediction" (future curve
coverage) or "confidence" (mean).0.05 gives 95%
bands.iid = FALSE and supply id (length
n) for clusters, or allow inference from column-name
prefixes.50;
clamped to a grid-specific max).Return value. A list with numeric vectors
lower, mean, upper (length
T) and meta (settings, n,
T, etc.).
Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G., Williams, B. J., & Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture, 9(1), 10–17. doi:10.1016/S0966-6362(98)00043-5
Koska, D., Oriwol, D., & Maiwald, C. (2023). Comparison of statistical models for characterizing continuous differences between two biomechanical measurement systems. Journal of Biomechanics, 149, 111506. doi:10.1016/j.jbiomech.2023.111506
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] funbootband_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.53
#> [5] cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29
#> [9] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 evaluate_1.0.5
#> [17] bslib_0.9.0 Rcpp_1.1.0 yaml_2.3.10 rlang_1.1.6
#> [21] jsonlite_2.0.0