| Type: | Package |
| Title: | Conformal Random Forests for Response Surface Emulation |
| Version: | 2.0.1 |
| Description: | Fits emulators, also known as surrogates or response surfaces, using conformal inference with random forests. The conformal calibration is performed using out-of-bag samples from the forest, eliminating the need for a separate hold-out set. The method is based on Johansson et al. (2014 <doi:10.1007/s10994-014-5453-0>). |
| License: | BSD_3_clause + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | randomForest, RANN |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-14 15:28:28 UTC; knrumsey |
| Author: | Kellin Rumsey [aut, cre] |
| Maintainer: | Kellin Rumsey <knrumsey@lanl.gov> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-19 09:00:02 UTC |
Plot method for rfok
Description
A conformal RF based on Johansson et al. (2014). Rather than using a traditional calibration set, conformal intervals are constructed using out-of-bag samples for each tree. K-nearest neighbors is used to calibrate the prediction intervals for each prediction location.
Usage
## S3 method for class 'rfok'
plot(x, conf = 0.95, ...)
Arguments
x |
Object returned by |
conf |
Confidence level for plotting |
... |
Optional parameters to pass to |
Value
Invisibly returns x.
Examples
# Friedman function
f <- function(x){
10 * sin(pi * x[1] * x[2]) + 20 * (x[3] - 0.5)^2 + 10 * x[4] + 5 * x[5]
}
X <- matrix(runif(250), nrow=50, ncol=5)
y <- apply(X, 1, f)
fit <- rfok(X, y)
plot(fit)
Predict method for rfok
Description
A conformal RF based on Johansson et al. (2014). Rather than using a traditional calibration set, conformal intervals are constructed using out-of-bag samples for each tree. K-nearest neighbors is used to calibrate the prediction intervals for each prediction location.
Usage
## S3 method for class 'rfok'
predict(
object,
newdata = NULL,
samples = 1000,
conf = NULL,
smoothing = TRUE,
eps = 0,
...
)
Arguments
object |
Object returned by |
newdata |
a data frame or matrix of new data |
samples |
Number of samples from the predictive distribution. |
conf |
a vector of levels for desired confidence intervals. Ignored unless samples is NULL. |
smoothing |
logical; should conformity scores be sampled or smoothed (using quantiles) |
eps |
Tolerance for nearest neighbor approach. When |
... |
Additional, ignored, parameters |
Details
Prediction intervals are constructed by estimating a local scale for each
prediction point using the k nearest training observations. The
conformal margin of error is the requested quantile of the training
non-conformity scores multiplied by this local scale.
If samples = NULL, the function returns point predictions and margins
of error for the confidence levels supplied in conf. For example,
conf = 0.95 uses the 0.95 quantile of the non-conformity scores.
If samples is a positive integer, the function returns simulated
predictive samples. When smoothing = TRUE, signed non-conformity scores
are generated using quantiles of the symmetrized empirical distribution. When
smoothing = FALSE, non-conformity scores are sampled directly with
replacement and assigned random signs.
Value
A matrix of predictions with 1 column per test point.
Examples
# Friedman function
f <- function(x){
10 * sin(pi * x[1] * x[2]) + 20 * (x[3] - 0.5)^2 + 10 * x[4] + 5 * x[5]
}
X <- matrix(runif(250), nrow=50, ncol=5)
y <- apply(X, 1, f)
fit <- rfok(X, y)
Xnew <- matrix(runif(250), nrow=50, ncol=5)
predict(fit, Xnew)
Print method for rfok objects
Description
Print method for rfok objects
Usage
## S3 method for class 'rfok'
print(x, ...)
Arguments
x |
Object returned by |
... |
Additional arguments, currently ignored. |
Value
Invisibly returns x.
Conformal Random Forest with Out-of-bag KNN
Description
A conformal RF based on Johansson et al. (2014). Rather than using a traditional calibration set, conformal intervals are constructed using out-of-bag samples for each tree. K-nearest neighbors is used to calibrate the prediction intervals for each prediction location.
Usage
rfok(X, y, k = 5, beta = sd(y)/30, eps = 0, ...)
Arguments
X |
a data frame or a matrix of predictors |
y |
a response vector |
k |
Number of nearest neighbors to use for out-of-bag error estimaters |
beta |
Sensitivity parameter |
eps |
Tolerance for nearest neighbor approach. When |
... |
additional parameters passed to randomForest |
Details
This function fits a random forest and constructs conformal prediction intervals using out-of-bag (OOB) predictions. For each training observation, an OOB prediction is computed using only trees for which that observation was not included in the bootstrap sample. These OOB predictions are used to form residuals without requiring a separate calibration set.
Local scale estimates are computed using the k nearest neighbors in
the predictor space. For observation i, the local scale estimate
\mu_i is the average absolute OOB residual among its nearest neighbors.
The non-conformity score is
\alpha_i = \frac{|y_i - \hat{y}^{OOB}_i|}{\mu_i + \beta}.
The parameter k controls the locality of the scale estimate. Smaller
values make the intervals more locally adaptive, while larger values produce
smoother, more stable local scale estimates.
The parameter beta is a non-negative stabilization parameter added to
the local scale estimate. Smaller values make the intervals more sensitive to
local changes in the OOB residuals. Larger values reduce this sensitivity and
make the intervals behave more like globally scaled conformal intervals. The
default, sd(y) / 30, is a scale-aware value intended to provide mild
stabilization.
The parameter eps is passed to the nearest-neighbor search. Setting
eps = 0 uses exact nearest neighbors. Larger values allow approximate
nearest-neighbor search and may be faster for large data sets.
Value
An object with class "rfok"
References
Johansson, U., Boström, H., Löfström, T., & Linusson, H. (2014). Regression conformal prediction with random forests. Machine learning, 97, 155-176.
Examples
# Friedman function
f <- function(x){
10 * sin(pi * x[1] * x[2]) + 20 * (x[3] - 0.5)^2 + 10 * x[4] + 5 * x[5]
}
X <- matrix(runif(250), nrow=50, ncol=5)
y <- apply(X, 1, f)
fit <- rfok(X, y)
Xnew <- matrix(runif(250), nrow=50, ncol=5)
predict(fit, Xnew)
Cross-validate the rfok sensitivity parameter
Description
Cross-validates the sensitivity parameter beta for an rfok
model over a user-supplied or automatically generated grid.
Usage
rfok_cv_beta(
X,
y,
beta = NULL,
beta_mult = c(0, 1/100, 1/50, 1/30, 1/20, 1/10),
k = 5,
num_folds = 5,
selection_rule = c("crps", "interval_score", "coverage_width"),
conf = 0.95,
samples = 1000,
smoothing = TRUE,
eps = 0,
return_fit = FALSE,
verbose = TRUE,
...
)
Arguments
X |
a data frame, matrix, or vector of predictors. A vector is coerced to a one-column matrix. |
y |
a numeric response vector. |
beta |
optional numeric vector of candidate beta values. If |
beta_mult |
numeric vector used to generate the beta grid when
|
k |
Number of nearest neighbors to use for out-of-bag error estimators. |
num_folds |
Number of cross-validation folds. |
selection_rule |
Criterion used to choose beta. Options are
|
conf |
Confidence level used for interval-based selection rules. |
samples |
Number of conformal samples used when
|
smoothing |
logical; should conformity scores be sampled or smoothed when generating conformal samples? |
eps |
Tolerance for nearest neighbor search. |
return_fit |
logical; if |
verbose |
logical. should progress be printed? |
... |
additional parameters passed to |
Details
The parameter beta controls the sensitivity of the conformal interval
widths to the local out-of-bag residual scale estimate. Smaller values make
intervals more locally adaptive, while larger values make intervals more
stable and more similar to globally scaled conformal intervals.
This function evaluates candidate beta values using cross-validation.
For selection_rule = "crps", predictive samples are generated for each
held-out observation and the candidate with the smallest average continuous
ranked probability score is selected.
For selection_rule = "interval_score", conformal intervals are
constructed at level conf and the candidate with the smallest average
interval score is selected. The interval score rewards narrow intervals but
penalizes observations that fall outside the interval.
For selection_rule = "coverage_width", the selected candidate is the
one with the smallest average interval width among candidates with empirical
coverage at least conf. If no candidate reaches the desired coverage,
the candidate with the largest empirical coverage is selected, with ties
broken by average interval width.
Value
A data frame with one row per candidate beta. The selected beta is
marked in the selected column. If return_fit = TRUE, a list is
returned with elements results, best_beta, and fit.
Summary method for rfok objects
Description
Summary method for rfok objects
Usage
## S3 method for class 'rfok'
summary(object, ...)
Arguments
object |
Object returned by |
... |
Additional arguments, currently ignored. |
Value
An object of class "summary.rfok".