| Title: | The SHAPBoost Feature Selection Algorithm |
| Version: | 1.0.0 |
| Description: | The implementation of SHAPBoost, a boosting-based feature selection technique that ranks features iteratively based on Shapley values. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Imports: | xgboost, SHAPforxgboost, methods, caret, Matrix |
| Suggests: | flare, survival |
| URL: | https://github.com/O-T-O-Z/SHAPBoost-R |
| BugReports: | https://github.com/O-T-O-Z/SHAPBoost-R/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-09-22 09:04:24 UTC; o.t.ozyilmaz |
| Author: | Ömer Tarik Özyilmaz
|
| Maintainer: | Ömer Tarik Özyilmaz <o.t.ozyilmaz@umcg.nl> |
| Repository: | CRAN |
| Date/Publication: | 2025-09-29 16:40:02 UTC |
SHAPBoostEstimator Class
Description
This class implements the SHAPBoost algorithm for feature selection. It is designed to be extended by specific implementations such as SHAPBoostRegressor and SHAPBoostSurvival. Any new method should implement the abstract methods defined in this class.
Fields
evaluatorThe model that is used to evaluate each additional feature.
metricA character string representing the evaluation metric.
xgb_paramsA list of parameters for the XGBoost model.
number_of_foldsThe number of folds for cross-validation.
epsilonA small value to determine convergence.
max_number_of_featuresThe maximum number of features to select.
siso_ranking_sizeThe number of features to consider in the SISO ranking.
siso_orderThe order of combinations to consider in SISO.
resetA logical indicating whether to reset the weights.
num_resetsThe number of resets allowed.
fold_random_stateThe random state for reproducibility in cross-validation.
verboseThe verbosity level of the output.
stratificationA logical indicating whether to use stratified sampling. Only applicable for c-index metric.
collinearity_checkA logical indicating whether to check for collinearity.
correlation_thresholdThe threshold for correlation to consider features as collinear.
Examples
if (requireNamespace("flare", quietly = TRUE)) {
data("eyedata", package = "flare")
shapboost <- SHAPBoostRegressor$new(
max_number_of_features = 1,
evaluator = "lr",
metric = "mae",
siso_ranking_size = 10,
verbose = 0
)
X <- as.data.frame(x)
y <- as.data.frame(y)
subset <- shapboost$fit(X, y)
}
SHAPBoostRegressor is a reference class for regression feature selection through gradient boosting.
Description
This class extends the SHAPBoostEstimator class and implements methods for initializing, updating weights, scoring, and fitting estimators.
Fields
evaluatorThe model that is used to evaluate each additional feature. Choice between "lr" and "xgb".
metricThe metric used for evaluation, such as "mae", "mse", or "r2".
xgb_paramsA list of parameters for the XGBoost model.
number_of_foldsThe number of folds for cross-validation.
epsilonA small value to prevent division by zero.
max_number_of_featuresThe maximum number of features to consider.
siso_ranking_sizeThe size of the SISO ranking.
siso_orderThe order of the SISO ranking.
resetA boolean indicating whether to reset the model.
xgb_importanceThe importance type for XGBoost.
num_resetsThe number of resets for the model.
fold_random_stateThe random state for folds.
verboseThe verbosity level for logging.
stratificationA boolean indicating whether to use stratification. Only applicable for c-index metric.
use_shapA boolean indicating whether to use SHAP values.
collinearity_checkA boolean indicating whether to check for collinearity.
correlation_thresholdThe threshold for correlation to consider features as collinear.
Examples
if (requireNamespace("flare", quietly = TRUE)) {
data("eyedata", package = "flare")
shapboost <- SHAPBoostRegressor$new(
max_number_of_features = 1,
evaluator = "lr",
metric = "mae",
siso_ranking_size = 10,
verbose = 0
)
X <- as.data.frame(x)
y <- as.data.frame(y)
subset <- shapboost$fit(X, y)
}
SHAPBoostSurvival is a reference class for survival analysis feature selection through gradient boosting.
Description
This class extends the SHAPBoostEstimator class and implements methods for initializing, updating weights, scoring, and fitting estimators.
Fields
evaluatorThe model that is used to evaluate each additional feature. Choice between "coxph" and "xgb".
metricThe metric used for evaluation, such as "mae", "mse", or "r2".
xgb_paramsA list of parameters for the XGBoost model.
number_of_foldsThe number of folds for cross-validation.
epsilonA small value to prevent division by zero.
max_number_of_featuresThe maximum number of features to consider.
siso_ranking_sizeThe size of the SISO ranking.
siso_orderThe order of the SISO ranking.
resetA boolean indicating whether to reset the model.
xgb_importanceThe importance type for XGBoost.
num_resetsThe number of resets for the model.
fold_random_stateThe random state for folds.
verboseThe verbosity level for logging.
stratificationA boolean indicating whether to use stratification. Only applicable for c-index metric.
use_shapA boolean indicating whether to use SHAP values.
collinearity_checkA boolean indicating whether to check for collinearity.
correlation_thresholdThe threshold for correlation to consider features as collinear.
Examples
if (requireNamespace("survival", quietly = TRUE)) {
shapboost <- SHAPBoostSurvival$new(
max_number_of_features = 1,
evaluator = "coxph",
metric = "c-index",
verbose = 0,
xgb_params = list(
objective = "survival:cox",
eval_metric = "cox-nloglik"
)
)
X <- as.data.frame(survival::gbsg[, -c(1, 10, 11)])
y <- as.data.frame(survival::gbsg[, c(10, 11)])
subset <- shapboost$fit(X, y)
}