--- title: "rSDR vignette" output: rmarkdown::html_vignette author: "Sheau-Chiann Chen, Shilin Zhao and Hsin-Hsiung Bill Huang" vignette: > %\VignetteIndexEntry{rSDR_vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction This R Markdown document demonstrates the usage of the `rSDR` package for Robust Sufficient Dimension Reduction (rSDR) using alpha-distance covariance and Stiefel manifold learning for estimation (Hsin-Hsiung Huang, Feng Yu & Teng Zhang, 2024). We apply the method to `ionosphere` dataset, including cross-validation, bootstrap methods for optimal alpha selection, and visualization of results for both 2 and 3 dimensions. # Setup - Load all packages required by rSDR package. ```{r setup, include=FALSE, echo=FALSE} library(ManifoldOptim) library(Matrix) library(rstiefel) library(RcppNumerical) library(expm) library(ggplot2) library(future) #parallel library(future.apply) #parallel library(fdm2id) #ionosphere dataset ``` ```{r, echo=TRUE} # # # Load rSDR package # if (!requireNamespace("rSDR", quietly = TRUE)) { # if (!requireNamespace("devtools", quietly = TRUE)){ install.packages("devtools") # } # library(devtools) # devtools::install_github("scchen1/rSDR") # # or # # #a local zip file (`rSDR-main.zip`) downloaded from GitHub # #devtools::install_local(path='yourpath/rSDR-main.zip') # } library(rSDR) ``` # Data Preparation We use the `ionosphere` dataset from the `fdm2id` package. The dataset has 33 predictor variables (V1-V33) and one response variable (V35, ‘g’ for good or ‘b’ for bad). The data include a response variable `Y` with nx1 matrix and predictors `X` with nxp matrix, where n is the number of observation and p is the number of variable. ```{r ionosphere, echo=TRUE} #Load ionosphere data in fdm2id package utils::data("ionosphere", package = "fdm2id") X<-as.matrix(ionosphere[,c(1:33)]) Y<-ifelse(ionosphere[,34]=='b',0,1) # Y will be used for rSDR Y<-matrix(Y,length(Y),1) #Y.factor will be used in plot_rSDR Y.factor<-factor(ionosphere$V35,levels=c('b','g'),labels=c('Bad','Good')) ``` # Robust Sufficient Dimension Reduction - Robust dimension reduction with optimal alpha + beta: estimated SDR directions, i.e. reduced p x d matrix. + result: ManifoldOptim result + projected_data: This projects X onto the estimated SDR directions (X %*% beta), the projected matrix (data) n x d. + f_value: the optimal cost function ## Example: 3-Dimensional Reduction ```{r, echo=TRUE} set.seed(2435) sdr_result<-rSDR(X=X, Y=Y, d=3, alpha=0.3,maxiter=1000,tol=1e-7) # An optimal of C is obtained by maximizing the target function using # ManifoldOptim method head(sdr_result$C_value) #The value of cost function f is equal to the negative of the target function. sdr_result$f_value #beta=sigma_x^(-0.5)%*%C head(sdr_result$beta) #projected_data: This projects X onto the estimated SDR directions (X %*% beta), #the projected matrix (data) n x d head(sdr_result$projected_data) ``` ## Visualizing the 3D Projected Data ```{r, echo=TRUE,fig.height=3,fig.width=4, fig.cap='Plot for 3-dimensional reduction using rSDR'} #Plot for 3-dimensional reduction using rSDR plot_rSDR(projected_data=sdr_result$projected_data,Y=Y.factor,Y.name='group',colors=c("#374E55FF", "#DF8F44FF")) ``` ## Cross-Validation for Optimal Alpha (d=3) - opt.alpha: The optimal alpha that gives minimum mean of cost (target) value (f_test.mean) - f_test.mean: The mean of cost function value by the alpha sequence - a vector of length length(alpha.v). - f_test.sd: The standard deviation of cost function value by the alpha sequence - a vector of length length(alpha.v). - f_test: A kfolds x length(alpha.v) matrix. The cost function value for each fold at a given alpha. - d: The number of reduced dimensions. The value of d as passed to optimal_alpha_cv. - kfolds: The value of kfolds as passed to optimal_alpha_cv. ```{r, echo=TRUE} # plan(multisession) will launch parallel workers running in the background # to save running time. To shut down background workers launched this way, call # plan(sequential) # use all local cores except one # future::plan(future::multisession, workers = future::availableCores() - 1) # use 2 cores for parallel future::plan("multisession", workers = 2) set.seed(2435) opt_results<-optimal_alpha_cv(alpha.v=c(0.3, 0.5, 0.7),X=X,Y=Y,d=3,kfolds=10) #opt_results<-optimal_alpha_cv(alpha.v=c(0.3, 0.4, 0.5, 0.6, 0.7),X=X,Y=Y,d=3,kfolds=10) # Optimal alpha using cross validation opt_results$opt.alpha # The cost function values by alpha (column) for each cross validation (row) head(opt_results$f_test) # The mean of cost function value by alpha opt_results$f_test.mean # The standard deviation of cost function value by alpha opt_results$f_test.sd # The reduced dimension opt_results$d # Number of folds opt_results$kfolds ``` ```{r, fig.height=3,fig.width=3,echo=TRUE,fig.cap='Mean and standard deviation of cost function (k-fold cross-validation method)'} plot_alpha(opt_results=opt_results) ``` ## Bootstrap for Optimal Alpha (d=3) - opt.alpha: the optimal alpha that gives minimum mean of cost (target) value (f_test.mean) - f_test.mean: The mean of cost function value by the alpha sequence - a vector of length length(alpha.v). - f_test.sd: The standard deviation of cost function value by the alpha sequence - a vector of length length(alpha.v). - f_test: A R x length(alpha.v) matrix. The cost function value for each replicates at a given alpha. - d: The number of reduced dimensions. The value of d as passed to optimal_alpha_cv. - R: The value of R as passed to optimal_alpha_boot. the number of bootstrap replicates. The default is 50. ```{r,echo=TRUE} # plan(multisession) will launch parallel workers running in the background # to save running time. To shut down background workers launched this way, call # plan(sequential) # use all local cores except one # future::plan(future::multisession, workers = future::availableCores() - 1) # use 2 cores for parallel future::plan("multisession", workers = 2) set.seed(2435) opt_results<-optimal_alpha_boot(alpha.v=c(0.3,0.5, 0.7),X=X,Y=Y,d=3,R=10) #opt_results<-optimal_alpha_boot(alpha.v=c(0.3,0.4,0.5,0.6, 0.7),X=X,Y=Y,d=3,R=50) # Optimal alpha using bootstrap method opt_results$opt.alpha # The cost function values by alpha (column) for R bootstrap replicates (row) head(opt_results$f_test) # The mean of cost function value by alpha opt_results$f_test.mean # The standard deviation of cost function value by alpha opt_results$f_test.sd # The reduced dimension opt_results$d # Number of bootstrap replicates opt_results$R ``` ```{r,fig.height=3,fig.width=3,fig.cap='Mean and standard deviation of cost function (bootstrap method)'} plot_alpha(opt_results=opt_results) ``` ## Example: 2-Dimensional Reduction and Plotting ```{r} set.seed(2435) sdr_result<-rSDR(X=X, Y=Y, d=2, alpha=0.3,maxiter=1000,tol=1e-7) # An optimal of C is obtained by maximizing the target function using ManifoldOptim method head(sdr_result$C_value) #The value of cost function f is equal to the negative of the target function. sdr_result$f_value #beta=sigma_x^(-0.5)%*%C head(sdr_result$beta) #projected_data: This projects X onto the estimated SDR directions (X %*% beta), the projected matrix (data) n x d head(sdr_result$projected_data) ``` ```{r} #Plot for 2-dimensional reduction using rSDR plot_rSDR(projected_data=sdr_result$projected_data,Y=Y.factor,Y.name='group',colors=c("#374E55FF", "#DF8F44FF")) ``` # Conclusion This document tested the core functionality of the `rSDR` package, including: - Performing Robust Sufficient Dimension Reduction with `rSDR`. - Selecting the optimal alpha using cross-validation (`optimal_alpha_cv`) and bootstrap (`optimal_alpha_boot`). - Visualizing the results with `plot_alpha`, and `plot_rSDR`. The package successfully processes the ionosphere dataset, optimizes the alpha parameter, and produces meaningful visualizations of the reduced dimensions. # Reference Hsin-Hsiung Huang, Feng Yu & Teng Zhang (19 Feb 2024): Robust sufficient dimension reduction via α-distance covariance, Journal of Nonparametric Statistics, DOI:10.1080/10485252.2024.2313137