Type: Package
Title: Information Assessment for Individual Modalities in Multimodal Regression Models
Version: 1.0
Description: Provides methods for quantifying the information gain contributed by individual modalities in multimodal regression models. Information gain is measured using Expected Relative Entropy (ERE) or pseudo-R² metrics, with corresponding p-values and confidence intervals. Currently supports linear and logistic regression models with plans for extension to additional Generalized Linear Models and Cox proportional hazard model.
License: GPL-3
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: tidyverse, MASS, SIS, glmnet, ncvreg, MBESS, survival, dplyr
Depends: R (≥ 3.6.0)
NeedsCompilation: no
Packaged: 2025-08-29 20:33:59 UTC; 10518
Author: Wanting Jin [aut, cre], Quefeng Li [aut]
Maintainer: Wanting Jin <jinwanting5@gmail.com>
LazyData: true
Repository: CRAN
Date/Publication: 2025-09-03 21:30:02 UTC

Example Dataset

Description

A toy dataset to demonstrate running this package on multimodal linear models.

Usage

data_linear_model

Format

A data object that contains

y

A vector of 200 observations of continuous outcomes.

X

A 200 \times 600 matrix containing all training data.

mod.idx

A list of modality indices.


Example Dataset

Description

A toy dataset to demonstrate running this package on multimodal logistic models.

Usage

data_logistic_model

Format

A data object that contains

y

A vector 200 observations of outcomes. (0 or 1)

X

A 200 \times 600 matrix containing all training data.

mod.idx

A list of modality indices.


Modality Assessment in Multimodal Generalized Linear Models

Description

Provides statistical inference for modality-specific information gain in multimodal GLMs. Estimates ERE and pseudo-R² with confidence intervals and p-values using Sure Independence Screening for variable selection and penalized likelihood for inference.

Usage

mglm.test(
  X,
  y,
  mod.idx,
  family = c("gaussian", "binomial"),
  iter = TRUE,
  penalty = c("SCAD", "MCP", "lasso"),
  tune = c("bic", "ebic", "aic"),
  lambda = NULL,
  nlambda = 100,
  conf.level = 0.95,
  CI.type = c("two.sided", "one.sided"),
  trace = FALSE
)

Arguments

X

The n \times p data matrix consisting of features from all modalities.

y

The n \times 1 vector of response.

mod.idx

A list of column indices for all modalities in the concatenated data matrix X.

family

A description of the error distribution and link function to be used in the model. Currently, we allow the Binomial ("binomial") and Gaussian ("gaussian") families with canonical links only.

iter

Specifies whether to perform iterative SIS. The default is iter=TRUE.

penalty

Specifies the type of penalty to be used in the variable selection and inference procedure. Options include 'MCP', 'SCAD', and 'lasso'. The default is penalty='SCAD'.

tune

Specifies the method for selecting the optimal tuning parameters in (I)SIS and penalized likelihood procedure. Options include 'bic', 'ebic' and 'aic'. The default is tune='bic'.

lambda

A user-specified decreasing sequence of lambda values for penalized likelihood procedure. By default, a sequence of values of length nlambda is automatically computed and equally spaced on the log scale.

nlambda

The number of lambda values. The default is 100.

conf.level

Levels of the confidence interval. The default is conf.level=0.95.

CI.type

A string specifying the type of the confidence interval. Options include 'two.sided' and 'one.sided'. The default is CI.type='two.sided'.

trace

Specifies whether to print out logs of iterations in SIS procedure. The default is trace=FALSE.

Value

An object with S3 class "mglm.test" containing:

sel.idx

List of indices of selected features by (I)SIS in each modality.

num.nonzeros

Number of selected features by (I)SIS in each modality.

ERE

Point estimation of ERE for each modality.

ERE.CI.L

Lower bound of the confidence interval of ERE for each modality

ERE.CI.U

Upper bound of the confidence interval of ERE for each modality

R2

Point estimate of pseudo-R^2 for each modality.

R2.CI.L

Lower bound of the confidence interval of pseudo-R^2 for each modality

R2.CI.U

Upper bound of the confidence interval of pseudo-R^2 for each modality

conf.level

Level of confidence intervals.

Examples

## Example 1: Linear model
data(data_linear_model)
X <- data_linear_model$X
y <- data_linear_model$y
mod.idx <- data_linear_model$mod.idx
test <- mglm.test(X = X, y = y, mod.idx = mod.idx, family = "gaussian",
               iter = TRUE, penalty = "SCAD", tune = "bic",
               conf.level = 0.95, CI.type = "one.sided")
summary(test)


## Example 2: Logistic regression
data(data_logistic_model)
X <- data_logistic_model$X
y <- data_logistic_model$y
mod.idx <- data_logistic_model$mod.idx
test <- mglm.test(X = X, y = y, mod.idx = mod.idx, family = "binomial",
               iter = TRUE, penalty = "SCAD", tune = "bic",
               conf.level = 0.95, CI.type = "two.sided")
sum.test <- summary(test)




Summary method for objects of class "mglm.test"

Description

Summary method for objects of class "mglm.test"

Usage

## S3 method for class 'mglm.test'
summary(object, ...)

## S3 method for class 'summary.mglm.test'
print(x, ...)

Arguments

object

An mglm.test object.

...

Additional arguments that could be passed to summary.mglm.test.

x

A summary.mglm.test object.

Value

An object with S3 class summary.mglm.test. The class has its own print method and contains the following list of elements.

sum.ERE

The summary table of point estimate and confidence interval of ERE for each modality.

sum.R2

The summary table of point estimate and confidence interval of pseudo-R^2 for each modality.

conf.level

Level of confidence intervals.

sel.mod

Index of the most informative modality.