--- title: "Creating Baseline Characteristics Tables" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating Baseline Characteristics Tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE, error = TRUE ) ``` ```{r setup} library(clinpubr) library(dplyr) library(survival) ``` ## Introduction Baseline characteristics tables (Table 1) summarize patient demographics and clinical features at study entry. The `clinpubr` package automates the key decisions: variable type classification, normality assessment, statistical test selection, and missing data reporting. ## Loading and Preparing Data We'll use the NCCTG Lung Cancer dataset from the `survival` package: ```{r load-data} data(cancer, package = "survival") str(cancer) knitr::kable(head(cancer), caption = "Raw Data Preview") ``` Create derived variables for demonstration: ```{r prepare-data} cancer$age_group <- cut(cancer$age, breaks = c(0, 50, 60, 70, 100), labels = c("<50", "50-60", "60-70", ">70") ) # Combine sparse ECOG categories cancer$ph.ecog_cat <- factor(cancer$ph.ecog, levels = c(0:3), labels = c("0", "1", ">=2", ">=2") ) # Add missing values for demonstration set.seed(123) cancer$meal.cal[sample(1:nrow(cancer), 30)] <- NA cancer$wt.loss[sample(1:nrow(cancer), 20)] <- NA knitr::kable(head(cancer), caption = "Data After Preparation") ``` ## Automatic Variable Type Determination Before creating a baseline table, `get_var_types()` classifies each variable as: - **Factor variables**: Categorical (factors or numeric with few unique values) - **Non-normal variables**: Continuous variables failing normality tests - **Exact test variables**: Categorical variables with small cell counts (Fisher's exact test) - **Omitted variables**: Variables with too many levels ```{r get-var-types} var_types <- get_var_types(cancer, strata = "sex") var_types ``` ### Customizing Classification Adjust thresholds for automatic classification: ```{r customize-var-types} var_types_custom <- get_var_types( cancer, strata = "sex", num_to_factor = 10, # Numeric vars with <=10 unique values treated as factor omit_factor_above = 15, # Omit factors with >15 levels norm_test_by_group = TRUE # Test normality within each stratum ) var_types_custom ``` Save QQ plots for manual review of normality tests (optional): ```{r save-qqplots, eval = FALSE} # var_types_with_plots <- get_var_types( # cancer, strata = "sex", # save_qqplots = TRUE, folder_name = "qqplots_review" # ) ``` ## Creating Baseline Tables ### Basic Baseline Table `baseline_table()` automatically selects summary statistics (mean/SD vs median/IQR) and statistical tests (t-test vs Mann-Whitney vs Chi-square vs Fisher): ```{r basic-baseline} baseline_result <- baseline_table( cancer, var_types = var_types, save_table = FALSE ) knitr::kable(baseline_result$baseline, caption = "Baseline Characteristics by Sex") ``` ### Multi-Group Comparisons With more than 2 groups, pairwise comparisons are automatically generated with optional multiple testing correction: ```{r multi-group} data(cancer, package = "survival") cancer$ph.ecog_cat <- factor(cancer$ph.ecog, levels = c(0:3), labels = c("0", "1", ">=2", ">=2") ) var_types_ecog <- get_var_types(cancer, strata = "ph.ecog_cat") baseline_multi <- baseline_table( cancer, var_types = var_types_ecog, save_table = FALSE, multiple_comparison_test = TRUE, p_adjust_method = "BH" ) knitr::kable(baseline_multi$baseline, caption = "Baseline Characteristics by ECOG Status") knitr::kable(baseline_multi$pairwise, caption = "Pairwise Comparison P-values") ``` ### Customizing the Table Select specific variables, add SMD, handle missing strata: ```{r customize-baseline} baseline_custom <- baseline_table( cancer, var_types = var_types, vars = c("age", "wt.loss", "meal.cal", "ph.ecog"), smd = TRUE, omit_missing_strata = TRUE, seed = 123 ) knitr::kable(baseline_custom$baseline, caption = "Customized Baseline Table") ``` ### Missing Data Summary ```{r missing-table} knitr::kable(baseline_result$missing, caption = "Missing Data Summary") ``` ## Manual Override Override automatic classification based on clinical knowledge or manual review: : ```{r manual-override} baseline_manual <- baseline_table( cancer, strata = "sex", factor_vars = c("ph.ecog", "pat.karno"), nonnormal_vars = c("age"), exact_vars = c("ph.ecog") ) knitr::kable(baseline_manual$baseline, caption = "Baseline Table with Manual Overrides") ``` ## Saving Results Save all tables to CSV files: ```{r save-results, eval = FALSE} # baseline_saved <- baseline_table( # cancer, var_types = var_types, # save_table = TRUE, filename = "baseline_characteristics.csv" # ) ``` ## Complete Workflow A streamlined 5-step workflow from data preparation to final table: ```{r complete-workflow} # Step 1: Prepare data data(cancer, package = "survival") cancer_clean <- cancer %>% mutate( age_group = cut(age, breaks = c(0, 50, 60, 70, 100), labels = c("<50", "50-60", "60-70", ">70") ), ph.ecog_cat = factor(ph.ecog, levels = c(0:3), labels = c("0", "1", ">=2", ">=2") ), sex = factor(sex, labels = c("Male", "Female")) ) # Step 2: Determine variable types var_types <- get_var_types(cancer_clean, strata = "sex", num_to_factor = 5) # Step 3: Review classification knitr::kable(data.frame( Variable_Type = c("Factor", "Non-normal", "Exact"), Variables = c( paste(var_types$factor_vars, collapse = ", "), paste(var_types$nonnormal_vars, collapse = ", "), paste(var_types$exact_vars, collapse = ", ") ) ), caption = "Variable Type Review") # Step 4: Create baseline table baseline_final <- baseline_table( cancer_clean, var_types = var_types, smd = TRUE ) # Step 5: Review results knitr::kable(baseline_final$baseline, caption = "Final Baseline Characteristics Table") knitr::kable(baseline_final$missing, caption = "Final Missing Data Summary") ``` ## Summary ### Key Functions - **`get_var_types()`**: Automatic variable type determination with customizable thresholds - **`baseline_table()`**: Create comprehensive baseline tables with automatic test selection ### Best Practices 1. **Review automatic classifications** --- clinical knowledge should override statistical defaults when appropriate 2. **Include SMD for observational studies** --- standardized mean differences help assess group balance 3. **Handle missing data transparently** --- report missing patterns in your tables 4. **Use BH correction** for multi-group pairwise comparisons