The pridit package implements the PRIDIT (Principal Component Analysis applied to RIDITs) methodology, a powerful technique for analyzing ordinal data and detecting patterns in multivariate datasets. This vignette provides a comprehensive introduction to the methodology and demonstrates its application using the package functions.
PRIDIT combines two statistical techniques:
Ridit Analysis: Originally developed by Bross (1958), ridit analysis transforms ordinal data into a scale from 0 to 1, making it suitable for further statistical analysis.
Principal Component Analysis (PCA): Applied to the ridit scores to identify the most important underlying factors and create composite scores.
The resulting PRIDIT scores provide a single measure that captures the most significant variation in your data, making it particularly useful for:
The PRIDIT process involves three main steps:
Ridit scores transform your raw data into a standardized form based on the empirical distribution of each variable. For each observation and variable, the ridit score represents the probability that a randomly selected observation would have a lower value.
Using Principal Component Analysis on the ridit scores, we identify the linear combination of variables that explains the most variance in the data. The weights represent the importance of each variable in this optimal combination.
The final PRIDIT scores are computed by applying the weights to the ridit scores, resulting in a single score for each observation that ranges from -1 to 1.
The pridit package provides three main functions:
ridit(): Calculates ridit scores for your dataPRIDITweight(): Computes PRIDIT weights using PCAPRIDITscore(): Calculates final PRIDIT scoresLet’s start with a simple example using healthcare quality data:
library(pridit)
# Create sample healthcare quality data
healthcare_data <- data.frame(
  Hospital_ID = c("A", "B", "C", "D", "E"),
  Smoking_cessation = c(0.9, 0.85, 0.89, 1.0, 0.89),
  ACE_Inhibitor = c(0.99, 0.92, 0.90, 1.0, 0.93),
  Proper_Antibiotic = c(1.0, 0.99, 0.98, 1.0, 0.99)
)
print(healthcare_data)
#>   Hospital_ID Smoking_cessation ACE_Inhibitor Proper_Antibiotic
#> 1           A              0.90          0.99              1.00
#> 2           B              0.85          0.92              0.99
#> 3           C              0.89          0.90              0.98
#> 4           D              1.00          1.00              1.00
#> 5           E              0.89          0.93              0.99# Calculate ridit scores
ridit_scores <- ridit(healthcare_data)
print(ridit_scores)
#>   Claim.ID Smoking_cessation ACE_Inhibitor Proper_Antibiotic
#> 1        A               0.4           0.4               0.6
#> 2        B              -0.8          -0.4              -0.2
#> 3        C              -0.2          -0.8              -0.8
#> 4        D               0.8           0.8               0.6
#> 5        E              -0.2           0.0              -0.2The ridit scores show how each hospital performs relative to the others on each quality measure. Values closer to 1 indicate better performance, while values closer to -1 indicate poorer performance.
# Calculate PRIDIT weights
weights <- PRIDITweight(ridit_scores)
print(weights)
#> Smoking_cessation     ACE_Inhibitor Proper_Antibiotic 
#>         0.8974684         0.9808691         0.9497501The weights tell us the relative importance of each variable in the overall quality assessment. Variables with larger absolute weights contribute more to the final score.
# Calculate final PRIDIT scores
final_scores <- PRIDITscore(ridit_scores, healthcare_data$Hospital_ID, weights)
print(final_scores)
#>   Claim.ID PRIDITscore
#> 1        A   0.4031461
#> 2        B  -0.3936292
#> 3        C  -0.5240944
#> 4        D   0.6284083
#> 5        E  -0.1138308The final PRIDIT scores provide a single quality measure for each hospital. Positive scores indicate above-average quality, while negative scores indicate below-average quality.
The package includes a test dataset that you can use to explore the functionality:
# Load the test dataset
data(test)
print(test)
#>   ID Smoking_cessation ACE_Inhibitor Proper_Antibiotic
#> 1  A              0.90          0.99              1.00
#> 2  B              0.85          0.92              0.99
#> 3  C              0.89          0.90              0.98
#> 4  D              1.00          1.00              1.00
#> 5  E              0.89          0.93              0.99
# Run the complete analysis
ridit_result <- ridit(test)
weights <- PRIDITweight(ridit_result)
final_scores <- PRIDITscore(ridit_result, test$ID, weights)
print(final_scores)
#>   Claim.ID PRIDITscore
#> 1        A   0.4031461
#> 2        B  -0.3936292
#> 3        C  -0.5240944
#> 4        D   0.6284083
#> 5        E  -0.1138308PRIDIT scores range from -1 to 1 and have two important characteristics:
The scores are also multiplicative, meaning a score of 0.6 indicates twice the strength of a score of 0.3.
PRIDIT is particularly useful for combining multiple quality indicators into a single score:
# Hospital quality assessment example
hospital_quality <- data.frame(
  Hospital = paste0("Hospital_", 1:10),
  Mortality_Rate = c(0.02, 0.03, 0.01, 0.04, 0.02, 0.03, 0.01, 0.02, 0.05, 0.01),
  Readmission_Rate = c(0.10, 0.12, 0.08, 0.15, 0.09, 0.11, 0.07, 0.10, 0.16, 0.08),
  Patient_Satisfaction = c(8.5, 7.2, 9.1, 6.8, 8.0, 7.5, 9.3, 8.2, 6.5, 9.0),
  Safety_Score = c(85, 78, 92, 70, 82, 79, 94, 86, 68, 90)
)
# Note: For this example, we'll need to invert mortality and readmission rates
# since lower values indicate better quality
hospital_quality$Mortality_Rate <- 1 - hospital_quality$Mortality_Rate
hospital_quality$Readmission_Rate <- 1 - hospital_quality$Readmission_Rate
# Calculate PRIDIT scores
ridit_scores <- ridit(hospital_quality)
weights <- PRIDITweight(ridit_scores)
quality_scores <- PRIDITscore(ridit_scores, hospital_quality$Hospital, weights)
# Sort by PRIDIT score
quality_ranking <- quality_scores[order(quality_scores$PRIDITscore, decreasing = TRUE), ]
print(quality_ranking)
#>       Claim.ID PRIDITscore
#> 7   Hospital_7  0.47655942
#> 3   Hospital_3  0.37904783
#> 10 Hospital_10  0.32332033
#> 1   Hospital_1  0.07007513
#> 8   Hospital_8  0.07007513
#> 5   Hospital_5  0.02826797
#> 6   Hospital_6 -0.18276586
#> 2   Hospital_2 -0.26634942
#> 4   Hospital_4 -0.39297586
#> 9   Hospital_9 -0.50525468The PRIDIT weights can help identify which variables are most important for distinguishing between high and low performers:
# Create a data frame showing variable importance
variable_names <- colnames(hospital_quality)[-1]  # Exclude ID column
importance_df <- data.frame(
  Variable = variable_names,
  Weight = weights,
  Abs_Weight = abs(weights)
)
# Sort by absolute weight to see most important variables
importance_df <- importance_df[order(importance_df$Abs_Weight, decreasing = TRUE), ]
print(importance_df)
#>                                  Variable    Weight Abs_Weight
#> Mortality_Rate             Mortality_Rate 0.9916197  0.9916197
#> Patient_Satisfaction Patient_Satisfaction 0.9902713  0.9902713
#> Safety_Score                 Safety_Score 0.9902713  0.9902713
#> Readmission_Rate         Readmission_Rate 0.9839797  0.9839797PRIDIT can be particularly useful for tracking changes over time:
# Simulate hospital performance over two time periods
hospitals <- paste0("Hospital_", 1:5)
# Time 1 data
time1_data <- data.frame(
  Hospital = hospitals,
  Quality_A = c(0.85, 0.90, 0.78, 0.92, 0.88),
  Quality_B = c(0.82, 0.85, 0.80, 0.88, 0.84),
  Quality_C = c(0.90, 0.87, 0.85, 0.91, 0.86)
)
# Time 2 data
time2_data <- data.frame(
  Hospital = hospitals,
  Quality_A = c(0.88, 0.91, 0.82, 0.93, 0.85),
  Quality_B = c(0.85, 0.87, 0.83, 0.89, 0.82),
  Quality_C = c(0.92, 0.88, 0.87, 0.93, 0.88)
)
# Calculate PRIDIT scores for both time periods
time1_ridit <- ridit(time1_data)
time1_weights <- PRIDITweight(time1_ridit)
time1_scores <- PRIDITscore(time1_ridit, time1_data$Hospital, time1_weights)
time2_ridit <- ridit(time2_data)
time2_weights <- PRIDITweight(time2_ridit)
time2_scores <- PRIDITscore(time2_ridit, time2_data$Hospital, time2_weights)
# Combine results for comparison
longitudinal_results <- merge(time1_scores, time2_scores, by = "Claim.ID", suffixes = c("_Time1", "_Time2"))
longitudinal_results$Change <- longitudinal_results$PRIDITscore_Time2 - longitudinal_results$PRIDITscore_Time1
print(longitudinal_results)
#>     Claim.ID PRIDITscore_Time1 PRIDITscore_Time2       Change
#> 1 Hospital_1        -0.1332252         0.1110432  0.244268406
#> 2 Hospital_2         0.2358129         0.1759543 -0.059858601
#> 3 Hospital_3        -0.6768010        -0.5726373  0.104163668
#> 4 Hospital_4         0.6768010         0.6850380  0.008237074
#> 5 Hospital_5        -0.1025876        -0.3993982 -0.296810547The PRIDIT methodology provides a powerful approach for analyzing multivariate ordinal data and creating meaningful composite scores. The pridit package makes this methodology accessible through simple, well-documented functions that can be easily integrated into your analysis workflow.
For more information about the theoretical foundations of PRIDIT, see the references below.