Introduction to the Topic Testlet Model

The Topic Testlet Model (TTM) integrates topic modeling (Latent Dirichlet Allocation) with psychometric models (Partial Credit Model) to calibrate testlet-based assessments. This approach uses the textual content of student responses to account for local item dependence (LID) caused by shared stimuli. This vignette demonstrates how to: Pre-process item scores and textual responses. Determine the optimal number of latent topics. Extract person-specific topic proportions (delta). Calibrate the TTM to estimate student ability (theta), topic penalties (lambda), and testlet effects (gamma).

Setup First, we load the package. We also set a seed for reproducibility.

library(TopicTestlet)
set.seed(1234)

Simulating Data For this tutorial, we will simulate a dataset representing 100 students responding to a testlet with 4 items. Scores: Polytomous scores (0, 1, 2). Essays: Textual responses generated from two distinct vocabularies (representing 2 latent topics).

# Simulation parameters
N_students <- 100
J_items <- 4
K_topics_true <- 2

# A. Simulate Numeric Scores (0-2)
score_matrix <- matrix(
  sample(0:2, N_students * J_items, replace = TRUE), 
  nrow = N_students, 
  ncol = J_items
)

# B. Simulate Textual Essays
# Define vocabularies for two topics
vocab_topic1 <- c("logic", "reasoning", "evidence", "fact", "analysis", "data")
vocab_topic2 <- c("feeling", "story", "character", "plot", "narrative", "mood")

# Helper function to generate a random essay
generate_essay <- function() {
  words <- sample(c(vocab_topic1, vocab_topic2), size = 20, replace = TRUE)
  paste(words, collapse = " ")
}

# Create matrix of essays
essay_matrix <- matrix(
  replicate(N_students * J_items, generate_essay()),
  nrow = N_students,
  ncol = J_items
)

# Preview the data
head(score_matrix, 3)
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    1    0    1
#> [2,]    1    0    2    0
#> [3,]    0    2    1    0
substr(essay_matrix[1,1], 1, 50) # First 50 chars of first essay
#> [1] "reasoning story reasoning analysis data story mood"

Aggregating Responses The TTM treats the collection of responses within a testlet as a single “document” for each student. We use aggregate_responses() to combine the text from all items.

text_vector <- aggregate_responses(essay_matrix)

# Check the first student's aggregated text
substr(text_vector[1], 1, 60)
#> [1] "reasoning story reasoning analysis data story mood character"

Determining the Number of Topics We use Perplexity to determine the optimal number of latent topics (K). A lower perplexity indicates a better model fit. We will test K=2 and K=3.

# In a real analysis, you might check a wider range (e.g., 2:10)
perp_results <- ttm_perplexity(text_vector, k_range = 2:3)
#> Calculating perplexity...
#>   Fitting LDA with k = 2
#>   Fitting LDA with k = 3

print(perp_results)
#>   k perplexity
#> 1 2   12.00455
#> 2 3   12.01140

# Select the K with the lowest perplexity
best_k <- perp_results$k[which.min(perp_results$perplexity)]
cat("Optimal number of topics:", best_k)
#> Optimal number of topics: 2

Extracting Topic Proportions Using the optimal K, we fit the Latent Dirichlet Allocation (LDA) model to extract the topic proportion matrix (delta). This matrix represents the probability of each topic in a student’s response.

delta_matrix <- ttm_lda(text_vector, k = best_k)
#> Fitting LDA with k = 2

# The result is an N x K matrix
head(delta_matrix)
#>           [,1]      [,2]
#> [1,] 0.4968303 0.5031697
#> [2,] 0.4973594 0.5026406
#> [3,] 0.4971421 0.5028579
#> [4,] 0.4999181 0.5000819
#> [5,] 0.5029592 0.4970408
#> [6,] 0.4991809 0.5008191

Estimating the TTM Finally, we calibrate the Topic Testlet Model using the scores and the extracted topic proportions. This estimates: Theta: Student latent ability. Gamma: Person-specific testlet effect (calculated as the inner product of lambda and delta. Item Parameters: Step difficulties.

# We use max_iter = 50 for speed in this vignette. 
# For operational use, allow more iterations for convergence.
ttm_results <- ttm_est(
  scores = score_matrix, 
  delta = delta_matrix, 
  max_iter = 50
)
#> Iter: 1 | LogLik: -371.8979 | Diff: Inf
#> Iter: 2 | LogLik: -375.6258 | Diff: 3.7280
#> Iter: 3 | LogLik: -376.0733 | Diff: 0.4475
#> Iter: 4 | LogLik: -376.1024 | Diff: 0.0291
#> Iter: 5 | LogLik: -376.1017 | Diff: 0.0007
#> Iter: 6 | LogLik: -376.0989 | Diff: 0.0028
#> Iter: 7 | LogLik: -376.0989 | Diff: 0.0000

# Model Fit Statistics
print(paste("AIC:", round(ttm_results$AIC, 2)))
#> [1] "AIC: 768.2"
print(paste("BIC:", round(ttm_results$BIC, 2)))
#> [1] "BIC: 789.04"

Visualizing Results The TTM allows us to analyze the relationship between student ability and the testlet effect. In many empirical applications, we observe an interaction where students with different ability levels interact with the testlet content differently.

plot(ttm_results$theta, ttm_results$gamma,
     xlab = "Student Ability (Theta)",
     ylab = "Testlet Effect (Gamma)",
     main = "Relationship between Ability and Testlet Effect",
     pch = 19, col = rgb(0, 0, 1, 0.6))
grid()
abline(lm(ttm_results$gamma ~ ttm_results$theta), col = "red", lwd = 2)

In this simulated example, we can examine the distribution of the estimated abilities:

hist(ttm_results$theta, 
     main = "Distribution of Estimated Abilities",
     xlab = "Theta", 
     col = "lightblue", 
     border = "white")

References Xiong, J., Kuang, H., Tang, C., Liu, Q., Wang, B., Engelhard, G., Cohen, A. S., Xiong, X., & Sheng, R. (2025). A Topic Testlet Model for Calibrating Testlet Constructed Responses. Journal of Educational Measurement.