This document accompanies the “A method to estimate probability of disease and vaccine efficacy from clinical trial immunogenicity data.” publication. It describes the application of PoDBAY package on the PoDBAY efficacy estimation examples using data from clinical trial(s).
The goal of PoDBAY efficacy estimation analysis is to:
We describe two scenarios of application in PoDBAY efficacy estimation:
PoDBAY efficacy is estimated in two subsequent steps as described in the publication, section Methods.
Notes:
PoD curve is estimated (point estimate together with confidence intervals) in three steps - further details can be found in the publication, section Methods.
Titers of all diseased and all non-diseased subjects are used for estimation of PoD curve parameters. Parameter estimates \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\) are obtained.
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\). New disease status is assigned to each titer based on the probability of disease.
Titers of all new diseased and all new non-diseased subjects are used for re-estimation of PoD curve parameters. Parameter estimates \(p_{max}^{``}\), \(et_{50}^{``}\) and \(\gamma^{``}\) are obtained.
Diseased and non-diseased subject level data are required. We’ll use PoDBAY::diseased and PoDBAY::nondiseased mock-up data. Both datasets contain population summary statistics (N, mean, sd) and individual subject level data (log2 titers, disease status (DS))
Only the individual subject level data (log2 titers, DS) are used for the PoD curve estimation as described above.
library(PoDBAY)
data(diseased)
data(nondiseased)
str(diseased)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#>  $ N                  : int 35
#>  $ mean               : num 3.83
#>  $ stdDev             : num 1.66
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : Named num [1:35] 5.59 6.07 2.43 5.84 6.29 ...
#>   ..- attr(*, "names")= chr [1:35] "vacc" "vacc" "vacc" "vacc" ...
#>  $ PoDs               : num(0) 
#>  $ diseaseStatus      : logi [1:35] TRUE TRUE TRUE TRUE TRUE TRUE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
str(nondiseased)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#>  $ N                  : int 1965
#>  $ mean               : num 6.01
#>  $ stdDev             : num 2.3
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : Named num [1:1965] 5.75 7.37 5.33 10.19 7.66 ...
#>   ..- attr(*, "names")= chr [1:1965] "vacc" "vacc" "vacc" "vacc" ...
#>  $ PoDs               : num(0) 
#>  $ diseaseStatus      : logi [1:1965] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popXNote: To convert your data in to the population class object use generatePopulation() function from PoDBAY package. See vignette vignette("population", package = "PoDBAY") for further details.
Once we have our data prepared function PoDParamEstimation is used to estimate PoD curve parameters in three steps as described above. For more details about the usage of the function see examples in ?PoDParamEstimation().
estimatedParameters <- PoDParamEstimation(diseasedTiters = diseased$titers,
                                          nondiseasedTiters = nondiseased$titers, 
                                          nondiseasedGenerationCount = nondiseased$N,
                                          repeatCount = 50)Step 1: \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\)
Results corresponding to the first step of estimation of PoD-titer relationship can be obtained via estimatedParameters$resultsPriorReset.
#> # A tibble: 50 x 3
#>      pmax slope  et50
#>     <dbl> <dbl> <dbl>
#>  1 0.0343  28.5  6.05
#>  2 0.0343  28.5  6.05
#>  3 0.0343  28.5  6.05
#>  4 0.0343  28.5  6.05
#>  5 0.0343  28.5  6.05
#>  6 0.0343  28.5  6.05
#>  7 0.0343  28.5  6.05
#>  8 0.0343  28.5  6.05
#>  9 0.0343  28.5  6.05
#> 10 0.0343  28.5  6.05
#> # … with 40 more rowsNote that parameter estimates are the same for every repeatCount iteration. This is according to our expectations as the same diseased and non-diseased cases are used in every iteration in step 1 of this example.
Step 2: Bootstrap and re-assignment of disease status Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\). New disease status is assigned to each titer based on the probability of disease.
Step 3: \(p^{``}_{max}\), \(et^{``}_{50}\) and \(\gamma^{``}\)
Results corresponding to the third step of Estimation of PoD-titer relationship can be obtained via estimatedParameters$results.
#> # A tibble: 50 x 3
#>      pmax slope  et50
#>     <dbl> <dbl> <dbl>
#>  1 0.0349  31.4  5.89
#>  2 0.0361  26.9  6.37
#>  3 0.0371  28.5  6.01
#>  4 0.0288  28.5  6.31
#>  5 0.0321  33.2  6.02
#>  6 0.0331  27.4  5.98
#>  7 0.0464  16.3  5.66
#>  8 0.0479  17.7  5.69
#>  9 0.0378  28.5  6.01
#> 10 0.0324  29.5  6.07
#> # … with 40 more rowsNon-parametric bootstrap described in step 2 is applied inside the function. Therefore, the estimated PoD curve parameters differ in this case.
Parameters of PoD curve point estimate representing the PoD-titer relationship are estimated using results from ‘step 1’ - estimatedParameters$resultsPriorReset.
Confidence intervals (95% level of significance) of PoD curve parameters are calculated using results from ‘step 3’ - estimatedParameters$results.
PoDBAY Efficacy (point estimate together with confidence intervals) is estimated - further details can be found in the publication, section Methods.
PoDParamsPointEst from step 1 PoD-titer relationship estimation - Trial AestimatedParameters$results from step 1 PoD-titer relationship estimation - Trial Astep 2) and standard deviationsstep 3Vaccinated and control population summary statistics (N, mean, sd) are required. We’ll use PoDBAY::vaccinated and PoDBAY::control mock-up data. Both datasets contain population summary statistics (N, mean, sd) and individual subject level log2 titers.
Only the population summary statistics (N, mean, sd) data are used for the PoDBAY efficacy estimation as described above.
data(vaccinated)
data(control)
str(vaccinated)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#>  $ N                  : num 1000
#>  $ mean               : num 7
#>  $ stdDev             : num 2
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : num [1:1000] 5.75 7.37 5.33 10.19 7.66 ...
#>  $ PoDs               : num [1:1000] 0.0137 0.00311 0.01952 0.00034 0.00241 ...
#>  $ diseaseStatus      : logi [1:1000] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
str(control)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#>  $ N                  : num 1000
#>  $ mean               : num 5
#>  $ stdDev             : num 2
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : num [1:1000] 7.27 7.22 3.26 5.42 5.14 ...
#>  $ PoDs               : num [1:1000] 0.00339 0.00354 0.04762 0.0181 0.02261 ...
#>  $ diseaseStatus      : logi [1:1000] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popXNote: To convert your data in to the population class object use generatePopulation() function from PoDBAY package. See vignette vignette("population", package = "PoDBAY") for further details.
Once we have our data prepared function efficacyComputation is used to estimate Efficacy point estimate as described above in step 1.
Jittering of population mean from step 2 by drawing from sampling distribution is done inside of PoDBAYEfficacy function. Efficacy set is estimated as described above in step 3
Analysis provides following results:
result <- list(
  EfficacyPointEst = EfficacyPointEst,
  efficacyCI = unlist(CI),
  PoDParamsPointEst = PoDParamsPointEst,
  PoDParametersCI = unlist(PoDParametersCI),
  PoDCurve = PoDCurve
)
  
result
#> $EfficacyPointEst
#> [1] 0.5383405
#> 
#> $efficacyCI
#>      mean    median     CILow    CIHigh 
#> 0.5429269 0.5492815 0.4767003 0.6106386 
#> 
#> $PoDParamsPointEst
#>            pmax    slope     et50
#> pmax 0.03430108 28.51659 6.051696
#> 
#> $PoDParametersCI
#>   PmaxCILow  PmaxCIHigh   Et50CILow  Et50CIHigh  SlopeCILow SlopeCIHigh 
#>  0.02425859  0.04787302  5.53529308  6.36748061 12.86874466 34.17583340 
#> 
#> $PoDCurveIn a frequent case when serum samples at baseline and after vaccination are collected and assayed only in a subset of subjects (“immunogenicity sample/ subset”) and the assay value of titer is obtained also for all disease cases at the same time points, the general method for PoD curve estimation described above can be extended. Further details can be found in the publication Appendix A.
PoD curve is estimated (point estimate together with confidence intervals) in three steps.
Titers of all non-diseased subjects are generated by random sampling with replacement from immunogenicity subset.
Titers of all diseased and all generated non-diseased subjects (generated in step 1) are used for estimation of PoD curve parameters. Parameter estimates \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\) are obtained.
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\). New disease status is assigned to each titer based on the probability of disease.
New immunogenicity subset is selected from all new non-diseased, such that the ratio of all diseased versus non-diseased in immunogenicity subset in new data match the ratio in original data.
Titers of all new non-diseased subjects are generated by random sampling with replacement from new immunogenicity subset.
Titers of all new diseased and all new generated non-diseased subjects (generated in step 5) are used for re-estimation of PoD curve parameters. Parameter estimates \(p_{max}^{``}\), \(et_{50}^{``}\) and \(\gamma^{``}\) are obtained.
Assume hypothetical case where we have clinical trial data of 2,000 subjects from which only 200 subjects’ plasma samples are collected and examined in the immunogenicity study. Further, out of these 2,000 we identify 35 disease cases to which we measure titers from the same time point. In the end we have titer information about 200 subjects from the immunogenicity study and 35 diseased subjects.
| Population | # subjects (N) | 
|---|---|
| Whole Trial | |
| All subjects | 2,000 | 
| Diseased | 35 | 
| Non-diseased | 1,965 | 
| Measured titers | |
| Diseased | 35 | 
| Immunogenicity sample | 200 | 
Note that in the immunogenicity sample the disease status is unknown as the sample is created before the clinical study. However, vaccination status is known.
In our example the steps would be following:
Titers of all non-diseased subjects (N = 1,965) are generated by random sampling with replacement from immunogenicity subset (N = 200).
Titers of all diseased (N = 35) and all generated non-diseased (N = 1,965) subjects are used for estimation of PoD curve parameters.
Titers of all diseased (N = 35) and all generated non-diseased (N = 1,965) subjects are put together and bootstrapped (N = 2,000).
New immunogenicity subset is selected from all new non-diseased (\(N^`\) = 2000 - X), such that the ratio of all new diseased (\(N^`\) = X) versus new non-diseased in immunogenicity subset in new data match the ratio in original data (ratio = 200:35).
| Population | # subjects (\(N^`\)) | 
|---|---|
| New diseased | \(X\) | 
| New non-diseased | \(2000 - X\) | 
| New Immunogenicity sample | \(X * \frac{200}{35}\) | 
Titers of all new non-diseased subjects are generated by random sampling with replacement from new immunogenicity subset (\(N^` = X * \frac{200}{35}\))
Titers of all new diseased (\(N^` = X\)) and all new generated non-diseased subjects (\(N^` = 2000 - X\)) are used for second estimation of PoD curve parameters.
Diseased and non-diseased subject level data are required. We’ll use PoDBAY::diseased and PoDBAY::nondiseased mock-up data. Both datasets contain population summary statistics (N, mean, sd) and individual subject level data (log2 titers, diseases status (DS))
Only the individual subject level data (log2 titers, DS) are used for the PoD curve estimation as described above.
We create the immunogenicity sample from our mock-up data as described above - We start with the titer information about 200 subjects from the immunogenicity study and 35 diseased subjects.
data(diseased)
data(nondiseased)
# Immunogenicity sample created
ImmunogenicitySample <- BlindSampling(diseased, nondiseased, method = list(name = "Fixed", value = 200))
nondiseasedImmunogenicitySample <- ImmunogenicitySample$ImmunogenicityNondiseased
str(diseased)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#>  $ N                  : int 35
#>  $ mean               : num 3.83
#>  $ stdDev             : num 1.66
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : Named num [1:35] 5.59 6.07 2.43 5.84 6.29 ...
#>   ..- attr(*, "names")= chr [1:35] "vacc" "vacc" "vacc" "vacc" ...
#>  $ PoDs               : num(0) 
#>  $ diseaseStatus      : logi [1:35] TRUE TRUE TRUE TRUE TRUE TRUE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
str(nondiseasedImmunogenicitySample)
#> Reference class 'Population' [package "PoDBAY"] with 8 fields
#>  $ N                  : int 196
#>  $ mean               : num 5.93
#>  $ stdDev             : num 2.45
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : Named num [1:196] 7.7 8.59 8.15 9.1 11.89 ...
#>   ..- attr(*, "names")= chr [1:196] "vacc" "vacc" "vacc" "vacc" ...
#>  $ PoDs               : num(0) 
#>  $ diseaseStatus      : logi [1:196] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popXNote: From now on the analysis and used functions are the same as in general case. Only the input variable change from unifected to NondiseasedImmunogenicitySample. The nondiseasedGenerationCount remains the same as the total number of nondiseased remains the same in the whole trial.
Once we have our data prepared, function PoDParamEstimation is used to estimate PoD curve parameters in six steps as described above. For more details about the usage of the function see examples in ?PoDParamEstimation().
estimatedParametersAP <- PoDParamEstimation(diseasedTiters = diseased$titers,
                                            nondiseasedTiters = nondiseasedImmunogenicitySample$titers, 
                                            nondiseasedGenerationCount = nondiseased$N,
                                            repeatCount = 50)Step 1: \(p_{max}^`\), \(et_{50}^`\) and \(\gamma^`\)
Results corresponding to the first step of Estimation of PoD-titer relationship can be obtained via estimatedParametersAP$resultsPriorReset.
#> # A tibble: 49 x 3
#>      pmax slope  et50
#>     <dbl> <dbl> <dbl>
#>  1 0.0324  34.3  6.05
#>  2 0.0318  34.3  6.08
#>  3 0.0322  34.3  6.08
#>  4 0.0324  34.3  6.08
#>  5 0.0315  34.3  6.10
#>  6 0.0316  34.3  6.07
#>  7 0.0325  34.3  6.09
#>  8 0.0322  34.3  6.10
#>  9 0.0316  34.3  6.09
#> 10 0.0323  34.3  6.07
#> # … with 39 more rowsNote that parameter estimates are now different for each repeatCount iteration. This is according to our expectations as titers of all non-diseased subjects are generated by random sampling with replacement from immunogenicity subset in every iteration in step 1 of this example.
Step 2: Data generation and re-assignment of disease status
Step 3: \(p^{``}_{max}\), \(et^{``}_{50}\) and \(\gamma^{``}\)
Results corresponding to the sixth step of estimation of PoD-titer relationship can be obtained via estimatedParametersAP$results.
#> # A tibble: 49 x 3
#>      pmax slope  et50
#>     <dbl> <dbl> <dbl>
#>  1 0.0347 39.8   5.99
#>  2 0.0231 16.0   5.89
#>  3 0.0351 21.4   5.77
#>  4 0.0419 12.1   5.46
#>  5 0.0301 39.8   6.04
#>  6 0.0483  4.59  3.92
#>  7 0.0393 43.7   5.94
#>  8 0.0386 36.8   6.06
#>  9 0.0314 40.5   6.35
#> 10 0.0263 32.7   6.00
#> # … with 39 more rowsNon-parametric bootstrap described in step 3 together with creation of new immunogenicity sample in step 4-5is applied inside the function.
Parameters of PoD curve point estimate representing the PoD-titer relationship are estimated using results from ‘step 1’ - estimatedParametersAP$resultsPriorReset.
Confidence intervals (80%, 90% and 95% level of significance) of PoD curve parameters are calculated using results from ‘step 6’ - estimatedParametersAP$results.
There are two possible situations:
We will describe the approach in the situation where Trial A = Trial B.
As stated above the only difference is in the data availability. The fact that vaccinated and control population summary statistics (N, mean, sd) are required remains the same. Therefore, we calculate summary statistics for both populations using immunogenicity subset data - created in the PoD-titer relationship estimation step.
# Immunogenicity sample - vaccinated
str(ImmunogenicitySample$ImmunogenicityVaccinated)
#> Reference class 'Population' [package "PoDBAY"] with 8 fields
#>  $ N                  : int 97
#>  $ mean               : num 7.03
#>  $ stdDev             : num 2.34
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : Named num [1:97] 7.7 8.59 8.15 9.1 11.89 ...
#>   ..- attr(*, "names")= chr [1:97] "vacc_FALSE" "vacc_FALSE" "vacc_FALSE" "vacc_FALSE" ...
#>  $ PoDs               : num(0) 
#>  $ diseaseStatus      : logi [1:97] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
# Immunogenicity sample - control
str(ImmunogenicitySample$ImmunogenicityControl)
#> Reference class 'Population' [package "PoDBAY"] with 8 fields
#>  $ N                  : int 103
#>  $ mean               : num 4.73
#>  $ stdDev             : num 2.13
#>  $ unknownDistribution: logi FALSE
#>  $ UDFunction         :function ()  
#>  $ titers             : Named num [1:103] 2.17 4.53 2.51 6.08 4.22 ...
#>   ..- attr(*, "names")= chr [1:103] "control_FALSE" "control_FALSE" "control_FALSE" "control_FALSE" ...
#>  $ PoDs               : num(0) 
#>  $ diseaseStatus      : logi [1:103] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  and 24 methods, of which 10 are  possibly relevant:
#>    assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#>    getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popXmeans <- list("vaccinated" = ImmunogenicitySample$ImmunogenicityVaccinated$mean,
                   "control" = ImmunogenicitySample$ImmunogenicityControl$mean)
  
standardDeviations  <- list("vaccinated" = ImmunogenicitySample$ImmunogenicityVaccinated$stdDev,
                                 "control" = ImmunogenicitySample$ImmunogenicityControl$stdDev)
  
EfficacyPointEst <- efficacyComputation(PoDParamsPointEst, 
                                        means, 
                                        standardDeviations)
EfficacyPointEst
#> [1] 0.5298448Analysis provides following results:
result <- list(
  EfficacyPointEst = EfficacyPointEst,
  efficacyCI = unlist(CI),
  PoDParamsPointEst = PoDParamsPointEst,
  PoDParametersCI = unlist(PoDParametersCI),
  PoDCurve = PoDCurve
)
  
result
#> $EfficacyPointEst
#> [1] 0.5298448
#> 
#> $efficacyCI
#>      mean    median   CILow95  CIHigh95   CILow90  CIHigh90   CILow80  CIHigh80 
#> 0.5375903 0.5419903 0.4432219 0.6231131 0.4527044 0.6106371 0.4608834 0.5999084 
#> 
#> $PoDParamsPointEst
#>            pmax    slope     et50
#> pmax 0.03218197 33.85696 6.085131
#> 
#> $PoDParametersCI
#>   PmaxCILow  PmaxCIHigh   Et50CILow  Et50CIHigh  SlopeCILow SlopeCIHigh 
#>  0.02140353  0.04420208  5.36664708  6.54030103  6.63482304 43.47149473 
#> 
#> $PoDCurve