vignette("maskedhaz") establishes that
fit(model)(df, par) returns a fisher_mle
object, which realises both the mle_fit concept (from
algebraic.mle) and the algebraic.dist
distribution interface. That gives us coef,
vcov, confint, se,
bias, observed_fim, score_val,
and a logLik method, all through generics the
maskedhaz package itself never implemented.
This vignette takes the next step: everything
hypothesize does works on these fits directly. No
bridge code, no wrappers, just generics flowing through generics. We’ll
work through five questions a statistician would actually ask about a
real analysis, using a different test machinery for each.
We generate data from a 2-component Weibull series system (true shapes are 2, not 1) and pretend we don’t know the true family. We’ll fit both an exponential-series model (simpler, 2 params) and a Weibull-series model (fuller, 4 params) to the same data, then use hypothesis tests to decide which is justified.
set.seed(42)
true_model <- dfr_series_md(components = list(
dfr_weibull(shape = 2, scale = 5),
dfr_weibull(shape = 2, scale = 8)
))
df <- rdata(true_model)(theta = c(2, 5, 2, 8), n = 300, tau = 10, p = 0.3)
table(df$omega)
#>
#> exact
#> 300null_model <- dfr_series_md(
components = list(dfr_exponential(), dfr_exponential()),
n_par = c(1L, 1L)
)
alt_model <- dfr_series_md(
components = list(dfr_weibull(), dfr_weibull()),
n_par = c(2L, 2L)
)
fit_null <- suppressWarnings(fit(null_model)(df, par = c(0.1, 0.1)))
fit_alt <- suppressWarnings(fit(alt_model)(df, par = c(1.5, 4, 1.5, 6)))
coef(fit_null)
#> [1] 0.17853850 0.09337168
coef(fit_alt)
#> [1] 1.798337 5.115741 2.306984 6.855044The exponential-series model is nested inside the Weibull-series model (it’s the boundary case where all shapes equal 1). A likelihood ratio test compares the two.
# lrt() accepts logLik objects directly and derives the dof difference
# from the `df` attribute attached by likelihood.model::logLik.fisher_mle
lrt_result <- lrt(logLik(fit_null), logLik(fit_alt))
lrt_result
#> Hypothesis test (likelihood_ratio_test)
#> -----------------------------
#> Test statistic: 157.552414688934
#> P-value: 6.13660041402268e-35
#> Degrees of freedom: 2
#> Significant at 5% level: TRUEThe test statistic is \(2(\ell_{\text{alt}} - \ell_{\text{null}})\), asymptotically \(\chi^2\) with dof equal to the parameter-count difference (here, 2 extra Weibull shape parameters). The p-value is vanishingly small — the exponential family is rejected decisively.
Notice that we never wrote any maskedhaz-specific glue code.
logLik() is a standard R generic,
hypothesize::lrt() consumes its output. The
maskedhaz package didn’t have to know anything about
hypothesize, and vice versa.
The LRT rejects the global hypothesis “both shapes equal 1 simultaneously,” but says nothing about each shape separately. For that we use Wald tests — one per shape parameter — feeding them the estimate and standard error directly.
est <- coef(fit_alt)
se_vec <- se(fit_alt) # from algebraic.mle
# Parameter layout: (shape1, scale1, shape2, scale2) = positions 1, 2, 3, 4
w_shape1 <- wald_test(est[1], se = se_vec[1], null_value = 1)
w_shape2 <- wald_test(est[3], se = se_vec[3], null_value = 1)
c(shape1_pval = pval(w_shape1),
shape2_pval = pval(w_shape2))
#> shape1_pval shape2_pval
#> 2.932220e-12 7.595611e-09Both individual tests reject shape = 1 at any
conventional level. The MLEs are about 1.8 and 2.3, the standard errors
are small enough to distinguish both from 1 cleanly.
If we plan to report these two tests as a family (making decisions
only when the family-wise error rate is bounded), we should adjust.
adjust_pval() accepts a list of tests and returns a list of
tests with the corrected p-values stored.
adjusted <- adjust_pval(list(w_shape1, w_shape2), method = "bonferroni")
c(shape1_adj = pval(adjusted[[1]]),
shape2_adj = pval(adjusted[[2]]))
#> shape1_adj shape2_adj
#> 5.864441e-12 1.519122e-08With only two tests, Bonferroni simply doubles each p-value; both
remain far below 0.05. The point is that adjust_pval()
returns hypothesis_test objects — they continue to compose
with the rest of the algebra. You could hand these adjusted tests into
is_significant_at(), invert_test(), or
anything else that expects a test.
A different but related question: is the conjunction “shape1 = 1 AND shape2 = 1” supported by the data? Two natural ways to build a test:
Intersection-union via max p-value.
intersection_test() takes the max of the individual
p-values. If the max is small, both individual tests rejected, so the
intersection rejects too.
comp_intersection <- intersection_test(w_shape1, w_shape2)
pval(comp_intersection)
#> [1] 7.595611e-09Fisher’s method for combining independent p-values.
Under the conjunction null, -2 \sum \log p_i follows \(\chi^2_{2k}\).
fisher_combine() implements this.
comp_fisher <- fisher_combine(w_shape1, w_shape2)
comp_fisher
#> Hypothesis test (fisher_combined_test)
#> -----------------------------
#> Test statistic: 90.5019128167015
#> P-value: 1.03010156933654e-18
#> Degrees of freedom: 4
#> Significant at 5% level: TRUEBoth agree that the data contradicts the composite exponential null, but with different levels of sensitivity (Fisher’s method uses more of the information in moderate p-values; max-p is conservative when some tests are strong and others weak).
The LRT and Wald tests both required fitting the full Weibull model. A score test can evaluate a null hypothesis using only the null model and the full-model score/Hessian machinery — no alternative-fit needed. This is useful when the alternative is expensive to fit or not globally identified.
For the composite null “all shapes = 1, scales from the null MLE,” we evaluate the score and observed Fisher information of the Weibull-series model at the null parameter vector.
# Map the exp-series MLE into the Weibull parameter space:
# (shape = 1, scale = 1/rate) for each component
null_in_alt <- c(1, 1/coef(fit_null)[1],
1, 1/coef(fit_null)[2])
# Score and observed Fisher info of the ALT model at the null point
s_at_null <- score(alt_model)(df, par = null_in_alt)
I_at_null <- -hess_loglik(alt_model)(df, par = null_in_alt)
s_at_null
#> [1] 124.464166255 -0.003897959 74.107135509 0.002022227
sc <- score_test(s_at_null, I_at_null, null_value = null_in_alt)
sc
#> Hypothesis test (score_test)
#> -----------------------------
#> Test statistic: 107.425992161895
#> P-value: 2.57532884934985e-22
#> Degrees of freedom: 4
#> Significant at 5% level: TRUETwo things worth pointing out. First, the test statistic \(s^\top I^{-1} s\) and its \(\chi^2_4\) reference distribution come out large — the null is rejected strongly, agreeing with the LRT and Wald results. Second, look at the components of the score vector: positions 1 and 3 (the shape entries) are far from zero, while positions 2 and 4 (the scales) are nearly zero. The likelihood gradient at the null “wants” to move shape up, not scale; the null MLE already got the scales roughly right for a shape-1 fit. That’s diagnostic information the LRT doesn’t give you for free.
A \(100(1-\alpha)\%\) confidence set
for a parameter is, by definition, the set of null values the test
fails to reject at level \(\alpha\). invert_test()
automates this inversion over a user-supplied grid.
# Build a Wald test parameterised by shape1's null value
wald_for_shape1 <- function(theta0) {
wald_test(est[1], se = se_vec[1], null_value = theta0)
}
ci_shape1 <- invert_test(wald_for_shape1, grid = seq(1.0, 3.0, by = 0.01), alpha = 0.05)
c(lower = lower(ci_shape1), upper = upper(ci_shape1))
#> lower.lower upper.upper
#> 1.58 2.02Compare with the direct Wald CI:
They agree to within the grid resolution. For a Wald test this
redundancy is by design (the two are algebraically identical). The real
power of invert_test() shows when you plug in a
different test-fn — e.g., a likelihood-ratio test with a
profile likelihood — to get a non-Wald confidence
region whose shape is driven by the actual likelihood surface rather
than the quadratic approximation. That’s expensive (requires refitting
under each grid point) but fully composable: the test-fn just has to
return a hypothesis_test.
Because maskedhaz and maskedcauses both
implement the series_md protocol and both return
fisher_mle objects, the hypothesis-testing code is
literally identical across them. We can verify by fitting the
same data with maskedcauses’ closed-form exponential-series
likelihood and running the same LRT.
library(maskedcauses)
fit_null_mc <- suppressWarnings(
likelihood.model::fit(exp_series_md_c1_c2_c3())(df, par = c(0.1, 0.1))
)
# LRT using maskedcauses null fit + maskedhaz alt fit, across two packages
lrt_cross <- lrt(logLik(fit_null_mc), logLik(fit_alt))
c(
lrt_same_package = pval(lrt_result),
lrt_cross_package = pval(lrt_cross)
)
#> lrt_same_package lrt_cross_package
#> 6.1366e-35 6.1366e-35The p-values agree. The test statistic, dof, and conclusion are the
same because both packages compute the same likelihood (just via
different numerical strategies). hypothesize doesn’t know
or care which package the logLik came from — it only needs the
logLik generic with the df attribute. That’s
the protocol in action.
We ran six distinct hypothesis-testing analyses on the same fitted model, each using a different test machinery:
| Question | Test | hypothesize function |
Input from maskedhaz fit |
|---|---|---|---|
| Is the simpler model enough? | LRT | lrt() |
logLik(fit_null), logLik(fit_alt) |
| Is each parameter different from the null value? | Wald | wald_test() |
coef(fit), se(fit) |
| Family-wise error control | Multiple testing | adjust_pval() |
list of Wald tests |
| Is the composite null rejected? | Intersection-union, Fisher | intersection_test(), fisher_combine() |
individual tests |
| Alt-model-free null rejection | Score | score_test() |
score(), hess_loglik() from the alt model
at the null point |
| Confidence interval as rejection set | Test inversion | invert_test() |
a test-fn parameterised by the null value |
None of this is maskedhaz-specific. Everything works
through the logLik / coef / vcov
/ se / score_val / observed_fim
generics that algebraic.mle and
likelihood.model wired up long before
hypothesize existed. The layered design lets each package
do its one job well; capabilities compose transparently.