A series system with \(m\) components fails when any component fails. We observe the system failure time \(T_i = \min(T_{i1}, \ldots, T_{im})\), but the component that caused the failure is often masked — diagnostics only narrow it to a set of candidates \(C_i \subseteq \{1, \ldots, m\}\).
Given observations \(\{(t_i, C_i)\}\) from \(n\) systems, we want to recover the component-level parameters \(\theta = (\theta_1, \ldots, \theta_m)\) from system-level data. This is the masked-cause likelihood problem, and under masking conditions C1, C2, and C3 it has a clean likelihood form:
\[ L(\theta) = \prod_{i=1}^{n} S_{\text{sys}}(t_i; \theta) \cdot \sum_{j \in C_i} h_j(t_i; \theta_j) \]
where \(h_j\) is component \(j\)’s hazard and \(S_{\text{sys}}(t) = \exp(-\sum_j H_j(t))\) is the system survival.
series_md Protocol and This ImplementationThe masked-cause likelihood problem has a protocol package: maskedcauses.
It specialises the general likelihood.model
framework for the masked-series-system setting — adding the
series_md S3 class (which inherits from
likelihood_model), the C1–C2–C3 data schema, and
domain-specific generics such as
conditional_cause_probability,
cause_probability, component_hazard, and
ncomponents. It also ships reference implementations for
exponential and Weibull component families as illustrations of the
protocol in use; other implementations can be written against the same
generics.
maskedhaz is such an implementation, for
arbitrary component hazard functions. The model class
dfr_series_md inherits from series_md — and
therefore also from likelihood_model — and wraps a
dfr_dist_series from serieshaz,
which in turn is built from dfr_dist components in flexhaz.
Components can be anything: Gompertz aging, log-logistic mid-life peaks,
user-defined bathtub curves, or a heterogeneous mixture of all of the
above.
Because dfr_series_md implements the
likelihood_model / series_md generics, every
workflow written against those protocols — loglik evaluation, MLE
fitting, diagnostics, simulation, cross-package comparison — works
without modification. The implementation choices that distinguish
maskedhaz from the closed-form reference
implementations:
stats::integrate) for left and interval censoringnumDerivdfr_dist
component plugs in — no closed-form assumption anywhereFitting a model with fit(model)(df, par) returns a
fisher_mle object from likelihood.model, which
realises the mle_fit concept from algebraic.mle.
That’s what lets the downstream diagnostics — standard errors, bias,
FIM, delta-method confidence intervals, MSE — work uniformly across
maskedhaz, maskedcauses, and any other likelihood implementation in this
ecosystem.
library(maskedhaz)
# A three-component series system with a mixed-distribution setup
model <- dfr_series_md(components = list(
dfr_weibull(shape = 2, scale = 100), # wear-out
dfr_gompertz(a = 0.001, b = 0.02), # aging
dfr_exponential(0.005) # random failure
))
model
#> Masked-cause likelihood model (3-component series)
#> Component 1: 2 param(s) [ 2, 100]
#> Component 2: 2 param(s) [0.001, 0.020]
#> Component 3: 1 param(s) [0.005]
#> Data columns: t (lifetime), omega (type), x * (candidates)The model knows the parameter layout, the column conventions, and the full series system distribution. Every generic in the likelihood ecosystem dispatches through this one object.
set.seed(42)
rdata_fn <- rdata(model)
df <- rdata_fn(
theta = c(2, 100, 0.001, 0.02, 0.005), # matches the parameter layout
n = 300,
tau = 50, # right-censoring time
p = 0.4 # masking probability for non-failed components
)
head(df, 3)
#> t omega x1 x2 x3
#> 1 50.00000 right FALSE FALSE FALSE
#> 2 50.00000 right FALSE FALSE FALSE
#> 3 40.45117 exact FALSE TRUE TRUE
table(df$omega)
#>
#> exact right
#> 133 167A few things to notice in the output. Rows 1 and 2 are right-censored
(omega = "right") at exactly t = tau = 50 —
that’s the convention for right-censored rows: t is the
censoring time, and the candidate columns x1, x2, x3 are
all FALSE because a right-censored observation carries no
information about which component would have caused the (unobserved)
failure. Row 3 is an exact failure at t = 40.45 with all
three candidate columns TRUE — a legitimate outcome under
p = 0.4 (each non-failing component is included in the
candidate set with probability 0.4, so two of them being included has
probability \(0.4^2 = 0.16\)). This is
the maximum-masking extreme: we know the system failed and when, but the
diagnostic narrowed the cause to “any of the three.” The opposite
extreme is a singleton candidate set, in which case the cause is fully
known.
Candidate sets in general are encoded as Boolean columns
(x1, x2, …, xm). The C1 condition
guarantees the true failed component is always TRUE; the
additional masking depends on the diagnostic procedure (under C1–C2 with
uniform masking, p is the per-non-failed inclusion
probability).
ll_fn <- loglik(model)
true_par <- c(2, 100, 0.001, 0.02, 0.005)
ll_fn(df, par = true_par)
#> [1] -797.9056loglik() is the core generic from the
likelihood.model package — maskedhaz just
provides the method for the dfr_series_md class. The
generic returns a closure, not a value:
loglik(model) gives you a function of
(df, par) that computes the log-likelihood for that model.
This two-step pattern is shared by score(model),
hess_loglik(model), fit(model), and
rdata(model); once you learn it for one, you have it for
all.
The value here (-790.5 on 300 observations, or roughly
\(-2.6\) nats per system) is an
unremarkable log-likelihood magnitude — nothing about the number itself
tells you whether the fit is good. What matters is comparing
log-likelihoods at different parameter values: the MLE should
give a larger loglik than any nearby point, and the curvature around the
MLE feeds the asymptotic variance.
fit() is the generic from generics
(specialised by likelihood.model’s
fit.likelihood_model and by maskedhaz’s
fit.dfr_series_md). It returns a fisher_mle
object. Let’s put the estimate next to the truth:
rbind(truth = true_par, estimate = round(coef(result), 4))
#> [,1] [,2] [,3] [,4] [,5]
#> truth 2.0000 100.0000 0.0010 0.0200 0.0050
#> estimate 2.3527 89.2322 0.0015 0.0147 0.0046Each parameter is recovered to within a fraction of its standard
error — see below. Standard coefficient / variance methods all dispatch
through the usual stats generics:
# 95% Wald confidence intervals — note the Gompertz rows (3 and 4)
# cross zero. That's a known limitation of Wald intervals near
# a boundary: the asymptotic normal approximation is too symmetric
# for parameters bounded below by zero when the SE is comparable
# to the estimate.
confint(result)
#> 2.5% 97.5%
#> [1,] 1.557532e+00 3.147960e+00
#> [2,] 6.878647e+01 1.096779e+02
#> [3,] -7.598318e-05 3.147689e-03
#> [4,] -2.182700e-02 5.132068e-02
#> [5,] 3.072264e-03 6.196758e-03A fisher_mle also realises the mle_fit
concept from algebraic.mle,
which in turn is an algebraic.dist distribution (under the
standard MLE asymptotic normal approximation). That means you get a
second layer of generics “for free” — they weren’t implemented by
maskedhaz, but they work because the class inheritance
wires them up. Attach those two packages to bring their generics onto
the search path:
library(algebraic.mle) # se(), bias(), observed_fim(), mse(), ...
library(algebraic.dist) # as_dist(), sampler(), expectation(), cdf(), ...MLE-algebra generics work directly on the fit:
se(result) # standard errors (from algebraic.mle)
#> [1] 4.057288e-01 1.043169e+01 8.223805e-04 1.866046e-02 7.970793e-04
bias(result) # first-order asymptotic bias — zero for Fisherian MLEs
#> [1] 0 0 0 0 0
observed_fim(result) # observed Fisher information — the negative Hessian
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 19.4974052 0.61247691 -1957.18831 -66.59989 -1729.36388
#> [2,] 0.6124769 0.02884912 -71.30818 -3.57579 -54.14572
#> [3,] -1957.1883106 -71.30817750 6909304.08999 267103.41229 793403.71919
#> [4,] -66.5998863 -3.57578960 267103.41229 13329.82769 28457.36004
#> [5,] -1729.3638819 -54.14571794 793403.71919 28457.36004 1785365.25596Because the MLE is a distribution (asymptotically
multivariate Normal with mean \(\theta_0\) and covariance \(I(\theta_0)^{-1}\)), the
algebraic.dist generics also apply:
# Convert to an explicit algebraic.dist object
mle_dist <- as_dist(result) # from algebraic.dist
class(mle_dist)
#> [1] "mvn" "multivariate_dist" "continuous_dist"
#> [4] "dist"
# The expected value of the sampling distribution is the MLE itself
expectation(mle_dist) # from algebraic.dist
#> [1] 2.357600152 89.127595819 0.001547257 0.014583132 0.004624848
# Draw 5 samples from the MLE's sampling distribution
set.seed(7)
samp <- sampler(mle_dist) # from algebraic.dist
round(samp(5), 4)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 3.3049 76.0186 0.0019 0.0017 0.0044
#> [2,] 1.8778 97.3373 0.0011 0.0211 0.0061
#> [3,] 1.5703 117.4455 0.0018 0.0292 0.0060
#> [4,] 2.7558 79.7628 0.0015 0.0112 0.0055
#> [5,] 2.3363 96.3118 0.0029 -0.0093 0.0057Every row above is a plausible “alternative MLE” under the asymptotic approximation. Note that some samples have negative values for the Gompertz parameters (columns 3 and 4) — that’s not a bug, it’s the MVN approximation being symmetric around the MLE when the true parameter space is \((0, \infty)\). For uncertainty propagation over nonlinear functions of the MLE this is usually fine (the probability mass is still concentrated in the feasible region); for simulation-based CIs of boundary-near parameters you’d want a log-reparametrisation or a profile likelihood instead.
The punchline: none of this algebraic.mle /
algebraic.dist machinery knows or cares that
result came from a masked-cause series-system model. It
just sees an mle_fit. That’s what the layered design buys —
each layer provides capabilities that compose uniformly with everything
below it.
Once fit, you can ask domain-specific cause-of-failure questions.
conditional_cause_probability is a
maskedcauses generic (part of the series_md
protocol) — maskedhaz provides the method for
dfr_series_md:
ccp_fn <- conditional_cause_probability(model)
ccp_fn(t = c(25, 50), par = coef(result))
#> [,1] [,2] [,3]
#> [1,] 0.4075557 0.1919099 0.4005344
#> [2,] 0.6055589 0.1614216 0.2330195At \(t = 25\), component 3 (the exponential) has the highest conditional probability (\(\approx 0.46\)) because the Weibull hazard \(h_1(t) = 2t/100^2\) is still small at \(t=25\) (about \(0.005\)) and the Gompertz component has a mild rate. By \(t = 50\), the Weibull hazard has grown enough (\(\approx 0.01\)) that component 1 overtakes — its conditional probability rises to \(\approx 0.52\). The Gompertz (component 2) is a minor, steady contributor throughout because its parameters (\(a=0.001, b=0.02\)) keep its hazard well below the other two on this time scale.
The marginal version averages over the system failure time distribution:
cp_fn <- cause_probability(model)
set.seed(1)
cp_fn(par = coef(result), n_mc = 5000)
#> [1] 0.5553825 0.1751522 0.2694653Roughly: over the population of failed systems, \(\approx 48\%\) are caused by the Weibull component, \(\approx 34\%\) by the exponential, and \(\approx 17\%\) by the Gompertz. The Weibull comes out ahead marginally even though it was second at \(t=25\), because the system failure-time distribution has enough mass at later times (where Weibull dominates) to tip the average.
maskedhaz sits at the intersection of three stacks — a
protocol stack defining what a masked-cause likelihood
model is, a component stack providing the hazard
functions inside a series system, and an MLE result
stack defining what the fit object looks like.
likelihood.model::likelihood_model
▲
│ (specialises)
maskedcauses::series_md
▲
│ (specialises)
maskedhaz::dfr_series_md
likelihood.model
defines the general Fisherian likelihood framework: the
likelihood_model class and the core generics
(loglik, score, hess_loglik,
fim, rdata, fit,
assumptions).maskedcauses
specialises that framework for masked-series-system problems: the
series_md subclass, C1–C2–C3 validation, and domain
generics (conditional_cause_probability,
cause_probability, component_hazard,
ncomponents). Its exp/Weibull likelihoods are reference
implementations; other implementations plug into the same generics.maskedhaz provides
dfr_series_md, the series_md implementation
for systems whose components are arbitrary dfr_dist
objects.flexhaz::dfr_dist (hazard-first component distributions)
│
▼ (composed into)
serieshaz::dfr_dist_series (system = sum of component hazards)
│
▼ (wrapped by)
maskedhaz::dfr_series_md (masked-cause likelihood over that system)
algebraic.dist::dist (probability distribution algebra)
▲
│ (an MLE IS a distribution, via asymptotic normality)
algebraic.mle::mle_fit (MLE concept: coefs, vcov, CIs, delta-method)
▲
│ (realised by)
likelihood.model::fisher_mle (Fisherian MLE with observed FIM, score at MLE)
A call to fit(model)(df, par) produces a
fisher_mle. Because it inherits mle_fit,
standard MLE diagnostics — coef(), vcov(),
confint(), se(), bias(),
observed_fim(), mse() — dispatch through the
algebraic.mle algebra. Because mle_fit in turn
realises algebraic.dist
(an MLE is a sampling distribution, approximately multivariate Normal
under standard theory), you can also density(),
cdf(), expectation(), and sample from a fitted
model — composing it with other distributions, propagating uncertainty,
and so on. The same fisher_mle returned by
maskedhaz, maskedcauses, or any other
implementation plugs into all of this uniformly.
vignette("custom-components", package = "maskedhaz")
— a worked example with components that have no closed-form likelihood:
Gompertz aging, log-logistic early failure, and how the
numerical-integration path handles them transparently.vignette("censoring-and-masking", package = "maskedhaz")
— all four observation types (exact, right,
left, interval) with realistic examples, plus
a cross-validation check against maskedcauses for the
exponential case.?dfr_series_md — constructor
reference, including the data-column naming conventions.maskedhaz inherits the classical C1-C2-C3 assumptions
from the foundational masked-cause literature:
assumptions(model)
#> [1] "iid observations"
#> [2] "series system configuration"
#> [3] "component independence: component lifetimes are independent"
#> [4] "non-negative hazard: h_j(t) >= 0 for all j, t > 0"
#> [5] "C1: failed component is in candidate set with probability 1"
#> [6] "C2: uniform probability for candidate sets given component cause"
#> [7] "C3: masking probabilities independent of system parameters"These hold in most practical reliability scenarios where candidate
sets come from unbiased diagnostic procedures. If your data-generating
process violates them (e.g., mechanics preferentially blame the cheapest
component), you need relaxed-masking extensions — see mdrelax.