\documentclass[a4paper, 11pt]{article} \usepackage[OT1]{fontenc} \usepackage{url} \usepackage{graphicx} \usepackage{tikz} \usetikzlibrary{decorations,arrows,shapes} \usepackage[margin=0.9in]{geometry} \usepackage{url} \usepackage{hyperref} \usepackage{listings} \usepackage{xspace} \usepackage[numbers]{natbib} \bibliographystyle{plainnat} \setlength{\parindent}{0mm} \setlength{\parskip}{1mm} \newcommand{\commentout}[1]{} \newcommand{\al}{$\alpha$-level\xspace} \newcommand{\eg}{\em e.g.} \newcommand{\ie}{\em i.e.} \newcommand{\gmcp}{\texttt{gMCP}\xspace} \newcommand{\gact}{\texttt{gACT}\xspace} \begin{document} % \VignetteEngine{knitr::knitr} % \VignetteIndexEntry{Graphical approaches for multiple endpoint problems using weighted parametric tests} \title{Weighted parametric tests defined by graphs} \author{Florian Klinglmueller} \maketitle \tableofcontents \section{Introduction} \label{sec:intro} This document describes how to use \gmcp to construct, conduct and evaluate multiple comparison procedures based on weighted parametric tests which are defined by directed graphs. In addition to sequentially rejective Bonferroni procedures \gmcp provides functionality to construct tests that take advantage of (partial) knowledge of the the joint distribution of the $p$-values associated with multiple hypotheses. For the time being \gmcp implements the case of multiple inferences based on one-sided $z$-tests. If the joint distribution of the p-values is known the sequentially rejective tests described in \cite{bretzEtAl2009graphical} may be performed using corresponding closed test procedures based on weighted min-$p$ tests for each intersection hypotheses. It is assumed that under the global null hypothesis $(\Phi^{-1}(1-p_1),\ldots,\Phi^{-1}(1-p_m))$ follow a multivariate normal distribution with correlation matrix $\Sigma$ where $\Phi^{-1}$ denotes the inverse of the standard normal distribution function. For example, this is the case if $p_1,\ldots, p_m$ are the raw p-values from one-sided $z$-tests for each of the elementary hypotheses where the correlation between z-test statistics is generated by an overlap in the observations (e.g. comparison with a common control, group-sequential analyses etc.). An application of the transformation $\Phi^{-1}(1-p_i)$ to raw p-values from a two-sided test will not in general lead to a multivariate normal distribution. Partial knowledge of the correlation matrix is supported. If the correlation matrix of a given subset of test statistics is known tests for intersection hypotheses containing the associated elementary hypotheses are computed using the multivariate distribution under the respective null. Hence, the consonance condition necessary for the shortcut that yields sequentially rejective tests can no longer be guaranteed using this approach. Hence, the whole closed test is performed. For an exhaustive theoretical specification of the general statistical principle please see \cite[Section 3.2]{Bretz11}. As an example we will use graph based procedures introduced in Example 1 and Example 2 of \cite[Sections 2 and 3.2]{Bretz11}. % \section{Statement of Problem} % \label{sec:prob} % Consider the problem of testing $m$ null hypotheses $H_1^0,...,H_m^0$ % at multiple significance level $\alpha$. Let $S_1,...,S_m$ denote the % test statistics associated with the elementary % hypotheses\footnote{Note that at this point we only consider test % normally distributed test statistics with known variance}. The % graphical approach \cite{paper1} provides a convenient tool to define % weighted sequential tests that capture the contextual relationships % between elementary hypotheses, \eg~the hypotheses might be of % different importance in the sense that % rejecting a hypotheses of low importance while retaining one of higher % importance might not be clinically relevant. This is done by defining % initial weights specifying the proportion of the overall \al allocated % to each elementary hyptheses. Weights for the elementary hypotheses of % smaller intersection hypotheses are then recursively determined % according to an \al reallocation scheme which is specified by means of % a directed graph. Suppose that for some pairs of hypotheses pairwise % correlations of the respective test statistics % $\textrm{cor}(S_i,S_j)$ are known and let $C := (c_{ij})_{i,j % \in \{1,...,m\}}$ denote the correlation matrix with: % \begin{displaymath} % c_{ij} = \left\{ % \begin{array}{rl} % \textrm{cor}(S_i,S_J), & \textrm{if known, or} \\ % \texttt{NA}, & \textrm{otherwise.} % \end{array}\right. % \end{displaymath} \section{Creating Graphs} \label{sec:creating} Examples 1 and 2 of \cite{Bretz11} are based on the same weighting scheme. Inferences on two primary and two secondary hypotheses are to be made. For example consider the comparison of two treatments to a control using a primary and secondary endpoint. Only if the the null hypotheses can be rejected for the primary hypotheses are the secondary hypotheses to be tested. If both primary and secondary hypotheses can be rejected for one treatment, then the portions of the global \al reserved for this treatment is passed to the other treatment. The graph corresponding to this procedures is depicted in figure \ref{fig:bretz}. We assume that all four hypotheses are tested by use of normally distributed test statistics with known variances. In Example 1 no further assumptions on the joint distribution of test statistics is made. Example 2 assumes that both the statistics associated with the primary hypotheses as well as statistics associated with the secondary hypotheses have pairwise correlations of $\frac{1}{2}$. This would be the case if the two treatments were compared to the same control group using balanced sample sizes. The main inputs needed for the construction of weighted parametric tests are a directed graph, initial weights for the elementary hypotheses in the global intersection hypothesis as well as a correlation matrix. The graph for both Example 1 and 2 is depicted in Figure \ref{fig:bretz}. As in the article we will distribute the overall \al equally between the two primary hyptheses. Initially the secondary hypotheses will be given no weight. The corresponding initial weights are therefore $(\frac{1}{2},\frac{1}{2},0,0)$. We show in Section \ref{sec:comline} how these parameters can be defined using R. In Section \ref{sec:gMCPgui} we demonstrate how to acchiev the same using the graphical user interface provided by \gmcp. \subsection{Creating Graphs using the Command Line} \label{sec:comline} The most basic way of defining an MTP in \gmcp is by way of numeric matrices where each element defines the proportion of the local \al of the elementary hypotheses corresponding to the row index that is passed to the elementary hypotheses associated with the column index. Since no hypotheses reallocates parts of its local \al to itself the diagonal elements of this matrix are zero. The simple graph from our examples is defined by the matrix: \begin{displaymath} \left( \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ \end{array} \right) \end{displaymath} <<>>= Gm <- matrix(0,nr=4,nc=4) Gm[1,3] <- 1 Gm[2,4] <- 1 Gm[3,2] <- 1 Gm[4,1] <- 1 Gm @ Initial weights are set by means of a numeric vector $\omega_1,...,\omega_m$ where $m$ denotes the number of test statistics. For the example graph of Figure \ref{fig:bretz} the corresponding weights vector is $(\frac{1}{2},\frac{1}{2},0,0)$\footnote{Note that this is different to the approach taken in \texttt{gMCP} where the local \al are specified for each elementary hypotheses instead of weights}. <<>>= w <- c(1/2,1/2,0,0) w @ Finally a correlation matrix has to be specified providing information on the pairwise correlations between test statistics. All known pairwise correlations of this matrix are set to numeric values whereas the unknown coefficients are set to \texttt{NA}. Example 1 assumes no knowledge of the correlation structure. The corresponding $4\times 4$ matrix would therefore have all elements set to \texttt{NA} except for the diagonal elements. Example 2 assumes pairwise correlations of $\frac{1}{2}$ between the statistics associated with both treatments using either the primary or the secondary endpoint. The corresponding correlation matrix has to be specified as: \begin{displaymath} \left( \begin{array}{cccc} 1 & \frac{1}{2} & \texttt{NA} & \texttt{NA} \\ \frac{1}{2} & 1 & \texttt{NA} & \texttt{NA} \\ \texttt{NA} & \texttt{NA} & 1 & \frac{1}{2} \\ \texttt{NA} & \texttt{NA} & \frac{1}{2} & 1 \end{array} \right) \end{displaymath} We construct the correlation matrix corresponding to Example 1 in R as \texttt{Cm1} and the one corresponding to Example 2 as \texttt{Cm2}. <<>>= Cm <- matrix(NA,nr=4,nc=4) diag(Cm) <- 1 Cm1 <- Cm Cm[1,2] <- 1/2 Cm[2,1] <- 1/2 Cm[3,4] <- 1/2 Cm[4,3] <- 1/2 Cm2 <- Cm Cm1 Cm2 @ Note that the diagonal elements have to be set to one. If it is known that some test statistics are uncorrelated then the corresponding elements have to be set to zero. \begin{figure}\centering \begin{tikzpicture}[scale=1] \node (H1) at (100bp,200bp) [draw,circle split,fill=green!80] {$H1$ \nodepart{lower} $\frac{1}{2}$}; \node (H2) at (200bp,200bp) [draw,circle split,fill=green!80] {$H2$ \nodepart{lower} $\frac{1}{2}$}; \node (H3) at (100bp,100bp) [draw,circle split,fill=green!80] {$H3$ \nodepart{lower} $0$}; \node (H4) at (200bp,100bp) [draw,circle split,fill=green!80] {$H4$ \nodepart{lower} $0$}; \draw [->,line width=1pt] (H1) to[auto] node[near start,above,fill=blue!20] {1} (H3); \draw [->,line width=1pt] (H2) to[auto] node[near start,above,fill=blue!20] {1} (H4); \draw [->,line width=1pt] (H3) to[auto] node[near start,above,fill=blue!20] {1} (H2); \draw [->,line width=1pt] (H4) to[auto] node[near start,above,fill=blue!20] {1} (H1); \end{tikzpicture} \caption{Graph corresponding to Examples 1 and 2 in \cite{Bretz11}} \label{fig:bretz} \end{figure} \subsection{Creating Graphs using the gMCP GUI} \label{sec:gMCPgui} Alternatively one can specify graphs using the graphical user interface provided by \gmcp. <>= library(gMCP) graphGUI() @ These commands will load the the \gmcp library and open the GUI interface a screenshot of which is presented in Figure \ref{fig:gui}. Graphs defined using this tool can be saved to an R object the name of which can be specified in the line above the graph manipulation window the graph is exported by setting the cursor into this line (optionally editing the variable name) and pressing enter. By default the variable name is set to createdGraph. This creates an object of class \texttt{graphMCP}. After switching back to R's interpreter the created graph can be converted to a matrix using the command \texttt{graph2matrix} from package \gmcp. The weights set for the graph can be extracted using the function \texttt{getWeights}. <>= Gm <- graph2matrix(createdGraph) w <- getWeights(createdGraph) @ Conversely graphs defined as matrices can also be converted to \texttt{graphMCP} objects. <>= G <- matrix2graph(Gm,weights=w) graphGUI(G) @ \begin{figure} \centering \includegraphics[width=.95\textwidth]{pictures/example.png} \caption{Screenshot of \gmcp GUI } \label{fig:gui} \end{figure} \section{Testing} \label{sec:testing} Performing tests using \gmcp can be done in several ways. \gmcp provides functions covering every step in the test procedure which can be done seperately or all together. The first step is the computation of all intersection hypotheses in the closure of the test problem together with conventional weights for the graphical approach without knowledge of any correlations. This can be done using the function \texttt{generateWeights}. For both of our examples this looks like: <<>>= library(gMCP, quietly=TRUE) generateWeights(Gm,w) @ \texttt{generateWeights} takes the graph defined as a matrix and the vector of initial weights and returns a matrix where each row corresponds to an intersection hypotheses in the closure of the test problem. The first half of each line indicates the intersection hypotheses. Hypotheses in the intersection are indicated by a $1$, hypotheses not in the intersection are indicated by a $0$. For example $(1,1,0,0)$ would stand for the intersection between $H_1$ and $H_2$. The second half of the elements then provides weights for each hypothesis in the corresponding intersection. In a next step critical values for all elementary hypotheses in each intersection hypothesis are computed. This can be done using the function \texttt{generateBounds}. Here for the first time the different assumptions on the correlation structure of Examples 1 and 2 are essential: <<>>= generateBounds(Gm,w,Cm1,al=.025) generateBounds(Gm,w,Cm2,al=.025) @ Alternatively we could transform these error bounds into $p$-values to recreate Table 2 of \cite{Bretz11} <<>>= (1-pnorm(generateBounds(Gm,w,Cm2,al=.025)))*100 @ this function takes the graph in matrix form, the weights, the correlation matrix and the overall \al as inputs in order to compute rejection bounds for all elementary hypotheses in each intersection. Bounds are computed using the multivariate distribution of test statistics whenever the whole correlation matrix is known for more than one test statistic associated with the elementary hypotheses within the intersection. For a concise explanation of the involved algorithm see \cite[Section 3.2]{Bretz11}. Once rejection bounds have been computed overall rejection of elementary hypotheses, based on actual data, needs to be determined using the closed testing principle. This means that all hypotheses which are rejected in all intersection hypotheses, of which they are a part of, are rejected at the overall \al. Since the bounds of a testing procedure is independent of the observed data \gmcp provides a test function for that particular MTP: <<>>= Example1 <- generateTest(Gm,w,Cm1,al=.025) Example2 <- generateTest(Gm,w,Cm2,al=.025) @ The definition of a test function is efficient if a test is applied several times, for example in simulations. The \texttt{myTest} function in the example above takes a vector of three $z$-scores and returns results in the form of a boolean vector wher \texttt{TRUE} stands for rejection of the null hypothesis and \texttt{FALSE} signifies that the null has to be retained. <<>>= Example1(c(2.24,2.24,2.24,2.3)) Example2(c(2.24,2.24,2.24,2.3)) @ We see that above $z$-score scenario leads to an effective gain in power of the procedure when knowledge about the correlation structure is used. {\em Attention:} Not all scenarios of partial knowledge of the correlation matrix can currently be handled by \gmcp. The correlation matrix must have a block structure: We assume that the set of hypothesis can be partitioned into subsets such that the pairwise correlations in each subset are either known or are set to NA. For example assume that in the example given above not only correlations between the statistics for $H_1$ and $H_2$ are known but also the correlation between the statistics for $H_1$ and $H_3$ but not between those associated with $H_2$ and $H_3$. In this case it is unclear when testing the intersection hypotheses $H_1 \cap H_2 \cap H_3$ wether the bounds for the statistics associated with $H_1$ and $H_2$ or alternatively for those associated with $H_2$ and $H_3$ should be searched using their multivariate distribution. The current implementation would try to find bounds using the multivariate normal distribution of all three statistics which is not fully known and hence break and return an exception. \subsection{Testing using the gMCP interface} \label{sec:gmcpint} The function \gmcp provides a common interface to both sequentially rejective Bonferroni procedures as well as parametric tests. \gmcp takes objects of the type \texttt{graphMCP} as its input together with a vector of $p$-values and computes whether the according test procedure rejects. For Example 1 this amounts to the call: <<>>= p <- 1-pnorm(c(2.24,2.24,2.24,2.3)) G <- matrix2graph(Gm,w) gMCP(G,p) @ In the case of a sequentially rejective Bonferroni type procedure \gmcp returns an object of class \texttt{graphMCP-Result} which holds information on the specific sequence the test procedure has taken through the graph and also provides adjusted $p$-values. For Example 2 we use the same graph and assume normally distributed test statistics with known variances and a correlation matrix of the form: \begin{displaymath} \left( \begin{array}{cccc} 1 & \frac{1}{2} & \texttt{NA} & \texttt{NA} \\ \frac{1}{2} & 1 & \texttt{NA} & \texttt{NA} \\ \texttt{NA} & \texttt{NA} & 1 & \frac{1}{2} \\ \texttt{NA} & \texttt{NA} & \frac{1}{2} & 1 \end{array} \right) . \end{displaymath} This can be implemented in \gmcp by additionally passing the correlation matrix to \gmcp: <<>>= gMCP(G,p,corr=Cm2,test="parametric") @ which returns a similar result however without the graphs representing the rejection sequence, since the whole closed test is done in this example. A vector specifying which of the elementary hypotheses can be rejected at the overall \al. The reported $p$-values are the unadjusted $p$-values passed to \gmcp. \newpage \bibliography{literatur} \end{document}