% \VignetteIndexEntry{desirability Manual} % \VignetteDepends{desirability} % \VignettePackage{desirability} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage[pdftex]{graphicx} \usepackage{color} \usepackage{xspace} \usepackage{fancyvrb} \usepackage{fancyhdr} \usepackage{lastpage} \usepackage{algorithm2e} \usepackage[ colorlinks=true, linkcolor=blue, citecolor=blue, urlcolor=blue] {hyperref} \usepackage{Sweave} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % define new colors for use \definecolor{darkgreen}{rgb}{0,0.6,0} \definecolor{darkred}{rgb}{0.6,0.0,0} \definecolor{lightbrown}{rgb}{1,0.9,0.8} \definecolor{brown}{rgb}{0.6,0.3,0.3} \definecolor{darkblue}{rgb}{0,0,0.8} \definecolor{darkmagenta}{rgb}{0.5,0,0.5} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \DefineVerbatimEnvironment{Sinput}{Verbatim}{fontshape=sl,formatcom=\color{darkblue}} \fvset{listparameters={\setlength{\topsep}{0pt}}} \renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} \fvset{fontsize=\footnotesize} \newcommand{\bld}[1]{\mbox{\boldmath $#1$}} \newcommand{\shell}[1]{\mbox{$#1$}} \renewcommand{\vec}[1]{\mbox{\bf {#1}}} \newcommand{\code}[1]{\mbox{\footnotesize\color{darkblue}\texttt{#1}}} \newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}} \newcommand{\ReallySmallSpacing}{\renewcommand{\baselinestretch}{.6}\Large\normalsize} \newcommand{\SmallSpacing}{\renewcommand{\baselinestretch}{1.1}\Large\normalsize} \newcommand{\halfs}{\frac{1}{2}} \setlength{\oddsidemargin}{-.25 truein} \setlength{\evensidemargin}{0truein} \setlength{\topmargin}{-0.2truein} \setlength{\textwidth}{7 truein} \setlength{\textheight}{8.5 truein} \setlength{\parindent}{0.20truein} \setlength{\parskip}{0.10truein} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \pagestyle{fancy} \lhead{} \chead{The \pkg{desirability} Package} \rhead{} \lfoot{} \cfoot{} \rfoot{\thepage\ of \pageref{LastPage}} \renewcommand{\headrulewidth}{1pt} \renewcommand{\footrulewidth}{1pt} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \title{The \pkg{desirability} Package} \author{Max Kuhn \\ max.kuhn@pfizer.com} \begin{document} \maketitle \thispagestyle{empty} \section{Introduction} The \pkg{desirability} package contains S3 classes for multivariate optimization using the desirability function approach of Harrington (1965) using functional forms described by Derringer and Suich (1980). \section{Basic Desirability Functions} The desirability function approach to simultaneously optimizing multiple equations was originally proposed by Harrington (1980). Essentially, the approach is to translate the functions to a common scale ([0, 1]), combine them using the geometric mean and optimize the overall metric. The equations may represent model predictions or other equations. For example, desirability functions are popular in response surface methodology (Box and Wilson (1951), Myers and Montgomery (1995)) as a method to simultaneously optimize a series of quadratic models. A response surface experiment may use measurements on a set of outcomes. Instead of optimizing each outcome separately, settings for the predictor variables sought to satisfy all of the outcomes at once. Also, in drug discovery, predictive models can be constructed to relate the molecular structures of compounds to characteristics of interest (such as absorption properties, potency and selectivity for the intended target). Given a set of predictive models built using existing compounds, predictions can be made on a large set of virtual compounds that have been designed but not necessarily synthesized. Using the model predictions, a virtual compound can be scored on how well the model results agree with required properties. In this case, ranking compounds on multiple endpoints may be sufficient to meet the scientist's needs. Originally, Harrington used exponential functions to quantify desirability. In this package, the simple discontinuous functions of Derringer and Suich (1984) are used. Suppose that there are $R$ equations or function to simultaneously optimize, denoted $f_r(\vec{x})$ ($r=1\ldots R$). For each of the $R$ functions, an individual ``desirability'' function is constructed that is high when $f_r(\vec{x})$ is at the desirable level (such as a maximum, minimum, or target) and low when $f_r(\vec{x})$ is at an undesirable value. Derringer and Suich proposed three forms of these functions, corresponding to the type of optimization goal. For maximization of $f_r(\vec{x})$, the function \begin{equation}\label{E1:Dmax} d_r^{max}= \begin{cases} 0 &\text{if $f_r(\vec{x})< A$} \\ \left(\frac{f_r(\vec{x})-A}{B-A}\right)^{s} &\text{if $A \leq f_r(\vec{x}) \leq B$} \\ 1 &\text{if $f_r(\vec{x})>B$} \end{cases} \end{equation} \noindent can be used, where $A$, $B$, and $s$ are chosen by the investigator. When the equation is to be minimized, they proposed the function \begin{equation}\label{E1:Dmin} d_r^{min}= \begin{cases} 0 &\text{if $f_r(\vec{x})> B$} \\ \left(\frac{f_r(\vec{x})-B}{A-B}\right)^{s} &\text{if $A \leq f_r(\vec{x}) \leq B$} \\ 1 &\text{if $f_r(\vec{x})< A$} \end{cases}, \end{equation} \noindent and for target is best situations, \begin{equation}\label{E1:Dtarget} d_r^{target}= \begin{cases} \left(\frac{f_r(\vec{x})-A}{t_0-A}\right)^{s_1} &\text{if $A \leq f_r(\vec{x}) \leq t_0$} \\ \left(\frac{f_r(\vec{x})-B}{t_0-B}\right)^{s_2} &\text{if $t_0 \leq f_r(\vec{x}) \leq B$} \\ 0 &\text{otherwise} \\ \end{cases} \end{equation} \noindent These functions are on the same scale and are discontinuous at the points $A$, $B$, and $t_0$. The values of $s$, $s_1$ or $s_2$ can be chosen so that the desirability criterion is easier or more difficult to satisfy. For example, if $s$ is chosen to be less than 1 in (\ref{E1:Dmin}), $d_r^{Min}$ is near 1 even if the model $f_r(\vec{x})$ is not low. As values of $s$ move closer to $0$, the desirability reflected by (\ref{E1:Dmin}) becomes higher. Likewise, values of $s$ greater than 1 will make $d_r^{Min}$ harder to satisfy in terms of desirability. These scaling factors are useful when one equation is of greater importance than the others. Examples of these functions are given in Figure \ref{F1:desire}. It should be noted that any function can be used to mirror the desirability of a model. For example, Del Castillo, Montgomery, and McCarville (1996) develop alternative desirability functions that can be used in conjunction with gradient based optimization routines. <>= library(desirability) pdf("dMax.pdf", width = 7, height = 4) par(mfrow=c(1,3)) plot(dMax(1, 3), nonInform = FALSE) plot(dMax(1, 3, 5), TRUE, col = 2, nonInform = FALSE) text(2.73, .3, "scale = 5", col = 2) plot(dMax(1, 3, 1/5), TRUE, col = 4, nonInform = FALSE) text(1.3, .8, "scale = .2", col = 4) text(2, .62, "scale = 1", col = 1) title("(a)") plot(dMin(1, 3), nonInform = FALSE) plot(dMin(1, 3, 5), TRUE, col = 2, nonInform = FALSE) text(1.5, 0.1, "scale = 5", col = 2) plot(dMin(1, 3, 1/5), TRUE, col = 4, nonInform = FALSE) text(1.5, 1, "scale = .2", col = 4) text(2, .62, "scale = 1", col = 1) title("(b)") plot(dTarget(1, 2, 3), nonInform = FALSE) plot(dTarget(1, 2, 3, 5), TRUE, col = 2, nonInform = FALSE) text(1.9, .1, "lowScale = 5", col = 2) plot(dTarget(1, 2, 3, 1/5), TRUE, col = 4, nonInform = FALSE) text(1.3, 0.9, "lowScale = .2", col = 4) text(1.35, 0.6, "lowScale = 1", col = 1) title("(c)") dev.off() par(mfrow=c(1,1)) @ \begin{figure}[p] \begin{center} \includegraphics[clip, width = 1\textwidth]{dMax} \caption{Examples of the three primary desirability functions. Panel (a) shows an example of a larger--is--better function, panel (b) shows a smaller--is--better desirability function and panel (c) shows a function where the optimal value corresponds to a target value. Not that increasing the scale parameter makes it more difficult to achieve higher desirability, while values smaller than 1 make it easier to achieve good results.} \label{F1:desire} \end{center} \end{figure} For each of these three desirability functions (and the others discussed in Section 6.3), there are \code{print}, \code{plot} and \code{predict} methods. \section{Overall Desirability} Given that the $R$ desirability functions $d_1 \ldots d_r$ are on the [0,1] scale, they can be combined to achieve an overall desirability function, $D$. One method of doing this is by the geometric mean \[ D=\left(\prod_{r=1}^Rd_r\right)^{1/R}. \] The geometric mean has the property that if any one model is undesirable ($d_r=0$), the overall desirability is also unacceptable ($D=0$). Once $D$ has been defined and the prediction equations for each of the $R$ equations have been computed, it can be use to optimize or rank the predictors. \section{An Example} Myers and Montgomery (1995) describe a response surface experiment where three factors (reaction time, reaction temperature and percent catalyst) were used to model two characteristics of the chemical reaction: percent conversion and thermal activity. They present two equations\footnote{In practice, we would just use the predict method for the linear model objects to get the prediction equation. Our results are slightly different from those given by Myers and Montgomery because they used prediction equations with full floating--point precision.} for the fitted quadratic response surface models: <>= conversionPred <- function(x) 81.09 + 1.0284 * x[1] + 4.043 * x[2] + 6.2037 * x[3] - 1.8366 * x[1]^2 + 2.9382 * x[2]^2 - 5.1915 * x[3]^2 + 2.2150 * x[1] * x[2] + 11.375 * x[1] * x[3] - 3.875 * x[2] * x[3] activityPred <- function(x) 59.85 + 3.583 * x[1] + 0.2546 * x[2] + 2.2298 * x[3] + 0.83479 * x[1]^2 + 0.07484 * x[2]^2 + 0.05716 * x[3]^2 - 0.3875 * x[1] * x[2] - 0.375 * x[1] * x[3] + 0.3125 * x[2] * x[3] @ The goal of the analysis was to maximize conversion while keeping the thermal activity between 55 and 60 units. An activity target of 57.5 was used in the analysis. Plots of the response surface models are in Figures \ref{f:conversionPlot} and \ref{f:activityPred}, where reaction time and percent catalyst are plotted while the reaction temperature was varied at four different levels. Both quadratic models are saddle surfaces and the stationary points are outside of the experimental region. To determine predictor settings these models, a constrained optimization can be used to stay inside of the experimental region. <>= plotGrid <- expand.grid(time = seq(-1.7, 1.7, length = 50), temperature = seq(-1.7, 1.7, length = 4), catalyst = seq(-1.7, 1.7, length = 50)) plotGrid$conversionPred <- apply(plotGrid[, 1:3], 1, conversionPred) plotGrid$activityPred <- apply(plotGrid[, 1:3], 1, activityPred) @ \begin{figure}[p] \begin{center} <>= library(lattice) textInfo <- trellis.par.get("add.text") textInfo$cex <- .7 trellis.par.set("add.text", textInfo) print(contourplot(conversionPred ~ time + catalyst|temperature, plotGrid, aspect = 1, as.table = TRUE)) @ \caption{The response surface for the percent conversion model. To plot the model contours, the temperature variable was fixed at four diverse levels. The largest effects in the fitted model are due to the time$\times$catalyst interaction and the linear and quadratic effects of catalyst.} \label{f:conversionPlot} \end{center} \end{figure} Translating the experimental goals to desirability functions, a larger--is--better function (\ref{E1:Dmax}) is used for percent conversion with values $A=80$ and $B=97$. A target oriented desirability function (\ref{E1:Dtarget}) was used for thermal activity with $t_0 = 57.5$, $A=55$ and $B=60$. Although the original analysis used numerous combinations of scaling parameters, we will only show analyses with the default scaling factor values. Figures \ref{f:conversionDesire}, \ref{f:conversionDesire} and \ref{f:Overall} show contour plots of the individual desirability function surfaces and the overall surface. \begin{figure}[p] \begin{center} <>= print(contourplot(activityPred ~ time + catalyst|temperature, plotGrid, aspect = 1, as.table = TRUE)) @ \caption{The response surface for the thermal activity model. To plot the model contours, the temperature variable was fixed at four diverse levels. The main effects of time and catalyst have the largest effect on the fitted model.} \label{f:activityPred} \end{center} \end{figure} To construct the overall desirability functions, objects must be created for the individual functions. For example, the following code chunk creates the appropriate objects and uses the \code{predict} method to estimate desirability at the center point of the design: <>= conversionD <- dMax(80, 97) activityD <- dTarget(55, 57.5, 60) predOutcomes <- c(conversionPred(c(0,0,0)), activityPred(c(0,0,0))) print(predOutcomes) predict(conversionD, predOutcomes[1]) predict(activityD, predOutcomes[2]) @ To get the overall score for these settings of the experimental factors, the \code{dOverall} function is used to combine the objects and \code{predict} is used to get the final score: <>= overallD <- dOverall(conversionD, activityD) print(overallD) predict(overallD, predOutcomes) @ <>= dValues <- predict(overallD, plotGrid[, 4:5], all = TRUE) plotGrid <- cbind(plotGrid, dValues) @ \begin{figure}[p] \begin{center} <>= print(contourplot(D1 ~ time + catalyst|temperature, plotGrid, aspect = 1, as.table = TRUE)) @ \caption{The individual desirability surface for the percent conversion outcome using \code{dMax(80, 97)}} \label{f:conversionDesire} \end{center} \end{figure} \begin{figure}[p] \begin{center} <>= print(contourplot(D2 ~ time + catalyst|temperature, plotGrid, aspect = 1, as.table = TRUE)) @ \caption{The individual desirability surface for the thermal activity outcome using \code{dTarget(55, 57.5, 60)}} \label{f:activityDesire} \end{center} \end{figure} \begin{figure}[p] \begin{center} <>= print(contourplot(Overall ~ time + catalyst|temperature, plotGrid, aspect = 1, as.table = TRUE)) @ \caption{The overall desirability surface that combines the model for percent conversion and thermal activity} \label{f:Overall} \end{center} \end{figure} \clearpage \section{Maximizing Desirability} Following Myers and Montgomery, we can maximize desirability within a cuboidal region bounded by the value of the axial points. To do this, the objective function (\code{rsmOpt}) uses a penalty approach; if a candidate point falls outside of the cuboidal design region, the desirability is set to zero. <>= rsmOpt <- function(x, dObject, space = "square") { conv <- conversionPred(x) acty <- activityPred(x) out <- predict(dObject, data.frame(conv = conv, acty = acty)) if(space == "circular") { if(sqrt(sum(x^2)) > 1.682) out <- 0 } else if(space == "square") if(any(abs(x) > 1.682)) out <- 0 out } @ The Nelder--Mean simplex method (Nelder and Mead (1965), Olsson and Nelson (1975)) can be used to maximize desirability using the \code{optim} function. This is a direct search method that uses only function calls and does not use gradient information. This optimization technique can handle discontinuous or non--smooth functions that are commonly produced by desirability functions. However, this method has the propensity to converge to a local optimum. Since the technique is fast when we have an efficient function to optimize, we can restart the algorithm in multiple locations in the feasible space and use the best outcome. As an alternative, other direct search methods are more likely to yield a global optimum, such as simulated annealing (Bohachevsky {\it et al}, 1986). This particular routine is also available in \code{optim}, but tuning the annealing parameters may be needed to ensure that the technique is performing well. The following code chunk uses the Nelder--Mean simplex method and a cubiodal design region: <>= searchGrid <- expand.grid(time = seq(-1.5, 1.5, length = 5), temperature = seq(-1.5, 1.5, length = 5), catalyst = seq(-1.5, 1.5, length = 5)) for(i in 1:dim(searchGrid)[1]) { tmp <- optim(as.vector(searchGrid[i,]), rsmOpt, dObject = overallD, space = "square", control = list(fnscale = -1)) if(i == 1) { best <- tmp } else { if(tmp$value > best$value) best <- tmp } } print(best) @ From this optimization, the predicted value of conversion was \Sexpr{round(conversionPred(best$par), 2)} and activity was predicted to be \Sexpr{round(activityPred(best$par), 2)}. Alternatively we can try to maximize desirability such that the experimental factors are constrained to be within a spherical design region with a radius equal to the axial point distance: <>= for(i in 1:dim(searchGrid)[1]) { tmp <- optim(as.vector(searchGrid[i,]), rsmOpt, space = "circular", dObject = overallD, control = list(fnscale = -1)) if(i == 1) { best <- tmp } else { if(tmp$value > best$value) best <- tmp } } print(best) @ The process converges to relative sub--optimum values (conversion = \Sexpr{round(conversionPred(best$par), 2)} and activity = \Sexpr{round(activityPred(best$par), 2)}). Using a radius of 2 produces overall desirability equal to one, although the solution extrapolates slightly outside of the design region. \section{Non--Standard Features} The preceding approach has been faithful to the process described in Derringer and Suich (1980) and is consistent with current implementations of desirability functions. This package also contains a few non--standard features. \subsection{Non--Informative Desirability and Missing Values} In some cases, the inputs to the desirability functions cannot be computed. When individual desirability functions are defined, a non--informative value is estimated by computing the desirabilities over the possible range and taking the mean value. By default, if the input to this desirability function is \code{NA}, it is replaced by the non--informative value. In order to have the calculation return an \code{NA} value, the value of \code{object\$missing} can be changed to \code{NA}, where \code{object} is the result of a call to one of the desirability R functions. The non--informative value is plotted as a broken line in the default \code{plot} methods (see Figure \ref{f:logistic} for an example). \subsection{Zero--Desirability Tolerances} In some cases where the dimensionality of the outcomes is large, it may be difficult to find feasible solutions where every individual desirability value is acceptable. Each desirability R function has a \code{tol} argument that can be set to a number on [0, 1] (the default value is \code{NULL}). If this value is not null, desirability values equal to zero are replaced by the value of \code{tol}. This computation is applied after the missing value imputation step. \subsection{Non--Standard Desirability Functions} In some cases, the three R desirability functions previously discussed may not cover all of the possible forms of the desirability function required by the user. The function \code{dArb} takes as input a numeric vector of input values and their matching desirabilities and can approximate many other functional forms. For example, if a symmetric, sinusoidal curve was needed to translate a real--valued equation to the desirability scale, the logistic function could be used: \[ d(\vec{x}) = \frac{1}{1+\exp(-\vec{x})} \] Outside of the range $\pm$5, the desirability values are close to zero and one, so we can define 20 points on this range, compute the logistic model and use these values to define the desirability function: <>= foo <- function(u) 1/(1+exp(-u)) xInput <- seq(-5,5, length = 20) logisticD <- dArb(xInput, foo(xInput)) @ Inputs in--between the grid points are linearly interpolated. Using this approach, the extreme values are used outside of the input range. For example, an input value of --10 would be assigned to desirability value of \code{foo(-5)}. Similarly, on the high side, the last desirability value (when the inputs are ordered) is carried forward. Figure \ref{f:logistic} shows an example of the \code{plot} method applied to the object \code{logisticD}. \begin{figure}[ht] \begin{center} <>= pdf("logistic.pdf", width = 5, height = 4) plot(logisticD) dev.off() @ \includegraphics[clip, width = .6\textwidth]{logistic} \caption{An example of using a logistic function to translate inputs to desirability using the \code{dArb} function.} \label{f:logistic} \end{center} \end{figure} Similarly, there is another desirability function to implement box constraints on an equation. For example, instead of using a penalty approach to constraining the experimental factors to be within the design region, we could coerce values outside of $\pm$1.682 to have zero desirability. Figure \ref{f:box} shows an example function. In our previous optimization example, we could create a desirability function for each of the predictors and add them to the overall desirability object. Another non--standard application of desirability functions uses categorical inputs. Desirabilities can be assigned for all possible values. For example: <>= values <- c("value1" = .1, "value2" = .9, "value3" = .2) groupedDesirabilities <- dCategorical(values) groupedDesirabilities @ Figure \ref{f:cat} shows a plot of the desirability profiles for this configuration. \begin{figure}[ht] \begin{center} <>= pdf("box.pdf", width = 5, height = 4) plot(dBox(-1.682, 1.682), nonInform = FALSE) dev.off() @ \includegraphics[clip, width = .6\textwidth]{box} \caption{An example of using a box--like desirability function.} \label{f:box} \end{center} \end{figure} \begin{figure} \begin{center} <>= pdf("groups.pdf", width = 5, height = 4) plot(groupedDesirabilities, nonInform = FALSE) dev.off() @ \includegraphics[clip, width = .6\textwidth]{groups} \caption{An example of using a desirability function for categorical values.} \label{f:cat} \end{center} \end{figure} \clearpage \section{References} \begin{description} \item [] Bohachevsky, I. O., Johnson, M. E. and Stein, M. L. (1986), Generalized Simulated Annealing for Function Optimization. {\em Technometrics} {\bf 28}, 209--217. \item [] Box, G. E. P. and Wilson, K. B. (1951), On the Experimental Attainment of Optimum Conditions. {\em Journal of the Royal Statistical Society, Series B} {\bf 13}, 1--45. \item [] Del Castillo, E., Montgomery, D. C., and McCarville, D. R. (1996), Modified Desirability Functions for Multiple Response Optimization. {\em Journal of Quality Technology} {\bf 28}, 337--345. \item [] Derringer, G. and Suich, R. (1980), Simultaneous Optimization of Several Response Variables. {\em Journal of Quality Technology} {\bf 12}, 214--219. \item [] Harington, J. (1965), The Desirability Function. {\em Industrial Quality Control} {\bf 21}, 494--498. \item [] Myers, R. H. and Montgomery, C. M. (1995), {\em Response Surfaces Methodology: Process and Product Optimization Using Designed Experiments}. New York: Wiley. \item [] Nelder, J. A. and Mead, R. (1965), A Simplex Method for Function Minimization. {\em Computer Journal} {\bf 7}, 308--313. \item [] Olsson, D. M. and Nelson, L. S. (1975), The Nelder--Mead Simplex Procedure for function minimization. {\em Technometrics} {\bf 17}, 45--51. \end{description} \end{document}