%\VignetteIndexEntry{Enabling packages as web services} %\VignetteKeywords{Web services} %\VignettePackage{RWebServices} \documentclass[]{article} \usepackage[colorlinks,linkcolor=blue,pagecolor=blue,urlcolor=blue]{hyperref} \usepackage{graphicx} \usepackage{Sweave} \newcommand{\lang}[1]{{\texttt{#1}}} \newcommand{\pkg}[1]{{\textsf{#1}}} \newcommand{\code}[1]{\texttt{#1}} \newcommand{\func}[1]{{\texttt{#1}}} \newcommand{\method}[1]{{\texttt{#1}}} \renewcommand{\arg}[1]{{\texttt{#1}}} \newcommand{\ret}[1]{{\texttt{#1}}} \newcommand{\obj}[1]{{\texttt{#1}}} \newcommand{\class}[1]{{\textit{#1}}} \newcommand{\R}{\textsf{R}} \newcommand{\Java}{\textsf{Java}} \newcommand{\caBIG}{\textsf{caBIG}} \newcommand{\caGrid}{\textsf{caGrid}} \newcommand{\introduce}{\pkg{introduce}} \newcommand{\Globus}{\textsf{Globus}} \newcommand{\activeMQ}{\textsf{activeMQ}} \newcommand{\RWebServices}{\pkg{RWebServices}} \newcommand{\TypeInfo}{\pkg{TypeInfo}} \newcommand{\SJava}{\pkg{SJava}} \newcommand{\file}[1]{\texttt{#1}} \newcommand{\STS}{\code{SimultaneousTypeSpecification}} \newcommand{\ITS}{\code{IndependentTypeSpecification}} \newcommand{\TypedSignature}{\code{TypedSignature}} \newcommand{\STT}{\code{StrictIsTypeTest}} \newcommand{\DTT}{\code{DynamicTypeTest}} \newcommand{\ITT}{\code{InheritsTypeTest}} %% \newcommand{\STS}{\func{SimultaneousTypeSpecification}} %% \newcommand{\ITS}{\func{IndependentTypeSpecification}} %% \newcommand{\TypedSignature}{\func{TypedSignature}} %% \newcommand{\STT}{\func{StrictIsTypeTest}} %% \newcommand{\DTT}{\func{DynamicTypeTest}} %% \newcommand{\ITT}{\func{InheritsTypeTest}} \begin{document} \title{Enabling \R{} packages for web or grid services} \author{ Martin T. Morgan\footnote{Fred Hutchinson Cancer Research Center, 1100 Fairview Ave.\ N., PO Box 19024 Seattle, WA 98109}, Nianhua Li, Seth Falcon,\\ Robert Gentleman, } \date{30 November, 2006, 20 March, 2007} \maketitle <>= options(width=69) @ \section{Preliminaries} \subsection{Prerequisites} \RWebServices{} and associated software must be installed; see the accompanying documentation ``Installing and testing RWebServices and enabled packages''. You must have a valid \R{} package, including NAMESPACE file. See the Writing \R{} Extensions manual. All complex objects to be translated to \Java{} \emph{must} be either primitive types (e.g., numeric, character) or S4 classes. \section{Creating \Java{} templates} \subsection{\TypeInfo} Add type information to your functions. \begin{enumerate} \item Include \TypeInfo{} as a `Depends' line in the DESCRIPTION file. \item Provide \func{typeInfo} for each method to be exposed. From the \pkg{caDNAcopy} package, an example is: <>= typeInfo(caDNAcopy) <- SimultaneousTypeSpecification( TypedSignature(dnacopyAssays= "DNAcopyAssays", dnacopyParameter="DNAcopyParameter"), returnType="DerivedDNAcopySegment") @ %% Provide this information within the package, in a `.R' file after the corresponding function (\func{caDNAcopy}) has been defined. See documentation and vignettes in the \TypeInfo{} package for detail. \item Install the package, e.g., \begin{verbatim} R CMD INSTALL --clean \end{verbatim} where \verb|| is the name of your package. This can also be done from within \R{} using \func{install.packages} or other means. \end{enumerate} \subsection{Unpack ant scripts} Unpack ant scripts with the \R{} \func{unpackAntScript} command, or at the command line with \begin{verbatim} echo "library(RWebServices); unpackAntScript('~/tmp/')" | R --vanilla \end{verbatim} where \verb|~/tmp/| is the path to a temporary directory. \subsection{Create \Java{} templates} There are several ways of proceeding. One way is to use \func{createMap} from within R. A second way is to change to the directroy where the ant scripts were unpacked, and evaluate \begin{verbatim} cd ~/tmp/ ant -Dpkg= map-package \end{verbatim} (\verb|~/tmp/| is the directory where the ant scripts were unpacked). Both methods create a directory hierarchy \verb|src/|, and usually \verb|test/src|. Sometimes additional \Java{} templates maybe required for extra R data types. Suppose your function returns a \class{list} of \class{DerivedDNAcopySegment}. Your type information only shows \verb|returnType="list"|, but you need the \Java{} templates of \class{DerivedDNAcopySegment}. If you use \func{createMap} within R, use argument \arg{extraClasses}. If you use the ant scripts, set the property \arg{extra.classes} in \verb|~/tmp//RWebServicesTuning.properties| to \arg{DerivedDNAcopySegment}. You can also specify multiple R data types as extra classes in a comma delimited character string. \section{Writing and running tests} \subsection{Writing test code -- data} The files \begin{verbatim} test/src/org/bioconductor/rserviceJms/worker/RWorkerDataTest.java test/src/org/bioconductor/rserviceJms/worker/R/*.R test/src/org/bioconductor/rserviceJms/worker/Data/*.data \end{verbatim} contain skeletons to help generate \Java{} and \R{} components for testing data transfer between \R{} and Java. Templates are established for tests from \Java{} to \R{} for all function arguments, and from \R{} to \Java{} for all return values. If any extra classes are specified, their tests are established in both directions. The \Java{} code for testing uses the \pkg{JUnit} framework. A typical method starts with \begin{verbatim} @Ignore("please initialize data") @Test public void TestDNAcopyParameterToR() throws Exception { org.bioconductor.packages.caDNAcopy.DNAcopyParameter inputVal = null; inputVal = new ... String rScript = getClass().getResource("R/DNAcopyParameterData.R").getFile(); String rVariable = "DNAcopyParameterData"; assertTrue(myService.mockJava2R(inputVal, rScript, rVariable)); } \end{verbatim} The first two lines are directives for \pkg{JUnit}. The test framework will arrange to pass \obj{inputVal} to \R{}, and use the value of the variable \obj{rVariable} in \obj{rScript} to assess whether the data transfer is successful. The developer needs to customize \obj{inputVal} and the source file in the \verb|test/src| hierarchy). Comment \verb|@Ignore| to enable the test. Serialized data instances can be added to the \verb|Data| directory. Brave users can even render serialized \Java{} data instances from \R{} data instances. Save \R{} objects into binary files, and put them in one directory, say \verb||, and then evaluate: \begin{verbatim} cd ~/tmp/ ant create-data -Daction=load -Ddata.dir= \end{verbatim} The ant task transfers those \R{} objects into \Java{} objects and saves them into binary files in the same directory. You can then use the serialized \Java{} data in the test. This task requires the \R{} to \Java{} converts of the \R{} objects. The \R{} to \Java{} converts are not created for function arguments. So PLEASE make sure your \R{} objects are either a function return type or an extra class. An alternative task \begin{verbatim} ant create-data -Daction=data -Ddata.name= \ -Ddata.dir= \end{verbatim} invokes \R{} function \func{data} with argument \arg{}, and saves the serialized \Java{} data in \arg{}. The default \arg{} for the task \func{create-data} is \verb|~/tmp//test/src/org/bioconductor/rservicesJms/worker/Data|. The argument \arg{action} in this ant task corresponds to R function \func{load} and \func{data} respectively. If the \R{} objects is provided by the package, you can use \arg{action=data} and provide the object name as argument \arg{data.dir}. The \arg{action=load} is more useful for loading your own data files or for loading multiple files. The argument \arg{data.dir} has different meanings on different \arg{action} types. When \arg{action} is \verb|load|, \arg{data.dir} is the path for both the input \R{} data files and the output \Java{} data files. Both absolete and relative path will work. But please make sure all the files in \arg{data.dir} are \R{} data files when you invoke the ant task. When \arg{action} is \verb|data|, \arg{data.dir} is the path for the output \Java{} data file. The argument \arg{data.name} is only used when \arg{action} is \verb|data| and it has to be a \R{} object name, not a \R{} data file name. \subsection{Writing test code -- methods} The file \begin{verbatim} test/src/org/bioconductor/rserviceJms/services/.java \end{verbatim} contains a template for writing test methods. The methods in this class arrange for input parameters to be provided by the developer, and for the corresponding \R{} function to be invoked. The developer is free to implement tests on the return value; the default is to compare the return value with an expected value provided by the developer. \subsection{Running tests} Tests require (1) a running activemq (2) a `worker' to perform calculations and (3) the \Java{} program to run the tests. The strategy (to be refined) is: \begin{enumerate} \item Open a terminal window and start activemq \begin{verbatim} cd $JMS_HOME bin/activemq \end{verbatim} (alternatives are in the activemq documentation.) \item Open another teriminal window, compile the test and package source code, and start the worker: \begin{verbatim} cd ~/tmp/ ant precompile start-worker \end{verbatim} Several files should be compiled, and the worker should start. The ant task will remain active. \item Finally, open a third teriminal window and run the test program: \begin{verbatim} cd ~/tmp/ ant local-test \end{verbatim} The test files will be compiled and and executed. \end{enumerate} As the test program executes, any output directed toward stderr in \R{} (warnings or errors) will appear in the `worker' window. Java-based errors (e.g., failed unit tests or explicit print statements) in the test code are echoed in the local-test console, or printed in the test output directory, \verb|test/output|. \section{Creating web services from \Java{} templates} The \Java{} code you have now is a standard \Java{} application. Converting it into a web service application allows your functions to be accessed remotely in a platform and implementation indenpendant way. This process is enabled by \href{http://ws.apache.org/axis/}{Apache Axis} , a java platform for creating and deploying web services applications. Please make sure Apache Axis is correctly installed and deployed. If you have no existing web server, use \href{http://tomcat.apache.org/}{Apache Tomcat} as a starting point. Please also specify related properties in \verb|~/tmp//RWebServicesEnv.properties| \subsection{Creating web services} \begin{enumerate} \item Create WSDL from \Java{} code and \Java{} templates from WSDL \begin{verbatim} cd ~/tmp/ ant gen-wsdl \end{verbatim} The outputs in \verb|~/tmp/| are: \begin{verbatim} wsdl/*.wsdl wsdl/org/bioconductor/packages/*/*.java wsdl/org/bioconductor/rservicesJms/services/*/* \end{verbatim} The file \verb|*.wsdl| is written in WSDL, the \href{http://www.w3.org/TR/wsdl}{Web Service Description Language}. It specifies the type information of your functions, and defines all related data types. It is the agreement between the web service server and client for service invocations. The file is generated by a tool called Java2WSDL from Axis by extracting information from your \Java{} codes. Advanced users can customize the WSDL style via properties \arg{wsdl.style} and \arg{wsdl.use} in \verb|~/tmp//RWebServicesTuning.properties|. The default is \code{Document/literal wrapped}. \href{http://www-128.ibm.com/developerworks/webservices/library/ws-whichwsdl/}{More information} about WSDL style is available. All other \Java{} files in directory \verb|wsdl| are generated by a tool called WSDL2Java from Axis by extracting information from the WSDL file. \file{wsdl/org/bioconductor/rservicesJms/services/*/*} contains server binding skeletons, client binding stubs and a template for test. The stubs and skeletons handle all the low-level details of the remote method invocation. They allow seemless interactions between your \Java{} application, Axis and web service clients. \file{wsdl/org/bioconductor/packages/*/*.java} are \Java{} implementations for the data type definitions in WSDL. \item Creating web service server and web service client The outputs from WSDL2Java need to be connected with your \Java{} codes. \begin{verbatim} cd ~/tmp/ ant mkserver ant mkclient \end{verbatim} Two directories are created: \verb|server| and \verb|client|, to hold all data for the web service server and client respectively. The client is only for testing pupose. Any users of your web service can create a client from the WSDL file, by using any tool or any programming language. \end{enumerate} The ant tasks gen-wsdl, mkserver and mkclient can also be invoked in one composite task: \begin{verbatim} cd ~/tmp/ ant ws \end{verbatim} \subsection{Deploying the web service to Axis} To deploy the service: \begin{verbatim} cd ~/tmp/ ant deploy-serv \end{verbatim} If it fails, check Tomcat log files for error messages. Please also access your Axis instance from browser, and view the list of deployed web services. Sometimes the service does not appear on the list even if the above ant call returns no error information. Try the ant call again. You may also want to restart Tomcat server after deploying the service. The deployment step copies \verb|wsdl/org/bioconductor/rservicesJms/services/*/deploy.wsdd| to the file \verb|/WEB-INF/server-config.wsdd|. Always remember to undeploy the service afterwards: \begin{verbatim} cd ~/tmp/ ant undeploy-serv \end{verbatim} \subsection{Testing the web service} Add test code to \begin{verbatim} client/*/src/org/bioconductor/rservicesJms/services/*/*TestCase.java \end{verbatim} Make sure activemq, the `worker', and Tomcat are all running, and then perform tests: \begin{verbatim} cd ~/tmp/ ant web-test \end{verbatim} Test output is collated in \verb|client/test_output|. \section{Adding \Java{} code to \R{} packages for redistribution} After \R{} methods have been exposed and working tests developed, a next (and optional) step is to add the \Java{} code to the original \R{} package. In this way, the combined \R{} and \Java{} code can be redistributed for others to use or deploy as web services. The approach is to add \Java{} files to the directory \verb+inst/rservices+. The commands \begin{verbatim} ant map-package unpack-package -Dpkg= \end{verbatim} will then create an \RWebServices{} skeleton as outlined for \verb+map-package+, and then copy the files in the \verb+inst/rservices+ folder into their corresponding location in the skeleton. The typical contents of \verb+inst/rservices+ might be \Java{} source files and perhaps data instances used for implementing tests or simple clients. \section{Alternative deployments: caGrid services} \RWebServices{} packages can be used as traditional web services, or integrated into other projects. One example of the latter involves \href{http://cabig.nci.nih.gov}{\caBIG{}} and \href{http://www.cagrid.org}{\caGrid{}}. \caBIG{} is an effort by the US National Cancer Institute to develop standardized software that uses strongly typed data. \caGrid{} builds on this foundation to offer analytic and data services in a grid-based computing environment built on top of the \href{http://www.globus.org}{\Globus{}} toolkit. Here is how one might proceed to create a \caGrid{} analytic service based on an \RWebServices{}-enabled package; the assumption is that \pkg{caSurvey} contains functions with \func{typeInfo} applied. \pkg{caSurvey} has been built with \code{R CMD build --clean caSurvey}. One can then \begin{verbatim} tar xzf caSurvey_1.0.tar.gz R CMD INSTALL --clean caSurvey echo "library(RWebServices);unpackAntScript('caSurveyImpl')" | \ R --vanilla cd caSurveyImpl ant map-package -Dpkg=caSurvey \end{verbatim} To start the project. Just as described above, this creates src/ and test/ directories. the test directories are meant to be populated with unit tests to ensure that data are being translated between R and Java correctly (RWorkerDataTest.java) and that the service is invoked correctly (caSurveyTest.java). The worker tests require \RWebServices{}, \SJava{}, and \pkg{caSurvey} to work correctly; the service tests also require \activeMQ{} and a worker to be working correectly. The tests constructed and run as described above. You can go on to create and deploy a web service (ant ws deploy-serv), but for the workflow we want the next step is to use \caGrid{} and the \introduce{} tool to create a grid service. We will forward grid service requests to the \pkg{caSurvey} application created by \RWebServices{}' map-package. Creating a \caGrid{} analytic service is document in this \href{http://gforge.nci.nih.gov/plugins/scmcvs/cvsweb.php/archvcdebpsig/analytical_services/building_analytical_services_bp.doc?cvsroot=archvcdebpsig}{best practices} document. Think of application produced by \code{map-package} as a `silver level' application (chapter 4), with the goal being to reach `gold level' (chapter 5). The basic steps involved are \begin{enumerate} \item Create xsd from the \Java{} data beans produced by \RWebServices{}. \item Create a \caGrid{} / \introduce{} `project' based on the xsd and services to be exposed; \item Add relevant components from the \RWebServices{} project to the \caGrid{} / \introduce{} project. \item Translate grid service requests to requests handled by the \RWebServices{} project. \end{enumerate} The first two steps are necessary when brining any \Java{} project to \caGrid{}, and are described in the \caGrid{} best practices document. Components of the \RWebServices{} project need to be added to the \file{lib} directory of the \caGrid{} project. These are: \begin{enumerate} \item A jar file of compiled classes, e.g., \begin{verbatim} ant precompile jar -cf caSurvey.jar -C bin . \end{verbatim} \item \file{rservices.jar} from \RWebServices{}, and \file{activemq-core-4.02.jar} and \file{geronimo-jms} from \activeMQ{}. \end{enumerate} The best practices document suggests that \caGrid{} services use \file{Impl} to wrap the underlying business logic. For us, this means \begin{enumerate} \item Import data packaages and the service provider, e.g., \begin{verbatim} import org.bioconductor.packages.caSurvey.*; import org.bioconductor.rserviceJms.services.caSuvery.caSurvey; \end{verbatim} \item Create a persistent service when the grid service is initialized, e.g., \begin{verbatim} public class CaSurveyImpl extends CaSurveyImplBase { private caSurvey caService = null; public CaSurveyImpl() throws RemoteException { super(); // Start our service; the service has a lifetime // equal to that of this instance. try { // logs/catalina.out System.out.println("Starting caSurvey"); caService = new caSurvey(); } catch (Exception ex) { throw new RemoteException(ex.getMessage()); } System.out.println("Start caSurvey successful"); } ... \end{verbatim} \item Forward service requests. The \code{Impl} class contains methods. Each method represents a grid service. We map each to a \code{caSurvey} service, perhaps using \code{get} methods to access the grid data types. Generally: \begin{verbatim} ... public () { // map input types, i.e., create // from var = new (); // invoke service = null; try { = caService.(); } catch (RemoteException ex) { // maybe log? throw (ex); } // map from to return() } ... \end{verbatim} \end{enumerate} \section{More information} The vignette ``Installing and testing RWebServices and enabled packages'' provides guidance on package and software installation. Additional vignettes contain thoughts and `lessons learned' from this project, and are not essential reading. \end{document}