%\VignetteIndexEntry{From R to Java} %\VignetteKeywords{Web services} %\VignettePackage{RWebServices} \documentclass[]{article} \usepackage[colorlinks,linkcolor=blue,pagecolor=blue,urlcolor=blue]{hyperref} \usepackage{graphicx} \newcommand{\lang}[1]{{\texttt{#1}}} \newcommand{\pkg}[1]{{\textsf{#1}}} \newcommand{\code}[1]{\texttt{#1}} \newcommand{\func}[1]{{\texttt{#1}}} \newcommand{\method}[1]{{\texttt{#1}}} \renewcommand{\arg}[1]{{\texttt{#1}}} \newcommand{\ret}[1]{{\texttt{#1}}} \newcommand{\obj}[1]{{\texttt{#1}}} \newcommand{\class}[1]{{\textit{#1}}} \newcommand{\R}{\lang{R}} \newcommand{\Java}{\lang{Java}} \newcommand{\RWebServices}{\pkg{RWebServices}} \newcommand{\TypeInfo}{\pkg{TypeInfo}} \newcommand{\SJava}{\pkg{SJava}} \newcommand{\STS}{\code{SimultaneousTypeSpecification}} \newcommand{\ITS}{\code{IndependentTypeSpecification}} \newcommand{\TypedSignature}{\code{TypedSignature}} \newcommand{\STT}{\code{StrictIsTypeTest}} \newcommand{\DTT}{\code{DynamicTypeTest}} \newcommand{\ITT}{\code{InheritsTypeTest}} \newcommand{\oneWayAnova}{\command{oneWayAnova}} %% \newcommand{\STS}{\func{SimultaneousTypeSpecification}} %% \newcommand{\ITS}{\func{IndependentTypeSpecification}} %% \newcommand{\TypedSignature}{\func{TypedSignature}} %% \newcommand{\STT}{\func{StrictIsTypeTest}} %% \newcommand{\DTT}{\func{DynamicTypeTest}} %% \newcommand{\ITT}{\func{InheritsTypeTest}} \begin{document} \title{From \R{} to \Java{}: the \TypeInfo{} and \RWebServices{} paradigm} \author{ Nianhua Li\footnote{Fred Hutchinson Cancer Research Center, 1100 Fairview Ave.\ N., PO Box 19024 Seattle, WA 98109}, Martin T. Morgan, Seth Falcon,\\ Robert Gentleman, Duncan Temple Lang\footnote{Department of Statistics, 4210 Mathematical Sciences Building, One Shield Avenue, Davis, CA 95616} } \date{14 July, 2006} \maketitle \begin{abstract} Web services are most effective on statically typed objects exposed in a well-developed infrastructure. This document summarizes our approach to exposing \R{} objects and functionality in a \Java{} class hierarchy of statically typed methods. The approach is to use \R{}'s formal (S4) class system to strongly type \R{} functions using \TypeInfo{}. We then convert strongly typed functions to \Java{} objects and methods for exposure as \Java-based web services. Exposing and implementing the web service in \Java{} involves the package \SJava{}. Documentation for these steps will be provided later. \end{abstract} <>= options(width=60) @ \section{Introduction} Exposing \R{} objects and functions as web services poses several challenges. First, \R{} has both informal `classes' and a formal (S4) class system, whereas web services are most effective with well-defined objects. Second \R{} functions are not strongly typed, whereas web services deploy statically typed functions. Finally, well-developed infrastructure supports \Java{}-based web services, whereas web services client and server functionality for \R{} requires substantial \emph{de novo} development. \TypeInfo{} and \RWebServices{} are packages that combine to provide a paradigm for exposing \R{} functions as effective web services in a \Java-based web services context. Here we document the paradigm of using \TypeInfo{} and \RWebServices{} for type mapping between \R{} and \Java. %% TODO: why auto-generation between R and Java? Java has structure to %% deploy web services in a common framework. Server model easy to set %% up. Need to take data from the web to R via Java, relying on %% established web services infrastructure to perform most of the %% web-based communication. SJava provides a two-way street -- both R %% to Java and reverse. \section{Steps to describing \R{} objects in \Java} \subsection{Adding \TypeInfo{} to \R{} functions} The main purpose of \TypeInfo{} is to provide type specification for function arguments and return values. By `type specification' we mean definition of argument and return types in terms of defined \R{} objects. The named objects are defined in \R{}, and objects and function definitions are translated to equivalent \Java{} objects and methods using \RWebServices{}. To illustrate, the following defines and invokes a hypothetical \R{} function \func{square} taking an un-typed argument \arg{x} and returning an untyped return value. <>= square <- function(x) { return(x^2) } square(10) @ % The function evaluates correctly when provide a numeric argument; non-numeric arguments result in a run-time error. Importantly, there is no way to query the function to determine its argument or return type. Type specification is applied by loading the \TypeInfo{} package and annotating the definition of \func{square}: <>= library(TypeInfo) STS <- SimultaneousTypeSpecification TS <- TypedSignature typeInfo(square) <- STS(TS(x = "numeric"), returnType = "numeric") @ (The symbols \code{STS} and \code{TS} are defined for convenience to be synonyms for the longer function names from the \TypeInfo{} library). Applying \TypeInfo{} provides two important changes to the behavior of \func{square}, without altering the body of the function. First, the argument \arg{x} and return type \emph{must} be objects of type \code{numeric} (approximately, \code{double[]} in \Java{}). Attempts to invoke \func{square} with non-numeric arguments result in an error. Programming errors returning non-numeric values also cause an error. The second important consequence of applying \TypeInfo{} is to allow functions annotated in this way to be queried for their argument and return types: <>= typeInfo(square) @ This information can be readily extracted and transformed programmatically. \R{} functionality is usually organized into \emph{packages}. The intention is that package authors, or individuals responsible for exposing \R{} functionality as web services, apply \TypeInfo{} to functions in the package. Thus type-specified functions are defined within packages. Full documentation of \TypeInfo{} is available with the package. Entering \code{library(help=TypeInfo)} at the \R{} prompt provides a synopsis of available commands. Documentation of each command is available by typing \code{?typeInfo} at the \R{} prompt. Additional illustration of \TypeInfo, written for a general audience, is distributed with the packages as a PDF file TypeInfoNews. \subsection{Using \RWebServices{} to create \Java{} mappings} The main purpose of \RWebServices{} is to translate \R{} object and function definitions into equivalent \Java{} class definitions. Note that there are two components to translation. The focus here is on \emph{describing} \R{} objects in \Java{}. The process of moving data from \R{} to \Java{} and vice-versa is implicit in this description, but the software for performing this translation (\SJava) is not part of the paradigm being described here. \RWebServices{} operate on type-specified functions. \RWebServices{} extracts information about argument and return types. It determines the underlying structure of potentially complicated \R{} objects specified in the type definition. Based on this information, \RWebServices{} produces \Java{} class hierarchies reflecting data objects, and composes \Java{} method signatures appropriate for the functions. From the \R{} perspective, the process of producing web services templates for a function, e.g., \func{caAffy} with \TypeInfo{} applied in the package \pkg{CaAffy} is straight-forward: <>= library(CaAffy) RJavaSignature(c(caAffy)) @ % \func{RJavaSignature} queries \func{caAffy} for its argument types. It then uses standard S4 object type definition specified in \pkg{CaAffy} (or other \R{} packages), and function definitions in \pkg{CaAffy} to construct \Java{} signatures. \func{RJavaSignature} then produces documented \Java{} beans representing the \R{} data objects and functions, organized in a hierarchy reflecting the package structure. Suppose \func{caAffy} takes arguments \arg{magePlaceholder} and \arg{caAffyTuningParam} of class \class{MagePlaceholder} and \class{CaAffyTuningParam}, and returns an object of \class{MagePlaceholder}. The \Java{} beans and methods are packaged as described below. Full documentation of \RWebServices{} is available with the library. Entering \code{library(help=RWebServices)} at the \R{} prompt provides a synopsis of available commands. Documentation of each command is available by typing \code{?RJavaSignature} at the \R{} prompt. Although the \RWebServices{} package depends on \SJava{} for performing web services, the functionality described here does not use the facilities of \SJava. \section{Understanding \Java{} representations of \R{} objects and functions} \RWebServices{} has two main functionalities. First, \RWebServices{} generates \Java{} representations of \R{} functions and data objects. Second, \RWebServices{} allows \R{} functions to be evaluated from within \Java{}, including \Java{}-based web or analytic services. This section describes in detail the functioning of \RWebServices{} as it generates \Java{} representations. A central purpose of \RWebServices{} is to generate \Java{} representations of \R{} data and functions. The main interface to \RWebServices{} is provided through the \R{} function \method{RJavaSignature}. Starting with a list (provided by the user or programmatically extracted from the package) of \TypeInfo{}-annotated functions, \RWebServices{} parses the functions for data types, and creates Java representations of each data type and method. The Java representation of methods and parsed data types are then collated into \Java{} packages with a layout consistent with the \R{} package structure. \RWebServices{} also generates \Java{} service APIs and adapters for the \R{} functions. Internally, the function \method{RWebServices:::generateFunctionMap} is responsible for these steps. The \Java{} data and method representations are written to disk as a file hierarchy reflecting the structure of the corresponding \R{} objects, including the libraries in which the \R{} data types and methods were defined. Details are provided below, but a simple example is: \begin{verbatim} package / CaAffy / data (Java data objects) / functions (Java methods for R functions) / CaPROcess / data / functions / CaDNAcopy / data / functions service / bioconductor (Java service API) \end{verbatim} The \R{} packages in this example include \pkg{CaAffy}, \pkg{CaPROcess}, and \pkg{CaDNAcopy}. \subsection{\Java{} representations of \R{} data objects} The responsibility for generating \Java{} representations of \R{} data objects is in the internally defined function \method{RWebServices:::generateDataMap}. This function operates by creating a hash of \R{} data types used in the \R{} functions. The function then creates \Java{} class definitions representing the \R{} data types (limitations concerning multiple inheritance are described below). The representations reflect underlying \R{} data type structure, for instance, capturing slots present in S4 classes. Part of this process is to identify functions required for low-level data conversion (e.g., R \verb|numeric| to \Java{} \verb|RDouble|); details of the low-level conversion process are presented below. \R{} class names are mangled to reflect \Java{} conventions (e.g., \R{} \verb|class.name| becomes \Java{} \verb|className|) and to avoid \Java{} keyword conflicts. The \Java{} representations are written to disk in a folder \verb|data| contained inside the corresponding package folder, e.g., \verb|biocJavaMap/CaAffy/data|. \subsection{Generating \Java{} representations of \R{} function signatures} \method{RWebServices:::generateDataMap} uses the \R{} function signature to generate \Java{} class methods. Methods are constructed by looking up input and output \R{} data types with their corresponding \Java{} representation. Argument input names are mangled to be consistent with \Java{} convention. \Java method names correspond to \R{} function names, except when several \R{} functions have the same name but different return types. In this case simple aliases (e.g., \verb|foo_1|, \verb|foo_2|) are created in the \Java{} representation. The \Java{} representations are written to disk in a folder \verb|function| containing a single class with methods corresponding to all \R{} functions defined in the \R{} package. \subsection{Generating the \Java{} API and adapters} \RWebServices{} creates an API that represents the main entry to invoke \R{} functionality from \Java{}. In its simplest form, the API consists of a single \Java{} class (e.g., service.bioconductor.java) with a method for each \R{} function. Each method in the API invokes the corresponding method in the individual \Java{} packages. For example, the \verb|affy| method in the main service API might invoke \verb|biocJavaMap.CaAffy.function.caAffy()|. Multiple web services can also be defined, with each service API dispatching to one or several \Java{} packages encapsulating \R{} methods. \RWebServices{} also creates a naive client interface to be used during testing, and an adapter to implement the web service interface generated by Axis or other web service facilities. The \Java{} API, client, and adapters are written to disk in the folder \texttt{service / bioconductor} (or as defined by the user). \section{Understanding how \Java{} invokes \R{} functions} Invoking \R{} functions from \Java{} relies on the \SJava{} package. There are two main tasks. The first is conversion of data types between \Java{} and \R{}. The second is to evaluate the \R{} functions, using an \R{} session embedded in the \Java{} virtual machine. \subsection{Data types and conversions} \SJava{} allows C code to interface between native \Java{} types (accessible through JNI) and native \R{} types (\R{} native types are C data structures that define S-expressions, or SEXPs). Each data type conversion is performed by converter functions, written in C or \R{}. Converters for basic data types are provided by \SJava. Additional converters can extend or override the basic converters, and can be registered with \SJava{} for dynamic dispatch. \subsubsection{Data models} \RWebServices{} uses the flexible infrastructure of \SJava{} to convert basic \R{} types to \Java{} primary types (\verb|integer|, \verb|double| or classes (e.g., \verb|Integer[]|, \verb|Double[]|, etc.), and to convert the structured S4 \R{} objects to corresponding \Java{} classes. This basic mapping provides sufficient flexibility for data transfer between languages, while promoting interoperability through reuse of common data types. \RWebServices{} also supports a richer object model, capturing the use of \R{} \emph{attributes} to convey object information, e.g., about dimensions or missing values. This richer model is not exposed in caBig. The \Java{} representation of complex \R{} objects (e.g., S4 objects) are programmatically generated using \R{} language reflection to identify object structure (\R{} slots) in terms of basic \R{} types. Limitations to this approach are indicated below. Additional \R{} class structures can also be represented in \Java. For instance, class unions are an \R{} concept where members of the class union form a single class, even though they are otherwise unrelated. <>= setClass("A", "logical") setClass("B", "character") setClassUnion("C", c("A","B")) @ % An instance of class \class{C} can be assigned either logical or character values. This pattern of inheritance cannot be represented as a single Java object, but \RWebServices{} implements Java representations of class unions using inspiration from the \href{http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html}{Abstract Factory} pattern. \subsubsection{Converters} A converter handles conversion between a specific pair of \R{} and \Java{} objects. There are two components to \RWebServices{} converters. A `match' function (e.g., \method{RWebServices:::cvtIntegerToJava}) is used for dynamic dispatch. A convert function (e.g., \method{RWebServices:::cvtIntegerToJava}) in \RWebServices{} is written in \R{}; converters rely on calls to underlying C code (e.g., \verb|RIntegerVector_JavaIntArray|) or on \SJava{} functionality to copy data types between \R{} SEXPs and \Java{} native representations. Converters for each complex \R{} object is programmatically generated by recursively visiting the object slots (corresponding to \Java{} fields) until basic \R{} types are encountered. Converters are included in the \verb|data| output directory, e.g., \verb|biocJavaMaps/CaAffy/data/TypeConverter.R|) and loaded in the embedded \R{}. \subsubsection{Limitations} There are several limitations to the object model and conversion process outlined here. \R{} objects can have arbitrary attributes, but the \RWebServices{} implementation only recognizes attributes essential for representing data structures to web or analytic services (e.g., \verb|dim| to describe \verb|RArray| dimensions). The main reason for restricting \RWebServices{} in this way is that the resulting \Java{} representation is likely not to be used often. The implementation is flexible enough that future extensions are possible. Classes from the the informal 'S3' object system of \R{} do not contain sufficient information about class structure for programmatic transformation between \R{} and \Java; these objects can be defined more formally as S4 objects, and the S4 objects used with \TypeInfo{} to specify argument and return types. S4 classes consist of slots specific to the class, and relationships to other classes; the class system is similar to but richer than that in Java, allowing multiple inheritance, class unions, etc. \RWebServices{} captures the entire data representation of S4 objects, but does not contain information about class relations. For instance, in the following example <>= setClass("D", representation=representation(x="numeric")) setClass("E", contains="D", representation=representation(y="numeric")) @ % An \R{} instance of class \class{D} has two slots x, y; information about the inheritance of x is contained in the class definition of \class{D}, but the structure of instances of \class{D} does not include this information. The \Java{} representation of class \class{D} created by \RWebServices{} has two fields x and y, but no knowledge of the class hierarchy that these slots represent in \R{} because \Java{} requires single inheritance. This is a satisfactory solution for present purposes, since the data contained in the \Java{} instance is sufficient for data transformation. A development might more fully leverage single inheritance in \Java{} to represent classes with only single inheritance in \R. \RWebServices{} allows \R{} objects to be represented in \Java, but does not provide facilities for automatically representing \Java{} objects as \R{} classes. This is satisfactory for the goal of exposing \R{} functions and data object as web or analytic services. \subsection{Function invocation} Invocation of \R{} functions is initiated in the \Java{} API created by \RWebServices{} (e.g., \texttt{service / bioconductor}). This API initializes and uses \SJava{} facilities. \SJava{} embeds \R{} in the \Java{} virtual machine as a shared library. \SJava{} mediates interactions with the embedded \R{} through instances of the \Java{} classes \class{ROmegahatInterpreter} and \class{REvaluator}. The \Java{} API uses \class{REvaluator} to establish the environment for \R{} function evaluation, including loading \R{} packages required for function evaluation and installing converter functions. The \Java{} virtual machine is now able to invoke \R{} functions. The interface to \R{} functions starts at the main API. The main API invokes the package-level (e.g., \pkg{CaAffy}) \Java{} representations of the \R{} function. The package-level representation invokes \method{REvaluator.call()}. This method takes as arguments a character string representing the \R{} function name and a \Java{} \class{Object[]} containing \Java{} representations of input parameter, and returns a \Java{} \class{Object}. \method{REvaluator.call} invokes necessary data translators for data transfer to and from \R{}, and arranges for \R{} function evaluation of appropriate arguments. Input parameters and return types of \method{REvaluator.call()} are generic; type coercion takes place in the package-level \Java{} representations. Error handling facilities are available. Errors triggering the exception handling system in \R{} during function evaluation or type conversion are propagated as \Java{} exceptions, and returned to the \Java{} virtual machine. Serious \R{} faults (e.g.,segmentation faults) trigger \Java{} exceptions that are also propagated. The implementation has several limitations. Callbacks to \Java{} from \R{} are not yet tested. \SJava{} implements the concept of foreign language references, where functions in one language operate on references to complex data types in the other language, rather than on the data itself. The \RWebServices{} implementation has not yet taken advantage of this feature. Finally, \R{} is not thread safe, so that each \Java{} virtual machine can have at most one instance of \R{}. This requires that evaluation of several functions must occur sequentially. One solution is to use multiple \Java{} processes in a coordinated fashion, e.g., using the \Java{} Message Service. \section{Next steps: Exposing \R{} as web and analytic services} The forgoing sections have described how \R{} data types and functions are exposed to \Java{} applications. There are well-established mechanisms to facilitate the transformation of stand-alone \Java{} applications to web or analytic services. For example, Apache Axis tools generate WSDL from stand-alone applications, and web services layers from WSDL. Likewise, the caGrid tool Introduce coupled with caDSR tools for semantic annotation allow generation of analytic services from stand-alone \Java{} applications. % %% TODO: (selected) items that cannot be translated. %% TODO: indicate ability to pass (nearly) arbitrary objects (e.g., %% binary objects representing images). %% TODO: cleaner ending %% \section{Deploying web services} %% This portion of the documentation is in preparation. %% TODO: Patrick McConnell %% TODO: suck text from TypeInfoNews, %% modify intro to stress importance of type info for easy exposure as web service, drop hints & tips \end{document}