%\VignetteIndexEntry{Mirsynergy} % \documentclass[12pt]{article} \usepackage[left=1in,top=1in,right=1in, bottom=1in]{geometry} \usepackage{Sweave} \usepackage{times} \usepackage{hyperref} \usepackage{subfig} \usepackage{natbib} \usepackage{graphicx} \hypersetup{ colorlinks, citecolor=black, filecolor=black, linkcolor=black, urlcolor=black } \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsf{#1}} \newcommand{\R}{\textsf{R}} \newcommand{\TopHat}{\software{TopHat}} \newcommand{\Bowtie}{\software{Bowtie}} \newcommand{\bs}{\boldsymbol} \newcommand{\mf}{\mathbf} \setkeys{Gin}{height=0.6\textheight} \bibliographystyle{plain} \title{Mirsynergy: detect synergistic miRNA regulatory modules by overlapping neighbourhood expansion} \author{Yue Li \\ \texttt{yueli@cs.toronto.edu}} \date{\today} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle \section{Introduction} MicroRNAs (miRNAs) are $\sim$22 nucleotide small noncoding RNA that base-pair with mRNA primarily at the 3$'$ untranslated region (UTR) to cause mRNA degradation or translational repression \cite{Bartel:2009fh}. Aberrant miRNA expression is implicated in tumorigenesis \cite{Spizzo:2009fx}. Construction of microRNA regulatory modules (MiRM) will aid deciphering aberrant transcriptional regulatory network in cancer but is computationally challenging. Existing methods are stochastic or require a fixed number of regulatory modules. We propose \emph{Mirsynergy}, a deterministic overlapping clustering algorithm adapted from a recently developed framework. Briefly, Mirsynergy operates in two stages that first forms MiRM based on co-occurring miRNAs and then expand the MiRM by greedily including (excluding) mRNA into (from) the MiRM to maximize the synergy score, which is a function of miRNA-mRNA and gene-gene interactions (manuscript in prep). \section{Demonstration} In the following example, we first simulate 20 mRNA and 20 mRNA and the interactions among them, and then apply \Rfunction{mirsynergy} to the simulated data to produce module assignments. We then visualize the module assignments in Fig.\ref{fig:toy} <>= library(Mirsynergy) load(system.file("extdata/toy_modules.RData", package="Mirsynergy")) # run mirsynergy clustering V <- mirsynergy(W, H, verbose=FALSE) summary_modules(V) @ \begin{figure}[htbp] \begin{center} <>= load(system.file("extdata/toy_modules.RData", package="Mirsynergy")) plot_modules(V,W,H) @ \caption{Module assignment on a toy example.} \label{fig:toy} \end{center} \end{figure} Additionally, we can also export the module assignments in a Cytoscape-friendly format as two separate files containing the edges and nodes using the function \texttt{tabular\_module} (see function manual for details). \section{Real test} In this section, we demonstrate the real utility of \Rpackage{Mirsynergy} in construct miRNA regulatory modules from real breast cancer tumor samples. Specifically, we downloaded the test data in the units of RPKM (read per kilobase of exon per million mapped reads) and RPM (reads per million miRNA mapped) of 13306 mRNA and 710 miRNA for the 15 individuals from TCGA (The Cancer Genome Atlas). We furhter log2-transformed and mean-centred the data. For demonstration purpose, we used 20\% of the expression data containing 2661 mRNA and 142 miRNA expression. Moreover, the corresponding sequence-based miRNA-target site matrix $\mf{W}$ was downloaded from TargetScanHuman 6.2 database \cite{Friedman:2009km} and the gene-gene interaction (GGI) data matrix $\mf{H}$ including transcription factor binding sites (TFBS) and protein-protein interaction (PPI) data were processed from TRANSFAC \cite{Wingender:2000tk} and BioGrid \cite{Stark:2011ii}, respectively. <>= load(system.file("extdata/tcga_brca_testdata.RData", package="Mirsynergy")) @ Given as input the $2661\times 15$ mRNA and $142\times 15$ miRNA expression matrix along with the $2661\times 142$ target site matrix, we first construct an expression-based miRNA-mRNA interaction score (MMIS) matrix using LASSO from \Rpackage{glmnet} by treating mRNA as response and miRNA as input variables \cite{Friedman:2010wm}. <>= library(glmnet) ptm <- proc.time() # lasso across all samples # X: N x T (input variables) # obs <- t(Z) # T x M # run LASSO to construct W W <- lapply(1:nrow(X), function(i) { pred <- matrix(rep(0, nrow(Z)), nrow=1, dimnames=list(rownames(X)[i], rownames(Z))) c_i <- t(matrix(rep(C[i,,drop=FALSE], nrow(obs)), ncol=nrow(obs))) c_i <- (c_i > 0) + 0 # convert to binary matrix inp <- obs * c_i # use only miRNA with at least one non-zero entry across T samples inp <- inp[, apply(abs(inp), 2, max)>0, drop=FALSE] if(ncol(inp) >= 2) { # NOTE: negative coef means potential parget (remove intercept) # x <- coef(cv.glmnet(inp, X[i,], nfolds=3), s="lambda.min")[-1] x <- as.numeric(coef(glmnet(inp, X[i,]), s=0.1)[-1]) pred[, match(colnames(inp), colnames(pred))] <- x } pred[pred>0] <- 0 pred <- abs(pred) pred[pred>1] <- 1 pred }) W <- do.call("rbind", W) dimnames(W) <- dimnames(C) print(sprintf("Time elapsed for LASSO: %.3f (min)", (proc.time() - ptm)[3]/60)) @ Given the $\mf{W}$ and $\mf{H}$, we can now apply \Rfunction{mirsynergy} to obtain MiRM assignments. <>= V <- mirsynergy(W, H, verbose=FALSE) print_modules2(V) print(sprintf("Time elapsed (LASSO+Mirsynergy): %.3f (min)", (proc.time() - ptm)[3]/60)) @ There are several convenience functions implemented in the package to generate summary information such as Fig.\ref{fig:brca}. In particular, the plot depicts the m/miRNA distribution across modules (upper panels) as well as the synergy distribution by itself and as a function of the number of miRNA (bottom panels). \begin{figure}[htbp] \begin{center} <>= plot_module_summary(V) @ \caption{Summary information on MiRM using test data from TCGA-BRCA. Top panels: m/miRNA distribution across modulesas; Bottom panels: the synergy distribution by itself and as a function of the number of miRNA.} \label{fig:brca} \end{center} \end{figure} For more details, please refer to our paper (manuscript in prep.). \section{Session Info} <>= sessionInfo() @ \bibliography{Mirsynergy} \end{document}