%\VignetteIndexEntry{Introduction to turner} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} \documentclass[12pt]{article} \usepackage{upquote} \usepackage{geometry} \geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm} \setlength{\parindent}{0in} \usepackage{color} \definecolor{darkgray}{rgb}{0.3,0.3,0.3} \definecolor{lightgray}{rgb}{0.5,0.5,0.5} \definecolor{tomato}{rgb}{0.87,0.32,0.24} \definecolor{myblue}{rgb}{0.066,0.545,0.890} \definecolor{linkcolor}{rgb}{0.87,0.32,0.24} \usepackage{hyperref} \hypersetup{ colorlinks=true, urlcolor=linkcolor, linkcolor=linkcolor } \begin{document} \title{Introduction to \texttt{turner}} \author{ \textbf{\textcolor{darkgray}{G}}\textcolor{lightgray}{aston} \textbf{\textcolor{darkgray}{S}}\textcolor{lightgray}{anchez} \\ \small \texttt{\href{http://www.gastonsanchez.com} {www.gastonsanchez.com}} } \date{} \maketitle \section{Introduction and Motivation} \texttt{turner} is an R package designed to provide a set of handy functions for manipulating vectors and lists of vectors. The main idea is to make it easier to \textbf{turn} vectors (and lists of vectors) into other data R structures. \subsection{Why \texttt{turner}?} The package \texttt{turner} was born out of necessity from my involvement with \textbf{Multiblock Methods} and other multivariate data analysis methods (eg PLS Path Modeling). Although \texttt{turner} is intended to be used as a lower-level package (i.e. for developing other packages), I hope that you may find it useful for your own computations. \subsection{A little bit of multiblocks} R is great for working with data in tabular format such as matrices and data frames. However, there's no data structure for representing the abstract concept of a \textit{multiblock}. Basically, a multiblock could be seen as a matrix divided by blocks (or submatrices). This is a very informal and simplistic description, but it helps to understand the notion of a multiblock. \vspace{2mm} So how can we work with multiblocks in R? The trivial solution is to work with several matrices (one matrix per block). Another solution is to work with arrays. A third solution is to work with lists of matrices. \vspace{2mm} A different approach ---the one I use--- is to work parallelly with one matrix (or a data frame) and one list. In this case, all the blocks are in a single matrix (or data frame), while the list contains the information about the blocks. The main advantage of this approach is that you keep the data in one single object, while the relevant information about the structure of the blocks is kept in one list. \vspace{2mm} If we decide to work with the matrix-list duet, we need to be able to \textit{extract} the information of the list, and \textbf{turn} it into indixed structures (or other objects) for manipulating the blocks in the data matrix. \texttt{turner} is my attempt to make it easier (at least for me) to perform such manipulations. \section{Indexification} To use \texttt{turner} (once you have installed it), simply load it with the function \texttt{library()}: <>= # load package turner library(turner) @ \subsection*{Data in Blocks} To see how we can apply \texttt{turner}, we need to consider some data under a multiblock perspective. First let's start by creating a data matrix with 10 observations and 9 variables. <>= # create a matrix set.seed = 21 some_data = round(matrix(rnorm(90), 10, 9), 3) rownames(some_data) = 1:10 colnames(some_data) = paste("X", 1:9, sep='') # take a peek head(some_data, n=5) @ Now, let's suppose that our data can be divided in 3 blocks. The first block is formed by variables \texttt{X1, X2, X3}. The second block is formed by variables \texttt{X4, X5}. And the third block is formed by variables \texttt{X6, X7, X8, X9}. All this information can be stored in a list: <>= # list of blocks blocks = list(B1 = 1:3, B2 = 4:5, B3 = 6:9) blocks @ \subsection*{Indexed Structures} \texttt{turner} has been designed to work with lists (preferable of vectors) in order to turn them into \textit{indexed structures}. Such structures are mostly vectors that map the position indices of the elements in the list. \subsubsection*{\texttt{indexify()}} One common task is to \textbf{indexify} the list of blocks. The idea is to get a vector of indices representing the membership of the variables to their corresponding block. This is better understood with the following example: <>= # get indices of blocks indices = indexify(blocks) indices @ The \textit{indexification} of \texttt{blocks} allows us to get an indexed vector \texttt{indices}. This vector contains as many elements as variables in \texttt{some\_data}. Moreover, it tells us that: the first three elements belong to one block, the fourth and fifth elements belong to block 2, and the rest of the elements belong to block 3. \subsubsection*{\texttt{list\_to\_dummy()}} Another interesting task is to produce a dummy matrix based on the blocks. This is done by using the funciton \texttt{list\_to\_dummy()}: <>= # get dummy matrix based on blocks dummy = list_to_dummy(blocks) dummy @ As you can tell, \texttt{dummy} is matrix with as many rows as elements in blocks, and with as many columns as number of blocks. In turn, the columns of \texttt{dummy} are dummy indicators (hence the name). \subsubsection*{\texttt{from\_to()}} Sometimes, it is also useful to know the starting and ending positions of the blocks. This can be done by using the function \texttt{from\_to()}. <>= # get starting and ending positions start_end = from_to(blocks) start_end # vectors from and to from = start_end$from to = start_end$to @ \texttt{from\_to()} provides a list with two vectors: \texttt{\$from} and \texttt{\$to}. The first vector \texttt{\$from} contains the indices of the starting positions; in turne the second vector \texttt{\$to} contains the indices of the ending positions: We can extract the first block in \texttt{some\_data}: <>= # extract first block some_data[,from[1]:to[1]] @ Obviously we can argue that there is no need to use \texttt{from} and \texttt{to}. We can extract the first block by just typing: <>= # get first block some_data[,blocks[[1]]] @ Yes, we can use \texttt{blocks} to manipulate the data. But the advantage of \texttt{from\_to()} comes when you work with string lists like the following one: <>= # string list str_list = list(c("a","b","c"), c("d", "e"), c("f","g","h","i")) @ In this case you cannot extract the first block by simply typing: <>= # failed attempt some_data[,str_list[[1]]] @ You solve this problem by using \texttt{from\_to()}: <>= # start-end position for 'str_list' fromto_aux = from_to(str_list) from1 = fromto_aux$from to1 = fromto_aux$to # successful attempt some_data[,from1[1]:to1[1]] @ \subsection{Working with lists} Among other interesting features of \texttt{turner} is the set of functions for working with lists of vectors. \subsubsection*{lengths()} The function \texttt{lengths()} allows us to get the length of each vector inside a list. This function has an argument \texttt{out} to specify whether the output is in vector format (default behavior) or in list format: <>= # say you have some list some_list = list(1:3, 4:5, 6:9) # length of each vector (vector output) lengths(some_list, out='vector') # length of each vector (list output) lengths(some_list, out='list') @ It is important not to confuse \texttt{lengths()} with \texttt{length()}; the latter only gives you the number of elements in the list: <>= # compared to 'length()' length(some_list) @ \subsubsection*{funlist()} Another interesting function in \texttt{turner} is \texttt{funlist()} which takes as arguments a list and a function. The purpose of \texttt{funlist()} is to apply the given function to the unlisted elements in the list: <>= # sum of all elements in 'some_list' funlist(some_list, sum) # maixmum of all elements in 'some_list' funlist(some_list, max) # product of all elements in 'some_list' funlist(some_list, prod) # mean value of all elements in 'some_list' funlist(some_list, mean) @ \subsubsection*{listsize()} The function \texttt{listsize()} ---as well as \texttt{sizelist()}--- is another related function to work with lists. It allows us to get the total number of elements contained in a list: <>= # number of elements in 'some_list' listsize(some_list) @ \subsubsection*{indexify()} Another handy function is \texttt{listify()} which creates a list from a vector of integers: <>= # vector of indices number_elements = c(3, 1, 5) # list of index vectors based on 'number_elements' listify(number_elements) @ \subsection{Other Functions} The following table shows other functions available in \texttt{turner}: \begin{center} \begin{tabular}{l l} \hline Function & Description \\ \hline \texttt{df\_to\_blocks()} & splits a data frame into blocks \\ \texttt{matrix\_to\_blocks()} & splits a matrix into blocks \\ \texttt{vector\_to\_dummy()} & creates a dummy matrix from the elements in a vector \\ \texttt{factor\_to\_dummy()} & creates a dummy matrix from the elements in a factor \\ \texttt{list\_to\_dummy()} & creates a dummy matrix from the elements in a list \\ \texttt{dummy\_to\_list()} & creates an indexed list from a dummy matrix \\ \texttt{list\_to\_matrix()} & creates a design-type matrix from the elements in a list \\ \hline \end{tabular} \end{center} \end{document}