--- title: "Converting to & from `tf`" author: "Jeff Goldsmith, Fabian Scheipl" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_width: 7 vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Converting to & from `tf`} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "##", fig.width = 6, fig.height = 4, out.width = "90%" ) library(tidyverse) library(viridisLite) library(lme4) library(gridExtra) theme_set(theme_minimal() + theme(legend.position = "bottom")) options( ggplot2.continuous.colour = "viridis", ggplot2.continuous.fill = "viridis" ) scale_colour_discrete <- scale_colour_viridis_d scale_fill_discrete <- scale_fill_viridis_d library("tidyfun") pal_5 <- viridis(7)[-(1:2)] set.seed(1221) ``` Functional data have often been stored in matrices or data frames. Although these structures have sufficed for some purposes, they are cumbersome or impossible to use with modern tools for data wrangling. In this vignette, we illustrate how to convert data from common structures to `tf` objects. Throughout, functional data vectors are stored as columns in a data frame to facilitate subsequent wrangling and analysis. # Conversion from matrices One of the most common structures for storing functional data has been a matrix. Especially when subjects are observed over the same (regular or irregular) grid, it is natural to observations on a subject in rows (or columns) of a matrix. Matrices, however, are difficult to wrangle along with data in a data frame, leading to confusing and easy-to-break subsetting across several objects. In the following examples, we'll use `tfd` to get a `tf` vector from matrices. The `tfd` function expects data to be organized so that each row is the functional observation for a single subject. It's possible to focus only on the resulting `tf` vector, but in keeping with the broader goals of **`tidyfun`** we'll add these as columns to a data frame. The `DTI` data in the **`refund`** package has been a popular example in functional data analysis. In the code below, we create a data frame (or `tibble`) containing scalar covariates, and then add columns for the `cca` and `rcst` track profiles. This code was used to create the `tidyfun::dti_df` dataset included in the package. ```{r} dti_df <- tibble( id = refund::DTI$ID, visit = refund::DTI$visit, sex = refund::DTI$sex, case = factor(ifelse(refund::DTI$case, "MS", "control")) ) dti_df$cca <- tfd(refund::DTI$cca, arg = seq(0, 1, length.out = 93)) dti_df$rcst <- tfd(refund::DTI$rcst, arg = seq(0, 1, length.out = 55)) ``` In `tfd`, the first argument is a matrix; `arg` defines the grid over which functions are observed. The output of `tfd` is a vector, which we include in the `dti_df` data frame. ```{r} dti_df ``` Finally, we'll make a quick spaghetti plot to illustrate that the complete functional data is included in each `tf` column. ```{r} dti_df |> tf_ggplot(aes(tf = cca, col = case, alpha = 0.2 + 0.4 * (case == "control"))) + geom_line() + facet_wrap(~sex) + scale_alpha(guide = "none", range = c(0.2, 0.4)) ``` We'll repeat the same basic process using a second, and probably even-more-perennial, functional data example: the Canadian weather data in the **`fda`** package. Here, functional data are stored in a three-dimensional array, with dimensions corresponding to day, station, and outcome (temperature, precipitation, and log10 precipitation). In the following, we first create a `tibble` with scalar covariates, then use `tfd` to create functional data vectors, and finally include the resulting vectors in the dataframe. In this case, our `arg`s are days of the year, and we use `tf_smooth` to smooth the precipitation outcome. Because the original data matrices record the different observations in the columns instead of the rows, we have to use their transpose in the call to `tfd`: ```{r} canada <- tibble( place = fda::CanadianWeather$place, region = fda::CanadianWeather$region, lat = fda::CanadianWeather$coordinates[, 1], lon = -fda::CanadianWeather$coordinates[, 2] ) |> mutate( temp = t(fda::CanadianWeather$dailyAv[, , 1]) |> tfd(arg = 1:365), precipl10 = t(fda::CanadianWeather$dailyAv[, , 3]) |> tfd(arg = 1:365) |> tf_smooth() ) ``` The resulting data frame is shown below. ```{r} canada ``` A plot containing both functional observations is shown below. ```{r} temp_panel <- canada |> tf_ggplot(aes(tf = temp, color = region)) + geom_line() precip_panel <- canada |> tf_ggplot(aes(tf = precipl10, color = region)) + geom_line() gridExtra::grid.arrange(temp_panel, precip_panel, nrow = 1) ``` # Conversion to `tf` from a data frame ### ... in "long" format "Long" format data frames containing functional data include columns containing a subject identifier, the functional argument, and the value each subject's function takes at each argument. There are also often (but not always) non-functional covariates that are repeated within a subject. For data in this form, we use `tf_nest` to produce a data frame containing a single row for each subject. A first example is the `sleepstudy` data from the **`lme4`** package, which is a nice example from longitudinal data analysis. This includes columns for `Subject`, `Days`, and `Reaction` -- which correspond to the subject, argument, and value. ```{r} data("sleepstudy", package = "lme4") sleepstudy <- as_tibble(sleepstudy) sleepstudy ``` We create `sleepstudy_tf` by nesting `Reaction` within subjects. The result is a data frame containing a single row for each curve (one per `Subject`) that holds two columns: the `Reaction` function and the `Subject` ID: ```{r} sleepstudy_tf <- sleepstudy |> tf_nest(Reaction, .id = Subject, .arg = Days) sleepstudy_tf ``` We'll make a quick plot to show the result. ```{r} sleepstudy_tf |> tf_ggplot(aes(tf = Reaction)) + geom_line() ``` Alternatively, for this simple example that does not contain any additional time-varying or time-constant covariates besides the values that define the functions themselves, we could have simply done: ```{r} tibble( Subject = unique(sleepstudy$Subject), Reaction = tfd(sleepstudy, id = "Subject", arg = "Days", value = "Reaction") ) ``` A second example uses the `ALA::fev1` dataset. **`ALA`** is not on CRAN, so we do not show an install command here. The code below is illustrative only and is not evaluated when the vignette is built. In this dataset, both `height` and `logFEV1` are observed at multiple ages for each child; that is, there are two functions observed simultaneously, over a shared argument. We can use `tf_nest` to create a dataframe with a single row for each subject, which includes both non-functional covariates (like age and height at baseline), and functional observations `logFEV1` and `height`. ```{r, eval = FALSE} ALA::fev1 |> group_by(id) |> mutate(n_obs = n()) |> filter(n_obs > 1) |> ungroup() |> tf_nest(logFEV1, height, .arg = age) |> glimpse() ``` ### ... in "wide" format In some cases functional data are stored in "wide" format, meaning that there are separate columns for each argument, and values are stored in these columns. In this case, `tf_gather` can be use to collapse across columns to produce a function for each subject. The example below again uses the `refund::DTI` dataset. We use `tf_gather` to transfer the `cca` observations from a matrix column (with `NA`s) into a column of irregularly observed functions (`tfd_irreg`). ```{r} dti_df <- refund::DTI |> janitor::clean_names() |> select(-starts_with("rcst")) |> glimpse() dti_df |> tf_gather(starts_with("cca")) |> glimpse() ``` # Changing representation with `tf_rebase` Sometimes you need to make different `tf` objects compatible -- for example, to combine raw observations with a basis representation, or to ensure two functional data vectors are expressed on the same grid or in the same basis. `tf_rebase` re-expresses one `tf` object in the representation of another: ```{r} # reload the tidyfun version of the DTI data data(dti_df, package = "tidyfun") # raw functional data cca_raw <- dti_df$cca[1:5] cca_raw # represent in a spline basis cca_basis <- tfb(dti_df$cca[1:5], k = 25) cca_basis # re-express the raw data in the same basis representation cca_rebased <- tf_rebase(cca_raw, basis_from = cca_basis) cca_rebased # or convert a spline-based representation to a grid-based one for a specific grid: tf_rebase(cca_basis, basis_from = cca_raw) ``` This is useful when you want to ensure that operations between `tf` objects (e.g., addition, comparison) use a common representation, or when you want to convert between `tfd` and `tfb` representations while matching a specific basis configuration. It is required for many operations that would otherwise not be well-defined. # Splitting and combining functions `tf_split` separates each function into fragments defined on sub-intervals of its domain, and `tf_combine` joins fragments back together. This is useful for analyzing specific parts of a function separately or for stitching together functional observations from different sources. ```{r} # split CCA profiles at their midpoint cca_halves <- tf_split(dti_df$cca[1:10], splits = 0.5) # result is a list of tf vectors, one per segment cca_halves[[1]] cca_halves[[2]] # recombine cca_recombined <- tf_combine(cca_halves[[1]], cca_halves[[2]]) cca_recombined ``` # Conversion from `fda` objects The **`fda`** package represents functional data as `fd` objects (basis function coefficients + basis definition). **`tf`** can convert these directly using `tfb_spline`, which re-expresses the `fd` basis in `tf`'s spline framework. This also works for `fdSmooth` objects returned by `fda::smooth.basis`. ```{r} # create an fd object from the Canadian weather data weather_basis <- fda::create.fourier.basis(c(0, 365), nbasis = 65) weather_fd <- fda::smooth.basis( argvals = 1:365, y = fda::CanadianWeather$dailyAv[, , 1], fdParobj = weather_basis ) # convert fdSmooth to tfb weather_tf <- tfb_spline(weather_fd) weather_tf[1:3] ``` The resulting `tfb` object can then be used with all **`tidyfun`** tools: ```{r} tibble( place = fda::CanadianWeather$place, region = fda::CanadianWeather$region, temp = weather_tf ) |> tf_ggplot(aes(tf = temp, color = region)) + geom_line(alpha = 0.5) ``` # Reversing the conversion **`tidyfun`** includes a wide range of tools for exploratory analysis and visualization, but many analysis approaches require data to be stored in more traditional formats. Several functions are available to aid in this conversion. ## Conversion from `tf` to data frames The functions `tf_unnest` and `tf_spread` reverse the operations in `tf_nest` and `tf_gather`, respectively -- that is, they take a data frame with a functional observation and produce long or wide data frames. We'll illustrate these with the `sleepstudy_tf` data set. First, to produce a long-format data frame, one can use `tf_unnest`: ```{r} sleepstudy_tf |> tf_unnest(cols = Reaction) |> glimpse() ``` To produce a wide-format data frame, one can use `tf_spread`: ```{r} sleepstudy_tf |> tf_spread() |> glimpse() ``` ## Converting back to a matrix or data frame To convert `tf` vector to a matrix with each row containing the function evaluations for one function, use `as.matrix`: ```{r} reaction_matrix <- sleepstudy_tf |> pull(Reaction) |> as.matrix() head(reaction_matrix) # argument values of input data saved in `arg`-attribute: attr(reaction_matrix, "arg") ``` To convert a `tf` vector to a standalone data frame with `"id"`,`"arg"`,`"value"`-columns, use `as.data.frame()` with `unnest = TRUE`: ```{r} sleepstudy_tf |> pull(Reaction) |> as.data.frame(unnest = TRUE) |> head() ```