--- title: "Data Frames" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Frames} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` Data frames are the workhorse of data analysis in R. In HDF5, data frames are stored as **Compound Datasets**. This allows different columns to have different data types (e.g., integer, float, string) within the same dataset, much like a SQL table. This vignette explains how `h5lite` handles data frames, including row names, factors, and missing values. ```{r setup} library(h5lite) file <- tempfile(fileext = ".h5") ``` ## Basic Usage Writing a data frame is as simple as writing any other object. `h5lite` automatically maps each column to its appropriate HDF5 type. ```{r} # Create a standard data frame df <- data.frame( id = 1:5, group = c("A", "A", "B", "B", "C"), score = c(10.5, 9.2, 8.4, 7.1, 6.0), passed = c(TRUE, TRUE, TRUE, FALSE, FALSE), stringsAsFactors = FALSE ) # Write to HDF5 h5_write(df, file, "study_data/results") # Fetch the column names h5_names(file, "study_data/results") # Read back df_in <- h5_read(file, "study_data/results") head(df_in) ``` ## Customizing Column Types You can use the `as` argument to control the storage type for specific columns. This is passed as a named vector where the names correspond to the column names. This is particularly useful for optimizing storage (e.g., saving space by storing small integers as `int8` or single characters as `ascii[1]`). ```{r} df_small <- data.frame( id = 1:10, code = rep("A", 10) ) # Force 'id' to be uint16 and 'code' to be an ascii string h5_write(df_small, file, "custom_df", as = c(id = "uint16", code = "ascii[]")) ``` ## Row Names Standard HDF5 Compound Datasets do not have a concept of "row names". However, `h5lite` preserves them using **Dimension Scales**. When you write a data frame with row names, `h5lite` creates a separate dataset (usually named `_rownames`) and links it to the main table. When reading, `h5lite` automatically restores these as the `row.names` of the data frame. ```{r} mtcars_subset <- head(mtcars, 3) h5_write(mtcars_subset, file, "cars") h5_str(file) # Read back result <- h5_read(file, "cars") print(row.names(result)) ``` ```{r, include=FALSE} unlink(file) ```