--- title: "Installation and Basic Usage" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Installation and Basic Usage} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) skip_vignette <- !reticulate::py_module_available("kuzu") ``` This vignette provides a guide to `kuzuR` and performing basic operations, including creating a database, defining a schema, loading data from various sources, and executing Cypher queries. ## 1. Connecting to a Database First, load the `kuzuR` package and create a connection to a Kuzu database. You can create an in-memory database or specify a path to a database file on disk. ```{r, eval=!skip_vignette} library(kuzuR) con <- kuzu_connection(":memory:") ``` ## 2. Data Types and Schema Definition Kuzu supports a rich set of data types. When creating a schema, you need to map your R data types to the corresponding Kuzu `LogicalTypeID`. ### Kuzu to R Data Type Mapping | **Kuzu `LogicalTypeID`** | **R Type Equivalent** | **Description** | |-------------------|-------------------|----------------------------------| | `BOOL` | `logical` | `TRUE`/`FALSE` values | | `INT64` | `integer` | 64-bit signed integer | | `DOUBLE` | `numeric` | Double-precision floating-point number | | `STRING` | `character` | UTF-8 encoded string | | `TIMESTAMP` | `POSIXct` | Date and time with timezone, stored as microseconds since epoch | | `DATE` | `Date` | Date (year, month, day) | | `INTERVAL` | `difftime` | Time interval (e.g., "1 year 2 months 3 days") | | `UUID` | `character` | Universally Unique Identifier, stored as a string | | `LIST` | `list` | Ordered collection of values of the same type | | `MAP` | `list` (named list) | Unordered collection of key-value pairs | ### Creating a Complex Schema You can define a schema with node and relationship tables using `kuzu_execute()`. Here's an example of a more complex schema: ```{r, eval=!skip_vignette} # Create a node table for users with various data types kuzu_execute(con, paste("CREATE NODE TABLE User(userID UUID, name STRING,", "age INT64, is_active BOOL, created_at TIMESTAMP,", "last_login DATE, notes STRING[],", "PRIMARY KEY (userID))")) # Create a node table for products kuzu_execute(con, "CREATE NODE TABLE Product(productID INT64, name STRING, PRIMARY KEY (productID))") # Create a relationship table for user purchases kuzu_execute(con, "CREATE REL TABLE Buys(FROM User TO Product, purchase_date DATE)") ``` ## 3. Loading Data You can load data into your Kuzu tables from R data frames or external files like CSV. ### Loading from a Data Frame Use `kuzu_copy_from_df()` to load data from an R `data.frame`. ```{r, eval=!skip_vignette} library(jsonlite) # Create data frames for nodes and relationships users <- data.frame( userID = c("a1b2c3d4-e5f6-7890-1234-567890abcdef", "b2c3d4e5-f6a7-8901-2345-67890abcdef0"), name = c("Alice", "Bob"), age = c(35, 45), is_active = c(TRUE, FALSE), created_at = as.POSIXct(c("2023-01-15 10:30:00", "2022-11-20 14:00:00")), last_login = as.Date(c("2023-10-25", "2023-09-30")), stringsAsFactors = FALSE ) # LIST types should be formatted as JSON strings users$notes <- c(toJSON(c("note1", "note2")), toJSON("note3")) products <- data.frame( productID = c(101, 102), name = c("Laptop", "Mouse") ) buys <- data.frame( from_user = c("a1b2c3d4-e5f6-7890-1234-567890abcdef", "b2c3d4e5-f6a7-8901-2345-67890abcdef0"), to_product = c(101, 102), purchase_date = as.Date(c("2023-02-20", "2023-03-15")) ) # Load data into Kuzu kuzu_copy_from_df(con, users, "User") kuzu_copy_from_df(con, products, "Product") kuzu_copy_from_df(con, buys, "Buys") ``` ### Loading from a CSV File Use `kuzu_copy_from_csv()` to load data from a CSV file. For this to work, the file should be in the current working directory. ```{r, eval=!skip_vignette} # Create a CSV file in the project's root directory csv_filename <- "products.csv" write.csv(data.frame(productID = c(103, 104), name = c("Keyboard", "Monitor")), csv_filename, row.names = FALSE) # Load data from the CSV file using just the filename kuzu_copy_from_csv(con, csv_filename, "Product") # Clean up the created file unlink(csv_filename) ``` ## 4. Executing Queries and Converting Results You can execute Cypher queries using `kuzu_execute()` and convert the results into various R formats. ```{r, eval=!skip_vignette} # Execute a query to get users and their purchases query_result <- kuzu_execute(con, "MATCH (u:User)-[b:Buys]->(p:Product) RETURN u.name, p.name, b.purchase_date") ``` ### A Note on Query Results The `QueryResult` object returned by `kuzu_execute()` acts as an iterator over the results. This means it can only be consumed once. Functions like `as.data.frame()`, `as_tibble()`, `kuzu_get_all()`, and the graph conversion functions will exhaust this iterator. ### Convert to Data Frame or Tibble ```{r, eval=!skip_vignette} # Convert to a data frame df_result <- as.data.frame(query_result) print(df_result) # Convert to a tibble library(tibble) tibble_result <- as_tibble(query_result) print(tibble_result) ``` ### Use Query Results returned as list ```{r, eval=!skip_vignette} query_result <- kuzu_execute(con, "MATCH (u:User)-[b:Buys]->(p:Product) RETURN u.name, p.name, b.purchase_date") result <- kuzu_get_all(query_result) print(result) # only fetch 1. result query_result <- kuzu_execute(con, "MATCH (u:User)-[b:Buys]->(p:Product) RETURN u.name, p.name, b.purchase_date") result <- kuzu_get_n(query_result, 1) print(result) #Fetch next result result <- kuzu_get_next(query_result) print(result) ``` ### Convert to Graph Objects For queries that return graph structures, you can convert the results into graph objects from packages like `igraph` or `tidygraph`. To do this, the query must return the node and relationship variables themselves, not just their properties. ```{r, eval=!skip_vignette} # Execute a query that returns a graph structure graph_query_result <- kuzu_execute(con, "MATCH (u:User)-[b:Buys]->(p:Product) RETURN u, p, b") igraph_obj <- as_igraph(graph_query_result) print(igraph_obj) plot(igraph_obj) # Convert to a tidygraph object tidygraph_obj <- as_tidygraph(graph_query_result) print(tidygraph_obj) plot(tidygraph_obj) ```