---
title: "Introduction to the BiocAzul package"
author:
- name: Marcel Ramos
  affilitation: CUNY Graduate School of Public Health and Health Policy
  email: marcel.ramos@sph.cuny.edu
date: "`r format(Sys.time(), '%B %d, %Y')`"
package: BiocAzul
output:
  BiocStyle::html_document
abstract: |
  The BiocAzul package provides an interface to the Azul API, which is a service
  that allows users to search and access data from the Human Cell Atlas (HCA)
  and the AnVIL Data Explorer. This vignette introduces the BiocAzul package,
  demonstrating how to use it to query data and integrate it with Terra
  workspaces for further analysis.
vignette: |
  %\VignetteIndexEntry{Introduction to the BiocAzul package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# BiocAzul

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    cache = TRUE
)
```

# Installation

Install the development version of the `BiocAzul` package from GitHub using the
following:

```{r install, eval = FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("Bioconductor/BiocAzul")
```

# Package Loading

```{r load-packages, message=FALSE, eval=TRUE, cache=FALSE}
library(BiocAzul)
```

# Introduction

The `BiocAzul` package provides an interface to the Azul API, which is used
to index data from the Human Cell Atlas (HCA) and the AnVIL Data Explorer.
Azul provides a convenient query interface for searching and retrieving data
from these projects.

# Basic Usage

To get started, create an `Azul` service object. By default, it connects to the
Human Cell Atlas service.

```{r azul-hca-init}
hca <- Azul()
hca
```

## Connecting to the AnVIL Data Explorer

To connect to the AnVIL Data Explorer instead, specify the provider when
creating the `Azul` object.

```{r azul-anvil}
anvil <- Azul(provider = "anvil")
anvil
```

Note that the `host` field in the objects output changes to reflect the AnVIL
Data Explorer service.

## Listing Catalogs

Azul organizes data into catalogs. You can list the available catalogs using
`listCatalogs()`.

```{r list-catalogs}
catalogs <- listCatalogs(hca)
catalogs
latest <- head(catalogs, n = 1)
latest
```

## Exploring Projects

To get a quick overview of the projects in a catalog, use `projectTable()`.
This returns a `tibble` with project names and their corresponding IDs.

```{r project-table}
projects <- projectTable(hca, catalog = latest)
head(projects)
```

## Exploring Facets

Azul data is organized by facets, which are attributes you can use to filter
and group data. You can list the available facets for a catalog using
`availableFacets()`.

```{r available-facets}
facets <- availableFacets(hca, catalog = latest)
head(facets)
```

You can also get a summary of values for a specific facet using `facetTable()`.

```{r facet-table}
facetTable(hca, facet = "genusSpecies", catalog = latest)
```

# Filtering and Queries

The `makeFilter()` function provides a convenient way to create filters for
querying the Azul API. It uses a formula-based syntax to define the filter
criteria.

```{r make-filter}
filter <- makeFilter(
    ~  specimenOrgan == "brain" &
        genusSpecies == "Mus musculus" &
        fileFormat == "h5"
)
filter
```

The filter created above filters for projects that have specimens from the
brain, are from the species Mus musculus, and have files in the h5 format. This
filter can be used in `importToTerra()` to import data that matches these
criteria. The image below shows the same filter applied via the HCA Data
Explorer interface.

```{r filter_sidebar, echo=FALSE, out.width="100%"}
img_path <-
    if (knitr::opts_knit$get("child")) "man/figures/" else "../man/figures/"
knitr::include_graphics(paste0(img_path, "filter_sidebar.png"))
```


# Integration with Terra

One of the main features of `BiocAzul` is the ability to import data directly
into a Terra workspace. This is done using the `importToTerra()` function.

Note: This step requires a Terra workspace and appropriate permissions. The
following code is for demonstration purposes and is not executed in this
vignette.

```{r import-to-terra, eval=FALSE}
importToTerra(
    hca,
    namespace = "your-terra-namespace",
    name = "your-terra-workspace",
    catalog = "dcp58",
    filters = filter
)
```

The equivalent operation in the Terra UI involves selecting a dataset for import
and clicking the "Request Link" button. See the image below for an example.

```{r terra_import, echo=FALSE, out.width="100%"}
knitr::include_graphics(paste0(img_path, "request_link.png"))
```

Once the link is requested, the user will be able to import the data into their
workspace. The image below shows how the user can select "Create a new
workspace" to import the data into a new Terra workspace.


```{r create_workspace, echo=FALSE, out.width="100%"}
knitr::include_graphics(paste0(img_path, "create_workspace.png"))
```


# Conclusion

The `importToTerra()` function conveniently simplifies the data import process.
By providing the desired filters and workspace information, users can
programmatically create a manifest, initiate the import job in Terra, and poll
for its completion, all without needing to interact with the Terra UI.

# Session Information

<details>
<summary>Click to see session information</summary>

```{r session-info}
sessionInfo()
```

</details>