---
title: "Using stars_proxy Objects"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using stars_proxy Objects}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
eval_chunks <-
  CopernicusMarine:::has_blosc &&
  curl::has_internet() &&
  CopernicusMarine::cms_get_password() != "" &&
  sf::st_drivers("raster", "^HDF5$")$vsi
```

Using [`stars_proxy` objects](https://r-spatial.github.io/stars/articles/stars2.html#stars-proxy-objects)
in combination with the CopernicusMarine package introduces opportunities to
efficiently work with Data from Copernicus Marine services on the fly.

The great thing about these proxy objects is that they will not read any data
unless it is needed. So, you can connect to a dataset from the Copernicus server
without having to read raster data. Instead it will only collect meta data about
the raster's dimensions and bands (attributes). The actual raster data is only
downloaded when you need it.

## Setting Up a Proxy Object

You can either set up a proxy object by calling `cms_native_proxy()` or
`cms_zarr_proxy()`. The first uses the 'native' service. In this case the
data is already structured in chunked files and the added value of proxy objects
is not that obvious. Therefore, in this vignette, we will focus on objects created
with `cms_zarr_proxy()`. It will connect with an entire layer in a product.

```{r create_proxy, eval=eval_chunks}
library(CopernicusMarine)
library(stars, quietly = TRUE)

my_proxy_gc <- cms_zarr_proxy(
  product       = "GLOBAL_ANALYSISFORECAST_PHY_001_024",
  layer         = "cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m",
  asset         = "geoChunked")

my_proxy_tc <- cms_zarr_proxy(
  product       = "GLOBAL_ANALYSISFORECAST_PHY_001_024",
  layer         = "cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m",
  asset         = "timeChunked")

print(my_proxy_tc)
```
## Selecting an Asset Type
The only downside from working with proxy objects is that you need
to know which asset type you wish to use. When subsetting data with
`cms_download_subset()` this selection is automated based on your
selection criteria. But when working with a proxy object, you may not
know which slices you wish to select in advance.

In general, if you wish to work with long time-series in a small geographical
area, it's most efficient to work with `"geoChunked"` data. Whereas, if
you want to work with a short time period, but on a large geographical scale,
it is better to use `"timeChunked"` data.

## Slicing a Proxy Object
As you can see from the proxy object printed above, it has dimension that
stretch pretty far. It has daily data for nearly four years, in 50 depth
layers with global coverage. If you would try to read this raster data, it
will almost certainly fail as it would require thousands of Gb of memory
which is simply not available on most devices.

Fortunately, the proxy object can easily be sliced, by selecting index values with
the bracket operator (`[`). The first index represents the band (attribute),
and we skip it, next are the `x` and `y` coordinate, followed by the elevation.
The last dimension is time, were we select the first four hundred records.

```{r slice_time, eval=eval_chunks}
time_slice <- my_proxy_gc[,2000, 1000, 48, 1:400]
show(time_slice)
```

As you can notice, this slicing is super fast. This is because no actual data
is transfered yet.
It isn't until `st_as_stars()` is called when the data is downloaded.
Since in this particular case we have only selected a single raster cell,
it makes sense to cast the object to a `data.frame`. We can then plot the
time series.

```{r plot_time_slice, eval=eval_chunks}
time_slice <- st_as_stars(time_slice)
plot(st_get_dimension_values(time_slice, "time"), time_slice$thetao,
     xlab = "date", ylab = "temperature", type = "l")
```

We can also select a specific area, for which we will use the time chunked
proxy.

```{r slice_area, eval=eval_chunks}
geo_slice <- my_proxy_tc[,2000:2500, 1500:1750, 50, 500]
plot(geo_slice, col = hcl.colors(10))
```