Detailed billing information is essential for cost and resource planning in cloud based analysis projects, but it can be difficult to obtain. The goal of this software is to help users of the Terra/AnVIL platform get access to this data as easily as possible.
The google cloud platform console can be used to acquire information at varying levels of detail. For example, it is simple to generate a display like the following.
Cost track from Google console.
However, the cost track here sums up charges for various activities related to CPU usage, storage, and network use. Our aim is to provide R functions to help present information on charges arising in the use of AnVIL.
To whet the appetite, we will show how to run an exploratory app that looks like:
Early view of reckoning app.
Install the AnVILBilling package with
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager", repos = "https://cran.r-project.org")
BiocManager::install("AnVILBilling")Once installed, load the package with
The functions in this vignette require a user to connect the billing export in the Google Cloud Platform project associated with Terra/AnVIL to a BigQuery dataset.
General information on this process can be found here:
https://cloud.google.com/billing/docs/how-to/export-data-bigquery
In order to set this up with the AnVIL/Terra system:
Once this is accomplished you will be able to see
exportview
with values appropriate to your project and account configuration substituted for ‘landmarkMark2’ (the compute project name), ‘bjbilling’ (the Google project with BigQuery scope that is used to transmit cost data on landmarkMark2 to Bigquery), and ‘anvilbilling’ (the BigQuery dataset name where next-day cost values are stored).
Billing data is generally available 1 day after incurring charges. Billing data is stored in BigQuery in a partitioned table, and is queryable using the bigrquery package.
In order to generate a request you need:
Then you can use the function:
setup_billing_request(start, end, project, dataset, table, billing_code)To create a request.
Once you have a request object, then you can get the billing data associated with that request using the reckon() function on your billing request.
The result of a reckoning on a billing request is an instance of avReckoning
We took a snapshot of usage in a project we work on, and it is available as demo_rec. This request represents one day of usage in AnVIL/Terra.
suppressPackageStartupMessages({
library(AnVILBilling)
library(dplyr)
library(magrittr)
library(BiocStyle)
})
demo_rec## AnVIL reckoning info for project  bjbilling 
##   starting 2020-01-28, ending 2020-01-29.
## There are  1599  records.
## Available keys:
##  [1] "goog-resource-type"               "goog-metric-domain"              
##  [3] "goog-dataproc-cluster-name"       "goog-dataproc-cluster-uuid"      
##  [5] "goog-dataproc-location"           "cromwell-workflow-id"            
##  [7] "goog-pipelines-worker"            "terra-submission-id"             
##  [9] "wdl-task-name"                    "security"                        
## [11] "ad-anvil_devs"                    "ad-auth_anvil_anvil_gtex_v8_hg38"
## --- 
## Use ab_reckoning() for full tableThe available keys for the billing object are shown.
For Terra, 3 of the most useful keys are:
The code, to be used while the cluster is in use, would look like this:
library(AnVIL)
leo = Leonardo()
leo$getRuntime(clustername)Given a key type, we want to know associated values.
## [1] "terra-196d3163-4eef-46e8-a7e6-e71c0012003d"To understand activities associated with this submission, we subset the table.
## # A tibble: 955 × 17
##    billing_a…¹ service      sku          usage_start_time    usage_end_time     
##    <chr>       <list>       <list>       <dttm>              <dttm>             
##  1 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  2 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  3 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  4 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  5 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  6 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  7 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  8 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
##  9 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
## 10 015E39-385… <named list> <named list> 2020-01-28 20:00:00 2020-01-28 21:00:00
## # … with 945 more rows, 12 more variables: project <list>, labels <list>,
## #   system_labels <list>, location <list>, export_time <dttm>, cost <dbl>,
## #   currency <chr>, currency_conversion_rate <dbl>, usage <list>,
## #   credits <list>, invoice <list>, cost_type <chr>, and abbreviated variable
## #   name ¹billing_account_idThe following data is available in this object
##  [1] "billing_account_id"       "service"                 
##  [3] "sku"                      "usage_start_time"        
##  [5] "usage_end_time"           "project"                 
##  [7] "labels"                   "system_labels"           
##  [9] "location"                 "export_time"             
## [11] "cost"                     "currency"                
## [13] "currency_conversion_rate" "usage"                   
## [15] "credits"                  "invoice"                 
## [17] "cost_type"You can drill down more to see what products used during the submission:
##  [1] "Storage PD Capacity"                                                   
##  [2] "SSD backed PD Capacity"                                                
##  [3] "Network Inter Zone Ingress"                                            
##  [4] "Network Intra Zone Ingress"                                            
##  [5] "External IP Charge on a Standard VM"                                   
##  [6] "Custom Instance Ram running in Americas"                               
##  [7] "Custom Instance Core running in Americas"                              
##  [8] "Licensing Fee for Shielded COS (CPU cost)"                             
##  [9] "Licensing Fee for Shielded COS (RAM cost)"                             
## [10] "Network Internet Ingress from APAC to Americas"                        
## [11] "Network Internet Ingress from EMEA to Americas"                        
## [12] "Network Google Egress from Americas to Americas"                       
## [13] "Network Internet Ingress from China to Americas"                       
## [14] "Network Google Ingress from Americas to Americas"                      
## [15] "Network Internet Egress from Americas to Americas"                     
## [16] "Network Internet Ingress from Americas to Americas"                    
## [17] "Network Internet Ingress from Australia to Americas"                   
## [18] "Network HTTP Load Balancing Ingress from Load Balancer"                
## [19] "Network Inter Region Ingress from Americas to Americas"                
## [20] "Network Egress via Carrier Peering Network - Americas Based"           
## [21] "Network Ingress via Carrier Peering Network - Americas Based"          
## [22] "Licensing Fee for Container-Optimized OS from Google (CPU cost)"       
## [23] "Licensing Fee for Container-Optimized OS from Google (RAM cost)"       
## [24] "Licensing Fee for Container-Optimized OS - PCID Whitelisted (CPU cost)"
## [25] "Licensing Fee for Container-Optimized OS - PCID Whitelisted (RAM cost)"You can also get the cost for a workflow using:
data(demo_rec) # makes rec
v = getValues(demo_rec@reckoning, "terra-submission-id")[1] # for instance
getSubmissionCost(demo_rec@reckoning,v)## [1] 0.054044And the ram usage as well:
data(demo_rec) # makes rec
v = getValues(demo_rec@reckoning, "terra-submission-id")[1] # for instance
getSubmissionRam(demo_rec@reckoning,v)##                                 submissionID      workflow
## 1 terra-196d3163-4eef-46e8-a7e6-e71c0012003d runterratrial
## 2 terra-196d3163-4eef-46e8-a7e6-e71c0012003d runterratrial
## 3 terra-196d3163-4eef-46e8-a7e6-e71c0012003d runterratrial
## 4 terra-196d3163-4eef-46e8-a7e6-e71c0012003d runterratrial
## 5 terra-196d3163-4eef-46e8-a7e6-e71c0012003d runterratrial
##                                      cromwellID
## 1 cromwell-4dde8ce1-a8e5-47ba-a261-120ae8c7556c
## 2 cromwell-8edb23d6-a7d2-4b1f-96b4-496e96c1d707
## 3 cromwell-b1dcdbe1-b4ec-4428-8570-e2ab883087d0
## 4 cromwell-664b3519-7c7a-42ba-bb00-562ea1f650fd
## 5 cromwell-176a9f09-483c-4eb1-abf0-e5de2fdf36a7
##                                       sku       amount         unit
## 1 Custom Instance Ram running in Americas 1.676111e+12 byte-seconds
## 2 Custom Instance Ram running in Americas 1.737314e+12 byte-seconds
## 3 Custom Instance Ram running in Americas 1.607392e+12 byte-seconds
## 4 Custom Instance Ram running in Americas 1.573032e+12 byte-seconds
## 5 Custom Instance Ram running in Americas 1.582695e+12 byte-seconds
##     pricingUnit amountInPricingUnit
## 1 gibibyte hour           0.4336111
## 2 gibibyte hour           0.4494444
## 3 gibibyte hour           0.4158333
## 4 gibibyte hour           0.4069444
## 5 gibibyte hour           0.4094444To simplify some of the aspects of reporting on costs, we have introduced
browse_reck, which will authenticate the user to Google BigQuery, and
use user-specified inputs to identify an interval of days between which
usage data are sought. This function can be called with no arguments,
or you can supply the email address for the Google identity to be used
in working with Google Cloud Platform projects and BigQuery.
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_2.0.3     dplyr_1.0.10       AnVILBilling_1.8.0 BiocStyle_2.26.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9          lubridate_1.8.0     tidyr_1.2.1        
##  [4] assertthat_0.2.1    digest_0.6.30       utf8_1.2.2         
##  [7] mime_0.12           R6_2.5.1            evaluate_0.17      
## [10] httr_1.4.4          ggplot2_3.3.6       pillar_1.8.1       
## [13] rlang_1.0.6         lazyeval_0.2.2      data.table_1.14.4  
## [16] jquerylib_0.1.4     DT_0.26             rmarkdown_2.17     
## [19] stringr_1.4.1       htmlwidgets_1.5.4   bit_4.0.4          
## [22] munsell_0.5.0       shiny_1.7.3         compiler_4.2.1     
## [25] httpuv_1.6.6        xfun_0.34           pkgconfig_2.0.3    
## [28] htmltools_0.5.3     tidyselect_1.2.0    tibble_3.1.8       
## [31] bookdown_0.29       fansi_1.0.3         viridisLite_0.4.1  
## [34] dbplyr_2.2.1        later_1.3.0         grid_4.2.1         
## [37] jsonlite_1.8.3      xtable_1.8-4        bigrquery_1.4.1    
## [40] gtable_0.3.1        lifecycle_1.0.3     DBI_1.1.3          
## [43] scales_1.2.1        cli_3.4.1           stringi_1.7.8      
## [46] cachem_1.0.6        fs_1.5.2            promises_1.2.0.1   
## [49] bslib_0.4.0         ellipsis_0.3.2      generics_0.1.3     
## [52] vctrs_0.5.0         tools_4.2.1         bit64_4.0.5        
## [55] glue_1.6.2          shinytoastr_2.1.1   purrr_0.3.5        
## [58] fastmap_1.1.0       yaml_2.3.6          colorspace_2.0-3   
## [61] gargle_1.2.1        BiocManager_1.30.19 plotly_4.10.0      
## [64] knitr_1.40          sass_0.4.2