--- title: "Intro to healthiar" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{intro_to_healthiar} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE) ``` ------------------------------------------------------------------------ ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) options(knitr.kable.max_rows = 10) set.seed(1) ## Avoid using pacman here, as it causes error in installation if it's not installed already library(healthiar) library(tibble) library(dplyr) library(purrr) library(tidyr) library(knitr) ``` Hi there! This vignette will tell you about `healthiar` and show you how to use `healthiar` with the help of examples. *NOTE*: By using healthiar you agree to the [terms of use](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme) and confirm you have read the [disclaimer](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme). *NOTE*: the development of `healthiar` is still ongoing. Any feedback, e.g. regarding bugs or unclear documentation, is welcome and highly appreciated. Please provide feedback via a [GitHub issue](https://github.com/SwissTPH/healthiar/issues). ------------------------------------------------------------------------ # About `healthiar` The `healthiar` functions allow you to quantify and monetize the health impacts of environmental stressors (air pollution & noise) `healthiar` core *family members* (= functions) - `attribute_health()` quantifies health impact attributable to environmental exposure (via relative or absolute risk) - `summarize_uncertainty()` performs a Monte Carlo simulation - `compare()` compares health impacts of two scenarios (via PAF or PIF approaches) - `monetize()` monetizes health impacts - `attribute_lifetable()` quantifies health impacts by applying the life table method (RR & AR) - `attribute_mod()` modifies an existing `attribute_health()` assessment - `cba()` performs a cost-benefit analysis - `get_daly()` adds up YLL & YLD to obtain DALY - `multiexpose()` considers exposure to 2 exposures (e.g. two different air pollutants) at the same time to quantify the attributable health impacts - `socialize()` exposes and quantifies social inequalities in health impacts - `prepare_mdi()` creates the BEST-COST MDI (Multidimensional Deprivation Index) ## Package overview ![Figure: `healthiar` overview](images/package_overview.png){width="700"} ------------------------------------------------------------------------ # Abbreviations BHD/bhd - baseline health data CI - confidence interval CBA/cba - cost-benefit analysis exp - exposure RR/rr - relative risk YLL/yll - years of life lost # `healthiar` Examples *Note*: most `healthiar` functions have examples that you can execute (with `healthiar` loaded) by running `example("function_name")`, e.g. `example("attribute_health")` *Note*: healthiar functions are easiest to use if your data is prepared in a tidy format: - Each variable is a column; each column is a variable. - Each observation is a row; each row is an observation. - Each value is a cell; each cell is a single value. Source: Hadley Wickham (2014). Tidy Data. [Link](https://doi.org/10.18637/jss.v059.i10). For additional information see this [informal explanation of tidy data](https://cran.r-project.org/package=tidyr) (by the author). ## Example: `attribute_health()` with relative risk Goal: attribute COPD cases to PM2.5 air pollution exposure ### Refresher - Burden of disease with relative risk ![](images/bod_rr.png){width="700"} ### Function call - Hard coded ```{r} results_pm_copd <- attribute_health( erf_shape = "log_linear", # shape of the exposure-response function (ERF) rr_central = 1.369, # relative risk (RR) central estimate rr_increment = 10, # increment for which relative risk is valid (in \\mu g / m^3) exp_central = 8.85, # PM2.5 exposure (in \\mu g / m^3) (here: population-weighted) cutoff_central = 5, # cutoff (in \\mu g / m^3) below which no health effects occur bhd_central = 30747 # baseline health data (BHD; here: COPD incidence) ) ``` For alternative ERF shapes see the function documentation of `attribute_health()`. ### Function call - Pre-loaded data `healthiar` comes with some example data that start with `exdat_` that allow you to test functions. Some of these example data will be used in some examples in this vignette. Now we call `attribute_health` with input data from the `healthiar` example data. Note that we can easily provide input data to the function argument using the `$` operator. ```{r, eval=TRUE, include=TRUE, echo=TRUE} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = exdat_pm$relative_risk, rr_increment = 10, exp_central = exdat_pm$mean_concentration, cutoff_central = exdat_pm$cut_off_value, bhd_central = exdat_pm$incidence ) ``` ### Output structure Every `attribute_health()` output consists of two lists ("folders") - `health_main` contains the main results - `health_detailed` detailed results and additional info about the assessment - `results_raw` tibble that contains the detailed results - `input_args` contains a complete list of all arguments used in the assessment (including some internal arguments) - `input_table` tibble that contains all input data entered by the user or derived for the calculations - `results_by_...` - `geo_id_micro`: contains results by the inputted or default micro (e.g. high resolution) geo IDs - `sex`: contains results by sex (only present if function argument `sex` specified) - `age_group`: contains results by age groups (only present if function argument `age_group` sepcified) *NOTE*: `attribute_lifetable()` creates additional output that is specific to life table calculations ### Main results Let's inspect the main results There exist different, equivalent ways of accessing the output - With `$` operator: `results_pm_copd$health_main$impact_rounded` (as in the example above) - By mouse: go to the *Environment* tab in RStudio and click on the variable you want to inspect, and then open the `health_main` results table - With `[[]]` operator `results_pm_copd[["health_main"]]` - With `pluck()` & `pull()`: use the `purrr::pluck` function to select a list and then the `dplyr::pull` function extract values from a specified column, e.g. `results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")` ```{r, include=TRUE, eval=TRUE, echo=TRUE} results_pm_copd$health_main ``` It is a table of the format `tibble` (very similar to a `data.frame`) of 3 rows and 23 columns. Let's zoom in on some relevant aspects ```{r} results_pm_copd$health_main |> dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> knitr::kable() # For formatting reasons only: prints tibble in nice layout ``` Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (`bhd_central`) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (`rr_central`) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases. Some of the most results columns include: - *impact_rounded* rounded attributable health impact/burden - *impact* raw impact/burden - *pop_fraction* population attributable fraction (PAF) or population impact fraction (PIF) *NOTE*: the main output contains more columns that provide additional information about the assessment, such as intermediate results and assessment specifications, ... ## Example: `attribute_health()` with relative risk & uncertainty Goal: attribute COPD cases to PM2.5 exposure Now we will make a similar function call, but include uncertainty in several input arguments ### Function call ```{r} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR rr_upper = 1.664, # upper 95% CI bound of RR rr_increment = 10, exp_central = 8.85, exp_lower = 8, # lower 95% CI bound of exposure exp_upper = 10, # upper 95% CI bound of exposure cutoff_central = 5, bhd_central = 30747, bhd_lower = 28000, # lower 95% confidence interval estimate of BHD bhd_upper = 32000 # upper 95% confidence interval estimate of BHD ) ``` ### Detailed results Let's inspect the detailed results: ```{r eval=FALSE, echo=TRUE, include=FALSE} results_pm_copd$health_detailed$results_raw ``` ```{r echo=FALSE} results_pm_copd$health_detailed$results_raw |> dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> dplyr::slice(1:9) |> knitr::kable() # Prints tibble in a minimal layout ``` Each row contains the estimated attributable cases (`impact_rounded`) obtained by the input data specified in the columns ending in "\_ci" and the other calculation pathway specifications in that row (not shown). - The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data - The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data - ... *NOTE*: only `r length(1:9)` of the `r length(results_pm_copd$health_detailed$results_raw$impact)` possible combinations are displayed due to space constraints *NOTE*: only a selection of columns are shown ## Example: `attribute_health()` with absolute risk Goal: attribute cases of high annoyance to (road traffic) noise exposure ### Refresher - Burden of disease with absolute risk ![](images/bod_ar.png){width="700"} ### Function call ```{r} results_noise_ha <- attribute_health( approach_risk = "absolute_risk", # default is "relative_risk" exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function ) ``` The `erf_eq_central` argument can digest other types of functions (see section [Example: attribute_health() with user-defined ERF]) ### Main results ```{r echo=FALSE} results_noise_ha |> purrr::pluck("health_main") |> dplyr::select(erf_eq, erf_ci, impact_rounded) |> knitr::kable() # Prints tibble in a minimal layout ``` ### Results per noise exposure band ```{r eval=FALSE, include=TRUE, echo=TRUE} results_noise_ha$health_detailed$results_raw ``` ```{r echo=FALSE, include=TRUE, eval=TRUE} results_noise_ha[["health_detailed"]][["results_raw"]] |> select(exp_category, exp, pop_exp, impact) |> knitr::kable() ``` Alternatively, it's also possible to only assess the impacts for a single noise exposure band ```{r} results_noise_ha <- attribute_health( approach_risk = "absolute_risk", exp_central = 57.5, pop_exp = 387500, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" ) ``` ```{r eval=FALSE, include=FALSE, echo=TRUE} results_noise_ha$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_noise_ha[["health_main"]] |> dplyr::select(exp_category, impact) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_noise_ha) ``` ## Example: `attribute_health()` for multiple geographic units with relative risk Goal: attribute disease cases to PM2.5 exposure in multiple geographic units - Enter unique ID's as a vector (`numeric` or `character`) to the `geo_id_micro` argument (e.g. municipality names or province abbrevations) - Optional: aggregate unit-specific results by providing higher-level ID's (e.g. region names or country abbreviations) as a vector (`numeric` or `character`) to the `geo_id_macro` argument Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors). ### Function call ```{r} results_iteration <- attribute_health( # Names of Swiss cantons geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"), # Names of languages spoken in the selected Swiss cantons geo_id_macro = c("German","German","French","Italian","French"), rr_central = 1.369, rr_increment = 10, cutoff_central = 5, erf_shape = "log_linear", exp_central = c(11, 11, 10, 8, 7), bhd_central = c(4000, 2500, 3000, 1500, 500) ) ``` In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region (`"German", "French", "Italian"`) ### Main results The main output contains aggregated results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` In this case `health_main` contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is `r sum(results_iteration[["health_main"]]$impact_rounded)` (using a relative risk of 1.369). ### Detailed results The geo unit-specific information and results are stored under `health_detailed`\>`results_raw` ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration$health_detailed$results_raw ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration[["health_detailed"]][["results_raw"]] |> dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> knitr::kable() ``` `health_detailed` also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above) ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_iteration) ``` ## Example: `attribute_health()` for multiple geographic units with absolute risk (See [Example: attribute_health() for multiple geographic units with relative risk] for a more detailed explanation of multiple geo unit assessments) Goal: attribute high annoyance cases to noise exposure in rural and urban areas ### Function call ```{r} data <- exdat_noise |> ## Filter for urban and rural regions dplyr::filter(region == "urban" | region == "rural") ``` ```{r} results_iteration_ar <- attribute_health( # Both the rural and urban areas belong to the higher-level "total" region geo_id_macro = "total", geo_id_micro = data$region, approach_risk = "absolute_risk", exp_central = data$exposure_mean, pop_exp = data$exposed, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" ) ``` *NOTE*: the length of the input vectors fed to `geo_id_micro`, `exp_central`, `pop_exp` must match and must be (number of geo units) x (number of exposure categories) = 2 x 5 = **10**, because we have 2 geo units (`"rural"` and `"urban"`) and 5 exposure categories. ### Main results `health_main` contains the aggregated results (i.e. sum of impacts in rural and urban areas) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration_ar$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration_ar[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> knitr::kable() ``` ### Detailed results Impact by geo unit, in this case impact in the rural and in the urban area ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration_ar$health_detailed$results_by_geo_id_micro ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> dplyr::select(geo_id_micro, geo_id_macro, impact) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_iteration_ar, data) ``` ## Example: `attribute_health()` and sub-group analysis (by age group) Goal: determine mean attributable health impacts by age group *NOTE*: to obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group. If the `age` argument was specified, age-group-specific results are available under `health_detailed` in the sub-folder `results_by_age_group`. ```{r} results_age_group <- attribute_health( approach_risk = "relative_risk", age = c("below_65", "65_plus"), exp_central = c(8, 7), cutoff_central = c(5, 5), bhd_central = c(1000, 5000), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" ) results_age_group$health_detailed$results_by_age_group |> dplyr::select(age_group, impact_rounded, exp, bhd) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_age_group) ``` ## Example: `attribute_health()` and sub-group analysis (by sex) Goal: determine mean attributable health impacts by sex *NOTE*: to obtain sex-specific results, the baseline health data (and possibly exposure) must be entered by sex. If the `sex` argument was specified, sex-specific results are available under `health_detailed` in the sub-folder `results_by_sex`. ```{r eval=TRUE, include=TRUE, echo=TRUE} results_sex <- attribute_health( approach_risk = "relative_risk", sex = c("female", "male"), exp_central = c(8, 8), cutoff_central = c(5, 5), bhd_central = c(1000, 1100), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" ) results_sex$health_detailed$results_by_sex |> dplyr::select(sex, impact_rounded, exp, bhd) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_sex) ``` ## Example: `attribute_health()` and sub-group analysis (by education level) Goal: determine mean attributable health impacts by education level *Note*: a (random) ID vector of unique values must be entered to the argument `geo_id_micro`, to indicate that each row is an observation on its own (and not just exposure categories) A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the `info` argument. In this exmaple, this will be information about the education level. In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups ```{r} info <- data.frame( education = rep(c("secondary", "bachelor", "master"), each = 5) # education level ) output_attribute <- attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, exp_central = sample(6:10, 15, replace = TRUE), bhd_central = sample(100:500, 15, replace = TRUE), geo_id_micro = c(1:nrow(info)), # (random) ID must be entered info = info ) output_stratified <- output_attribute$health_detailed$results_raw |> dplyr::group_by(info_column_1) |> dplyr::summarize(mean_impact = mean(impact)) |> print() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(info, output_attribute, output_stratified) ``` ## Example: `attribute_health()` with user-defined ERF Goal: attribute COPD cases to air pollution exposure by applying a user-defined exposure response function (e.g. MR-BRT curves from Global Burden of Disease study) *Note*: in this case, the function arguments `erf_eq_...` require a function as input, so we use an auxiliary function (`splinefun()`) to transform the points on the ERF into type `fucntion`. ```{r} results_pm_copd_mr_brt <- attribute_health( exp_central = 8.85, bhd_central = 30747, cutoff_central = 0, # Specify the function based on x-y point pairs that lie on the ERF erf_eq_central = splinefun( x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110), y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60), method = "natural") ) ``` The ERF curve created looks as follows ```{r eval=TRUE, include=TRUE, echo=FALSE} x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110) y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60) spline_func <- stats::splinefun( x = x, y = y, method = "natural") ## Generate finer x-values x_dense <- seq(min(x), max(x), length.out = 200) ## Evaluate the spline function at these points y_dense <- spline_func(x_dense) ## Plot plot(x, y, # main = "User-defined ERF", main = "", xlab = "Exposure", ylab = "Relative risk", col = "blue", pch = 19) lines(x_dense, y_dense, col = "red", lwd = 2) legend("topleft", legend = c("Original points", "Spline curve"), col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2)) ``` Alternatively, other functions (e.g. `approxfun()`) can be used to create the ERF ## Example: `compare()` Goal: comparison of two scenarios ### Function call 1. Use `attribute_health()` to calculate burden of scenarios A & B ```{r} scenario_A <- attribute_health( exp_central = 8.85, # EXPOSURE 1 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` ```{r} scenario_B <- attribute_health( exp_central = 6, # EXPOSURE 2 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` Alternatively, the function `attribute_mod()` can be used to modify an existing scenario, e.g. `scenario_A` ```{r} scenario_B <- attribute_mod( output_attribute = scenario_A, exp_central = 6 ) ``` 2. Use `compare()` to compare scenarios A & B ```{r} results_comparison <- compare( approach_comparison = "delta", # or "pif" (population impact fraction) output_attribute_scen_1 = scenario_A, output_attribute_scen_2 = scenario_B ) ``` The default value for the argument `approach_comparison` is `"delta"`. The alterntive is `"pif"` (population impact fraction). See the function documentation of `compare()` for more details. ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_comparison$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_comparison[["health_main"]] |> dplyr::select( impact, impact_rounded, impact_scen_1, impact_scen_2, bhd, dplyr::starts_with("exp_"), -dplyr::starts_with("exp_ci"), # remove col "exp_ci" dplyr::starts_with("rr_con")) |> dplyr::slice_head() |> knitr::kable() ``` ### Detailed results The `compare()` results contain two additional outputs in addition to those we have already seen - `health_detailed` - `scen_1` contains results of scenario 1 (scenario A in our case) - `scen_2` contains results of scenario 2 (scenario B in our case) ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_comparison, scenario_A, scenario_B) ``` ## Example: `summarize_uncertainty()` You can do a Monte Carlo uncertainty analysis via the `summarize_uncertainty` function. The outcome of the Monte Carlo analysis is added to the variable entered as the `results` argument, which is `results_pm_copd` in our case. Two lists ("folders") are added: - `uncertainty_main` contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment - `uncertainty_detailed` contains all `n_sim` simulations of the Monte Carlo assessment ### Function call ```{r} results_pm_copd_summarized <- summarize_uncertainty( output_attribute = results_pm_copd, n_sim = 100 ) ``` ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_summarized$uncertainty_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_summarized$uncertainty_main |> knitr::kable() ``` ### Detailed results The folder `uncertainty_detailed` contains all single simulations. Let's look at the impact of the first 10 simulations. ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_summarized$uncertainty_detailed$impact_by_sim ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> knitr::kable() ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_pm_copd_summarized) ``` ## Example: `monetize()` Goal: determine monetized attributable health impact of a policy that will have health benefits five years from now The outcome of the monetization is added to the variable entered to the `output_attribute` argument, which is `results_pm_copd` in our case. Two folders are added: - `monetization_main` contains the central monetization estimate and the corresponding 95% confidence intervals obtained through the specified monetization - `monetization_detailed` contains the monetized results for each unique combination of the input variable estimates that were provided to the initial `attribute_health()` call ### Function call ```{r} results_pm_copd <- monetize( output_attribute = results_pm_copd, discount_shape = "exponential", discount_rate = 0.03, n_years = 5, valuation = 50000 # E.g. EURO ) ``` ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd$monetization_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd$monetization_main |> select(erf_ci, monetized_impact) |> knitr::kable() ``` We see that the monetized impact (discounted) is more than 160 million EURO. Alternatively, you can also monetize (attributable) health impacts from a non-`healthiar` source. ```{r} results <- monetize( impact = 1151, valuation = 100 ) ``` ## Example: `cba()` Goal: determine the net-benefit of a health policy via cost-benefit analysis (CBA) using previous results Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the `cutoff_central` argument in the initial `attribute_health()` call. So we could avoid all COPD cases attributed to air pollution. Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy, ? We can find out using `healthiar`'s `cba()` function. The outcome of the CBA is contained in two folders, which are added to the existing assessment: - `cba_main` contains the central estimate and the corresponding 95% confidence intervals obtained - `cba_detailed` contains additional intermediate results for both cost and benefit - `benefit` contains results `by_year` and raw results `health_raw` - `cost` contains the costs of the policy at the end of the period specified in the `n_years_cost` argument ### Function call ```{r} cba <- cba( output_attribute = results_pm_copd, valuation = 50000, cost = 100000000, discount_shape = "exponential", discount_rate_benefit = 0.03, discount_rate_cost = 0.03, n_years_benefit = 5, n_years_cost = 5 ) ``` ### Main results ```{r} cba$cba_main |> dplyr::select(benefit, cost, net_benefit) |> knitr::kable() ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(cba) ``` We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost! ## Example: `attribute_lifetable()` for YLL Goal: determine years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year We can use `attribute_lifetable()` combined with life table input data to determine YLL attributable to an environmental stressor ### Function call ```{r} results_pm_yll <- attribute_lifetable( year_of_analysis = 2019, health_outcome = "yll", rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, # In the life table case, BHD refers to deaths bhd_central = exdat_lifetable$deaths ) ``` ### Main results Total YLL attributable to exposure (sum of sex-specific impacts) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_yll$health_main |> dplyr::slice_head() |> dplyr::select(sex, age_group, impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` Use the two arguments `approach_exposure` and `approach_newborns` to modify the life table calculation - `approach_exposure` - `"single_year"` (default) population is exposed for a single year and the impacts of that exposure are calculated - `"constant"` population is exposed every year - `approach_newborns` - `"without_newborns"` (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born - `"with_newborns"` assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population ### Detailed results Attributable YLL results - per year - per age (group) - per sex (if sex-specific life table data entered) are available *Note*: we will inspect the results for females; male results are also available ### Results per year *NOTE*: only a selection of years is shown ```{r} results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE) ) ``` ```{r} results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE)) |> knitr::kable() ``` ### Results by age #### YLL TODO ### Results by year and age *NOTE*: only a selection of years and ages is shown #### YLL ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, age_end, impact_2019) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` #### Population (baseline scenario) Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` #### Population (unexposed scenario) Impacted scenario refers to the scenario without exposure ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` ## Example: `attribute_lifetable()` for premature deaths Goal: determine premature deaths from COPD attributable to PM2.5 exposure during one year See example above [Example: attribute_lifetable() for YLL] for additional info on `attribute_lifetable()` calculations and its output ### Function call ```{r} results_pm_deaths <- attribute_lifetable( health_outcome = "deaths", year_of_analysis = 2019, rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, bhd_central = exdat_lifetable$deaths ) ``` ### Main results Total premature deaths attributable to exposure (sum of sex-specific impacts) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_deaths$health_main$impact ``` ```{r, eval=TRUE, include=TRUE, echo=FALSE} results_pm_deaths$health_main |> dplyr::slice_head() |> dplyr::select(sex, age_group, impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` ### Detailed results Attributable premature deaths results - per year (if argument `approach_exposure = "constant"`) - per age (group) - per sex (if sex-specific life table data entered) are available *Note*: we will inspect the results for females; male results are also available *Note*: because we set the function argument `approach_exposure = "constant"` in the function call results are available for one year (the year of analysis) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_deaths$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_pm_deaths) ``` ## Example: `attribute_health()` for YLD ### Function call ```{r} results_pm_copd_yld <- attribute_health( rr_central = 1.1, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, bhd_central = 1000, duration_central = 10, dw_central = 0.2 ) ``` ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_yld$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_yld$health_main |> dplyr::select(erf_ci, impact) |> knitr::kable() ``` ## Example: `daly()` ### Function call ```{r} results_daly <- daly( output_attribute_yll = results_pm_yll, output_attribute_yld = results_pm_copd_yld ) ``` ### Main results YLL, YLD & DALY ```{r eval=FALSE, include=FALSE, echo=TRUE} results_daly$health_main |> select(impact_yll_rounded, impact_yld_rounded, impact_rounded) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_daly$health_main |> dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_daly, results_pm_yll, results_pm_copd_yld) ``` ## Example: `multiexpose()` ### Function call ```{r} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_increment = 10, exp_central = 8.85, cutoff_central = 5, bhd_central = 30747 ) results_no2_copd <- attribute_mod( output_attribute = results_pm_copd, exp_central = 10.9, rr_central = 1.031 ) results_multiplicative <- multiexpose( output_attribute_exp_1 = results_pm_copd, output_attribute_exp_2 = results_no2_copd, exp_name_1 = "pm2.5", exp_name_2 = "no2", approach_multiexposure = "multiplicative" ) ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_no2_copd, results_pm_copd) ``` ### Main results ```{r eval=FALSE, include=TRUE, echo=TRUE} results_multiplicative$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_multiplicative$health_main |> dplyr::select(impact_rounded) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_multiplicative) ``` ## Example: `prepare_mdi` ### Function call ```{r eval=TRUE, include=TRUE, echo=TRUE} mdi <- prepare_mdi( geo_id_micro = exdat_prepare_mdi$id, edu = exdat_prepare_mdi$edu, unemployed = exdat_prepare_mdi$unemployed, single_parent = exdat_prepare_mdi$single_parent, pop_change = exdat_prepare_mdi$pop_change, no_heating = exdat_prepare_mdi$no_heating, n_quantile = 10, verbose = FALSE ) ``` *Note*: `verbose = FALSE` suppresses any output to the console (default: `verbose = TRUE`, i.e. with printing turned on). ### Main results Function output includes - `mdi_main`, a tibble containing the BEST-COST MDI ```{r eval=FALSE, include=TRUE, echo=TRUE} mdi$mdi_main |> select(geo_id_micro, MDI, MDI_index) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} mdi$mdi_main |> dplyr::select(geo_id_micro, MDI, MDI_index) |> knitr::kable() ``` ### Detailed results - `mdi_detailed` - DESCRIPTIVE STATISTICS - PEARSON'S CORRELATION COEFFICIENTS - CRONBACH'S α, including the reliability rating this value indicates - Code for boxplots of the single indicators - Code for histogram of the MDI's for the geo units with a normal distribution curve To reproduce the boxlots run ```{r eval=TRUE, include=TRUE, echo=TRUE} eval(mdi$mdi_detailed$boxplot) ``` Analogeously, to reproduce the histogram run ```{r eval=TRUE, include=TRUE, echo=TRUE} eval(mdi$mdi_detailed$histogram) ``` ```{r, eval=TRUE, include=FALSE, echo=FALSE} rm(mdi) ``` # Post-`healthiar` workflow ## Export results Export as `.csv` file ```{r eval=FALSE, include=FALSE, echo=TRUE} write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv") ``` Save as `.Rdata` file ```{r eval=FALSE, include=FALSE, echo=TRUE} save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata") ``` Export to Excel (as `.xlsx` file) ```{r eval=FALSE, include=FALSE, echo=TRUE} openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx") ``` ## Visualize results Visualization is out of scope of `healthiar`. You can visualize in - R, e.g. with the `ggplot2` package ([online book by the creator](https://ggplot2-book.org/)) - Excel (export results first) - Other tools