---
title: "Using OdysseusPathwayModule"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using OdysseusPathwayModule}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

OdysseusPathwayModule provides cohort pathway analysis for pre-instantiated
OMOP cohorts. The package focuses on one core workflow:

1. create or identify a target cohort and one or more event cohorts,
2. run pathway analysis with `executeCohortPathways()`, and
3. inspect pathway sequences, counts, and event-code mappings.

The package supports two analysis modes:

1. **Post-index** (`analysisType = "post-index"`, default): events occurring 
   *after* the target cohort index date.
2. **Pre-index** (`analysisType = "pre-index"`): events occurring *before* the 
   target cohort index date in a configurable lookback window.

Target and event cohorts can reside in the same cohort table or in separate
tables and schemas.

## Setup

```{r setup, eval=FALSE}
library(OdysseusPathwayModule)
library(Eunomia)

connectionDetails <- Eunomia::getEunomiaConnectionDetails()
```

## Create Example Cohorts with Eunomia

```{r cohorts, eval=FALSE}
generationSet <- Eunomia::createCohorts(connectionDetails)

generationSet
```

In the standard Eunomia example database, `createCohorts()` materializes four
cohorts in `main.cohort`:

- `1`: Celecoxib
- `2`: Diclofenac
- `3`: GiBleed
- `4`: NSAIDs

For the examples below, use `NSAIDs` (`cohortId = 4`) as the **target cohort**
and `Celecoxib`, `Diclofenac`, and `GiBleed` (`cohortId = 1:3`) as **event cohorts**.

## Run Post-Index Pathway Analysis

This is the default mode. It asks: what events happen after entry into the target cohort?

```{r post_index, eval=FALSE}
postIndexResults <- executeCohortPathways(
  connectionDetails = connectionDetails,
  cohortDatabaseSchema = "main",
  cohortTableName = "cohort",
  targetCohortIds = 4,
  eventCohortIds = c(1, 2, 3),
  maxDepth = 3,
  collapseWindow = 30
)
```

The result is a named list of analysis outputs:

```{r post_index_names, eval=FALSE}
names(postIndexResults)
```

The two most useful tables to inspect first are pathway-level counts and the
event-code mapping used to decode combinations:

```{r post_index_inspect, eval=FALSE}
head(postIndexResults$pathwaysAnalysisPathsData)
head(postIndexResults$pathwayAnalysisCodesLong)
```

## Run Pre-Index Pathway Analysis

Pre-index mode asks: what events occurred before the target cohort entry date,
within a configurable lookback window?

```{r pre_index, eval=FALSE}
preIndexResults <- executeCohortPathways(
  connectionDetails = connectionDetails,
  cohortDatabaseSchema = "main",
  cohortTableName = "cohort",
  targetCohortIds = 4,
  eventCohortIds = c(1, 2, 3),
  analysisType = "pre-index",
  lookbackStartDay = -365,
  lookbackEndDay = -1,
  maxDepth = 3,
  collapseWindow = 30
)
```

You can narrow the lookback window without changing any other arguments:

```{r pre_index_90, eval=FALSE}
preIndex90 <- executeCohortPathways(
  connectionDetails = connectionDetails,
  cohortDatabaseSchema = "main",
  cohortTableName = "cohort",
  targetCohortIds = 4,
  eventCohortIds = c(1, 2, 3),
  analysisType = "pre-index",
  lookbackStartDay = -90,
  lookbackEndDay = -1,
  maxDepth = 3
)
```

## Understanding the Returned Objects

`executeCohortPathways()` returns several tables, each serving a different purpose:

- `pathwayAnalysisStatsData`: summary-level analysis metadata and counts.
- `pathwaysAnalysisPathsData`: pathway sequences with `step1`, `step2`, ... and person counts.
- `pathwaysAnalysisEventsData`: event-level counts.
- `pathwaycomboIds`: unique event-combination codes observed in the pathways.
- `pathwayAnalysisCodesLong`: long-form decoding of combination codes into event cohorts.
- `isCombo`: identifies whether a code represents a single event or a multi-event combination.
- `pathwayAnalysisCodesData`: compact code lookup table.

For example, to inspect only the decoded event combinations:

```{r code_mapping, eval=FALSE}
subset(
  postIndexResults$pathwayAnalysisCodesLong,
  select = c(pathwayAnalysisGenerationId, code, targetCohortId, eventCohortId, isCombo, numberOfEvents)
)
```

## Using Separate Target and Event Cohort Tables

The core function also supports separate target and event tables. In the Eunomia
SQLite example, you can create those tables directly from `main.cohort`:

```{r separate_tables, eval=FALSE}
connection <- DatabaseConnector::connect(connectionDetails)

DatabaseConnector::executeSql(connection, "DROP TABLE IF EXISTS target_cohorts;")
DatabaseConnector::executeSql(connection, "DROP TABLE IF EXISTS event_cohorts;")

DatabaseConnector::executeSql(
  connection,
  "CREATE TABLE target_cohorts AS
     SELECT *
     FROM main.cohort
     WHERE cohort_definition_id = 4;"
)

DatabaseConnector::executeSql(
  connection,
  "CREATE TABLE event_cohorts AS
     SELECT *
     FROM main.cohort
     WHERE cohort_definition_id IN (1, 2, 3);"
)

resultsSeparateTables <- executeCohortPathways(
  connectionDetails = connectionDetails,
  cohortDatabaseSchema = "main",
  cohortTableName = "target_cohorts",
  outcomeDatabaseSchema = "main",
  outcomeTableName = "event_cohorts",
  targetCohortIds = 4,
  eventCohortIds = c(1, 2, 3)
)

DatabaseConnector::disconnect(connection)
```

This is useful when target cohorts and event cohorts are managed by different
ETL or cohort-generation steps.

## Building an Event Sequence Graph

The raw pathway output uses bitmask-encoded combo IDs. Use
`buildEventSequenceGraph()` to decode these into a directed **igraph** graph
with human-readable event names, transition edges, and probabilities.

**Note:** The simplified Eunomia example database produces only single-step
pathways (each patient has exactly one event after the target index date).
`buildEventSequenceGraph()` requires at least two steps to construct transition
edges.  On real-world OMOP data with richer treatment histories, the pathway
output from `executeCohortPathways()` will typically contain multiple steps and
can be passed directly to `buildEventSequenceGraph()`.

The example below constructs a small mock pathway result set that mirrors the
structure returned by `executeCohortPathways()`, so you can see the full
graph-building workflow in action:

```{r esg_mock_data, eval=FALSE}
# --- Mock cpResults with multi-step pathways ---
# Bitmask combo codes: 2 = Celecoxib, 4 = Diclofenac, 8 = GiBleed
mockPathsData <- data.frame(
  pathwayAnalysisGenerationId = rep(1L, 5),
  targetCohortId              = rep(4L, 5),
  step1      = c( 2L,  2L,  4L,  4L,  2L),
  step2      = c( 4L,  8L,  2L,  8L, NA),
  step3      = c( 8L, NA,   8L, NA,  NA),
  countValue = c(120L, 80L, 95L, 65L, 40L)
)

mockCodesLong <- data.frame(
  pathwayAnalysisGenerationId = rep(1L, 3),
  code           = c(2L, 4L, 8L),
  targetCohortId = rep(4L, 3),
  eventCohortId  = c(1L, 2L, 3L),
  isCombo        = rep(0L, 3),
  numberOfEvents = rep(1L, 3)
)

mockIsCombo <- data.frame(
  targetCohortId = rep(4L, 3),
  comboId        = c(2L, 4L, 8L),
  numberOfEvents = rep(1L, 3),
  isCombo        = rep(0L, 3)
)

mockCpResults <- list(
  pathwayAnalysisStatsData   = data.frame(
    pathwayAnalysisGenerationId = 1L,
    targetCohortId = 4L,
    countValue = 400L
  ),
  pathwaysAnalysisPathsData  = mockPathsData,
  pathwaysAnalysisEventsData = data.frame(eventCohortId = 1:3, countValue = c(240L, 215L, 360L)),
  pathwaycomboIds            = data.frame(comboIds = c(2L, 4L, 8L)),
  pathwayAnalysisCodesLong   = mockCodesLong,
  isCombo                    = mockIsCombo,
  pathwayAnalysisCodesData   = data.frame(
    pathwayAnalysisGenerationId = rep(1L, 3),
    code    = c(2L, 4L, 8L),
    isCombo = rep(0L, 3)
  )
)
```

Now build the graph using a generation set that maps cohort IDs to names:

```{r event_sequence_graph, eval=FALSE}
# Map cohort IDs to human-readable names
generationSet <- data.frame(
  cohortId   = c(1L, 2L, 3L, 4L),
  cohortName = c("Celecoxib", "Diclofenac", "GiBleed", "NSAIDs")
)

esg <- buildEventSequenceGraph(
  cpResults     = mockCpResults,
  generationSet = generationSet,
  maxSteps      = 3,
  minCount      = 1
)

# Print a summary
esg
```

When working with real `executeCohortPathways()` output, use the generation set
from `Eunomia::createCohorts()` (renaming `name` to `cohortName`):

```{r esg_real_note, eval=FALSE}
# With real data:
# generationSet <- Eunomia::createCohorts(connectionDetails)
# generationSet$cohortName <- generationSet$name
#
# esg <- buildEventSequenceGraph(
#   cpResults     = postIndexResults,
#   generationSet = generationSet,
#   maxSteps      = 3,
#   minCount      = 5
# )
```

The returned object is a list of class `"event_sequence_graph"` with four
components:

```{r graph_components, eval=FALSE}
# The igraph object — vertices are (event, step) pairs, edges are transitions
ig <- esg$graph

# Vertex attributes
igraph::V(ig)$eventName   # human-readable event names
igraph::V(ig)$step        # pathway step number
igraph::V(ig)$count       # patient count at this node
igraph::V(ig)$share       # share within the step (sums to 1)

# Edge attributes
igraph::E(ig)$weight      # patient count crossing this transition
igraph::E(ig)$probability # transition probability (sums to 1 per source)
igraph::E(ig)$sourceStep
igraph::E(ig)$targetStep

# Decoded pathways
head(esg$sequences)

# Summary statistics
esg$summary
```

### Quick visualization

`plot()` is defined on the returned object and produces a layered graph using
igraph's Sugiyama layout. Nodes are sized by patient count and colored by event
identity (same event = same color across steps). Edge widths are proportional
to transition weights.

```{r plot_esg, eval=FALSE}
# Default plot
plot(esg)

# Customized plot
plot(esg,
  colorPalette    = c("#1b9e77", "#d95f02", "#7570b3"),
  edgeWidthRange  = c(1, 10),
  vertexSizeRange = c(10, 30),
  main            = "Post-Index Treatment Pathways"
)
```

### Transition probabilities

Each edge carries a `probability` attribute — the fraction of patients at a
source event (within a step) who transition to each target:

```{r transition_probs, eval=FALSE}
# Extract edge data frame
edgeDf <- igraph::as_data_frame(esg$graph, what = "edges")

# View transitions from Step 1 to Step 2
edgeDf[edgeDf$sourceStep == 1, ]
```

### Downstream igraph analysis

Because the result is a standard igraph object, the full igraph API is
available for network analysis:

```{r igraph_analysis, eval=FALSE}
ig <- esg$graph

# Out-degree: how many distinct next-step events each node leads to
igraph::degree(ig, mode = "out")

# Weighted betweenness (inverse weight = lower traffic → higher betweenness)
igraph::betweenness(ig, weights = 1 / igraph::E(ig)$weight)

# Shortest weighted paths between all pairs
igraph::distances(ig, weights = 1 / igraph::E(ig)$weight)

# Identify hubs and authorities (HITS)
igraph::hub_score(ig, weights = igraph::E(ig)$weight)$vector
igraph::authority_score(ig, weights = igraph::E(ig)$weight)$vector

# Export to data frames for use outside igraph
vertDf <- igraph::as_data_frame(ig, what = "vertices")
edgeDf <- igraph::as_data_frame(ig, what = "edges")
```