--- title: "Retinal Organoids counts data: Gene and Isoform Level Exploration" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Retinal Organoids counts data: Gene and Isoform Level Exploration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knitr-setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) ``` ## Required packages ```{r load-packages} library(HumanRetinaLRSData) library(SummarizedExperiment) library(ggplot2) library(ggrepel) ``` ## Overview `HumanRetinaLRSData` is a companion data package for our long-read sequencing study characterising isoform diversity, splicing dynamics, and allele-specific expression in developing human retinal organoids. In this study we used nanopore-based long-read RNA sequencing to profile human stem cell-derived retinal organoids across multiple developmental stages, as well as purified retinal ganglion cells. These datasets enable exploration of gene- and isoform-level expression patterns, detection of differential isoform usage, identification of neuron-specific splicing programs, and quantification of allelic imbalance. All raw and processed data generated in the study have been deposited in an Open Science Framework (OSF) [repository](https://osf.io/z2yvs/overview). The `HumanRetinaLRSData` package provides convenient access to these files directly from the OSF node using the `osfr` package. For ease of use, `HumanRetinaLRSData` is designed to: - query and retrieve study datasets from the OSF project, - cache downloaded archives via `BiocFileCache`, and - load gene- and isoform-level counts into R for downstream visualisation and analysis. ## Load data ```{r load-data} se_gene <- ROGeneLevelData() se_isoform <- ROIsoformLevelData() ``` ## Visualise gene-level expression We perform PCA on log2-transformed CPM values and colour samples by developmental stage. ```{r plot-gene-pca, fig.width=8, fig.height=6} stage_colors <- c( "stage 1" = "orange", "stage 2" = "seagreen", "stage 3" = "purple" ) expr_gene <- assay(se_gene, "cpm") pca_gene <- prcomp(t(log2(expr_gene + 1)), scale. = TRUE) var_gene <- round( pca_gene$sdev^2 / sum(pca_gene$sdev^2) * 100, 1 ) pca_df_gene <- data.frame( PC1 = pca_gene$x[, 1], PC2 = pca_gene$x[, 2], sample = colnames(expr_gene), stage = colData(se_gene)[["stage"]] ) ggplot(pca_df_gene, aes(x = PC1, y = PC2, color = stage)) + geom_point(size = 3) + geom_label_repel( aes(label = sample), size = 2.5, show.legend = FALSE ) + scale_color_manual(values = stage_colors, name = "Stage") + labs( title = "PCA of Gene-Level Expression", x = paste0("PC1 (", var_gene[1], "%)"), y = paste0("PC2 (", var_gene[2], "%)") ) + theme_bw(base_size = 12) ``` ## Visualise isoform-level expression ```{r plot-isoform-pca, fig.width=8, fig.height=6} expr_iso <- assay(se_isoform, "cpm") pca_iso <- prcomp(t(log2(expr_iso + 1)), scale. = TRUE) var_iso <- round( pca_iso$sdev^2 / sum(pca_iso$sdev^2) * 100, 1 ) pca_df_iso <- data.frame( PC1 = pca_iso$x[, 1], PC2 = pca_iso$x[, 2], sample = colnames(expr_iso), stage = colData(se_isoform)[["stage"]] ) ggplot(pca_df_iso, aes(x = PC1, y = PC2, color = stage)) + geom_point(size = 3) + geom_label_repel( aes(label = sample), size = 2.5, show.legend = FALSE ) + scale_color_manual(values = stage_colors, name = "Stage") + labs( title = "PCA of Isoform-Level Expression", x = paste0("PC1 (", var_iso[1], "%)"), y = paste0("PC2 (", var_iso[2], "%)") ) + theme_bw(base_size = 12) ``` ## Interpretation The PCA plots reveal: - **Separation by stage**: Samples cluster by developmental stage, indicating distinct expression profiles. - **Variance explained**: PC1 and PC2 capture the major sources of variation in the dataset. - **Gene vs. isoform**: While samples cluster similarly by developmental stage at both levels, the proportion of variance explained by each PC differs. This reflects that isoform-level analysis may capture additional complexity from alternative splicing and transcript variation beyond what is seen at the gene level. ## Session information ```{r session-info} sessionInfo() ```