Building a minimal genome browser with h5vc and shiny ======================================================== In this vignette we will have a look at another powerful use-case for `h5vc` which is interactive data exploration. This can be a great tool to communicate your results to cour colaborators. If you want to see the end result first, head over to the rstudio shiny server hosting service, where we have put an example shiny application demonstrating this. * [The minimal genome browser](http://spark.rstudio.com/pyl/YeastGenomeBrowser) * [The minimal variant browser](http://spark.rstudio.com/pyl/YeastVariantBrowser) Prerequisites ------------- For this vignette you will have to know about: * creating tallies (see the respective vignette) * writing simple shiny applications (have a look at their [documentation](http://rstudio.github.io/shiny/tutorial/#hello-shiny)) The Data -------- We will use the yeast tally file that we created in the `h5vc.creating.tallies` vignette as our single data input. The Shiny App ------------- The shiny app that we write is the most basic genome broser imaginable, it will have controls for the chromosome, genomic position and the windowsize (i.e. zoom level) of the current view and it will generate a `mismatchPlot` from the tally file and the inputs that shows us the respective region of the genome in all samples present in the tally file. The shiny app consists of two files: `ui.R` defining the user interface and `server.R` defining the calculations / plots that the UI will be populated with. ### ui.R In the user interface file we define a `sliderInput` for the windowsize and two other inputs `chromSelect` and `genomicPositionSelect` that will be generated by the server. We do this since the available chromosomes depend on the tally file and the available positions depend on the selected chromosome. ```{r, eval=FALSE} library(shiny) shinyUI(pageWithSidebar( # Application title headerPanel("Simple Genome Browser"), # Sidebar with a slider inputs and selection boxes sidebarPanel( sliderInput("windowsize", "Windowsize:", min = 10, max = 200, value = 50, step = 5), uiOutput("chromSelect"), uiOutput("genomicPositionSelect") ), # Show a plot of the region mainPanel( plotOutput("mismatchPlot", height=800) ) )) ``` ### server.R In `server.R` the logic of our application is stored, basically we load the required packages, have a look into the tally file to check for chromosomes and chromosome lengths and then generate the controls for the user to select chromosome and position. The main calculations happen in `output$mismatchPlot`, were we take the chromosome and position and windowsize selected by the user and get the respective data from the tallyFile. That data is then handed to the `mismatchPlot` function and creates the plot we send back to the user (i.e. the web browser). ```{r, eval=FALSE} library(shiny) library(h5vc) library(rhdf5) tallyFile <- "yeast.hfs5" study <- "/yeast" h5ls( tallyFile ) chromosomes <- h5ls( tallyFile ) chromlengths <- as.numeric(subset( chromosomes, otype == "H5I_DATASET" & name == "Reference" )$dim) chromosomes <- subset( chromosomes, otype == "H5I_GROUP" & name != "yeast" )$name names(chromlengths) = chromosomes # Define server logic required to generate and plot a random distribution shinyServer(function(input, output) { output$chromSelect <- renderUI({ selectInput( "chrom", "Chromosome", choices = c("I","II","III","IV","V","VI","VII","VIII","IX","X","XI","XII","XIII","XIV","XV","XVI", "Mito")) }) output$genomicPositionSelect <- renderUI({ sliderInput( "gpos", "Genomic Position:", min = 10, max = chromlengths[input$chrom] - 10, value = 200 ) }) group <- reactive({ paste( study, input$chrom, sep="/" ) }) sampleData <- reactive({ sd = getSampleData( tallyFile, group() ); sd$Sample = c("YB210","s288c"); sd }) pos <- reactive({ min( max( input$windowsize + 1, input$gpos ), chromlengths[input$chrom] - input$windowsize - 1 ) }) data <- reactive({ h5dapply( tallyFile, group(), blocksize = input$windowsize*3, names = c("Coverages","Counts","Deletions"), range = c( max( pos() - input$windowsize, 0 ), min( pos() + input$windowsize, chromlengths[input$chrom] ) ) )[[1]] }) output$mismatchPlot <- renderPlot({ p = mismatchPlot( data(), sampleData(), samples = c("s288c","YB210"), input$windowsize, pos() ) print(p) }) }) ``` Including Variant Calls ----------------------- Obviously this genome browser has limited use in its current state but by adding a file of variants as an input and making them selectabe we can turn it into a browser for potential variant calls and can then easily go through those and inspect the region to get an idea of the quality of our calls. There is a lot of potential for creating tremendously usefull and visually impressive ways of communicating your reserach to your collaborators and peers. ### Calling the variants We use the following code to call variants using the build-in functionality of `h5vc`. We call variants from the perspective of the *YB210* strain, taking the *s288c* strain as our control. ```{r, eval=FALSE} library(rhdf5) library(h5vc) setwd("~/ShinyApps/Yeast") tallyFile <- "yeast.hfs5" study <- "/yeast" h5ls( tallyFile ) chromosomes <- h5ls( tallyFile ) chromosomes <- subset( chromosomes, otype == "H5I_GROUP" & name != "yeast" )$name variantCalls <- list() for( chrom in chromosomes ){ group <- paste( study, chrom, sep = "/" ) sdat <- getSampleData( tallyFile, group ) sdat$Group <- "yeast" variantCalls[[chrom]] <- h5dapply( filename = tallyFile, group = group, blocksize = 100000, FUN = callVariantsPairedFancy, sampledata = sdat, cl = vcConfParams( returnDataPoints = TRUE, minStrandAltSupport = 4 ), names = c( "Counts", "Coverages", "Reference" ), verbose = TRUE ) } for( chrom in chromosomes ){ variantCalls[[chrom]] <- do.call( rbind, variantCalls[[chrom]] ) } variantCalls <- do.call( rbind, variantCalls ) rownames(variantCalls) = NULL variantCalls$Support = variantCalls$caseCountFwd + variantCalls$caseCountRev variantCalls$Coverage = variantCalls$caseCoverageFwd + variantCalls$caseCoverageRev variantCalls$AF = variantCalls$Support / variantCalls$Coverage save( variantCalls, file = "yeast.variants.RDa" ) ``` ### Extending the functionality of the genome browser Now we have to make some changes to the genome browser shiny app so that it loads the list of variants and allows us to browse and filter them. The result of the efforts described below is linked here: * [The minimal variant browser](http://spark.rstudio.com/pyl/YeastVariantBrowser) #### Changes to the interface We will make three changes to the interface: 1. We move the `mismatchPlot` to a separate tab and add another tab which will contain a table of all the variants fulfilling current filtering criteria as well as a tab showing summary pots of the filtered variants (a histogram of the allele frequency for now). 2. We add filter selectors to the sidebar, we will allow filtering on allele freqeuncy (`AF` column), Support and Coverage 3. We replace the genomic position selector with a variant selector The new `ui.R` looks like this: ```{r, eval=FALSE} library(shiny) shinyUI(pageWithSidebar( # Application title headerPanel("Simple Variant Browser - Yeast Strains Example"), # Sidebar with a slider inputs and selection boxes sidebarPanel( sliderInput("windowsize", "Windowsize:", min = 10, max = 200, value = 50, step = 5), uiOutput("chromSelect"), sliderInput("af", "Allele Frequency:", min = 0, max = 1, value = c(0.1,1.0), step = 0.01), sliderInput( "minSupport", "Minimum Support", min = 2, max = 200, value = 2), sliderInput( "minCoverage", "Minimum Coverage", min = 10, max = 500, value = 50, step = 5), uiOutput("variantSelect"), textOutput("diag") ), # Show a plot of the region mainPanel( tabsetPanel( tabPanel( title = "Region Mismatch Plot", plotOutput("mismatchPlot", height=800) ), tabPanel( title = "Variant Table", tableOutput("variantTable") ), tabPanel( title = "Variant Summary Plots", plotOutput("afHist") ) ) ) )) ``` ### Changes to the server-side computations Here we have to add the code to load the variant calls, dynamically generate the variant selector and update some methods for generating the plots / calculating the current position to plot (since this now depends on the sleected variant.) The new `server.R` looks like this: ```{r, eval=FALSE} library(shiny) library(h5vc) library(rhdf5) library(ggplot2) library(grid) tallyFile <- "yeast.hfs5" study <- "/yeast" h5ls( tallyFile ) chromosomes = h5ls( tallyFile ) chromlengths = as.numeric(subset( chromosomes, otype == "H5I_DATASET" & name == "Reference" )$dim) chromosomes = subset( chromosomes, otype == "H5I_GROUP" & name != "yeast" )$name names(chromlengths) = chromosomes load(file="yeast.variants.RDa") variantCalls$start <- variantCalls$start + 1 #fixing difference in counting 0-based vs. 1-based variantCalls$end <- variantCalls$end + 1 # Define server logic required to generate and plot a random distribution shinyServer(function(input, output) { output$chromSelect <- renderUI({ selectInput( "chrom", "Chromosome", choices = c("I","II","III","IV","V","VI","VII","VIII","IX","X","XI","XII","XIII","XIV","XV","XVI", "Mito")) }) group <- reactive({ paste( study, input$chrom, sep="/" ) }) sampleData <- reactive({ sd = getSampleData( tallyFile, group() ); sd$Sample = c("YB210","s288c"); sd }) variants <- reactive({ subset( variantCalls, seqnames == input$chrom & AF >= input$af[1] & AF <= input$af[2] & Support >= input$minSupport & Coverage >= input$minCoverage ) }) output$variantSelect <- renderUI({ tmp = seq(nrow(variants())) names(tmp) = paste( variants()$start, " - ", variants()$refAllele, "/", variants()$altAllele, sep="" ) selectInput( "var", "Variant:", choices = tmp ) }) pos <- reactive({ variants()$start[as.numeric(input$var)] }) data <- reactive({ if( nrow(variants()) > 0 ){ h5dapply( filename = tallyFile, group = group(), blocksize = input$windowsize*3, names = c("Coverages","Counts","Deletions"), range = c( max( pos() - input$windowsize, 0 ), min( pos() + input$windowsize, chromlengths[input$chrom] ) ) )[[1]] }else{ NULL } }) output$mismatchPlot <- renderPlot({ if( nrow(variants()) > 0 ){ p = mismatchPlot( data(), sampleData(), samples = c("s288c","YB210"), input$windowsize, pos() ) print(p) } }) output$variantTable <- renderTable({ variants()[,c("seqnames", "start", "refAllele", "altAllele", "AF", "Support", "Coverage")] }) output$afHist = renderPlot({ hist( variants()$AF, breaks = seq(0,1,0.01) ) }) }) ``` Conclusions ----------- So there you have it, your own variant browser ready to be deployed on the web for your collaborators or the readers of your paper to access, browse and filter as they like. And we did all of this in a bit over `100` lines of code. There is a whole bunch of other useful plots, filters and functionalities that could be added to this tool, downloading filtered variants or generated plots would be one of the more obvious ones.