---
title: 'SWATH2stats example script'
output: 
  pdf_document: 
    keep_tex: yes
vignette: >
  %\VignetteIndexEntry{SWATH2stats example script}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

Example R code showing the usage of the SWATH2stats package. The data processed is the publicly available dataset of *S.pyogenes* (Röst et al. 2014) (http://www.peptideatlas.org/PASS/PASS00289). The results file 'rawOpenSwathResults\_1pcnt\_only.tsv' can be found on PeptideAtlas (ftp://PASS00289@ftp.peptideatlas.org/../Spyogenes/results/).
This is a R Markdown file, showing the result of processing this data. The lines shaded in grey represent the R code executed during this analysis.

The stable release package SWATH2stats can be directly installed from Bioconductor using the commands below. This file here was generated using the current development release SWATH2stats v.1.1.14 that can be downloaded from http://bioconductor.org/packages/devel/bioc/html/SWATH2stats.html.


```{r, eval=FALSE}
## try http:// if https:// URLs are not supported
source('https://bioconductor.org/biocLite.R')
biocLite('SWATH2stats')
```

# Part 1: Loading and annotation

Load the SWATH-MS example data from the package, this is a reduced file in order to limit the file size of the package. 

```{r}
library(SWATH2stats)
library(data.table)
data('Spyogenes', package = 'SWATH2stats')
```

Alternatively the original file downloaded from the Peptide Atlas can be loaded from the working directory.

```{r, eval=FALSE}
data <- data.frame(fread('rawOpenSwathResults_1pcnt_only.tsv', sep='\t', header=TRUE))
```

Extract the study design information from the file names. Alternatively, the study design table can  be provided as an external table.

```{r, tidy=TRUE}
Study_design <- data.frame(Filename = unique(data$align_origfilename))
Study_design$Filename <- gsub('.*strep_align/(.*)_all_peakgroups.*', '\\1', Study_design$Filename)
Study_design$Condition <- gsub('(Strep.*)_Repl.*', '\\1', Study_design$Filename)
Study_design$BioReplicate <- gsub('.*Repl([[:digit:]])_.*', '\\1', Study_design$Filename)
Study_design$Run <- seq(1:nrow(Study_design))
head(Study_design)
```

The SWATH-MS data is annotated using the study design table.
```{r}
data.annotated <- sample_annotation(data, Study_design)
```

Remove the decoy peptides for a subsequent inspection of the data.
```{r}
data.annotated.nodecoy <- subset(data.annotated, decoy==FALSE)
```


\newpage

# Part 2: Analyze correlation, variation and signal
Count the different analytes for the different injections.
```{r}
count_analytes(data.annotated.nodecoy)
```

Plot the correlation of the signal intensity.
```{r, fig.height=2.5, fig.width = 6}
correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'Intensity')
```

Plot the correlation of the delta_rt, which is the deviation of the retention time from the expected retention time.
```{r, fig.height=2.5, fig.width = 6}
correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'delta_rt')
```
\newpage 


Plot the variation of the signal across replicates.
```{r, fig.height=2.5, fig.width = 5.5}
variation <- plot_variation(data.annotated.nodecoy)
variation[[2]]
```

Plot the total variation versus variation within replicates.
```{r, fig.height=2.5, fig.width = 5.5}
variation_total <- plot_variation_vs_total(data.annotated.nodecoy)
variation_total[[2]]
```

Calculate the summed signal per peptide and protein across samples.
```{r}
peptide_signal <- write_matrix_peptides(data.annotated.nodecoy)
protein_signal <- write_matrix_proteins(data.annotated.nodecoy)
head(protein_signal)
```
\newpage


# Part 3: FDR estimation
Estimate the overall FDR across runs using a target decoy strategy.
```{r, fig.height = 3.5}
par(mfrow = c(1, 3))
fdr_target_decoy <- assess_fdr_overall(data.annotated, n.range = 10, FFT = 0.25, output = 'Rconsole')

```
According to this FDR estimation one would need to filter the data with a lower mscore threshold to reach an overall protein FDR of 5%. 
```{r}
mscore4protfdr(data, FFT = 0.25, fdr_target = 0.05)
```

# Part 4: Filtering
Filter data for values that pass the 0.001 mscore criteria in at least two replicates of one condition.
```{r}
data.filtered <- filter_mscore_condition(data.annotated, 0.001, n.replica = 2)
```

Select only the 10 peptides showing strongest signal per protein.
```{r}
data.filtered2 <- filter_on_max_peptides(data.filtered, n_peptides = 10)
```
\newpage 


Filter for proteins that are supported by at least two peptides.
```{r}
data.filtered3 <- filter_on_min_peptides(data.filtered2, n_peptides = 2)
```


# Part 5: Conversion
Convert the data into a transition-level format (one row per transition measured). 

```{r}
data.transition <- disaggregate(data.filtered3)
```

Convert the data into the format required by MSstats.
```{r}
MSstats.input <- convert4MSstats(data.transition)
head(MSstats.input)
```

Convert the data into the format required by mapDIA.
```{r}
mapDIA.input <- convert4mapDIA(data.transition)
head(mapDIA.input)
```

Convert the data into the format required by aLFQ.
```{r}
aLFQ.input <- convert4aLFQ(data.transition)
head(aLFQ.input)
```

Session info on the R version and packages used.
```{r}
sessionInfo()
```