% Pour un exemple de vignette, voir
% browseVignettes("network")

%\documentclass{article}
\documentclass[article,nojss]{jss}
% ou documentclass{jss} pour le Journal of Statistical Software

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
% RJournal
%\usepackage{RJournal}
\usepackage{amsmath,amssymb,array}
%\usepackage{booktabs}

\usepackage{Sweave}

%Les lignes suivantes doivent rester commentées, pour être utilisée par R et pas par LaTeX
% !Rnw weave = Sweave
%!\SweaveUTF8
%\VignetteIndexEntry{About the stemmatology package}
%\VignetteIndexEntry{stemmatology Vignette}


\author{Jean-Baptiste Camps, Florian Cafiero}

\title{\pkg{stemmatology}: An \proglang{R} Stemmatology Package}
\Plaintitle{stemmatology: An R Stemmatology Package}
\Shorttitle{\pkg{stemmatology}: An \proglang{R} Stemmatology Package}

\Abstract{
\emph{Stemmatology} is the name of the field dedicated to studying text genealogies
and establishing genealogical tree-like graphs known as stemma codicum.

This package includes various functions for stemmatological analysis. It particularly implements functions following the Poole-Camps-Cafiero method, as well as functions to import data. 
}

\Keywords{stemmatology, philology, network, graphs}

\Address{
JB Camps\\
École nationale des chartes\\
\href{mailto:jbcamps@hotmail.com}{jbcamps@hotmail.com}\\
URL: \url{www.chartes.psl.eu/jean-baptiste-camps}
}

\begin{document}

\definecolor{Sinput}{rgb}{0.19,0.19,0.75}
\definecolor{Soutput}{rgb}{0.2,0.3,0.2}
\definecolor{Scode}{rgb}{0.75,0.19,0.19}
\DefineVerbatimEnvironment{Sinput}{Verbatim}{formatcom = {\color{Sinput}}} 
\DefineVerbatimEnvironment{Soutput}{Verbatim}{formatcom = {\color{Soutput}}}
\DefineVerbatimEnvironment{Scode}{Verbatim}{formatcom = {\color{Scode}}} 
\renewenvironment{Schunk}{}{}

\SweaveOpts{concordance=TRUE}

% Ce fichier sert à donner une documentation généraliste du package sous la forme d'un article, qui peut être plus complet que la documentation proprement dite des fichiers .Rd
% Voir notamment http://www.stats.uwo.ca/faculty/murdoch/ism2013/5Vignettes.pdf
% ainsi que http://cran.r-project.org/doc/manuals/R-exts.html#Documenting-packages

\maketitle


\section{Input}

Most of the functions take, as input a \emph{numeric matrix}, with witnesses in columns, variant locations in rows, and readings coded by a number, e.g.

\begin{table}[!h]
\begin{tabular}{lllllllllll}
  & A & B & C & D & E & H & I & J & K & O\\ \hline \hline 
1 & 0 & 1 & 1 & 1 & NA & 1 & 1 & NA & 1 & 1\\
2 & 1 & 1 & 1 & 1 & NA & 1 & 1 & NA & 1 & 1\\
3 & 1 & 1 & 1 & 1 & NA & 1 & 1 & NA & 1 & 1\\
4 & 1 & 1 & 1 & 2 & NA & 1 & 1 & NA & 1 & 1\\
5 & 1 & 1 & 1 & 2 & NA & 1 & 1 & NA & 1 & 1\\
6 & 1 & 1 & 1 & 1 & NA & 1 & 1 & NA & 1 & 1
\end{tabular}
\end{table}
where $A, B, …, O$ are the various witnesses in columns, $1 … 6$ the various variant locations, in rows, and the differents readings are coded either $0$ (omission), $1, 2, …, n$. \code{NA} is used for the lack of information (physical lacuna, absence of observation, variant location not applicable to a given witness, etc.).

Alternatively , if \code{alternateReadings = TRUE}, the input can be a \emph{character matrix}, with witnesses in columns, variant locations in rows, and, in each cell, one or several readings, coded by numbers and separated by a comma (e.g. '1,2,3', if the witness has three different readings), e.g.
\begin{table}[!h]
\begin{tabular}{llllll}
 & A & D & F & T & P \\ \hline \hline
1 & "1" & "2" & "2" & "2" & "1,2" \\
2 & "1" & "2" & "1,2" & "2" & "1" \\
3 & "1" & "1" & "1" & "1" & "2" \\
4 & "1,3" & "1,2" & "1" & "2" & "3"
\end{tabular}
\end{table}

Notice how a witness can bear several readings (e.g., P at VL 1).


% Import functions
\subsection{Create or import data}
 
  Data can be created inside R or imported. They can be imported by reading a csv file, for instance (e.g. with \code{read.csv}). They can also be imported from a TEI encoded apparatus in parallel-segmentation, either by using an XSL stylesheet, or the built-in function \code{import.TEIApparatus}.
  
  The function \code{import.TEIApparatus} allows to import a TEI P5 encoded apparatus into a stemmatological matrix usable with other functions. It has some parameters to refine the import (variant types, …), and can read either from disk or from an URL. 

% Fonctions 

\section{PCC Method}

    Functions are made available for the PCC method (See Camps and Cafiero 2014 or \code{PCC} for more details). The most important are
\begin{description}
\item[\code{PCC}] global shell for the PCC functions;
\item[\code{PCC.Exploratory}] global function for exploratory methods of the PCC family;
\item[\code{PCC.Stemma}] Building the Stemma Codicum.
\end{description}

\section{Other functions}
The package contains also various other functions, particularly aimed at
detecting contamination. It contains for instance the function \code{PCC.contam}.%\code{\link{VL.pValues}} (presented in Camps 2013 unpublished)

% Other methods
The package aims at making available various other stemmatological methods, including further functions for contamination detection, 
or for theoretical stemmatology.

%The package should include a function to generate arbitrary stemmata based on a set of parameters (fecundity, decimation rate at each level, number of generations, ...).

%It should also have a function for the analysis of stemmata shapes, and making hypotheses on the parameters of the original tradition

\section*{References}

Camps, Jean-Baptiste, and Florian Cafiero. ‘Stemmatology: An R Package for the Computer-Assisted Analysis of Textual Traditions’. \emph{Proceedings of the Second Workshop on Corpus-Based Research in the Humanities (CRH-2)}, edited by Andrew U. Frank et al., 2018, pp. 65–74, \url{https://halshs.archives-ouvertes.fr/hal-01695903v1}.

Camps, Jean-Baptiste. ‘Detecting Contaminations in Textual Traditions
Computer Assisted and Traditional Methods’.  Leeds, International Medieval Congress, 2013, unpublished paper, \url{https://www.academia.edu/3825633/Detecting_Contaminations_in_Textual_Traditions_Computer_Assisted_and_Traditional_Methods}.

Camps, Jean-Baptiste, and Florian Cafiero. ‘Genealogical Variant Locations and Simplified Stemma: A Test Case’. \emph{Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches}, edited by Tara Andrews and Caroline Macé, Brepols, 2015, pp. 69–93, \url{https://halshs.archives-ouvertes.fr/halshs-01435633}, DOI: \href{http://dx.doi.org/10.1484/M.LECTIO-EB.5.102565}{10.1484/M.LECTIO-EB.5.102565}.

Poole, Eric. ‘L’analyse stemmatique des textes documentaires’. \emph{La pratique des ordinateurs dans la critique des textes}, Paris, 1979, p. 151-161.

Poole, Eric, ‘The Computer in Determining Stemmatic Relationships’. \emph{Computers and the Humanities}, 8-4 (1974), p. 207-16.

\section*{Bugs and Issues}

Please report issues with this package to \url{https://github.com/Jean-Baptiste-Camps/stemmatology}.

\section*{Example of use}

<<eval=FALSE>>=
# Interactive mode
# Load data
data(fournival)
# or alternatively, import it
fournival = import.TEIApparatus(file = "myFournival.xml", 
    appTypes = c("substantive"))
# Analyse it with the PCC functions
PCC(fournival)

# Complete step-by-step non interactive use
data("fournival")
# look for conflicts
myConflicts = PCC.conflicts(fournival)
# remove conflicting VL
myConflicts = PCC.overconflicting(myConflicts, ask = FALSE, threshold = 0.06)
myNewData = PCC.elimination(myConflicts)
# look for competing genealogies
myConflicts = PCC.conflicts(myNewData)
myNewData = PCC.equipollent(myConflicts, ask = FALSE, scope = "W", wits = "D")
# build a stemma
PCC.Stemma(myNewData$databases[[3]], ask = FALSE)
@


\end{document}