\documentclass[fleqn]{article} \usepackage[round,longnamesfirst]{natbib} \usepackage{graphicx,keyval,a4wide,thumbpdf,makeidx,color,colordvi} \usepackage{amsfonts,hyperref,graphicx} \newcommand\R{\textsf{R}} \newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}} \newcommand{\sQuote}[1]{`{#1}'} \newcommand{\dQuote}[1]{``{#1}''} \newcommand{\file}[1]{\sQuote{\textsf{#1}}} \newcommand{\data}[1]{\texttt{#1}} \newcommand{\var}[1]{\textit{#1}} \newcommand{\class}[1]{\textsf{#1}} \newcommand{\proglang}[1]{\textsf{#1}} %% \code without `-' ligatures \def\nohyphenation{\hyphenchar\font=-1 \aftergroup\restorehyphenation} \def\restorehyphenation{\hyphenchar\font=`-} {\catcode`\-=\active% \global\def\code{\bgroup% \catcode`\-=\active \let-\codedash% \Rd@code}} \def\codedash{-\discretionary{}{}{}} \def\Rd@code#1{\texttt{\nohyphenation#1}\egroup} \newcommand{\codefun}[1]{\code{#1()}} \newcommand{\codefunind}[1]{\codefun{#1}\index{\texttt{#1}}} \newcommand{\codeind}[1]{\code{#1}\index{\texttt{#1}}} \SweaveOpts{strip.white=true} \AtBeginDocument{\setkeys{Gin}{width=0.5\textwidth}} \definecolor{Blue}{rgb}{0,0,0.8} \definecolor{Red}{rgb}{0.7,0,0} % \date{2018-01-03} \title{Knowledge Space Theory} \author{Christina Stahl \and Cord Hockemeyer} %\VignetteIndexEntry{KST} %\VignetteDepends{kst} %\VignetteKeywords{knowledge structure, knowledge space} %\VignettePackage{kst} \makeindex{} \sloppy{} \begin{document} \maketitle \begin{abstract} This document explains algorithms and basic operations of knowledge structures and knowledge spaces available in \R{} through the \pkg{kst} package. \end{abstract} \tableofcontents % \clearpage <>= options(width = 80) library("kst") @ % \section{Introduction} \label{sec:introduction} \noindent Knowledge Space Theory \citep{Doignon+Falmagne:1999} is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The most basic assumption of knowledge space theory is that every knowledge \emph{domain} can be represented in terms of a set of domain problems or items. Moreover, knowledge space theory assumes dependencies between these items in that knowledge of a given item or a subset of items may be a prerequisite for knowledge of another, more difficult or complex item. These prerequisite \emph{relations} are realized by surmise relations, which create a quasi-order between different items. One advantage of these surmise relations is that they reduce the quantity of all possible solution patterns to a more manageable amount of knowledge states. Each of these knowledge states represents the subset of items an individual is capable of solving. The collection of all knowledge states captures the organization of the domain and is referred to as \emph{knowledge structure}. \section{Classes introduced in the \pkg{kst} package} \label{sec:classes} The \pkg{kst} package makes heavily usage of \R{} classes. Figure~\ref{fig:classes} gives an overview of these classes and the relationships between them. \begin{figure}[htbp] \begin{center} \includegraphics{kst-classes.png} \caption{\R{} classes in the \pkg{kst} package}\label{fig:classes} \end{center} \end{figure} The basic class is the \codeind{kfamset} class which describes quite genereally a family of sets. All other classes are effectively sub-classes of \code{kfamset}. The \codeind{kstructure} class describes knowledge structures. The difference between a \code{kfamset} and a \code{kstructure} is that the latter contains the empty set $\emptyset$ and the full item set. A special case of knowledge structures are the knowldge spaces described by the \code{kspace} class: they are closed under union. The fourth class defined in \pkg{kst} ist the \codeind{kbasis} class which describes the basis of a knowledge space. Please note that a \code{kbasis} does not contain the empty set $\emptyset$ and often neither the full item set. \section{Knowledge Structures} \label{sec:kstructure} The \codefunind{kstructure} function in package \pkg{kst} is the basic constructor for knowledge structures. It takes an endorelation representing a surmise relation or a set of sets each representing one knowledge state (e.g., one clause of a surmise system) and returns the corresponding knowledge structure: <>= # An endorelation representing a surmise relation kst <- endorelation(graph=set(tuple(1,1), tuple(2,2), tuple(3,3), tuple(4,4), tuple(2,1), tuple(3,1), tuple(4,1), tuple(3,2), tuple(4,2))) kstructure(kst) # A set of sets representing knowledge states (e.g., clauses of a surmise system) kst <- kstructure(set(set("a"), set("a","b"), set("a","c"), set("d","e"), set("a","b","d","e"), set("a","c","d","e"), set("a","b","c","d","e"))) kst @ Note that by default the quotes indicate the fact that the items are represented by characters. For displaying purposes, these quotes may be turned off: <>= sets_options("quote",FALSE) kst @ On the resulting knowledge structure several operations can be performed. Firstly, the knowledge \emph{domain} of the knowledge structure can be determined by means of the \codefunind{kdomain} function: <>= kdomain(kst) @ Secondly, the \emph{notions} of the knowledge structure can be determined by means of the \codefunind{knotions} function. A notion is a set of items always jointly contained in all knowledge states. Consequently, these items carry the same information and may therefore be considered equivalent: <>= knotions(kst) @ Thirdly, the \emph{atoms} of the knowledge structure can be determined by means of the \codefunind{katoms} function. For any item of the knowledge domain, an atom is a minimal knowledge state containing the respective item, where minimal refers to the fact that the respective knowledge state is not the union of any other knowledge states: <>= katoms(kst, items=set("a","b","c")) @ Forthly, the \emph{trace} of the knowledge structure can be determined by means of the \codefunind{ktrace} function. The trace of a knowledge structure on a set of items is the substructure of the knowledge structure on these items, i.e., the substructure resulting from restricting the knowledge structure to the specified item(s): <>= ktrace(kst, items=set("c","d","e")) @ Fifthly, the \codefunind{knneighbourhood} and \codefunind{kneighbourhood} functions compute the n-neighbourhood and neighbourhood of a knowledge state. <>= knneighbourhood(kst, state=set("a", "b"), distance=2) kneighbourhood(kst, state=set("a", "b")) @ Finally, the \codefunind{kfringe} function allows for determining the fringe of a knowledge state. Fringes determine the symmetric difference between a given knowledge state and its neighbouring states. <>= kfringe(kst, state=set("a", "b")) @ In addition, several properties of knowledge structures may be tested. Currently, only the functions \codefunind{kstructure\_is\_wellgraded} and the \codefunind{kstructure\_is\_space} are implemented (see Section 2 for the latter). A knowledge structure is considered \emph{well-graded} if any two of its states are connected by a bounded path, i.e., each knowledge state (except the empty set \emph{\{\}}) has at least one predecessor state that contains the same domain items with the exception of exactly one and each knowledge state (except the state for the full set of domain problems \emph{Q}) has at least one immediate successor state that comprises the same domain items plus exactly one. <>= kstructure_is_wellgraded(kst) @ Apart from these basic operations, the \pkg{kst} package also provides plotting functionalities for knowledge structures (see README for details). The \codefunind{plot} method takes an arbitrary knowledge structure and plots a Hasse Diagram of the respective knowledge structure (see Figure~\ref{fig:hasse-kst}): <>= if(requireNamespace("Rgraphviz")) {Rgraphviz::plot(kst)} @ \begin{figure}[h] \begin{center} <>= <> @ \caption{Knowledge Structure} \label{fig:hasse-kst} \end{center} \end{figure} In order to allow for plotting the surmise relation underlying a knowledge structure, the \pkg{kst} package provides the \codefunind{as.relation} method, which computes its underlying surmise relation, i.e., the set of item pairs corresponding to the knowledge dependencies. Antisymmetric and transitive surmise relations may then be plotted as a Hasse diagram: <>= as.relation(kst) @ In those cases where individuals' response patterns are available, they may be used to assess individuals or validate a knowledge structure. \subsection{Assessment and Validation} The \codefunind{kassess} function assigns individuals to their corresponding knowledge state in a knowledge structure. Currently only \dQuote{deterministic} assessment is implemented. Assessing individuals based on a deterministic procedure starts by determining an item \emph{a}, which is contained in approximately half of the available knowledge states. If the individual being assessed has successfully solved the respective item \emph{a}, all knowledge states that do not contain item \emph{a} are removed from the set of potential knowledge states of the individual. If, on the other hand, the individual has not solved the respective item \emph{a}, all knowledge states that do contain item \emph{a} are removed from the set of potential knowledge states of the individual. From the remaining knowledge states an item \emph{b}, which again is contained in approximately half of the still available knowledge states, is selected. If the individual has successfully solved the respective item \emph{b}, all knowledge states that do not contain item \emph{b} are removed from the set of potential knowledge states of the individual. If, on the other hand, the individual has solved the respective item \emph{b}, all knowledge states that do contain item \emph{b} are removed from the set of potential knowledge states of the individual. This procedure is repeated until only one knowledge state is left. This is the knowledge state the individual is currently located in. <>= rp <- data.frame(a=c(1,1,0,1,1,1,1,0,0,0),b=c(0,1,0,1,0,1,0,1,0,0), c=c(0,0,0,0,1,1,1,0,1,0),d=c(0,0,1,1,1,1,0,0,0,1), e=c(0,0,1,1,1,1,0,0,0,0)) kassess(kst, rpatterns=rp) @ The \codefunind{kvalidate} function on the other hand calculates validity coefficients for prerequisite relations and knowledge structures. The \emph{$\gamma$-Index} \citep{Goodman+Kruskal:1972} validates the prerequisite relation underlying a knowledge structure and assumes that not every response pattern is represented by a prerequisite relation. For this purpose it compares the number of response patterns that are represented by a prerequisite relation (i.e., concordant pairs) with the number of response patterns that are not represented by a prerequisite relation (i.e., discordant pairs). Formally, the $\gamma$-Index is defined as \[\gamma = \frac{N_c - N_d}{N_c + N_d}\] where $N_c$ is the number of concordant pairs and $N_d$ the number of discordant pairs. Generally, a positive $\gamma$-value supports the validity of prerequisite relations. The validation method \emph{percent} likewise validates prerequisite relations and assumes that more difficult or complex items are solved less frequently than less difficult or complex items. For this purpose it calculates the relative solution frequency for each of the items in the domain. The \emph{Violational Coefficient} \citep{Schrepp+Held+Albert:1999} also validates prerequisite relations. For this purpose, the number of violations (i.e., the earlier mentioned discordant pairs) against a prerequisite relation are calculated. Formally, the VC is defined as \[VC = \frac{1}{n(|S| - m)} \sum_{x,y} v_{xy}\] where $n$ denotes the number of response vectors, $|S|$ refers to the number of pairs in the relation, $m$ denotes the number of items, and $v_{xy}$ again refers to the number of discordant pairs. Generally, a low VC supports the validity of prerequisite relations. In contrast to the other three indices, the \emph{Distance Agreement Coefficient} \citep{Schrepp:1999} validates the resulting knowledge structure. For this purpose it compares the average symmetric distance between the knowledge structure and respone patterns (referred to as \emph{ddat}) to the average symmetric distance between the knowledge structure and the power set of response patterns (referred to as \emph{dpot}). By calculating the ratio of \emph{ddat} and \emph{dpot}, the DA is determined. Generally, a lower DA-value indicates a better fit between a knowledge structure and a set of response patterns. <>= kvalidate(kst, rpatterns=rp, method="gamma") kvalidate(kst, rpatterns=rp, method="percent") kvalidate(kst, rpatterns=rp, method="VC") kvalidate(kst, rpatterns=rp, method="DA") @ \subsection{Set--related Methods} Apart from these kst-specific functions, the \pkg{kst} package also provides general set-related methods. In particular, these include methods pertaining to the \emph{closure} and \emph{reduction} of sets. The \codefunind{closure} method for objects of class \codeind{kstructure} performs the closure of a knowledge structure by computing the union or intersection of any two knowledge states. \codefunind{union} is also used as a basis for the \codefunind{kspace} function (see next section). <>= closure(kst, operation="union") @ The \codefunind{reduction} method performs the reduction of a knowledge structure by computing the minimal subset having the same closure as the knowledge structure. Additionally, it allows for computing the \emph{discriminative} reduction of a knowledge structure. Such a discriminative reduction is a knowledge structure in which each notion contains a single item. <>= reduction(kst, operation="discrimination") @ \subsection{Families of Sets}\label{sec:famset} A more general concept than the knowledge structures represented by \codeind{kstructure}s are the families of (sub)sets represented by the \codeind{kfamset} class. The \codefunind{kfamset} constructor takes the same arguments as the \codefunind{kstructure} constructor. <>= # An endorelation representing a surmise relation # A set of sets representing knowledge states (e.g., clauses of a surmise system) kfs <- kfamset(set(set("a"), set("a","b"), set("a","c"), set("d","e"), set("a","b","d","e"), set("a","c","d","e"), set("a","b","c","d","e"))) kfs @ Figure~\ref{fig:hasse-kfamset} shows the Hasse diagram of this family of sets. Please note the difference to Fig.~\ref{fig:hasse-kst} which also contains the empty set $\emptyset$. <>= if(requireNamespace("Rgraphviz")) {Rgraphviz::plot(kfs)} @ \begin{figure}[h] \begin{center} <>= <> @ \caption{Family of sets} \label{fig:hasse-kfamset} \end{center} \end{figure} \section{Knowledge Spaces} \label{sec:kspace} Apart from knowledge structures, knowledge space theory also suggests the concept of \emph{knowledge spaces}. A knowledge structure is considered a knowledge space if it includes one state for the empty set \emph{\{\}}, one state for the full set of domain items, and a state for the union of any two knowledge states (i.e., the closure under union). The basic constructor for creating knowledge spaces is the \codefunind{kspace} function. It takes an arbitrary knowledge structure and returns the corresponding knowledge space: <>= ksp <- kspace(kst) ksp @ In order to test for the space property of a knowledge structure, the \pkg{kst} package provides the function \codefunind{kstructure\_is\_space}: <>= kstructure_is_kspace(ksp) @ Apart from the functions described in the previous section, which can likewise be performed on knowledge spaces, the package \pkg{kst} provides the additional function \codefunind{kbase}, which is only applicable to knowledge spaces. The \codefunind{kbase} function takes an arbitrary knowledge space and computes its \emph{base}. A base for a knowledge space is a minimal family of knowledge states spanning the knowledge space, i.e., the base includes the minimal states sufficient to reconstruct the full knowledge space. A knowledge structure has a base only if it is a knowledge space. <>= kbase(ksp) @ \section{Learning Paths} \label{sec:lpath} Both, knowledge structures and knowledge spaces, involve the concept of \emph{learning paths}. A learning path is a maximal sequence of knowledge states, which allows learners to gradually traverse a knowledge structure or space from the empty set \emph{\{\}} (or any other bottom state) to the full set of domain problems \emph{Q}. The basic constructor for creating learning paths is the \codefunind{lpath} function. It takes an arbitrary knowledge structure or space and computes all possible learning paths in the respective knowledge structure or space. The result is a list where each element represents one learning path: <>= lp <- lpath(ksp) lp @ A learning path is considered a \emph{gradation} if each state in the respective learning path differs from its predecessor and/or successor state by a single item/notion. The \codefunind{lpath\_is\_gradation} function allows for testing the gradation property of a list of learning paths: <>= lpath_is_gradation(lp) @ \section{Utilities} Other KST related \R\ packages (\pkg{DAKS} and \pkg{pks}) use matrices as representation for knowledge structures and data sets. The \codefunind{as.famset} function converts such matrices into families (i.e. \code{sets}) of \code{sets}. <>= m <- matrix(c(1, 0, 0, 1, 1, 0), nrow = 2, ncol = 3) m as.famset(m) as.famset(m, as.letters = FALSE) @ The inverse function is given by the \codefunind{as.binaryMatrix} function which returns the matrix representation of a \code{kstructure}. <>= as.binaryMatrix(ksp) @ {\small \bibliographystyle{abbrvnat} \bibliography{kst} } \printindex{} \end{document}