--- title: "`vws`: Vertical Weighted Strips in R using C++" author: Andrew M. Raim format: pdf: mathspec: true fontsize: 12pt indent: false toc: true number-sections: true colorlinks: true link-citations: true prompt: false include-in-header: text: | \usepackage{common} pdf-engine: pdflatex vignette: > %\VignetteIndexEntry{vws} %\VignetteEncoding{UTF-8} %\VignetteEngine{quarto::pdf} bibliography: references.bib editor_options: chunk_output_type: console classoption: abstract abstract: | The `vws` package facilitates the vertical weighted strips (VWS) method in R. VWS is an approach to constructing proposals for rejection sampling which is explored in @VWS2025. The `vws` package provides an API to program the necessary components for the proposal and carry out rejection sampling. The API of the `vws` package is in C++ and makes use of templates and object-oriented programming; a working understanding of these techniques is assumed. The primary intended usage is that sampler logic is implemented in C++ as a variate generation function; this is exposed in R for use in applications. This vignette presents the API and several complete examples; multiple variations of VWS samplers are presented with each example. The `vws` package and this vignette have been written primarily for univariate targets; however, it is intended that the framework can be used to handle multivariate problems as well. thanks: | **For correspondence**`:` . Document was compiled `{r} format(Sys.time(), "%Y-%m-%d %H:%M:%S %Z")`. geometry: - left=0.75in - right=0.75in - top=0.75in - bottom=1.00in execute: eval: true code-block-bg: "#FAFAFA" code-block-border-left: "#B2DEED" filters: - include-code-files callout-icon: false # Center for Statistical Research & Methodology, U.S. Census Bureau, # Washington, DC, 20233, U.S.A. --- ```{r} #| include: false library(vws) library(tidyverse) set.seed(1234) ``` # Disclaimer and Acknowledgments {.unnumbered} Although there are no guarantees of correctness of the `vws` package, reasonable efforts will be made to address shortcomings. Comments, questions, corrections, and possible improvements can be communicated through the project’s Github repository (). # Introduction The `vws` package facilitates vertical weighted strips (VWS) sampling [@VWS2025]. VWS is an approach to construct proposals distributions for rejection sampling. Here, the target density is regarded as a weighted density - the product of a nonnegative weight function and a base density. The weight function is majorized by a given function (which bounds it above) to yield an envelope which is recombined with the base density to form a proposal distribution. If it is convenient to generate draws from the proposal, and the distance between the original and majorized weight function is not too large, such a proposal can be effectively used in rejection sampling. When the weight function is majorized in a piecewise manner over the support, the resulting proposal can be regarded as a finite mixture. Such a mixture can be refined to reduce the rejection probability to an acceptable tolerance. The `vws` package provides tools to construct and utilize VWS proposal distributions. Variate generation functions are constructed in C++ and may be exposed for use in R via the Rcpp package [@Eddelbuettel2103]. C++ is taken as the primary programming language because of its efficiency as a compiled language and its formal approach to object-orientation and template programming. The `fntl` package [@fntl] is integral to `vws` usage; it provides a simplified C++ interface to useful routines in the R application programming interface (API) which is described in @RExtensions, as well as other useful numerical routines, where functions are supplied as lambdas. The `vws` package has been designed primarily to generate from univariate distributions (both continuous and discrete). It is also possible to construct proposals for multivariate distributions using the VWS approach; the `vws` package is intended to support code development for such settings as well. This vignette presents an overview of the package, its API, and several examples of working samplers. An `R>` prompt is shown in some code displays to emphasize interaction via the console. Examples make liberal use of the `tidyverse` [@Tidyverse2019], especially `ggplot2` to plot and `dplyr` to manipulate tables. Some of the lengthier codes for the examples are not presented in this document; however, complete codes are provided with the package. They can be accessed in the `doc/examples` folder of the installed `vws` package. This path can be accessed in R with the following command. ```{r} #| eval: false #| prompt: true file.path(path.package("vws"), "doc", "examples") ``` The remainder of the vignette proceeds as follows. @sec-vws briefly reviews VWS sampling. @sec-overview gives an overview of the package, its major components, and important preliminaries for programming. @sec-api provides a detailed specification of the C++ API. @sec-vmf develops several VWS samplers to generate from the Von Mises Fisher (VMF) distribution following the application in @VWS2025. @sec-ln-norm develops samplers in a disclosure avoidance setting considered by @DirectSamplingDAS2021, @DPSimulation2022, and @DASMethods2025. @sec-bessel develops VWS samplers for the Bessel count distribution [@Devroye2002]. ::: {.callout-tip title="Quick Start"} Readers looking for a quick start can go directly to the first example in @sec-vmf-const-default whose contents are described with the highest amount of detail. ::: # A Brief Review of Vertical Weighted Strips {#sec-vws} The objective of VWS is to sample from a weighted target density \begin{align} f(x) = f_0(x) / \psi, \quad f_0(x) = w(x) g(x), \quad \psi = \int_\Omega f_0(x) d\nu(x), \label{eqn:weighted-target} \end{align} where $\Omega$ is the support, $\nu$ is a dominating measure, $g$ is assumed to be a normalized density, $w(x)$ is a nonnegative weight function, and $\psi$ is a normalizing constant. We will construct a proposal of the form \begin{align*} h(x) = h_0(x) / \psi_N, \quad h_0(x) = \overline{w}(x) g(x), \quad \psi_N = \int_\Omega h_0(x) d\nu(x). \end{align*} We will say that a function $\phi$ *majorizes* $w$ on $A \subseteq \Omega$ if $\phi(x) \geq w(x)$ for $x \in A$. Let $\ind\{ x \in A \}$ denote the indicator function for $x \in A$. Suppose $\Omega$ is partitioned into regions $\mathscr{D}_1, \ldots, \mathscr{D}_N$ and there are corresponding functions $\overline{w}_1, \ldots, \overline{w}_N$ where $\overline{w}_j$ majorizes $w$ on $\mathscr{D}_j$. Then $\overline{w}(x) = \sum_{j=1}^N \overline{w}_j(x) \ind\{x \in \mathscr{D}_j\}$ majorizes $w$ on $\Omega$ and the unnormalized proposal becomes \begin{align*} h_0(x) = g(x) \sum_{j=1}^N \overline{w}_j(x) \ind\{x \in \mathscr{D}_j\}. \end{align*} With this construction, $f_0(x) \leq h_0(x)$ for all $x \in \Omega$. Therefore, classical rejection sampling can be carried out by drawing $u$ from $\text{Uniform}(0,1)$, $x$ from $h$, and accepting $x$ as a draw from $f$ if $u \leq f_0(x) / h_0(x)$. A benefit of this construction is that the functional form of the density can help to guide proposal selection. The normalized $h$ can be obtained by defining \begin{align*} \overline{\xi}_j = \int_{\mathscr{D}_j} \overline{w}_j(x) g(x) d\nu(x) \end{align*} and \begin{align*} \psi_N = \sum_{j=1}^N \overline{\xi}_j, \end{align*} giving the finite mixture \begin{align} h(x) = h_0(x) / \psi_N = \sum_{j=1}^N \pi_j g_j(x), \label{eqn:fmm-proposal} \end{align} a finite mixture with mixing weights and component densities, \begin{align*} \pi_j = \frac{\overline{\xi}_j}{\sum_{\ell=1}^N \overline{\xi}_\ell}, \quad \text{and} \quad g_j(x) = \overline{w}_j(x) g(x) \ind\{x \in \mathscr{D}_j\} / \overline{\xi}_j, \end{align*} respectively. The $g_j$ are truncated and reweighted versions of base distribution $g$. In addition to the majorizer, suppose that $\underline{w}_j$ is a *minorizer* of $w$ so that $0 \leq \underline{w}_j(x) \leq w(x)$ for all $x \in \mathscr{D}_j$, and let \begin{math} \underline{\xi}_j = \int_{\mathscr{D}_j} \underline{w}_j(x) g(x) d\nu(x). \end{math} Note that the majorizer and minorizer need not have the same functional form. When $h$ is used as a proposal in rejection sampling, an upper bound for the probability of rejection is \begin{align} \rho_+ = \sum_{j=1}^N \rho_j, \quad \text{where} \quad \rho_j = \frac{ \overline{\xi}_j - \underline{\xi}_j }{ \sum_{j=1}^N \overline{\xi}_\ell }. \label{eqn:bound} \end{align} The quantities $\rho_1, \ldots, \rho_N$ are the contributions of each region to $\rho_+$. The bound $\rho_+$ can be used to determine whether a VWS proposal will be viable for rejection sampling; if it is not much less than 1, rejections may be too frequent for practical use. The proposal may be refined by altering the partitioning or considering a different majorizer. Several specific choices of majorizer are considered by @VWS2025 and will be reviewed in remainder of this section. @sec-vws-constant discusses the use of a constant majorizing function. A linear majorizing function is discussed in @sec-vws-linear. @sec-vws-knots reviews several knot selection methods. ## Constant Majorizer {#sec-vws-constant} Suppose $\Omega = (\alpha_0, \alpha_N]$ is an interval whose endpoints may or may not be finite and $w(x) < \infty$ on $\Omega$. We can majorize $w$ using a constant $\overline{w}_j \geq \sup_{x \in \mathscr{D}_j} w(x)$ on each $\mathscr{D}_j$ so that the majorizer is $\overline{w}(x) = \sum_{j=1}^N \overline{w}_j \ind\{ x \in \mathscr{D}_j \}$. Here we obtain component densities of finite mixture \eqref{eqn:fmm-proposal} as \begin{align*} g_j(x) = g(x) \ind\{x \in \mathscr{D}_j\} / \Prob(T \in \mathscr{D}_j), \quad \text{with} \quad \overline{\xi}_j = \overline{w}_j \Prob(T \in \mathscr{D}_j). \end{align*} Furthermore, if we assume a constant minorizer $\underline{w}_j \leq \inf_{x \in \mathscr{D}_j} w(x)$, then $\underline{\xi}_j = \underline{w}_j \Prob(T \in \mathscr{D}_j)$. ## Linear Majorizer {#sec-vws-linear} Suppose $\Omega = (\alpha_0, \alpha_N]$ is an interval whose endpoints may or may not be finite and $0 < w(x) < \infty$ on $\Omega$. Furthermore, suppose $\Omega$ is partitioned into regions $\mathscr{D}_j = (\alpha_{j-1}, \alpha_j]$ where $w$ is entirely either log-convex or log-concave for each $j = 1\, \ldots, N$. An exponentiated linear function $\overline{w}_j(x) = \exp\{ \overline{\beta}_{0j} + \overline{\beta}_{1j} x \}$ can be used to majorize $w$ on $\mathscr{D}_j$ with an appropriate choice of coefficients $\overline{\beta}_{0j}$ and $\overline{\beta}_{1j}$. Therefore, the piecewise linear function $\overline{w}(x) = \sum_{j=1}^N \overline{w}_j(x) \ind\{ x \in \mathscr{D}_j \}$ is a majorizer on $\Omega$. Here the component densities of finite mixture \eqref{eqn:fmm-proposal} are \begin{align*} g_j(x) = \exp\{ \overline{\beta}_{0j} + \overline{\beta}_{1j} x \} g(x) \ind\{ x \in \mathscr{D}_j \} / \overline{\xi}_j, % \quad \text{with} \quad % \overline{\xi}_j = e^{ \overline{\beta}_{0j} } \int_{\alpha_{j-1}}^{\alpha_j} e^{ \overline{\beta}_{1j} \cdot x } g(x) d\nu(x). \end{align*} In certain cases, such as where $g$ is in an exponential family, the integral in $\overline{\xi}_j$ can be simplified and $g_j$ is seen to be a familiar distribution. We may also assume a linear minorizer $\overline{w}(x) = \sum_{j=1}^N \underline{w}_j(x) \ind\{ x \in \mathscr{D}_j \}$ with $\overline{w}_j(x) = \exp\{ \underline{\beta}_{0j} + \underline{\beta}_{1j} x \}$ for appropriate coefficients $\underline{\beta}_{0j}$ and $\underline{\beta}_{1j}$. To identify coefficients $\overline{\beta}_{0j}$, $\overline{\beta}_{1j}$, $\underline{\beta}_{0j}$, and $\underline{\beta}_{1j}$, consider the following remarks corresponding to the log-concave and log-convex cases, respectively. ::: {#rem-linear-concave} Suppose $w$ is log-concave on $\mathscr{D}_j$. For any $c \in \mathscr{D}_j$, \begin{align} \log w(x) &\leq \log w(c) + (x - c) \nabla(c) \nonumber \\ &= \overline{\beta}_{j0} + \overline{\beta}_{j1} \cdot x, \label{eqn:linear-concave-majorizer} \end{align} so that \begin{math} \overline{w}_j(x) = \exp\{ \overline{\beta}_{j0} + \overline{\beta}_{j1} \cdot x \} \end{math} with \begin{math} \overline{\beta}_{j0} = \log w(c) - c \cdot \nabla(c), \end{math} \begin{math} \overline{\beta}_{j1} = \nabla(c), \end{math} and \begin{math} \nabla(x) = \frac{d}{dx} \log w(x). \end{math} For an accompanying minorizer, let $\lambda \in [0,1]$ for a given $x \in \mathscr{D}_j$ so that $x = (1 - \lambda) \alpha_{j-1} + \lambda \alpha_j$. Concavity of $\log w(x)$ gives \begin{align} \log w(x) &\geq (1-\lambda) \log w(\alpha_{j-1}) + \lambda \log w(\alpha_j) \nonumber \\ &= \log w(\alpha_{j-1}) + \frac{x - \alpha_{j-1}}{\alpha_j - \alpha_{j-1}} [ \log w(\alpha_j) - \log w(\alpha_{j-1})] \nonumber \\ &= \underline{\beta}_{j0} + \underline{\beta}_{j1} \cdot x, \label{eqn:linear-concave-minorizer} \end{align} so that \begin{math} \underline{w}_j(x) = \exp\{ \underline{\beta}_{j0} + \underline{\beta}_{j1} \cdot x \} \end{math} with \begin{math} \underline{\beta}_{j0} = \log w(\alpha_{j-1}) - \alpha_{j-1} \underline{\beta}_{j1} \end{math} and \begin{math} \underline{\beta}_{j1} = \{\log w(\alpha_j) - \log w(\alpha_{j-1}) \} / \{ \alpha_j - \alpha_{j-1} \}. \end{math} \remarkqed ::: ::: {#rem-linear-convex} Suppose $w$ is log-convex on $\mathscr{D}_j$. A majorizer and minorizer are obtained from \eqref{eqn:linear-concave-minorizer} and \eqref{eqn:linear-concave-majorizer}, respectively. Note that they have switched roles from @rem-linear-concave. \remarkqed ::: The following remark describes one possible choice for the expansion point $c$ which will be utilized in the examples in Sections [-@sec-vmf-linear], [-@sec-ln-norm-linear], and [-@sec-bessel-linear]. ::: {#rem-expansion-point} For a majorizer in the log-concave case, we choose $c$ to minimize the L1 distance between unnormalized densities, \begin{align} c^* &= \argmin_{c \in \mathscr{D}_j} \int_{\mathscr{D}_j} [\overline{w}_j(x) - w(x) ] g(x) d\nu(x) \nonumber \\ % &= \argmin_{c \in \mathscr{D}_j} \Big\{ w(c) \exp\{ -c \nabla(c) \} \int_{\mathscr{D}_j} \exp\{ x \nabla(c) \} g(x) d\nu(x) \Big\} \nonumber \\ % &= \argmin_{c \in \mathscr{D}_j} \Big\{ \log w(c) - c \nabla(c) + \log M_j(\nabla(c)) \Big\}, \label{eqn:concave-majorize-choice} \end{align} where \begin{math} \overline{w}_j(x) = \exp\{ \overline{\beta}_{j0} + \overline{\beta}_{j1} \cdot x \}, \end{math} \begin{math} \overline{\beta}_{j0} = \log w(c) - c \cdot \nabla(c), \end{math} \begin{math} \overline{\beta}_{j1} = \nabla(c), \end{math} \begin{math} \nabla(x) = \frac{d}{dx} \log w(x), \end{math} and \begin{math} M_j(s) = \int_{\alpha_{j-1}}^{\alpha_j} e^{xs} g(x) d\nu(x). \end{math} A similar choice of $c$ for a minorizer in the log-convex case is \begin{align} c^* &= \argmin_{c \in \mathscr{D}_j} \int_{\mathscr{D}_j} [w(x) - \underline{w}_j(x) ] g(x) d\nu(x) \nonumber \\ % &= \argmax_{c \in \mathscr{D}_j} \int_{\mathscr{D}_j} \underline{w}_j(x) g(x) d\nu(x) \nonumber \\ % &= \argmax_{c \in \mathscr{D}_j} \Big\{ \log w(c) - c \nabla(c) + \log M_j(\nabla(c)) \Big\}. \label{eqn:convex-minorize-choice} \end{align} where \begin{math} \overline{w}_j(x) = \exp\{ \underline{\beta}_{j0} + \underline{\beta}_{j1} \cdot x \}, \end{math} \begin{math} \underline{\beta}_{j0} = \log w(c) - c \cdot \nabla(c), \end{math} \begin{math} \underline{\beta}_{j1} = \nabla(c), \end{math} \begin{math} \nabla(x) = \frac{d}{dx} \log w(x). \end{math} \remarkqed ::: ## Knot Selection {#sec-vws-knots} After selecting a decomposition of the target into weighted form \eqref{eqn:weighted-target}, the choice of knots is important to obtain an efficient proposal distribution. One option is to specify knots at known locations. For example, we can consider using evenly spaced points on a given interval within the support; however, this may not take into account important features of the target and add wasteful components to the finite mixture which do not contribute much to the quality of the approximation. An automated option that often achieves much better results is the rule of thumb proposed by @VWS2025, shown here as Algorithm \ref{alg:refine}. This method sequentially refines an initial partition of the support consisting of one or more regions. At each step a region is selected and bifurcated at its midpoint. The selection is probabilistic, with probabilities proportional to $\rho_1, \ldots, \rho_N$. A deterministic ("greedy") variation can be considered by instead always selecting a region $\ell$ whose value $\rho_\ell$ is largest. Bifurcation at a midpoint is mostly straightforward in univariate settings where the support is a subset of the real line. However, there may be multiple ways to characterize this operation when the support is multivariate, perhaps with more structure like a multinomial sample space. The necessary operations can be coded for such settings within the `vws` framework; however, this document does not consider such cases in detail. \begin{algorithm} \caption{Probabilistic rule of thumb for sequential knot selection.} \label{alg:refine} \hspace*{\algorithmicindent}\textbf{Input}: maximum number of knots to add $N$. \\ \hspace*{\algorithmicindent}\textbf{Input}: initial vector of internal knots $\alpha_1, \ldots, \alpha_{N_0-1}$; may be empty with $N_0 = 0$. \\ \hspace*{\algorithmicindent}\textbf{Input}: tolerance $\epsilon > 0$. \begin{algorithmic}[1] \State $j \leftarrow 0$ \While{$j \leq N$} \State Let $\mathscr{D}_\ell$ with $\rho_\ell$ for $\ell \in \{ 1, \ldots, N_0 + j \}$ be current regions. \State If $\sum_{\ell=1}^{N_0 + j} \rho_\ell < \epsilon$, break from the loop. \State Draw $\ell \in \{ 1, \ldots N_0 + j \}$ from $\text{Discrete}(\rho_1, \ldots, \rho_{N_0 + j})$. \label{line:selection} \State Let $\alpha^*$ be midpoint of $\mathscr{D}_\ell$; add $\alpha^*$ to vector of knots. \State Let $j \leftarrow j + 1$. \label{line:midpoint} \EndWhile \State \Return $(\alpha_0, \ldots, \alpha_{N_0 + j})$. \end{algorithmic} \end{algorithm} # Overview of Package {#sec-overview} The `vws` package aims to support the methodology which was described in the previous section. The present section will describe tools in the package which can be used to formulate a problem, construct a proposal, and generate samples. Use of the `vws` package centers on two classes. The `FMMProposal` class represents a finite mixture of the form \eqref{eqn:fmm-proposal} and encapsulates the operations needed for rejection sampling. The `Region` class represents region $\mathscr{D}_j$ and the operations required for VWS. `Region` itself is an abstract base class which defines an interface needed by the framework. All problem-specific logic is coded within a subclass of `Region`: either by using one of the provided subclasses or by coding a new subclass. The subclasses `RealConstRegion` and `IntConstRegion` provide implementations of the constant majorizer from @sec-vws-constant, for continuous and discrete univariate distributions, respectively. The linear majorizer described in @sec-vws-linear requires more customization than the constant majorizer; here, users should create a subclass of `Region`. An `FMMProposal` object is created from $N \geq 1$ `Region` objects that represent the partition $\mathscr{D}_1, \ldots, \mathscr{D}_N$ of $\Omega$. The `rejection` function takes an object `h` of class `FMMProposal` and carries out rejection sampling to obtain $n$ draws. The `rejection` function returns a vector of accepted draws along with information about the number of rejections. @fig-software-design displays a diagram of the high-level design just described. ![Design of `vws` package.](software-design.pdf){#fig-software-design} The following display outlines a typical workflow for VWS rejection sampling in C++. Suppose this code is saved is in a file named `sample.cpp`. ```{cpp} // [[Rcpp::depends(vws, fntl)]] // <1> #include "vws.h" // <2> #include "MyRegion.h" // [[Rcpp::export]] // <3> Rcpp::List sample(unsigned int n) { MyRegion supp( ... ); // <4> vws::FMMProposal h(supp); // <5> h.refine(N - 1, 0.01); // <6> auto out = vws::rejection(h, n); // <7> return Rcpp::List::create( // <8> Rcpp::Named("draws") = out.draws, Rcpp::Named("rejects") = out.rejects ); } ``` 1. This annotation ensures that `vws` and `fntl` packages are linked during compilation. `fntl` is not explicitly used in this example, but it is needed for linking by the `vws` library regardless. 2. Include the main header for the `vws` package so that the API is accessible. 3. This annotation exports the `sample` function for use in R. This is not needed if the `sample` function will be used only in C++. 4. Construct a region of type `MyRegion` which we have presumably defined in `MyRegion.h`. The symbol `...` here is a placeholder for any arguments needed by the constructor. 5. Construct an `FMMProposal` based on a single region `supp`. Note that it is a template class that needs two template arguments: the first (`double`) indicates the data type of the distribution and the second (`MyRegion`) indicates the type of region it will contain. 6. Refine the proposal using Algorithm \ref{alg:refine} up to $N-1$ times or until tolerance `0.01` is achieved for \eqref{eqn:bound}. 7. Carry out rejection sampling with the refined proposal `h`. 8. Pack elements of the struct `out` into an `Rcpp::List` and return them to the caller. Note that the `FMMProposal` class, the `rejection` function, and other elements of the `vws` API are accessed in the `vws` namespace. Details for the API are given in @sec-api. Once the `sample` function is successfully exported as an R function, we may invoke it as usual. ```{r} #| eval: false #| prompt: true Rcpp::sourceCpp("sample.cpp") out = sample(n = 100) head(out$draws) head(out$rejects) ``` ::: {.callout-tip title="Working Examples"} A complete VWS example is given in @sec-vmf-const-default; this demonstrates the use of `RealConstRegion` to implement a sampler with a minimal amount of coding. @sec-vmf-const-custom is an improvement which requires more coding; here we provide functions to determine values of the constant majorizer and minorizer analytically. Finally, @sec-vmf-linear demonstrates how to subclass `Region` to implement a linear majorizer, which requires more intricate coding. Subsequent examples are given in @sec-ln-norm and @sec-bessel, but with less explanation. ::: It is useful to mention several other preliminaries for the `vws` package; see the following remarks. ::: {#rem-lambdas} Lambdas in C++ are functions which can be defined within the course of a program, regarded as objects which can be passed to other functions, and capture variables in the environment so that they do not need to be passed as arguments. They should feel very familiar to R programmers who likely use R functions in this way. Here is a snippet with several examples of lambdas. ```{cpp} double z = 3; typedef function dfdd; dfdd f1 = [&](double x, double y) -> double { return x*y*z; }; dfdd f2 = [&](double x, double y) { return x*y*z; }; dfdd f3 = [=](double x, double y) { return x*y*z; }; dfdd f4 = [](double x, double y) { return x*y*3; }; ``` These four functions carry out the same operation and may be invoked in the usual way. ```{cpp} double out = f1(1, 2); ``` The typedef `dfdd` is given as a shorthand for a Standard Template Library (STL) [`function`](https://en.cppreference.com/w/cpp/utility/functional/function.html) which takes two `double` variables as arguments and returns a `double`. An STL `function` can be passed to other functions like any other variable. Function `f1` explicitly states the return type of the lambda using `-> double` while it is omitted in the others to be inferred by the compiler. The variable `z` is *captured* by `f1`, `f2`, and `f3`; in other words, it is "baked in" to the function definition without needing to be passed as an argument. Functions `f1` and `f2` capture `z` by reference while `f3` captures `z` by value, indicated with the syntaxes `[&]` and `[=]` respectively. Function `f4` uses empty square brackets `[]` to indicate that it does not capture any outside variables. The `fntl` package defines several function typedefs used in the `vws` package, and also provides access to numerical tools such as integration, differentiation, and optimization which take STL `function`s as arguments. \remarkqed ::: ::: {#rem-logscale} Many of the computations in this vignette and within the `vws` package are carried out on the logarithmic scale. This is to avoid loss of precision when working with very large or small magnitude numbers. Several functions to support log-scale arithmetic are given in @sec-api-logscale. \remarkqed ::: ::: {#rem-optimization} To help automate computation of constant majorizers and minorizers for use with `RealConstRegion` and `IntConstRegion`, the default is to use numerical optimization to find appropriate constants on each region in the partition. The particular optimization method is presented in @sec-api-opt. This can be somewhat of a computational burden; it can sometimes be avoided when there is a closed form solution or other informed way to arrive at maximizing / minimizing constants. The API for `RealConstRegion` and `IntConstRegion` accepts user-specified lambdas to specify alternative methods. \remarkqed ::: # C++ API {#sec-api} This section documents functions, classes, and other components in the C++ API. Note that some functions in the C++ API are also exposed as R functions; see the `vws` package manual for these. ## Typedefs {#sec-api-typedefs} The following typedef is used to represent functions that take `double` and `bool` arguments and return a `double`. It is used as the type for weight functions in particular, with `x` the argument of the weight function and `log = true` if the result is to be returned on the log-scale. ```{cpp} typedef std::function dfdb; ``` The following typedef represents a function that optimizes (i.e., maximizes or minimizes) a weight function on a given interval. ```{cpp} typedef std::function double lo, // <2> double hi, // <3> bool log // <4> )> optimizer; ``` 1. A weight function. 2. Lower bound of the interval; may be `R_NegInf`. 3. Upper bound of the interval; may be `R_PosInf`. 4. Logical; `log = true` specifies that the result should be the optimal value $\log w(x^*)$, given on the log-scale. Otherwise, the result should be the optimal value $w(x^*)$ on the original scale. The following typedef represents a function that computes the midpoint of a given interval. Note that this type name can clash with the `midpoint` function in the STL; therefore, it is reference using its namespace `vws::midpoint` within the `vws` package. ```{cpp} typedef std::function midpoint; ``` ## Finite Mixture Proposal {#sec-api-proposal} The class `FMMProposal` represents a VWS proposal in the form of \eqref{eqn:fmm-proposal}. ### Class Definition {.unlisted} `FMMProposal` has two template arguments: `T` is the data type for the underlying distribution (e.g., `T = double` for a real-valued univariate random variable) and `R` is the type of region which the proposal will be based upon. ```{cpp} template class FMMProposal { ... }; ``` ### Constructor {.unlisted} The primary constructor takes a vector of regions of type `R`. There must be at least one region in the vector These regions are expected to be a partition of the support $\Omega$. ```{cpp} FMMProposal(const std::vector& regions); ``` A second constructor takes a single region. This is a special case of the previous one which is provided for convenience. ```{cpp} FMMProposal(const R& region); ``` ### Distribution Methods {.unlisted} The following functions make use of the proposal as a distribution. They are especially utilized in rejection sampling. ```{cpp} std::vector r(unsigned int n = 1) const; // <1> std::pair, std::vector> // <2> r_ext(unsigned int n = 1) const; double d(const T& x, bool normalize = true, bool log = false) const; // <3> double w_major(const T& x, bool log = true) const; // <4> double d_target_unnorm(const T& x, bool log = true) const; // <5> ``` 1. Draw $n$ variates of type `T` from the proposal. 2. Draw $n$ variates of type `T` from the proposal and retain the indices of the regions used in each draw. The result is an STL pair whose first element is the vector of draws and second element is the vector of indices. 3. Evaluate the density $h$ on the given $x$. If `normalize = false`, $h_0(x)$ is computed; otherwise $h(x)$ is computed. 4. Evaluate the majorized weight function $\overline{w}(x)$. 5. Evaluate the unnormalized target $f_0(x) = w(x) g(x)$. For the `log` argument, the value on the log-scale is returned if `true`; otherwise, the value on the original scale is returned. ### Accessors {.unlisted #sec-api-proposal-accessors} The following accessors are provided. ```{cpp} Rcpp::NumericVector xi_upper(bool log = true) const; // <1> Rcpp::NumericVector xi_lower(bool log = true) const; // <2> Rcpp::LogicalVector bifurcatable() const; // <3> Rcpp::NumericVector pi(bool log = false) const; // <4> Rcpp::NumericVector bound_contrib(bool log = false) const; // <5> double bound(bool log = false) const; // <6> double nc(bool log = false) const; // <7> unsigned int size() const; // <8> ``` 1. Get the constants $\overline{\xi}_1, \ldots, \overline{\xi}_N$. 2. Get the constants $\underline{\xi}_1, \ldots, \underline{\xi}_N$. 3. Get a vector of $N$ logical values indicating whether the corresponding regions can be bifurcated. For example, when the support `T` is `int`, a region $(a,b]$ containing one integer should not be bifurcated because one of the two resulting regions will not contain any points of the support. 4. Get the mixing proportions $\pi_1, \ldots, \pi_N$. 5. Get the contributions $\rho_1, \ldots, \rho_N$ to bound \eqref{eqn:bound} corresponding to the $N$ regions. 6. Get the overall rejection bound \eqref{eqn:bound}. 7. Get the normalizing constant $\psi_N$ 8. Get the number of regions $N$. Methods above with the a `log` argument return values on the log-scale when `log = true`; values are returned on the original scale otherwise. ### Iterators {.unlisted} The following methods can be used to get (read-only) iterators to internal data structures. These can be more efficient than the accessors in @sec-api-proposal because they do not make a copy of the data. ```{cpp} std::set::const_iterator regions_begin() const; // <1> std::set::const_iterator regions_end() const; Rcpp::NumericVector::const_iterator log_xi_upper_begin() const; // <2> Rcpp::NumericVector::const_iterator log_xi_upper_end() const; Rcpp::NumericVector::const_iterator log_xi_lower_begin() const; // <3> Rcpp::NumericVector::const_iterator log_xi_lower_end() const; Rcpp::LogicalVector::const_iterator bifurcatable_begin() const; // <4> Rcpp::LogicalVector::const_iterator bifurcatable_end() const; ``` 1. The start and end of the set of regions (of template type `R`) in the proposal. 2. The start and end of the vector $(\overline{\xi}_1, \ldots, \overline{\xi}_N)$. 3. The start and end of the vector $(\underline{\xi}_1, \ldots, \underline{\xi}_N)$. 4. The start and end of the vector of indicators for whether regions are bifurcatable. ### Refining the Proposal {.unlisted} Two functions are provided to refine the proposal from $\mathscr{D}_1, \ldots, \mathscr{D}_N$ into a finer partition. Both variants return a vector $\rho_+^(0), \ldots, \rho_+^{(N_0)}$ which represents values of the bound \eqref{eqn:bound} at each refinement step; $N_0$ is the number of steps taken, $\rho_+^(j)$ represents the value at refinement $j = 0, \ldots, N_0$, and $\rho_+^(j)$ represents the initial value. The first variant partitions at a given vector of knots. ```{cpp} Rcpp::NumericVector refine( const std::vector& knots, // <1> bool log = true // <2> ); ``` 1. A vector of knots. 2. If `log = true` return bound values on the log-scale. The second variant uses the rule of thumb for sequential knot selection from @VWS2025. Refining will halt when \eqref{eqn:bound} reduces below `tol`; which is possible when `tol` is positive; otherwise, refining halts after `N` steps. ```{cpp} Rcpp::NumericVector refine( unsigned int N, // <1> double tol = 0, // <2> bool greedy = false, // <3> unsigned int report = fntl::uint_max, // <4> bool log = true // <5> ); ``` 1. Number of refinements to make. 2. Tolerance for \eqref{eqn:bound}; refinement will halt if $\rho_+$ reaches this. 3. If `greedy = true`, the region with the largest $\rho_\ell$ is always selected for partitioning; otherwise, regions are selected with probabilities proportional to $\rho_1, \ldots, \rho_N$. 4. If `log = true` return bound values $\rho_+^{(0)}, \ldots, \rho_+^{(N)}$ on the log-scale. 5. The period at which progress is reported to the console. E.g., use `period = 2` to report progress every two selections. The `seq` function is provided as a convenience to generate equally-spaced knots for univariate real-valued intervals. ```{cpp} std::vector seq(double lo, double hi, unsigned int N, bool endpoints = false); ``` ### Summary Methods {.unlisted} Several methods are provided to summarize the regions in the proposal. @tbl-summary describes the columns in the summary and rows correspond to the $N$ regions. ```{cpp} Rcpp::DataFrame summary() const; // <1> void print(unsigned int n = 5) const; // <2> ``` 1. Get a data frame with the summary. 2. Print summary to the console. | Column | Description | |-----|-----| `Region` | String describing region $j$. `log_xi_upper` | $\log \overline{\xi}_j$ `log_xi_lower` | $\log \underline{\xi}_j$ `log_volume` | $\log \rho_j$ : Data frame returned by summary method of `FMMProposal`. {#tbl-summary tbl-colwidths="\[30,70\]"} ## Region Base Class {#sec-api-region} `Region` is an abstract base class whose interface represents the problem-specific logic that must be coded to implement VWS. Users create a subclass of this method to construct a proposal for a given problem. However, for the most common application of VWS - univariate support with a constant majorizer - users may be able to leverage the provided subclasses in Sections [-@sec-api-realconstregion] and [-@sec-api-intconstregion]. The class has one template argument `T`, which is the data type for the underlying distribution. ```{cpp} template class Region { ... }; ``` The interface consists of the following public methods. These are abstract and must be implemented in a subclass. ```{cpp} virtual double d_base(const T& x, bool log = false) const = 0; // <1> virtual std::vector r(unsigned int n) const = 0; // <2> virtual bool s(const T& x) const = 0; // <3> virtual double w(const T& x, bool log = true) const = 0; // <4> virtual double w_major(const T& x, bool log = true) const = 0; // <5> virtual bool is_bifurcatable() const = 0; // <6> virtual double xi_upper(bool log = true) const = 0; // <7> virtual double xi_lower(bool log = true) const = 0; // <8> virtual std::string description() const = 0; // <9> ``` 1. Evaluate the density function $g$ of the base distribution. 2. Generate a vector of $n$ draws from $g_j$ specific to this region. 3. Indicator of whether $x$ is in the support for $g_j$ specific to this region. 4. The weight function $w$. 5. Majorized weight function $\overline{w}_j$ for this region. 6. Indicator of whether this region is bifurcatable into two smaller regions. This is used when refining a proposal; see @sec-vws-knots. One reason that a region should not be bifurcated is when one of the resulting regions will not have any points in the support. 7. The quantity $\overline{\xi}_j$ for this region. 8. The quantity $\underline{\xi}_j$ for this region. 9. A string that describes this region; used for printing to the console. The argument `log = true` in the methods above requests values to be returned on the log-scale. Subclasses of `Region` are expected to provide several additional functions which are not specified in the interface. Suppose `MyRegion` is a subclass of `Region` whose elements are variables of type `type` (e.g., `double`); its definition should include the following. ```{cpp} class MyRegion : public Region { public: std::pair bifurcate() const; // <1> std::pair bifurcate(const type& x) const; // <2> MyRegion singleton(const type& x) const; // <3> bool operator<(const MyRegion& x) const; // <4> bool operator==(const MyRegion& x) const; // <5> const MyRegion& operator=(const MyRegion& x); // <6> ... }; ``` 1. Bifurcate the current region into two regions; this version takes no arguments and should include logic to decide where to make the split. 2. Bifurcate the current region into two regions; the argument `x` is used as the location to make the split. 3. Return a region on the singleton set $(x,x]$ with the information in the current object. 4. Comparison operator; used to order regions of this type. 5. Equality operator; check whether two regions of this type are equal. 6. Assignment operator; assign information from `x` to this object. ## Region on Real-Valued Support with Constant Majorizer {#sec-api-realconstregion} `RealConstRegion` is a subclass of `Region`, defined in @sec-api-region, specifically for univariate problems with continuous support where $\overline{w}(x) = \sum_{j=1}^N \overline{w}_j \ind\{x \in \mathscr{D}_j\}$ is constructed from constants $\overline{w}_1, \ldots, \overline{w}_N$. It assumes a constant a minorizer $\underline{w}(x) = \sum_{j=1}^N \underline{w}_j \ind\{x \in \mathscr{D}_j\}$ is constructed from constants $\underline{w}_1, \ldots, \underline{w}_N$. The $\overline{w}_j$ and $\underline{w}_j$ are obtained using numerical optimization by default; however, user-supplied methods can be supplied to identify these values when it is possible. ### Constructors {.unlisted #sec-api-realconstregion-constructors} `RealConstRegion` has the following constructors. The first creates a region based on the interval $(a,b]$; the second creates a region based on the singleton set $\{a\}$, which is intended primarily for internal use. ```{cpp} RealConstRegion( double a, // <1> double b, // <2> const dfdb& w, // <3> const UnivariateHelper& helper, // <4> const optimizer& maxopt = maxopt_default, // <5> const optimizer& minopt = minopt_default, // <6> const midpoint& mid = midpoint_default // <7> ); RealConstRegion( double a, // <1> const dfdb& w, // <3> const UnivariateHelper& helper // <4> const optimizer& maxopt = maxopt_default, // <5> const optimizer& minopt = minopt_default, // <6> const midpoint& mid = midpoint_default // <7> ); ``` 1. Lower limit of interval that defines this region. 2. Upper limit of interval that defines this region. 3. Weight function $w$ for the target distribution. 4. A container with operations of the base distribution $g$. 5. A function of type `optimizer` that maximizes `w` on the given region. 6. A function of type `optimizer` that minimizes `w` on the given region. 7. A function of type `midpoint` to compute the midpoint of region's interval. The default optimizers, `maxopt_default` and `minopt_default`, use a hybrid numerical optimization method in @sec-api-opt to optimize `w`. The function `midpoint_default` is a version of the arithmetic midpoint with special handling for infinite endpoints. It is used primarily to bifurcate a given region into two smaller regions. In particular, the current implementation is \begin{align*} \text{midpoint}(a,b) = \begin{cases} 0, & \text{if $a = -\infty$ and $b = \infty$}, \\ b \cdot 2^{-\text{sign}(b)} - 1, & \text{if $a = -\infty$ and $b < \infty$}, \\ a \cdot 2^{\text{sign}(a)} + 1, & \text{if $a > -\infty$ and $b = \infty$}, \\ (a + b) / 2, & \text{otherwise}, \end{cases} \end{align*} where $\text{sign}(x) = \ind(x > 0) - \ind(x < 0)$. If both endpoints are infinite, zero is returned. When one endpoint is infinite, the finite endpoint is reduced in magnitude by a factor of $1/2$ and brought in by an additional unit of one. The exponential factor helps to avoid large numbers of wasted regions during knot selection. The additive factor accommodates endpoints that are near zero; without it, $\text{midpoint}(a,b)$ only brings the finite endpoint closer to zero which likely to create wasteful bifurcations. ### Methods {.unlisted} The `midpoint` method returns a midpoint for the current region. ```{cpp} double RealConstRegion::midpoint() const ``` The `bifurcate` method returns two disjoint regions whose union is the current region. The result is given as an STL pair. The first version bifurcates at the midpoint of the current region, determined by the `midpoint` method. The second version partitions at the given $x$. ```{cpp} std::pair bifurcate() const; std::pair bifurcate(const double& x) const; ``` The `is_bifurcatable` function always returns `true` in the case of a continuous support. ```{cpp} bool is_bifurcatable() const; ``` The `singleton` method returns a singleton interval $(x, x]$, using the current object's member data. ```{cpp} RealConstRegion singleton(const double& x) const; ``` The following methods determine an ordering of the current region and an another region specified as argument `x`. Region $(a_1, b_1]$ is considered "less than" $(a_2, b_2]$ if $b_1 \leq a_2$. The regions are considered equal if $a_1 = a_2$ and $b_1 = b_2$. Note that other elements such as $w$ are $g$ are not explicitly checked, and are assumed to be the same. ```{cpp} bool operator<(const RealConstRegion& x) const; bool operator==(const RealConstRegion& x) const; ``` The following method assigns the current region to be equal to the argument `x`. ```{cpp} const RealConstRegion& operator=(const RealConstRegion& x); ``` ## Region on Integer-Valued Support with Constant Majorizer {#sec-api-intconstregion} The class `IntConstRegion` is a subclass of `RealConstRegion` that is specialized to integer-valued distributions. ```{cpp} class IntConstRegion : public RealConstRegion { ... }; ``` ### Constructors {.unlisted} `IntConstRegion` has the following constructors. ```{cpp} IntConstRegion( double a, // <1> double b, // <2> const dfdb& w, // <3> const UnivariateHelper& helper, // <4> const optimizer& maxopt = maxopt_default, // <5> const optimizer& minopt = minopt_default, // <6> const midpoint& mid = midpoint_default // <7> ); IntConstRegion( double a, // <1> const dfdb& w, // <3> const UnivariateHelper& helper // <4> const optimizer& maxopt = maxopt_default, // <5> const optimizer& minopt = minopt_default, // <6> const midpoint& mid = midpoint_default // <7> ); ``` 1. Lower limit of interval that defines this region. 2. Upper limit of interval that defines this region. 3. Weight function $w$ for the target distribution. 4. A container with operations of the base distribution $g$. 5. A function of type `optimizer` that maximizes `w` on the given region. 6. A function of type `optimizer` that minimizes `w` on the given region. 7. A function of type `midpoint` to compute the midpoint of region's interval. The arguments here are analogous to those in @sec-api-realconstregion-constructors. Note that `a` and `b` need not be integers here. ### Methods {.unlisted} The following methods are specialized from `RealConstRegion`. ```{cpp} std::pair bifurcate() const; std::pair bifurcate(const double& x) const; ``` The `is_bifurcatable` function returns `false` for regions whose width is smaller than $1$. ```{cpp} bool is_bifurcatable() const; ``` The `singleton` method returns a singleton interval $(x, x]$, using the current object's weight function, base distribution, etc. ```{cpp} IntConstRegion singleton(const double& x) const; ``` The following methods determine an ordering of the current region and an another region specified as argument `x`. Region $(a_1, b_1]$ is considered "less than" $(a_2, b_2]$ if $b_1 \leq a_2$. The regions are considered equal if $a_1 = a_2$ and $b_1 = b_2$. Note that other elements such as $w$ are $g$ are not explicitly checked, and are assumed to be the same. ```{cpp} bool operator<(const IntConstRegion& x) const; bool operator==(const IntConstRegion& x) const; ``` The following method assigns the current region to be equal to the argument `x`. ```{cpp} const IntConstRegion& operator=(const IntConstRegion& x); ``` ## Univariate Helper {#sec-api-helper} The class `UnivariateHelper` is intended for use with `RealConstRegion` and `IntConstRegion`. It encapsulates several operations needed from the base distribution $g$. These operations are specified as lambdas in the constructor. ```{cpp} UnivariateHelper( const fntl::density& d, // <1> const fntl::cdf& p, // <2> const fntl::quantile& q // <3> ); ``` 1. A function to evaluate density $g$. 2. A function to evaluate the cumulative distribution function (CDF) $G$. 3. A function to evaluate the quantile function $G^{-}$. The following methods on `UnivariateHelper` utilize the lambdas specified above. ```{cpp} double d(double x, bool log = false) const; // <1> double p(double q, bool lower = true, bool log = false) const; // <2> double q(double p, bool lower = true, bool log = false) const; // <3> ``` 1. Evaluate the density function at argument $x$. Result is on the log-scale if `log = true`. 2. Evaluate the CDF at argument $q$. Result is on the log-scale if `log = true`. Result represents $P(X \leq q)$ if `lower = true` and $P(X > q)$ otherwise. 3. Evaluate the quantile function at argument $p$. Assume $p$ is specified on the log-scale if `log = true`. Request $p$ quantile if `lower = true` and $1-p$ quantile otherwise. ## Rejection Sampling {#sec-api-rejection} The following functions perform rejection sampling using a VWS proposal described in @sec-api-proposal. ```{cpp} template rejection_result rejection( const FMMProposal& h, // <1> unsigned int n, // <2> const rejection_args& args // <3> ); template rejection_result rejection( const FMMProposal& h, // <1> unsigned int n // <2> ); ``` 1. An `FMMProposal` object to use as the proposal. 2. The number of desired draws. 3. Additional arguments for rejection sampling. Default values are assumed in the second form. Template arguments `T` and `R` correspond to the given proposal `h` and are described in @sec-api-proposal. Additional arguments to `rejection` are provided via the following struct. ```{cpp} struct rejection_args { unsigned int max_rejects = std::numeric_limits::max(); // <1> unsigned int report = std::numeric_limits::max(); // <2> double ratio_ub = std::exp(1e-5); // <3> fntl::error_action action = fntl::error_action::STOP; // <4> rejection_args() { }; // <5> rejection_args(SEXP obj); // <6> operator SEXP() const; // <7> }; ``` 1. Maximum number of rejections to tolerate overall (among all $n$ attempted draws) before bailing out. 2. Determines period at which progress is logged to the console. 3. Maximum allowed value for the ratio $f_0(x) / h_0(x)$, which may be slightly larger than 1 due to numerical precision. An error is thrown if the ratio is larger than this value. 4. The action to take if `max_rejects` rejections is exceeded; see @tbl-action. The definition of the enum `fntl::error_action` is given in Section 2 of @fntl. 5. Constructor that takes no arguments and initializes to default values. 6. Convert an `Rcpp::List` to a `rejection_args` struct. 7. Return a `Rcpp::List` from a `rejection_args` struct. | Action | Effect | |-----|-----| `fntl::error_action::NONE` | Error condition is ignored. `fntl::error_action::MESSAGE` | A message is emitted without halting. `fntl::error_action::WARNING` | A warning is emitted without halting. `fntl::error_action::STOP` | Halts when an error condition is encountered. : Value of `action` used in `rejection_args.action` and its effect in `rejection`. {#tbl-action tbl-colwidths="\[35,65\]"} The return value of `rejection` is a struct of the following type. ```{cpp} template struct rejection_result { std::vector draws; // <1> std::vector rejects; // <2> operator SEXP() const; // <3> }; ``` 1. Vector of draws. 2. Vector of rejection counts; the $i$th element represents number of rejections observed before accepting the $i$th draw. 3. Return a `Rcpp::List` from a `rejection_args` struct. If the maximum number og rejections are reached and `fntl::error_action::STOP`, the length of the vectors `draws` and `rejects` will be less than $n$. ## Optimization on an Interval {#sec-api-opt} The function `optimize_hybrid` is a hybrid optimization method for univariate functions $f(x) : [a,b] \rightarrow \mathbb{R}$ with bounds $x \in [a,b]$ whose endpoints may be finite or infinite. Uses Brent's method if both bounds are finite and BFGS otherwise. In the latter case, the bounds are enforced via a transformation. ```{cpp} optimize_hybrid_result optimize_hybrid( const fntl::dfd& f, // <1> double init, // <2> double lower, // <3> double upper, // <4> bool maximize, // <5> unsigned maxiter = 100000 // <6> ); ``` 1. Objective function. 2. Initial value used with BFGS. 3. Lower bound $a$ of domain $[a,b]$. 4. Upper bound $b$ of domain $[a,b]$. 5. Logical; if `true`, optimization will be a maximization. Otherwise it is a minimization. 6. Maximum number of iterations. The result is a `optimize_hybrid_result` struct which is defined as follows. ```{cpp} struct optimize_hybrid_result { double par; // <1> double value; // <2> std::string method; // <3> int status; // <4> operator SEXP() const; // <5> }; ``` 1. Final value of the optimization variable $x$. 2. Final value of the objective function $f(x)$. 3. Description of the method used to obtain the result; see @tbl-method for possible values. 4. Corresponds to a code from BFGS if it is used as `method`; otherwise $0$. 5. Return an `Rcpp::List` from a `optimize_hybrid_result` struct. | Message | Description | |-----|-----| | `"Brent"` | Brent optimization method was used. | | `"BFGS"` | BFGS method was used. | | `"Lower Limit Inf"` | For maximization, lower limit taken as `par` because its value is `inf`. | | `"Upper Limit Inf"` | For maximization, upper limit taken as `par` because its value is `inf`. | | `"Lower Limit NegInf"` | For minimization, lower limit taken as `par` because its value is `-inf`. | | `"Upper Limit NegInf"` | For minimization, upper limit taken as `par` because its value is `-inf`. | | `"Max at Lower Limit"` | Numerical maximization used, but larger value found at lower limit. | | `"Max at Upper Limit"` | Numerical maximization used, but larger value found at upper limit. | | `"Min at Lower Limit"` | Numerical minimization used, but smaller value found at lower limit. | | `"Min at Upper Limit"` | Numerical minimization used, but smaller value found at upper limit. | : Possible values for `method` field of `optimize_hybrid_result` struct. {#tbl-method tbl-colwidths="\[28,72\]"} ## Log-Scale Arithmetic {#sec-api-logscale} As mentioned in @rem-logscale, calculations in the package are carried out on the log-scale, where possible, to avoid issues from floating point numbers with very small or very large magnitudes. Users may want to follow this convention when implementing their own sampling problems. Several included functions help to avoid explicit exponentiation. The following computes the scalar $f(x) = \log\{ \sum_{i=1}^n \exp(x_i) \}$ from a vector $x \in \mathbb{R}^n$ using the method described in [this StackExchange post](https://stats.stackexchange.com/a/381937). ```{cpp} double log_sum_exp(const Rcpp::NumericVector& x); ``` The following functions carry out addition on the log scale using the property \begin{displaymath} \log(e^x + e^y) = t + \log\{1 + \exp[s - t]\}, \quad s = \min(x,y), \quad t = \max(x,y). \end{displaymath} The first form takes scalar arguments $x$ and $y$. The second and third forms take $x, y \in \mathbb{R}^n$ and produce a vector with elements $\log(e^{x_i} + e^{y_i})$, $i = 1, \ldots, n$. ```{cpp} double log_add2_exp(double x, double y); std::vector log_add2_exp( const std::vector& x, const std::vector& y ); Rcpp::NumericVector log_add2_exp( const Rcpp::NumericVector& x, const Rcpp::NumericVector& y ); ``` The following carry out subtraction on the log scale using the property \begin{displaymath} \log(e^x - e^y) = x + \log\{1 - \exp[y - x]\}. \end{displaymath} When $x$ is smaller than $y$, the results is assumed to be `nan`. The first form takes scalar arguments $x$ and $y$. The second and third forms take $x, y \in \mathbb{R}^n$ and produce a vector with elements $\log(e^{x_i} - e^{y_i})$, $i = 1, \ldots, n$. ```{cpp} double log_sub2_exp(double x, double y); std::vector log_sub2_exp( const std::vector& x, const std::vector& y ); Rcpp::NumericVector log_sub2_exp( const Rcpp::NumericVector& x, const Rcpp::NumericVector& y ); ``` ## Generating from a Discrete Distribution {#sec-api-discrete} We make use of the Gumbel trick [e.g., @HuijbenEtAl2023] to draw from a discrete distribution with probabilities $p_1, \ldots, p_k$. This approach allows probabilities to be specified on the log-scale without the need to exponentiate or normalize them so that they sum to $1$. The Gumbel trick generates a draw $x$ from the desired discrete distribution via $$ X = \argmax \{ Z_1 + \log p_1, \ldots, Z_k + \log p_k \}, \quad Z_1, \ldots, Z_k \iid \text{Gumbel}(0,1), $$ where $\text{Gumbel}(0,1)$ is a standard Gumbel distribution with density $f(x) = e^{-(x + e^{x})} \ind(x > 0)$. ### Categorical Distribution {.unlisted} The following functions generate a draw of $X$. The first form generates a single variate and the second form generates a sample of size $n$. ```{cpp} unsigned int r_categ( const Rcpp::NumericVector& p, // <2> bool log = false, // <3> bool one_based = false // <4> ); Rcpp::IntegerVector r_categ( unsigned int n, // <1> const Rcpp::NumericVector& p, // <2> bool log = false, // <3> bool one_based = false // <4> ); ``` 1. Desired sample size. 2. Vector of probabilities $p_1, \ldots, p_k$. 3. Logical; if `true`, argument `p` is interpreted as $\log p_1, \ldots, \log p_k$. Otherwise, it is interpreted as $p_1, \ldots, p_k$ on the original scale. 4. Logical; if `true`, support is assumed to be $\{ 1, \ldots, k \}$, where $k$ is length of the given `p`. Otherwise it is assumed to be $\{ 0, \ldots, k-1 \}$. The former is useful to generate indices in C++ while the latter is useful for indices in R. ### Gumbel Distribution {.unlisted} The following functions are provided for the Gumbel distribution with location parameter $\mu$ and scale $\sigma$. They compute the density, CDF, quantile, and variate generation, respectively. The first group operates on scalars and the second are vectorized versions which operate on an independent and identically distributed sample. ```{cpp} double d_gumbel( double x, // <1> double mu = 0, // <5> double sigma = 1, // <6> bool log = false // <7> ); double p_gumbel( double q, // <2> double mu = 0, // <5> double sigma = 1, // <6> bool lower = true, // <8> bool log = false // <7> ); double q_gumbel( double p, // <3> double mu = 0, // <5> double sigma = 1, // <6> bool lower = true, // <8> bool log = false // <7> ); double r_gumbel( double mu = 0, // <5> double sigma = 1 // <7> ); Rcpp::NumericVector d_gumbel( const Rcpp::NumericVector& x, // <1> double mu = 0, // <5> double sigma = 1, // <6> bool log = false // <7> ); Rcpp::NumericVector p_gumbel( const Rcpp::NumericVector& q, // <2> double mu = 0, // <5> double sigma = 1, // <6> bool lower = true, // <8> bool log = false // <7> ); Rcpp::NumericVector q_gumbel( const Rcpp::NumericVector& p, // <3> double mu = 0, // <5> double sigma = 1, // <6> bool lower = true, // <8> bool log = false // <7> ); Rcpp::NumericVector r_gumbel( unsigned int n, // <4> double mu = 0, // <5> double sigma = 1 // <6> ); ``` 1. Argument to the density. 2. Argument to the CDF. 3. Argument to the quantile function. 4. Number of draws. 5. Location parameter. 6. Scale parameter. 7. Logical; if `true`, probability arguments and return values are on the log-scale 8. Logical; if `true`, probability arguments and return values are lower tailed in the form $P[X \leq x]$; otherwise, $P[X > x]$. # Example: Von Mises Fisher Distribution {#sec-vmf} Let us consider generating from the von Mises Fisher (VMF) distribution as in @VWS2025. VMF is useful in modeling directional data whose support is the sphere $\mathbb{S}^{d-1} = \{ \vec{v} \in \mathbb{R}^d : \vec{v}^\top \vec{v} = 1 \}$ [e.g., @MardiaJupp1999]. A random variable $\vec{V}$ with distribution $\text{VMF}_d(\vec{\mu}, \kappa)$ has density \begin{align*} f_{\text{VMF}}(\vec{v}) = \frac{ \kappa^{d / 2 - 1} }{ (2 \pi)^{d/2} I_{d/2 - 1}(\kappa) } \exp(\kappa \cdot \vec{\mu}^\top \vec{v}) \cdot \ind\{\vec{v} \in \mathbb{S}^{d-1}\}, \end{align*} with modified Bessel function of the first kind \begin{math} I_{\nu}(x) = \sum_{m=0}^\infty \{m! \cdot \Gamma(m + \nu + 1)\}^{-1} (\frac{x}{2})^{2m + \nu} \end{math} and gamma function $\Gamma(x) = \int_0^\infty t^{x-1} e^{-t} dt$. Parameters $\vec{\mu} \in \mathbb{S}^{d-1}$ and $\kappa > 0$ determine the orientation on the sphere and the concentration, respectively. First consider $\vec{\mu}_0 = (1, 0, \ldots, 0)$. A random variable $\vec{V}_0 \sim \text{VMF}_d(\vec{\mu}_0, \kappa)$ can be obtained from the transformation \begin{align} \vec{V}_0 = \left( X, \sqrt{1 - X^2} \cdot \vec{U} \right), \label{eqn:vmf-tx} \end{align} where $\vec{U}$ is a uniform random variable on the sphere $\mathbb{S}^{d-2}$ and $X$ has density \begin{align} f(x) = \frac{ (\kappa / 2)^{d/2 - 1} (1 - x^2)^{(d-3)/2} \exp(\kappa x) }{ \sqrt{\pi} \cdot I_{d/2 - 1}(\kappa) \cdot \Gamma((d-1)/2) } \cdot \ind(-1 < x < 1). \label{eqn:vmf-target} \end{align} To obtain a draw from $\vec{V} \sim \text{VMF}_d(\vec{\mu}, \kappa)$ with an arbitrary $\vec{\mu}$, we can rotate $\vec{V} = \vec{Q} \vec{V}_0$ using an orthonormal matrix $\vec{Q}$ whose first column is $\vec{\mu}$. To generate from $\vec{V}_0$ in \eqref{eqn:vmf-tx}, we may draw $\vec{U} = \vec{Z} / \sqrt{\vec{Z}^\top \vec{Z}}$ from $\vec{Z} \sim \text{N}(\vec{0}, \vec{I}_{d-1})$ and $X$ independently from \eqref{eqn:vmf-target}. In the following, we consider the use of VWS to draw the univariate random variable $X$. Before proceeding, we identify a useful distribution. ::: {#def-texp} Denote $X \sim \text{Exp}_{(a,b)}(\kappa)$ as a "doubly truncated" Exponential random variable with density \begin{align*} g(x) = \frac{\kappa e^{\kappa x}}{e^{\kappa b} - e^{\kappa a}} \cdot \ind(a < x < b), \end{align*} where $-\infty < a < b < \infty$ and rate $\kappa$ may be any real number. The CDF and quantile function corresponding to $g$ are \begin{align*} &G(x) = \frac{e^{\kappa x} - e^{\kappa a}}{e^{\kappa b} - e^{\kappa a}} \cdot, \quad x \in (a, b), \\ % &G^{-1}(\varphi) = \frac{1}{\kappa} \log\left[e^{\kappa a} + \varphi (e^{\kappa b} - e^{\kappa a}) \right], \quad \varphi \in [0,1]. \end{align*} \remarkqed ::: Returning to target \eqref{eqn:vmf-target}, let us decompose $f$ into \begin{align*} f(x) \propto \underbrace{(1 - x^2)^{(d-3)/2}}_{w(x)} \underbrace{\exp(\kappa x) \cdot \ind(-1 < x < 1)}_{g_0(x)}, \end{align*} excluding terms from the normalizing constant, so that $w$ is the weight function and $g_0$ is proportional to the density of $\text{Exp}_{(-1,1)}(\kappa)$, \begin{align*} g(x) = \frac{\kappa e^{\kappa x}}{e^\kappa - e^{-\kappa}} \cdot \ind(-1 < x < 1). \end{align*} @sec-vmf-const-default obtains a VWS sampler with this decomposition using a constant majorizer. @sec-vmf-const-custom replaces the default numerical optimization with custom code, which reduces the amount of computational overhead. @sec-vmf-linear considers a linear majorizer which is substantially more involved but also obtains substantially lower rejection rates with a moderate number of regions. Codes for this example are in the folder `examples/vmf`. C++ functions for the $\text{Exp}_{(a,b)}(\kappa)$ distribution from @def-texp are given in the file `examples/vmf/texp.h`. ::: {#rem-vmf-caveat} A caveat of this decomposition is that, in the $d < 3$ case, $w(x) \rightarrow \infty$ as $x$ approaches $\pm 1$. One way to avoid this is by truncating the support to $(\alpha_0, \alpha_N] = (-1 + \epsilon, 1 - \epsilon]$ for a small $\epsilon > 0$. Rejection sampling can proceed using the truncated support if the exclusion of $(-1, -1 + \epsilon] \cup (1 - \epsilon, 1]$ is known to have a negligible impact on the result. Otherwise, @VWS2025 mention another strategy where the support is initially truncated and gradually expanded as rejections are encountered. In this document, we assume $(\alpha_0, \alpha_N] = (-1, 1]$ for $d > 3$ and a fixed truncation $(\alpha_0, \alpha_N] = (-1 + \epsilon, 1 - \epsilon]$ for $d < 3$. ::: ```{r} #| echo: false #| prompt: true Rcpp::sourceCpp("examples/vmf/functions.cpp") ``` ## Constant Majorizer with Numerical Optimization {#sec-vmf-const-default} We now give our first example demonstrating the `vws` package. We consider a constant majorizer for $w$ which uses the default numerical optimization routine to identify appropriate constants $\overline{w}_j$ and $\underline{w}_j$. Because this is our first example, the source file for the sampler (`examples/vmf/vmf-v1.cpp`) is displayed in its entirety as follows. ``` {.cpp include="examples/vmf/vmf-v1.cpp" code-line-numbers="true"} ``` 1. Ensure that this code links with the `vws` and `fntl` packages during compilation. 2. Include the header for C++ framework in `vws`. 3. Include `texp.h`, which provides functions described in @def-texp. 4. Define a C++ function which invokes the sampler and export it for use in R. 5. Prepare a struct with extra arguments for rejection sampling. 6. Define the weight function as a lambda. We are careful to avoid `nan` values that can occur when $x = \pm 1$. Computations are carried out on the log-scale to avoid numerical loss of precision. 7. Specify the density, CDF, and quantile function of the base distribution $\text{Exp}_{(-1,1)}(\kappa)$. The density, CDF, and quantile function types are defined in @sec-api-typedefs. 8. Create a "helper" object as a container for the distribution functions. 9. Construct a `RealConstRegion` which contains all problem-specific logic of the sampler. We construct one initial region which contains the entire support $(-1,1]$. 10. Construct an `FMMProposal` based on our initial region `supp`. The first template argument specifies that the data type of the support is `double`; the second specifies that regions (which include the logic of the sampler) are of type `RealConstRegion`. 11. Request the proposal `h` to refine itself `N-1` times using Algorithm \ref{alg:refine} so that there are `N` regions. 12. Carry out rejection sampling with proposal `h`. 13. Assemble an `Rcpp::List` to return to the caller. It contains the draws in element `draws`, a vector of rejection counts in `rejects` where the $i$th element represents the number of rejections for the $i$th draw, and a vector in element `lbdd` with the $N$ bounds $\rho_+^{(1)}, \ldots, \rho_+^{(N)}$, where $\rho_+^{(j)}$ is the value achieved with $j$ regions. The name of the function `r_vmf_pre_v1` reflects that this is our first version of the sampler for target \eqref{eqn:vmf-target}, which is a precursor to transformation \eqref{eqn:vmf-tx} to obtain a VMF random variable. The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/vmf/vmf-v1.cpp") out1 = r_vmf_pre_v1(n = 1000, kappa = 5, d = 4, N = 50, tol = 0.10) head(out1$draws) ``` @fig-vmf-const-refine plots the bound for the rejection probability during the refinement process, which is captured in the variable `out$lbdd`. @fig-vmf-const-draws plots the empirical distribution of the draws and compares them to the density. ```{r} #| out-width: 60% #| fig-cap: | #| Refinement for VMF example with constant majorizer. #| label: fig-vmf-const-refine #| echo: false df = data.frame(bdd = exp(out1$lbdd)) %>% mutate(step = row_number() - 1) ggplot(df) + geom_line(aes(x = step, y = bdd)) + scale_x_continuous(breaks = df$step[df$step %% 2 == 0]) + scale_y_continuous(n.breaks = 10) + xlab("Step") + ylab("Bound") + theme_minimal() ``` ```{r} #| out-width: 60% #| fig-cap: | #| Empirical distribution of draws (solid) versus target (dashed) for VMF #| example with constant majorizer. #| label: fig-vmf-const-draws #| echo: false data.frame(x = out1$draws) %>% ggplot() + geom_density(aes(x = x), col = "black") + geom_function(fun = d_target, args = list(kappa = 5, d = 4), lty = 2) + ylab("Empirical Density") + theme_minimal() ``` ## Constant Majorizer with Custom Optimization {#sec-vmf-const-custom} It is not difficult to find the minimum and maximum of the function $w$ on a given interval for this example. We can reduce some of the computational burden by providing functions to compute these two values. We have \begin{align*} &\log w(x) = \frac{d-3}{2} \log(1 - x^2), \\ &\frac{d}{dx} \log w(x) = -(d-3) \frac{x}{1 - x^2}, \\ &\frac{d^2}{dx^2} \log w(x) = -(d-3) \frac{1 + x^2}{(1 - x^2)^2}. \end{align*} Therefore, it is seen that $\log w(x)$ is concave when $d > 3$, convex when $d = 2$, and constant otherwise. When $d > 3$, $\frac{d}{dx} \log w(x)$ is positive for $x \in (-1, 0)$, negative for $x \in (0, 1)$, and has root $x = 0$; therefore, $\log w(x)$ is unimodal on $(-1, 1)$ with a maximum at $x = 0$. When $d = 2$, the point $x = 0$ is instead a minimum of $\log w(x)$. Finally, $w$ is a constant in the case $d = 3$. The following code implements these maximization and minimization routines as lambdas. ```{cpp} vws::optimizer opt1 = [&](const vws::dfdb& w, double lo, double hi, bool log) { double x = 0; if (hi < 0) { x = hi; } else if (lo > 0){ x = lo; } double out = w(x, true); return log ? out : std::exp(out); }; ``` ```{cpp} vws::optimizer opt2 = [&](const vws::dfdb& w, double lo, double hi, bool log) { double w_lo = w(lo, true); double w_hi = w(hi, true); double out = std::min(w_lo, w_hi); return log ? out : std::exp(out); }; ``` The following snippet creates pointers `maxopt` and `minopt` for the maximizer and minimizer functions. The condition $d > 3$ is checked to determine which of `opt1` and `opt2` is the maximizer and which is the minimizer. ```{cpp} vws::optimizer* maxopt; vws::optimizer* minopt; if (d >= 3) { maxopt = &opt1; minopt = &opt2; } else { minopt = &opt1; maxopt = &opt2; } ``` Finally, we create the initial region `supp`. The `maxopt` and `minopt` functions are provided to the constructor as additional arguments. Note that we dereference our pointers and pass the function objects themselves (which are taken as references by the constructor). ```{cpp} vws::UnivariateHelper helper(df, pf, qf); vws::RealConstRegion supp(-1, 1, w, helper, *maxopt, *minopt); ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/vmf/vmf-v2.cpp") out2 = r_vmf_pre_v2(n = 1000, kappa = 5, d = 4, N = 50, tol = 0.10) head(out2$draws) ``` ## Linear Majorizer {#sec-vmf-linear} We noted in @sec-vmf-const-custom that $w$ is log-convex when $d < 3$, log-concave when $d > 3$, and a constant otherwise. Therefore, we can majorize $w$ with exponentiated linear functions of the form $\overline{w}_j(x) = \exp\{ \overline{\beta}_{j0} + \overline{\beta}_{j1} x \}$. This yields the expression \begin{align*} \overline{\xi}_j = \int_{\alpha_{j-1}}^{\alpha_j} \overline{w}_j(x) g(x) dx = \frac{\kappa \exp\{ \overline{\beta}_{j0} \}}{(\kappa + \overline{\beta}_{j1}) (e^{\kappa \alpha_N} - e^{\kappa \alpha_0})} \left\{ \exp\{(\kappa + \overline{\beta}_{j1}) \alpha_j\} - \exp\{(\kappa + \overline{\beta}_{j1}) \alpha_{j-1} \} \right\}. \end{align*} The proposal $h$ is then a finite mixture $h(x) = \sum_{j=1}^N \pi_j g_j(x)$ with \begin{align*} g_j(x) &= \overline{w}_j(x) g(x) \ind\{ x \in \mathscr{D}_j \} / \overline{\xi}_j \\ &= \frac{ (\kappa + \overline{\beta}_{j1}) \exp\{(\kappa + \overline{\beta}_{j1}) x\} }{ \exp\{(\kappa + \overline{\beta}_{j1}) \alpha_j\} - \exp\{(\kappa + \overline{\beta}_{j1}) \alpha_{j-1} \} } \cdot \ind(\alpha_{j-1} < x \leq \alpha_j), \end{align*} the density of $\text{Exp}_{(\alpha_{j-1}, \alpha_j]}(\kappa + \overline{\beta}_{j1})$. @rem-expansion-point is used to select the expansion point $c$ to determine coefficients for the majorizer in the log-concave case, with \begin{align*} M_j(s) = \int_{\alpha_{j-1}}^{\alpha_j} e^{sx} g(x) dx = \frac{e^{s\alpha_j} - e^{s\alpha_{j-1}}}{s(\alpha_j - \alpha_{j-1})}. \end{align*} To compute \eqref{eqn:bound}, we assume the "trivial" minorizer $\underline{w}_j(x) = w(x)$ so that \begin{align*} \underline{\xi}_j = \int_{\alpha_{j-1}}^{\alpha_j} w(x) g(x) dx = \int_{\alpha_{j-1}}^{\alpha_j} \frac{(1-x^2)^{(d-3)/2} \kappa e^{\kappa x}}{e^{\kappa \alpha_N} - e^{\kappa \alpha_0}} dx, \end{align*} which we compute using numerical integration. The proposal is implemented with the `vws` package by inheriting from the abstract `Region` base class (@sec-api-region) and implementing each of the functions using the expressions above. We name the resulting subclass `LinearVWSRegion`; its complete code is given in `examples/vmf/LinearVWSRegion.h`. The code in `examples/vmf/vmf-v3.cpp` instantiates a proposal with regions of type `LinearVWSRegion`, invokes rejection sampling, and returns an `Rcpp::List` with the results. This code is displayed below. ``` {.cpp include="examples/vmf/vmf-v3.cpp" code-line-numbers="true"} ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/vmf/vmf-v3.cpp") out3 = r_vmf_pre_v3(n = 1000, kappa = 5, d = 4, N = 50, tol = 0.01) head(out3$draws) ``` ::: {#rem-vmf-endpoints} This code will fail if we attempt to use it with $d = 2$ as described in @rem-vmf-endpoints. Here, $\log w(x) \rightarrow \infty$ as $x \rightarrow \pm 1$ so that the first and last regions cannot be bounded by an exponentiated linear form. This can be averted by truncating the support to $(-1 + \epsilon, 1 - \epsilon]$ for a small $\epsilon > 0$. \remarkqed ::: @fig-vmf-const-draws plots the empirical distribution of the draws and compares them to the density. @fig-vmf-refine plots the bound for the rejection probability for this sampler after each step of refining, along with those from the previous two versions. A substantial improvement in efficiency is seen here; fewer regions are needed to achieve a small rejection probability. Although versions 1 and 2 of the sampler are based on the same proposal, the changes to their bounds are seen to differ due to the randomness in knot selection. ```{r} #| out-width: 60% #| fig-cap: | #| Empirical distribution of draws (solid) versus target (dashed) for VMF #| example with linear majorizer. #| label: fig-vmf-linear-draws #| echo: false data.frame(x = out3$draws) %>% ggplot() + geom_density(aes(x = x), col = "black") + geom_function(fun = d_target, args = list(kappa = 5, d = 4), lty = 2) + ylab("Empirical Density") + theme_minimal() ``` ```{r} #| out-width: 60% #| fig-cap: | #| Refinement for VMF example with constant majorizer. #| label: fig-vmf-refine #| echo: false df1 = data.frame(sampler = "v1", bdd = exp(out1$lbdd)) %>% mutate(N = row_number()) df2 = data.frame(sampler = "v2", bdd = exp(out2$lbdd)) %>% mutate(N = row_number()) df3 = data.frame(sampler = "v3", bdd = exp(out3$lbdd)) %>% mutate(N = row_number() + 1) df = bind_rows(df1, df2, df3) %>% mutate(N = as.integer(N)) ggplot(df) + geom_line(aes(x = N, y = bdd, color = sampler), lwd = 1.02, alpha = 1) + geom_point(aes(x = N, y = bdd, pch = sampler, color = sampler), cex = 3, alpha = 1) + scale_x_continuous(breaks = df$N[df$N %% 2 == 0]) + scale_y_continuous(n.breaks = 10) + xlab("N") + ylab("Bound") + theme_minimal() ``` # Example: Lognormal-Normal Conditional Distribution {#sec-ln-norm} Suppose an official statistics agency collects sensitive data from respondents in a survey or census. The agency then seeks to disseminate a tabulation to data users while protecting the privacy of the respondents. The agency can consider adding random noise $\gamma$ to the tabulation $Y$ and releasing $Z = Y + \gamma$. This setting is the basis of a modeling scenario considered by @DirectSamplingDAS2021, @DPSimulation2022, and @DASMethods2025 where the distribution for $\gamma$ has been selected by the agency and the data have been disseminated; the objective in these works is to carry out inference on the unobserved $Y$ given observed $Z = z$. The field of differential privacy studies the design of noise mechanisms (including distributions of $\gamma$ in the present setting) to satisfy mathematical criteria for privacy [e.g., @DworkRoth2014]. @AbowdEtAl2022 describe work by the U.S. Census Bureau to implement differential privacy in the release of data from the decennial census. Our present motivation is to consider a simple but nontrivial sampling problem that arises in analysis of the released $z$. Suppose $Y$ and $\gamma$ are independently distributed with $Y \sim \text{Lognormal}(\mu, \sigma^2)$ and $\gamma \sim \text{N}(0, \lambda^2)$. The variance $\lambda^2$ is assumed to be known and provided with the noisy data. Such transparency into the noise mechanism is often featured in differential privacy. Suppose the target distribution is the conditional $[Y \mid Z = z, \mu, \sigma^2]$ whose density is given by \begin{align} f(y \mid z, \mu, \sigma^2) % &\propto \frac{1}{\lambda \sqrt{2\pi}} \exp\left\{ -\frac{1}{2\lambda^2} (z - y)^2 \right\} \cdot \frac{1}{y\sigma \sqrt{2\pi}} \exp\left\{ -\frac{1}{2\sigma^2} (\log y - \mu)^2 \right\} \ind(y > 0) \nonumber \\ % &\propto \underbrace{\frac{1}{\lambda \sqrt{2\pi}} \exp\left\{ -\frac{1}{2\lambda^2} (z - y)^2 \right\}}_{g(y)} \cdot \underbrace{\frac{1}{y} \exp\left\{ -\frac{1}{2\sigma^2} (\log y - \mu)^2 \right\} \ind(y > 0)}_{w(y)}. \label{eqn:ln-norm} \end{align} We have decomposed $f$ into weight function \begin{math} w(y) = \frac{1}{y} \exp\left\{ -\frac{1}{2\sigma^2} (\log y - \mu)^2 \right\} \ind\{ y \in (0,\infty) \}, \end{math} from the Lognormal component---dropping some of the terms in its normalizing constant---and base distribution whose density $g$ is $\text{N}(z, \lambda^2)$. The conditional $[\vec{y} \mid \sigma^2, \vec{\mu}, \vec{z}]$ in \eqref{eqn:ln-norm} would be encountered in a Gibbs sampler for the posterior distribution $[\vec{y}, \vec{\mu}, \sigma^2 \mid \vec{z}]$, based on an observed sample $\vec{z} = (z_1, \ldots, z_n)$ with augmented data $\vec{y} = (y_1, \ldots, y_n)$. The remaining conditionals $[\vec{\mu} \mid \sigma^2, \vec{y}, \vec{z}]$ and $[\sigma^2 \mid \vec{\mu}, \vec{y}, \vec{z}]$ may be straightforward to generate if a convenient prior distribution is selected; therefore, we will focus on \eqref{eqn:ln-norm}. Before proceeding, let us fix the following values for the parameters. ```{r} #| prompt: true mu = 5 sigma = sqrt(0.5) lambda = 10 ``` Jointly draw values of $Y$ and $Z$ from the model; $Z$ will be observed by users of the data while $Y$ is unobserved and the objective for inference. ```{r} #| prompt: true source("examples/ln-norm/functions.R") y_true = rlnorm(1, mu, sigma) z = rnorm(1, y_true, lambda) vws::printf("y_true = %g and z = %g\n", y_true, z) ``` Coding the target density is helpful to evaluate the distribution of the draws. To compute the normalizing constant, we use Hermite quadrature via the `gauss.quad` function in the `statmod` package [@Smyth2005]. The integral $\psi = \int_{-\infty}^\infty q(x) e^{-x^2} dx$ is approximated as $\psi \approx \sum_{j=1}^Q \omega_j q(x_j)$ using quadrature points $x_1, \ldots, x_Q$ and weights $\omega_1, \ldots, \omega_Q$; to identify the function $q$, we have \begin{align*} \psi &= \int_{-\infty}^\infty w(y) g(y) dy \\ % &= \int_{-\infty}^\infty \ind\{y > 0\} \cdot \frac{1}{y} \exp\left\{ -\frac{1}{2\sigma^2} (\log y - \mu)^2 \right\} \frac{1}{\lambda \sqrt{2\pi}} \exp\left\{ -\frac{1}{2\lambda^2} (z - y)^2 \right\} dy \\ % &= \int_{-\infty}^\infty \ind\left( z > \sqrt{2} \lambda x \right) \cdot \frac{1}{z - \sqrt{2} \lambda x} \exp\left\{ -\frac{1}{2\sigma^2} \left[\log\left( z - \sqrt{2} \lambda x \right) - \mu \right]^2 \right\} \frac{1}{\sqrt{\pi}} e^{-x^2} dx, \end{align*} using the transformation $y = z - \sqrt{2} \lambda x$, so that \begin{align*} q(x) = \ind\left( z > \sqrt{2} \lambda x \right) \cdot \frac{1}{z - \sqrt{2} \lambda x} \exp\left\{ -\frac{1}{2\sigma^2} \left[\log\left( x - \sqrt{2} \lambda x \right) - \mu \right]^2 \right\} \frac{1}{\sqrt{\pi}} \end{align*} is used with `gauss.quad`. We will consider three variations of a VWS sampler, progressing from easier-to-implement to more computationally efficient. @sec-ln-norm-const-default considers a constant majorizer where the constant for each region is obtained by numerical optimization. @sec-ln-norm-const-custom replaces numerical optimization with code for a closed-form solution. @sec-ln-norm-linear makes use of a linear majorizer which subclasses the abstract `Region` class. All codes for this example are in the folder `examples/ln-norm`. The function `d_target` defined in `examples/ln-norm/functions.R` computes the target density. ## Constant Majorizer with Numerical Optimization {#sec-ln-norm-const-default} We may adapt the code from @sec-vmf-const-default to the target density in the present problem. The present weight function may be coded as follows. ```{cpp} const vws::dfdb& w = [&](double x, bool log = true) { double out = R_NegInf; if (x > 0) { double sigma2 = std::pow(sigma, 2) out = -std::log(x) - std::pow(std::log(x) - mu, 2) / (2 * sigma2); } return log ? out : std::exp(out); }; ``` The base distribution's density, CDF, and quantile function are coded as lambdas and packaged into a `UnivariateHelper` object. Here we can make use of the implementations of the normal distribution in R's API. ```{cpp} fntl::density df = [&](double x, bool log = false) { return R::dnorm(x, z, lambda, log); }; fntl::cdf pf = [&](double q, bool lower = true, bool log = false) { return R::pnorm(q, z, lambda, lower, log); }; fntl::quantile qf = [&](double p, bool lower = true, bool log = false) { return R::qnorm(p, z, lambda, lower, log); }; vws::UnivariateHelper helper(df, pf, qf); ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/ln-norm/ln-norm-v1.cpp") out1 = r_ln_norm_v1(n = 1000, z, mu, sigma, lambda, N = 50, tol = 0.10) head(out1$draws) ``` @fig-ln-norm-const-refine plots the bound for the rejection probability during the refinement process, which is captured in the variable `out$lbdd`. @fig-ln-norm-const-draws plots the empirical distribution of the draws and compares them to the density. It displays an interval based on the $0.025$ and $0.975$ quantiles of the distribution $[Y \mid Z = z]$ approximated from the empirical quantiles of the draws. The value of the observed $z$ and the latent $y$ are also highlighted for reference. ```{r} #| out-width: 60% #| fig-cap: | #| Refinement for lognormal-normal example with constant majorizer. #| label: fig-ln-norm-const-refine #| echo: false df = data.frame(bdd = exp(out1$lbdd)) %>% mutate(step = row_number() - 1) ggplot(df) + geom_line(aes(x = step, y = bdd)) + scale_x_continuous(breaks = df$step[df$step %% 2 == 0]) + scale_y_continuous(n.breaks = 10) + xlab("Step") + ylab("Bound") + theme_minimal() ``` ```{r} #| out-width: 60% #| fig-cap: | #| Empirical distribution of draws (solid black) versus target (dashed black) #| for lognormal-normal example with constant majorizer. Observed value of #| $z$ (blue line) and latent value of $y$ (red line) are displayed along #| with a 95\% interval (blue ribbon) based on draws from $[Y \mid Z = z]$. #| label: fig-ln-norm-const-draws #| echo: false interval_lo = quantile(out1$draws, probs = 0.025) interval_hi = quantile(out1$draws, probs = 0.975) data.frame(x = out1$draws) %>% ggplot() + geom_density(aes(x = x), col = "black") + geom_function(fun = d_target, lty = 2, args = list(mu = mu, sigma = sigma, z = z, lambda = lambda)) + annotate("rect", xmin = interval_lo, xmax = interval_hi, ymin = 0, ymax = Inf, alpha = 0.1, fill = "blue") + geom_vline(xintercept = z, col = "blue", lwd = 1.05) + geom_vline(xintercept = y_true, col = "red", lwd = 1.05) + ylab("Empirical Density") + theme_minimal() ``` ## Constant Majorizer with Custom Optimization {#sec-ln-norm-const-custom} The log-weight function can be both maximized and minimized in closed form. Coding it explicitly reduces computational overhead and avoids convergence issues from numerical optimization. This section will demonstrate how to override the default `optimize` method of `RealConstRegion`. We have first derivative \begin{align*} \frac{d}{dy} \log w(y) = -\frac{1}{y}\left( 1 + \frac{\log y - \mu}{\sigma^2} \right), \end{align*} for $y \in (0, \infty)$. Let $y^* = \exp(\mu - \sigma^2)$; it is seen that $\frac{d}{dy} \log w(y)$ is positive when $y < y^*$, negative when $y > y^*$, and takes value zero at $y = y^*$. Therefore, $\log w(y)$ is unimodal and $y^*$ maximizes $\log w(y)$ with \begin{align*} \log w(y^*) &= -\log[\exp(\mu - \sigma^2)] - \frac{(\log [\exp(\mu - \sigma^2)] - \mu)^2}{2 \sigma^2} \\ &= -\mu + \sigma^2 - \frac{(\sigma^2)^2}{2 \sigma^2} \\ &= -\mu + \sigma^2 / 2. \end{align*} Therefore, on a region $\mathscr{D}_j = (\alpha_{j-1}, \alpha_j]$ where both endpoints are smaller than $y^*$, the maximum of $\log w(y)$ is $\log w(\alpha_j)$ and the minimum is $\log w(\alpha_{j-1})$. On the other hand, for a region where both endpoints are larger than $y^*$, the maximum of $\log w(y)$ is $\log w(\alpha_{j-1})$ and the minimum is $\log w(\alpha_j)$. @fig-ln-norm-weight displays a plot of $\log w(y)$ with our selected $\mu$ and $\sigma^2$ values. ```{r} #| prompt: true y_star = exp(mu - sigma^2) w_star = -mu + sigma^2 / 2 vws::printf("Maximizer y = %g obtains value log w(%g) = %g.\n", y_star, y_star, w_star) ``` ```{r} #| out-width: 60% #| fig-cap: | #| Weight function for Lognormal-Normal example on the log-scale, with the #| maximizer $y^* = \exp(\mu - \sigma^2)$ highlighted. #| label: fig-ln-norm-weight #| echo: false xlim = y_star + c(-10,10) w_one = function(y, log = T) { out = -Inf if (y > 0) { out = -log(y) - (log(y) - mu)^2 / (2*sigma^2) } if (log) { return(out) } else { return(exp(out)) } } w = Vectorize(w_one, vectorize.args = "y") ggplot() + geom_function(fun = w, xlim = xlim) + geom_vline(xintercept = y_star, lty = 2) + geom_hline(yintercept = w_star, lty = 2) + xlab("y") + ylab(expression("log w(y)")) + theme_minimal() ``` The maximizer may be coded as follows. ```{cpp} const vws::optimizer& maxopt = [&](const vws::dfdb& w, double lo, double hi, bool log) { double y_star = exp(mu - sigma2); double y = y_star; if (y_star > hi) { y = hi; } else if (y_star < lo) { y = lo; } double out = w(y, true); return log ? out : exp(out); }; ``` Here is code for the minimizer. ```{cpp} const vws::optimizer& minopt = [&](const vws::dfdb& w, double lo, double hi, bool log) { double lwa = w(lo, true); double lwb = w(hi, true); double out = std::min(lwa, lwb); return log ? out : exp(out); }; ``` We can construct the proposal using a single `RealConstRegion` that represents the support; the `maxopt` and `minopt` arguments are specified here. ```{cpp} vws::RealConstRegion supp(0, R_PosInf, w, helper, maxopt, minopt); ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/ln-norm/ln-norm-v2.cpp") out2 = r_ln_norm_v2(n = 1000, z, mu, sigma, lambda, N = 50, tol = 0.10) head(out2$draws) ``` ## Linear Majorizer {#sec-ln-norm-linear} We can obtain a linear majorizer by noting the convexity of $\log w(y)$. Its second derivative is seen to be \begin{align*} &\frac{d^2}{dy^2} \log w(y) = -\frac{1}{y^2} \left( 1 + \frac{\log y - \mu - 1}{\sigma^2} \right), \end{align*} which has root $y_0 = \exp\{ \mu - \sigma^2 + 1 \}$. The weight function $w$ is log-concave for $y < y_0$ and log-convex for $y > y_0$. This is plotted in @fig-ln-norm-weight-convexity. We will assume that there are two initial regions $\mathscr{D}_1 = (0, y_0]$ and $\mathscr{D}_2 = (y_0, \infty]$ so that all partitions considered thereafter consist of regions on which $\log w(y)$ is entirely concave or convex. ```{r} #| prompt: true y0 = exp(mu - sigma^2 + 1) printf("Convexity changes at y = %g.\n", y0) ``` ```{r} #| out-width: 60% #| fig-cap: | #| Weight function for Lognormal-Normal example on the log-scale, #| highlighting $y_0 = \exp(\mu - \sigma^2 + 1)$ where there is a change in #| convexity. #| label: fig-ln-norm-weight-convexity #| echo: false xlim = c(0, 2000) ggplot() + geom_function(fun = w, xlim = xlim) + geom_vline(xintercept = y0, lty = 2) + xlab("y") + ylab(expression("log w(y)")) + theme_minimal() ``` As in @sec-vmf-linear, we can majorize $w$ with exponentiated linear functions of the form $\overline{w}_j(y) = \exp\{ \overline{\beta}_{j0} + \overline{\beta}_{j1} y \}$. Here, we also assume a minorizer of the form $\underline{w}_j(y) = \exp\{ \underline{\beta}_{j0} + \underline{\beta}_{j1} y \}$. Before proceeding, the following remark gives an integral that will be used several times. ::: {#rem-normal-mgf-integral} Let $\phi(\cdot \mid \mu, \sigma^2)$ and $\Phi(\cdot \mid \mu, \sigma^2)$ be the density and CDF of $Y \sim \text{N}(\mu, \sigma^2)$, respectively. If $a < b$ are scalars (possibly infinite), then \begin{align*} \int_a^b e^{ty} \phi(y \mid \mu, \sigma^2) dy &= \exp(\mu t + t^2 \sigma^2 / 2) \left\{ \Phi(b \mid \mu + t \sigma^2, \sigma^2) - \Phi(a \mid \mu + t \sigma^2, \sigma^2) \right\} \\ &= \exp(\mu t + t^2 \sigma^2 / 2) \Prob(a < T \leq b). \end{align*} where $T \sim \text{N}(\mu + t \sigma^2, \sigma^2)$. The special case $a = -\infty$ and $b = \infty$ yields the moment-generating function $M(t) = \exp(\mu t + t^2 \sigma^2 / 2)$ of $Y$. \remarkqed ::: @rem-normal-mgf-integral yields the expressions \begin{align*} \overline{\xi}_j &= \int_{\alpha_{j-1}}^{\alpha_j} \overline{w}_j(y) g(y) dy \\ % &= \exp\{ \overline{\beta}_{j0} \} \int_{\alpha_{j-1}}^{\alpha_j} \exp\{ \overline{\beta}_{j1} y \} \frac{1}{\lambda \sqrt{2\pi}} \exp\left\{ -\frac{1}{2 \lambda^2} (y - z)^2 \right\} dy \\ % &= \exp\left\{ \overline{\beta}_{0j} + z \overline{\beta}_{1j} + \frac{1}{2} \overline{\beta}_{1j}^2 \lambda^2 \right\} \Prob(\alpha_{j-1} < \overline{T} \leq \alpha_j), \end{align*} where $\overline{T} \sim \text{N}(z + \overline{\beta}_{1j} \lambda^2, \lambda^2)$, and similarly, \begin{align*} \underline{\xi}_j &= \exp\left\{ \underline{\beta}_{0j} + z \underline{\beta}_{1j} + \frac{1}{2} \underline{\beta}_{1j}^2 \lambda^2 \right\} \Prob(\alpha_{j-1} < \underline{T} \leq \alpha_j), \end{align*} where $\underline{T} \sim \text{N}(z + \underline{\beta}_{1j} \lambda^2, \lambda^2)$. The proposal $h$ is a finite mixture $h(y) = \sum_{j=1}^N \pi_j g_j(y)$ with \begin{align*} g_j(y) &= \overline{w}_j(y) g(y) \ind\{ y \in \mathscr{D}_j \} / \overline{\xi}_j \\ % &= \exp\{ \overline{\beta}_{j0} + \overline{\beta}_{j1} y \} \frac{1}{\lambda \sqrt{2\pi}} \exp\left\{ -\frac{1}{2 \lambda^2} (y - z)^2 \right\} \cdot \ind(\alpha_{j-1} < y \leq \alpha_j) / \overline{\xi}_j \\ % &= \frac{ \frac{1}{\lambda \sqrt{2\pi}} \exp\left\{ -\frac{1}{2 \lambda^2} (y - z - \lambda^2 \overline{\beta}_{1j})^2 \right\} }{ \Prob(\alpha_{j-1} < \overline{T} \leq \alpha_j) } \cdot \ind(\alpha_{j-1} < y \leq \alpha_j), \end{align*} which is the density of $\text{N}(z + \lambda^2 \overline{\beta}_{1j}, \lambda^2)$ truncated to the interval $(\alpha_{j-1}, \alpha_j]$. @rem-expansion-point is used to select the expansion point $c$ to determine coefficients for the majorizer in the log-concave case and the minorizer in the log-convex case, with \begin{align*} M_j(s) = \exp(z s + s^2 \lambda^2 / 2) \frac{ \Phi(\alpha_j \mid z + s \lambda^2, \lambda^2) - \Phi(\alpha_{j-1} \mid z + s \lambda^2, \lambda^2) }{ \Phi(\alpha_j \mid z, \lambda^2) - \Phi(\alpha_{j-1} \mid z, \lambda^2) }. \end{align*} Adapting the code from @sec-vmf-linear to the present problem, we implement `LinearVWSRegion` as a subclass of the abstract `Region` base class (@sec-api-region). See the file `examples/ln-norm/LinearVWSRegion.h`. The code in `examples/ln-norm/ln-norm-v3.cpp` instantiates a proposal from this region type and proceeds with rejection sampling. This code is displayed below. ``` {.cpp include="examples/ln-norm/ln-norm-v3.cpp" code-line-numbers="true"} ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/ln-norm/ln-norm-v3.cpp") out3 = r_ln_norm_v3(n = 1000, z, mu, sigma, lambda, lo = 1e-8, 1e8, N = 50, tol = 0.10) head(out3$draws) ``` @fig-ln-norm-linear-draws plots the empirical distribution of the draws and compares them to the density. @fig-ln-norm-refine plots the bound for the rejection probability for this sampler after each step of refining, along with those from the previous two versions. A substantial improvement in efficiency is seem here; fewer regions are needed to achieve a small rejection probability. Although versions 1 and 2 of the sampler are based on the same proposal, the changes to their bounds are seen to differ due to randomness in knot selection. ```{r} #| out-width: 60% #| fig-cap: | #| Empirical distribution of draws (solid) versus target (dashed) for #| lognormal-normal example with constant majorizer. #| label: fig-ln-norm-linear-draws #| echo: false data.frame(x = out1$draws) %>% ggplot() + geom_density(aes(x = x), col = "black") + geom_function(fun = d_target, lty = 2, args = list(mu = mu, sigma = sigma, z = z, lambda = lambda)) + ylab("Empirical Density") + theme_minimal() ``` ```{r} #| out-width: 60% #| fig-cap: | #| Refinement for lognormal-normal example with constant majorizer. #| label: fig-ln-norm-refine #| echo: false df1 = data.frame(sampler = "v1", bdd = exp(out1$lbdd)) %>% mutate(N = row_number()) df2 = data.frame(sampler = "v2", bdd = exp(out2$lbdd)) %>% mutate(N = row_number()) df3 = data.frame(sampler = "v3", bdd = exp(out3$lbdd)) %>% mutate(N = row_number() + 1) df = bind_rows(df1, df2, df3) %>% mutate(N = as.integer(N)) ggplot(df) + geom_line(aes(x = N, y = bdd, color = sampler), lwd = 1.02, alpha = 1) + geom_point(aes(x = N, y = bdd, pch = sampler, color = sampler), cex = 3, alpha = 1) + scale_x_continuous(breaks = df$N[df$N %% 2 == 0]) + scale_y_continuous(n.breaks = 10) + xlab("N") + ylab("Bound") + theme_minimal() ``` # Example: Bessel Count Distribution {#sec-bessel} Where Sections [-@sec-vmf] and [-@sec-ln-norm] considered continuous distributions, we now generate from a discrete target. The Bessel distribution [@YuanKalbfleisch2000] has density \begin{align*} f(x) = \frac{(\lambda/2)^{2x+\nu}}{I_v(\lambda) \cdot x! \cdot \Gamma(x + \nu + 1)} \cdot \ind\{ x \in \mathbb{N} \}, \quad \nu > -1, \quad \lambda > 0, \end{align*} where $\mathbb{N} = \{ 0, 1, 2, \ldots \}$ is the set of nonnegative integers and where \begin{align*} I_v(\lambda) = \sum_{x=0}^\infty \frac{(\lambda/2)^{2x+\nu}}{x! \cdot \Gamma(x + \nu + 1)} \end{align*} is a modified Bessel function of the first kind. Let $X \sim \text{Bessel}(\lambda, \nu)$ denote a random variable with density $f$. @Devroye2002 develops a rejection sampling method to generate draws of $X$ using properties of the distribution. Here we implement several VWS samplers which do not require as much insight. Our approach is to decompose the density into \begin{align*} f(x) &= \frac{(\lambda/2)^{2x+\nu}}{I_v(\lambda) \cdot x! \cdot \Gamma(x + \nu + 1)} \cdot \ind\{ x \in \mathbb{N} \} \\ % &\propto \underbrace{\frac{1}{\Gamma(x + \nu + 1)}}_{w(x)} \underbrace{\frac{(\lambda/2)^{2x} e^{-\lambda^2 / 4}}{x!} \ind\{ x \in \mathbb{N} \}}_{g(x)}, \end{align*} disregarding several terms in the normalizing constant, so that $g$ is the density of a $\text{Poisson}(\lambda^2 / 4)$ distribution and the weight function is specified by $\log w(x) = -\log \Gamma(x + \nu + 1)$. Before proceeding, let us fix values of $\lambda$ and $\nu$. Let us also load some useful R functions from the script `examples/bessel/bessel.R`; primarily, `d_bessel` which computes the target density. ```{r} #| prompt: true source("examples/bessel/bessel.R") lambda = 10 nu = 2 ``` ```{r} #| echo: false source("examples/common/plots.R") xseq = seq(0, 15) fseq = d_bessel(xseq, lambda, nu) lwseq = w(xseq, log = T) ``` Three variations of VWS samplers are considered in subsections. @sec-bessel-const-default uses a constant majorizer obtained by numerical optimization. @sec-bessel-const-custom replaces numerical optimization with a closed-form solution. @sec-bessel-linear uses a linear majorizer and subclasses the abstract `Region` class. All codes for this example are in the folder `examples/bessel`. The function `d_bessel` defined in `examples/bessel/bessel.R` computes the target density. ## Constant Majorizer with Numerical Optimization {#sec-bessel-const-default} The weight function may be coded as a lambda in C++ as follows. ```{cpp} const vws::dfdb& w = [&](double x, bool log = true) { double out = -std::lgamma(x + nu + 1); return log ? out : std::exp(out); }; ``` The base distribution’s density, CDF, and quantile function may be supplied as a `UnivariateHelper` object using Poisson functions in the R API. ```{cpp} fntl::density df = [&](double x, bool log = false) { return R::dpois(x, mean, log); }; fntl::cdf pf = [&](double q, bool lower = true, bool log = false) { return R::ppois(q, mean, lower, log); }; fntl::quantile qf = [&](double p, bool lower = true, bool log = false) { return R::qpois(p, mean, lower, log); }; vws::UnivariateHelper helper(df, pf, qf); ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/bessel/bessel-v1.cpp") out1 = r_bessel_v1(n = 1000, lambda, nu, N = 50, tol = 0.10) head(out1$draws) ``` @fig-bessel-const-refine plots the bound for the rejection probability during the refinement process, which is captured in the variable `out$lbdd`. @fig-bessel-const-draws plots the empirical distribution of the draws and compares them to the target probability mass function (pmf). ```{r} #| out-width: 60% #| fig-cap: | #| Refinement for lognormal-normal example with constant majorizer. #| label: fig-bessel-const-refine #| echo: false df = data.frame(bdd = exp(out1$lbdd)) %>% mutate(step = row_number() - 1) ggplot(df) + geom_line(aes(x = step, y = bdd)) + scale_x_continuous(breaks = df$step[df$step %% 2 == 0]) + scale_y_continuous(n.breaks = 10) + xlab("Step") + ylab("Bound") + theme_minimal() ``` ```{r} #| out-width: 60% #| fig-cap: | #| Empirical distribution of draws (bars) versus target (points) for Bessel #| example with constant majorizer. #| label: fig-bessel-const-draws #| echo: false plot_pmf(out1$draws) + geom_point(data = data.frame(x = xseq, y = fseq), aes(x,y)) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) ``` ## Constant Majorizer with Custom Optimization {#sec-bessel-const-custom} We can make several improvements to the default numerical majorizer by coding it explicitly. Note that $\log w(x) = -\log \Gamma(x + \nu + 1)$ is a decreasing function in $x$ so that numerical optimization is not necessary. Therefore, \begin{displaymath} \log w(\alpha_j) \leq \log w(x) \leq \log w(\alpha_{j-1}), \quad x \in \mathscr{D}_j. \end{displaymath} If the endpoints are not integers, we can improve the bounds by restricting consideration to integers, using $\lfloor \alpha_{j-1} \rfloor + 1$ and $\lfloor \alpha_j \rfloor$ to obtain the maximizer and minimizer, respectively. The following functions put this into practice. ```{cpp} const vws::optimizer& maxopt = [](const vws::dfdb& w, double lo, double hi, bool log) { if (lo < 0 && hi < 0) { Rcpp::stop("Did not code this case"); } double x = (lo < 0) ? 0 : std::floor(lo) + 1; double out = w(x, true); return log ? out : std::exp(out); }; const vws::optimizer& minopt = [](const vws::dfdb& w, double lo, double hi, bool log) { if (lo < 0 && hi < 0) { Rcpp::stop("Did not code this case"); } double x = std::isinf(hi) ? hi : std::floor(hi); double out = w(x, true); return log ? out : std::exp(out); }; ``` We can construct the proposal using a single `IntConstRegion` that represents the support; the `maxopt` and `minopt` arguments are specified here. Also note that we have specified a lower endpoint $\alpha_0$ below zero to ensure that the support $(\alpha_0, \alpha_N]$ contains zero. ```{cpp} vws::IntConstRegion supp(-0.1, R_PosInf, w, helper, maxopt, minopt); ``` The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/bessel/bessel-v2.cpp") out2 = r_bessel_v2(n = 1000, lambda, nu, N = 50, tol = 0.10) head(out2$draws) ``` ## Linear Majorizer {#sec-bessel-linear} Our decomposition of $f$ into the given $w$ and $g$ lends itself well to a linear majorizer because the function $\log w(x) = -\log \Gamma(x + \nu + 1)$ is log-concave. In fact, a theorem attributed to @BohrMollerup1922 states that the function $q(x) = \Gamma(x)$ is uniquely characterized as the only function on $(0, \infty)$ which is log-convex with $q(1) = 1$ and $q(x+1) = x q(x)$. The following remark is also useful to formulate the linear majorizer. ::: {#rem-incgamma} By rearranging equation 6.5.13 of @AbramowitzStegun1972, the cdf of $T \sim \text{Poisson}(\lambda)$ can be written as \begin{displaymath} F(x \mid \lambda) = e^{-\lambda} \sum_{k=0}^n \frac{\lambda^k}{k!} = \frac{\Gamma(x + 1, \lambda)}{\Gamma(x + 1)}, \quad x \in \{ 1, 2, \ldots \}, \end{displaymath} where $\Gamma(c,z) = \int_z^\infty t^{c-1} e^{-t} dt$ and $\Gamma(c) = \Gamma(c,0)$. Therefore, for $a, b \in \mathbb{R}$ with $0 \leq a \leq b$ and $T \sim \text{Poisson}(\lambda)$, \begin{align*} \Prob(a < T \leq b) &= \Prob(\lfloor a \rfloor + 1 \leq X \leq \lfloor b \rfloor) \\ &= \sum_{x = \lfloor a \rfloor + 1}^{\lfloor b \rfloor} \frac{e^{-\lambda} \lambda^x}{x!} \\ &= \sum_{x = 0}^{\lfloor b \rfloor} \frac{e^{-\lambda} \lambda^x}{x!} - \sum_{x = 0}^{\lfloor a \rfloor} \frac{e^{-\lambda} \lambda^x}{x!} \\ &= \frac{\Gamma(\lfloor b \rfloor + 1, \lambda)}{\Gamma(\lfloor b \rfloor + 1)} - \frac{\Gamma(\lfloor a \rfloor + 1, \lambda)}{\Gamma(\lfloor a \rfloor + 1)}. \end{align*} To verify the first equality, if $a$ is an integer then \begin{align*} \Prob(a < T \leq b) &= \Prob(T \in \{a + 1, \ldots, \lfloor b \rfloor \} ) \\ &= \Prob(T \in \{ \lfloor a \rfloor + 1, \ldots, \lfloor b \rfloor \}) = \Prob(\lfloor a \rfloor + 1 < T \leq \lfloor b \rfloor); \end{align*} otherwise, if $a$ is not an integer then \begin{align*} \Prob(a < T \leq b) &= \Prob(T \in \{ \lceil a \rceil, \ldots, \lfloor b \rfloor \} ) \\ &= \Prob(T \in \{ \lfloor a \rfloor + 1, \ldots, \lfloor b \rfloor \}) = \Prob(\lfloor a \rfloor + 1 < T \leq \lfloor b \rfloor). \end{align*} \remarkqed ::: We have the expression \begin{align*} \overline{\xi}_j &= \int_{\alpha_{j-1}}^{\alpha_j} \overline{w}_j(x) g(x) d\nu(x) \\ % &= \exp\{ \overline{\beta}_{j0} \} \sum_{k \in (\alpha_{j-1}, \alpha_j]} \exp\{ \overline{\beta}_{j1} x \} \frac{(\lambda^2 / 4)^x e^{-\lambda^2 / 4} }{x!} \\ % %&= \exp\{ \overline{\beta}_{j0} \} %\exp\{ -\lambda^2 / 4 \} %\sum_{k \in (\alpha_{j-1}, \alpha_j]} %\frac{1}{x!} %\left[ e^{ \frac{\lambda^2}{4}\overline{\beta}_{j1} } \right]^x \\ % &= \exp\left( \overline{\beta}_{j0} - \frac{\lambda^2}{4} + \frac{\lambda^2}{4} e^{ \overline{\beta}_{j1} } \right) \Prob(\alpha_{j-1} < \overline{T} \leq \alpha_j), \end{align*} where $\overline{T} \sim \text{Poisson}(e^{ \overline{\beta}_{j1}} \lambda^2 / 4)$, and similarly, \begin{align*} \underline{\xi}_j % &= \exp\left( \overline{\beta}_{j0} - \frac{\lambda^2}{4} + \frac{\lambda^2}{4} e^{ \overline{\beta}_{j1} } \right) \Prob(\alpha_{j-1} < \underline{T} \leq \alpha_j). \end{align*} where $\underline{T} \sim \text{Poisson}(e^{ \underline{\beta}_{j1}} \lambda^2 / 4)$. The proposal $h$ is a finite mixture $h(x) = \sum_{j=1}^N \pi_j g_j(x)$ with \begin{align*} g_j(x) &= \overline{w}_j(x) g(x) \ind\{ x \in \mathscr{D}_j \} / \overline{\xi}_j \\ % &= \frac{ \exp\{ \overline{\beta}_{j0} + \overline{\beta}_{j1} x \} (\lambda^2 / 4)^x e^{-\lambda^2 / 4} / x! }{ \exp\left( \overline{\beta}_{j0} - \frac{\lambda^2}{4} + \frac{\lambda^2}{4} e^{ \overline{\beta}_{j1} } \right) \Prob(\alpha_{j-1} < \overline{T} \leq \alpha_j) } \ind\{ x \in \mathscr{D}_j \} \\ % &= \frac{1}{x!} \left(\frac{\lambda^2}{4} e^{ \overline{\beta}_{j1} } \right)^x \exp\left( -\frac{\lambda^2}{4} e^{ \overline{\beta}_{j1} } \right) \frac{1}{ \Prob(\alpha_{j-1} < \overline{T} \leq \alpha_j) } \ind(\alpha_{j-1} < x \leq \alpha_j), \end{align*} which is the density of $\text{Poisson}(\frac{\lambda^2}{4} e^{ \overline{\beta}_{j1}})$ truncated to the interval $(\alpha_{j-1}, \alpha_j]$. @rem-expansion-point is used to select the expansion point $c$ to determine coefficients for the majorizer in the log-concave case, with \begin{align*} M_j(s) &= \int_{\alpha_{j-1}}^{\alpha_j} e^{sx} g(x) d\nu(x) \\ &= \exp( -\lambda^2 / 4 ) \exp(e^s \lambda^2 / 4 ) \sum_{x \in (\alpha_{j-1}, \alpha_j]} \frac{1}{x!} \exp\{ -e^s \lambda^2 / 4 \} (e^s \lambda^2 / 4)^x \\ &= \exp\left\{ -\frac{\lambda^2}{4} + e^s \frac{\lambda^2}{4} \right\} \Prob(\alpha_{j-1} < T \leq \alpha_j), \end{align*} where $T \sim \text{Poisson}(e^s \lambda^2 / 4)$. The following R snippet builds the C++ code and invokes the sampling function. ```{r} #| prompt: true Rcpp::sourceCpp("examples/bessel/bessel-v3.cpp") out3 = r_bessel_v3(n = 1000, lambda, nu, lo = -0.1, hi = 1e5, N = 50, tol = 0.10) head(out3$draws) ``` @fig-bessel-refine plots the bound for the rejection probability for this sampler after each step of refining, along with that of the previous two versions. A substantial improvement in efficiency is seen here; fewer regions are needed to achieve a small rejection probability. Although versions 1 and 2 of the sampler are based on the same proposal, the changes to their bounds are seen to differ due to randomness in knot selection. @fig-bessel-majorizers and @fig-bessel-proposals display the three versions of majorized weight functions and proposal distributions, respectively, after refinement. It is seen that the majorizer from @sec-bessel-const-default does not provide a tight bound, even with many knots, because our numerical optimization is not specific to the integers in the support. This is rectified by the manual optimization from @sec-bessel-const-custom. The linear majorizer provides a tight bound with fewer knots. Because the mass of our selected $\text{Bessel}(\lambda, \nu)$ distribution is focused on a relatively small set of integers, the second and third versions of the proposal can be made practically equivalent to the target with a sufficient number of regions. After being refined to this point, candidates are rarely discarded in rejection sampling. ```{r} #| out-width: 60% #| fig-cap: | #| Refinement for Bessel example with three majorizers. #| label: fig-bessel-refine #| echo: false df1 = data.frame(sampler = "v1", bdd = exp(out1$lbdd)) %>% mutate(N = row_number()) df2 = data.frame(sampler = "v2", bdd = exp(out2$lbdd)) %>% mutate(N = row_number()) df3 = data.frame(sampler = "v3", bdd = exp(out3$lbdd)) %>% mutate(N = row_number()) df = bind_rows(df1, df2, df3) %>% mutate(N = as.integer(N)) %>% filter(N <= 18) ggplot(df) + geom_line(aes(x = N, y = bdd, color = sampler), lwd = 1.02, alpha = 1) + geom_point(aes(x = N, y = bdd, pch = sampler, color = sampler), cex = 3, alpha = 1) + scale_x_continuous(breaks = df$N[df$N %% 2 == 0]) + scale_y_continuous(n.breaks = 10) + xlab("N") + ylab("Bound") + theme_minimal() ``` ```{r} #| fig-cap: | #| Majorizers (on the log-scale) for Bessel example for three sampler #| versions. The blue curve with is the majorizer with a blue dot marking the #| right endpoint of each region. The black triangle displays the value of #| $\log w(x)$ at integers $x \in \mathbb{N}$. #| fig-subcap: #| - v1. #| - v2. #| - v3. #| label: fig-bessel-majorizers #| layout-ncol: 3 #| fig-width: 3 #| fig-height: 2.5 #| echo: false ggplot(out1$df_weight) + geom_segment(aes(x = lo, xend = hi, y = lwmax), col = "blue") + geom_point(aes(x = hi, y = lwmax), col = "blue") + geom_point(data = data.frame(x = xseq, w = lwseq), aes(x, w), pch = 2) + coord_cartesian(xlim = c(NA, 15), ylim = c(min(lwseq), NA)) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) + xlab("x") + ylab("Logarithm of Weight") + theme_minimal() ggplot(out2$df_weight) + geom_segment(aes(x = lo, xend = hi, y = lwmax), col = "blue") + geom_point(aes(x = hi, y = lwmax), col = "blue") + geom_point(data = data.frame(x = xseq, w = lwseq), aes(x, w), pch = 2) + coord_cartesian(xlim = c(NA, 15), ylim = c(min(lwseq), NA)) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) + xlab("x") + ylab("Logarithm of Weight") + theme_minimal() out3$df_weight %>% mutate(w_lo = beta0_max + beta1_max * lo) %>% mutate(w_hi = beta0_max + beta1_max * hi) %>% ggplot() + geom_segment(aes(x = lo, xend = hi, y = w_lo, yend = w_hi), col = "blue") + geom_point(aes(x = hi, y = w_hi), col = "blue") + geom_point(data = data.frame(x = xseq, y = lwseq), aes(x, y), pch = 2) + coord_cartesian(xlim = c(NA, 15), ylim = c(min(lwseq), NA)) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) + xlab("x") + ylab("Logarithm of Weight") + theme_minimal() ``` ```{r} #| fig-cap: | #| Proposal density (bars) for Bessel example for three sampler versions. #| Triangles display the values of $f(x)$, $x \in \mathbb{N}$. #| fig-subcap: #| - v1. #| - v2. #| - v3. #| label: fig-bessel-proposals #| layout-ncol: 3 #| fig-width: 3 #| fig-height: 2 #| echo: false out1 = r_bessel_v1(0, lambda, nu, N = 50, tol = 0.1, x = xseq) out2 = r_bessel_v2(0, lambda, nu, N = 50, tol = 0.1, x = xseq) out3 = r_bessel_v3(0, lambda, nu, lo = -0.1, hi = 1e5, N = 50, tol = 0.1, x = xseq) data.frame(x = xseq, h = out1$hx) %>% ggplot() + geom_bar(aes(x, h), stat = "identity", fill = NA, col = "black") + geom_point(aes(x, fseq), pch = 2) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) + xlab("x") + ylab("Density") + theme_minimal() data.frame(x = xseq, h = out2$hx) %>% ggplot() + geom_bar(aes(x, h), stat = "identity", fill = NA, col = "black") + geom_point(aes(x, fseq), pch = 2) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) + xlab("x") + ylab("Density") + theme_minimal() data.frame(x = xseq, h = out3$hx) %>% ggplot() + geom_bar(aes(x, h), stat = "identity", fill = NA, col = "black") + geom_point(aes(x, fseq), pch = 2) + scale_x_continuous(breaks = xseq[xseq %% 2 == 0]) + xlab("x") + ylab("Density") + theme_minimal() ``` \clearpage # References