--- title: "PostgreSQL Setup for taskqueue" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{PostgreSQL Setup for taskqueue} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Overview The `taskqueue` package uses PostgreSQL to manage tasks, projects, and workers. This vignette shows how to install and configure PostgreSQL on Ubuntu for HPC environments. PostgreSQL should be installed on a server that all worker nodes can access. ## Why PostgreSQL? PostgreSQL is chosen for `taskqueue` because: - **Concurrent Access**: Handles large numbers of concurrent requests from multiple workers - **ACID Compliance**: Ensures data integrity for task status updates - **Reliability**: Proven track record for production workloads - **Performance**: Efficient handling of read/write operations - **Open Source**: Free and widely supported ## Installation on Ubuntu ```bash # Update and install PostgreSQL sudo apt update sudo apt install postgresql postgresql-contrib # Start PostgreSQL sudo systemctl start postgresql sudo systemctl enable postgresql ``` ## Create Database and User ```bash # Switch to postgres user and create database sudo -u postgres psql # Run these commands in the PostgreSQL prompt: CREATE USER taskqueue_user WITH PASSWORD 'your_password'; CREATE DATABASE taskqueue_db OWNER taskqueue_user; GRANT ALL PRIVILEGES ON DATABASE taskqueue_db TO taskqueue_user; \q ``` ## Configure Remote Access Allow worker nodes to connect to PostgreSQL. ### 1. Edit postgresql.conf ```bash # Edit configuration file (adjust version number if needed) sudo nano /etc/postgresql/14/main/postgresql.conf # Find and change: listen_addresses = '*' ``` ### 2. Edit pg_hba.conf ```bash # Edit authentication file sudo nano /etc/postgresql/14/main/pg_hba.conf # Add this line: # For specific HPC network (recommended): host taskqueue_db taskqueue_user 10.0.0.0/8 md5 # Or for all IPs (less secure): host taskqueue_db taskqueue_user 0.0.0.0/0 md5 ``` ### 3. Restart PostgreSQL ```bash sudo systemctl restart postgresql ``` ### 4. Open Firewall if Needed ```bash # Allow PostgreSQL port from HPC network sudo ufw allow from 10.0.0.0/8 to any port 5432 # Or allow from all IPs (less secure): sudo ufw allow 5432/tcp ``` ## R Configuration On all machines (daily working machines, login nodes and compute nodes), add these environment variables to `~/.Renviron`: ``` PGHOST=your.database.server.com PGPORT=5432 PGUSER=taskqueue_user PGPASSWORD=your_password PGDATABASE=taskqueue_db ``` **Edit .Renviron:** ```bash nano ~/.Renviron # Add the variables above, then save and exit ``` Restart R after editing `.Renviron`. ## Install R Packages ```r # install from CRAN install.packages("taskqueue") ``` ```r # or install the latest development version from GitHub remotes::install_github("byzheng/taskqueue") ``` ## Test Connection ```r library(taskqueue) db_connect() ``` ## Initialize taskqueue ```r library(taskqueue) # Create required tables db_init() ``` ## Security Notes - Use strong passwords - Restrict IP ranges in `pg_hba.conf` to your HPC network only - Protect `.Renviron`: `chmod 600 ~/.Renviron` ## Clean Database If needed, remove all taskqueue data: ```r library(taskqueue) db_clean() # Removes all tables ``` The package will recreate tables automatically when needed.