
You have an R analysis that runs on your laptop. Maybe it takes a while. Maybe you need to run it many times — once per species, once per county, once per simulation parameter, once per experimental condition. Maybe both.
CHTC’s high-throughput computing infrastructure can run many independent jobs across a large pool of compute resources. The barrier is rarely the value of the computing. The barrier is the handoff: turning a local analysis into something a scheduler can run somewhere else.
That handoff requires several pieces to line up at once. Your R code needs to run without relying on the interactive session where you developed it. Your software environment needs to be portable. Your files need to move to a submit node. HTCondor needs a submit file. The execute node needs a shell script. Your results need to come back.
submitr is designed to make that handoff easier. It
generates the HTCondor submit file, generates the executable script,
wraps the SSH and SCP commands that move files to and from the submit
node, submits the job, checks status, and downloads results — all from
R.
If you are new to CHTC, submitr gives you a guided path
to your first successful submission. If you already use CHTC,
submitr reduces repetitive setup work and makes common
submission patterns easier to reproduce, review, and share.
Use submitr when you are:
condor_submit
command-line work;submitr is useful on its own if your project is already
organized and containerized. It also fits into a broader workflow for
moving from a literate analysis document to a portable, scalable
computation.
submitr is the third step in the From the
Notebook to the Cluster package family:
toolero organize, scaffold, split
└─ containr freeze the software environment in a container
└─ submitr send the analysis to CHTC and retrieve results
Each package is useful on its own. Together, they form a path from a local R project to a completed high-throughput computing run.
toolero helps you start with a maintainable project
structure, use Quarto as a source of truth, and split data into
job-sized pieces.containr helps you build a container image from your
renv.lock so the software environment can travel with the
analysis.submitr helps you send the containerized analysis to
CHTC, monitor the job, and bring results back.You can adopt these packages one at a time. submitr does
not require toolero, and toolero does not
require submitr. The family exists so that each step
prepares cleanly for the next when your project is ready to scale.
submitr assumes your project is already organized and
containerized. Before using it, confirm that:
Rscript analysis.R outside
RStudio;ap2002.chtc.wisc.edu.Set up SSH connection reuse before anything else.
Every submitr function that touches CHTC opens an SSH
connection, which can trigger a Duo MFA prompt. Setting up ControlMaster
caches your authenticated session and makes the entire workflow
significantly smoother. The setup takes two minutes and is worth doing
before your first htc_config() call. Full instructions
appear after Step 1 below.
Install the development version from GitHub:
# install.packages("pak")
pak::pak("erwinlares/submitr")library(submitr)
# 1. Configure your CHTC connection
cfg <- htc_config()
# 2. Generate the submit file
htc_gen_submit(
output_file = "analysis.sub",
container_image = "docker://registry.doit.wisc.edu/your.netid/my-analysis:1.0.0",
executable = "analysis.sh",
input_files = c("analysis.R", "data.csv"),
output_files = "results.tar.gz",
resources = "small",
comments = TRUE
)
# 3. Generate the executable script
htc_gen_executable(
r_script = "analysis.R",
output_file = "analysis.sh",
results_folder = "results",
comments = TRUE
)
# 4. Upload files to the submit node
htc_upload(
files = c("analysis.sub", "analysis.sh", "analysis.R", "data.csv"),
config = cfg
)
# 5. Submit the job
cluster_id <- htc_submit(submit_file = "analysis.sub", config = cfg)
# 6. Check progress
htc_status(cluster_id = cluster_id, config = cfg)
# 7. Download results
htc_download(files = "*.tar.gz", config = cfg, local_path = "results/")htc_config()On first use, htc_config() prompts for your NetID and
submit node, writes htc.cfg to your project directory, and
displays ControlMaster setup instructions. Subsequent calls read the
existing config and validate the connection.
cfg <- htc_config()
#> Reading HTC config from ./htc.cfg
#> ✔ Connected to "ap2002.chtc.wisc.edu" as "your.netid".Before continuing, take two minutes to set up ControlMaster. Add this
block to ~/.ssh/config:
Host *.chtc.wisc.edu
ControlMaster auto
ControlPersist 2h
ControlPath ~/.ssh/connections/%r@%h:%pThen create the directory used by ControlPath:
mkdir -p ~/.ssh/connectionsWith ControlMaster in place, all subsequent SSH connections — uploads, submits, status checks, downloads — reuse the same authenticated session without prompting for Duo MFA. Full documentation is at https://chtc.cs.wisc.edu/uw-research-computing/configure-ssh.
htc_gen_submit()Generates the HTCondor .sub submit file. It tells
HTCondor which container to use, which executable to run, which files to
transfer, what resources to request, and what output files to
expect.
htc_gen_submit(
output_file = "analysis.sub",
container_image = "docker://registry.doit.wisc.edu/your.netid/my-analysis:1.0.0",
executable = "analysis.sh",
input_files = c("analysis.R", "data.csv"),
output_files = "results.tar.gz",
resources = "small",
comments = TRUE
)Use comments = TRUE on a first submission. The generated
file includes explanations of each section, making it useful both as a
working submit file and as a learning document.
Resource presets:
| preset | cpus | memory | disk | when to use |
|---|---|---|---|---|
| small | 1 | 4 GB | 4 GB | first test jobs, lightweight scripts, quick summaries |
| medium | 4 | 16 GB | 15 GB | moderate analyses, multiple input files, model fitting |
| large | 8 | 64 GB | 32 GB | memory-intensive work, large datasets, parallel computation |
Start with "small" for a first test regardless of what
your eventual job will need. The HTCondor log file reports actual
resource usage after each run, which is the best guide for tuning future
submissions. Requesting too little causes jobs to fail; requesting much
more than you need makes jobs harder to match with available resources.
The log is the ground truth.
htc_gen_executable()Generates the .sh script that HTCondor runs inside the
container. The generated script creates the results directory, runs your
R script with Rscript, and archives the results as a
.tar.gz file.
htc_gen_executable(
r_script = "analysis.R",
output_file = "analysis.sh",
results_folder = "results",
comments = TRUE
)htc_upload()Copies files to the CHTC submit node via scp. Use
dry_run = TRUE to preview the command before running
it.
# Preview first
htc_upload(
files = c("analysis.sub", "analysis.sh", "analysis.R", "data.csv"),
config = cfg,
dry_run = TRUE
)
#> ✔ Dry run -- command that would be executed:
#> `scp analysis.sub analysis.sh analysis.R data.csv your.netid@ap2002.chtc.wisc.edu:~/`
# Then upload
htc_upload(
files = c("analysis.sub", "analysis.sh", "analysis.R", "data.csv"),
config = cfg
)htc_submit()Runs condor_submit on the remote server via SSH and
returns the cluster ID.
cluster_id <- htc_submit(
submit_file = "analysis.sub",
config = cfg,
verbose = TRUE
)
#> Submitting "analysis.sub" on "ap2002.chtc.wisc.edu"...
#> 1 job(s) submitted to cluster 6302860.
#> ✔ Job submitted successfully.htc_status()Runs condor_q on the remote server. Use
watch = TRUE to poll until all jobs in the cluster leave
the queue.
# One-shot check
htc_status(cluster_id = cluster_id, config = cfg)
# Watch until complete
htc_status(cluster_id = cluster_id, config = cfg, watch = TRUE)htc_download()Copies files back from the submit node via scp. Supports
glob patterns.
# Download results
htc_download(files = "*.tar.gz", config = cfg, local_path = "results/")
# Download logs
htc_download(files = c("job.log", "job.err"), config = cfg, local_path = "logs/")Once a single job works, scaling up is mostly a matter of changing
the queue. Use toolero::write_by_group() to split your
dataset and produce a manifest, then switch to multiple-job mode:
htc_gen_submit(
output_file = "analysis.sub",
container_image = "docker://registry.doit.wisc.edu/your.netid/my-analysis:1.0.0",
executable = "analysis.sh",
input_files = "analysis.R",
mode = "multiple",
queue_from = "data/jobs/manifest.csv",
resources = "medium",
comments = TRUE
)
htc_gen_executable(
r_script = "analysis.R",
output_file = "analysis.sh",
results_folder = "results",
mode = "multiple",
comments = TRUE
)In multiple-job mode, HTCondor passes each subset filename to your R script as a positional argument. Your script should read that argument explicitly:
args <- commandArgs(trailingOnly = TRUE)
input_file <- args[[1]]
data <- readr::read_csv(input_file)| Function | What it does |
|---|---|
htc_config() |
Create or read htc.cfg, validate connection |
htc_gen_submit() |
Generate the HTCondor .sub submit file |
htc_gen_executable() |
Generate the .sh executable script |
htc_upload() |
Copy files to the submit node via scp |
htc_submit() |
Run condor_submit on the submit node |
htc_status() |
Check job progress via condor_q |
htc_download() |
Copy results back from the submit node |
submitr reduces friction. It does not replace
understanding.
The CHTC facilitation team is the right resource for complex workflow questions.
The package vignette walks through a complete first submission step by step, with annotated output at each stage:
submitr is part of the From the Notebook to the
Cluster package family:
MIT © Erwin Lares