The SymLink Tool is an R6 object-oriented tool that helps a researcher manage pipeline outputs in a standard way. It falls under the category of ‘standard tooling’. It will:
This assumes you have a large set of versioned output folders for your pipeline.
If you’re already using a database to manage your output versions, you probably don’t need this.
If you have a big mess of folders you’re having difficulty tracking, this tool may help you out!
SLT
(short for SymLink Tool) is an R6 object generator,
or “R6ClassGenerator”.
When you want to make a new ‘instance’ of a tool, call the
new
method on the tool’s class.
Let’s start by calling $new()
with no arguments.
library(vmTools)
library(data.table)
slt <- try(SLT$new())
#>
#>
#> This tool expects `user_root_list` to be a named list of root directories for pipeline outputs.
#>
#> e.g.
#> list(
#> input_root = '/mnt/share/my_team/input_data',
#> output_root = '/mnt/share/my_team/output_data'
#> )
#>
#> This tool assumes each root will have a `version_name` output folder (same folder name in each root).
#> You may track outputs in one root, or across many roots in parallel (as long as the version_name is the same).
#> It's recommended to create these folders with the tool so they get a log at time of creation.
#>
#> Each output folder is defined by `clean_path(user_root_list, version_name)`.
#> The user can 'mark' or 'unmark' any `version_name` folder as best/keep/remove.
#> This folder receives a log entry for all *demotion* and *promotion* actions (marking and unmarking).
#> All the `version_name` folder logs are used for report generation.
#>
#>
#>
#>
#> This tool expects `user_central_log_root` to be a single directory for the central log.
#>
#> e.g.
#> '/mnt/share/my_team'
#>
#> The central log receives all the marking actions of all the version_name logs across all roots,
#> but is not used for report generation.
#>
#> The central log is *created* on initialization i.e. when calling `SLT$new()`.
#>
#>
#> Error in initialize(...) :
#> You must provide both user_root_list and user_central_log_root
OK, now that we know what the tool expects, let’s feed it this information and try using it in earnest.
In my pipeline, I divert outputs to two folders:
I want to have the same version_name
of my pipeline
outputs in both roots so I can correlate pre and post modeled
data.
If you need to handle roots independently, then you
should instantiate different versions of the tool to handle each
independent root, giving each instance of the tool a unique name
e.g. slt_input
and
slt_output
.
version_name
is simply a string like
“2024_02_02_new_covariates” that’s important to you, the modeler, to
tell you when and why the pipeline was run. There is no requirement for
this to include a date, but it’s good practice.In addition, the tool needs a location for a central log. I’ll set that one level above both my output folders, since the central log will be shared between them.
version_name
level, not
the folder level, so one ‘best’ promotion affects folders in
both my output roots.# a safe temporary directory every user has access to, that we'll clean up later
root_base <- file.path(tempdir(), "slt")
root_input <- file.path(root_base, "to_model")
root_output <- file.path(root_base, "modeled")
PATHS <- list(
log_cent = file.path(root_base, "log_symlinks_central.csv"),
log_2024_02_02 = file.path(root_input, "2024_02_02", "logs/log_version_history.csv"),
log_2024_02_10 = file.path(root_input, "2024_02_10", "logs/log_version_history.csv")
)
Try to make the tool naively.
slt <- try(SLT$new(
user_root_list = list(
root_input = root_input,
root_output = root_output
)
, user_central_log_root = root_base
))
#> Error in FUN(X[[i]], ...) :
#> root does not exist: /tmp/Rtmp3tDGBK/slt/to_model
# We need to ensure all output folders exist first
dir.create(root_input, recursive = TRUE, showWarnings = FALSE)
dir.create(root_output, recursive = TRUE, showWarnings = FALSE)
# Now everything should work
suppressWarnings({ # idiosyncratic and benign cluster message
slt <- SLT$new(
user_root_list = list(
root_input = root_input,
root_output = root_output
)
, user_central_log_root = root_base
)
})
What do we have in our root_base folder?
We should now have a central log, and two output folder.
You can mark any output folder as ‘best’, and give it a ‘best’ symlink in each output root.
version_name
can be ‘best’, and the SLT will
demote the current ‘best’ version if you promote a new
version_name
.version_name
folder, and
an entry in the central log.NOTE:
version_name
log record records both ‘demote’ and
‘promote’ actions.NOTE:
First we’ll create two version_name
folders to play with
in each root.
dir.create(file.path(root_input, "2024_02_02"), recursive = TRUE, showWarnings = FALSE)
dir.create(file.path(root_output, "2024_02_02"), recursive = TRUE, showWarnings = FALSE)
dir.create(file.path(root_input, "2024_02_10"), recursive = TRUE, showWarnings = FALSE)
dir.create(file.path(root_output, "2024_02_10"), recursive = TRUE, showWarnings = FALSE)
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | `-- 2024_02_10
#> `-- to_model
#> |-- 2024_02_02
#> `-- 2024_02_10
Then we’ll mark one as best.
slt$mark_best(version_name = "2024_02_02", user_entry = list(comment = "testing mark_best"))
#> Marking best: 2024_02_02
#> No existing symlinks found - moving on
#> No 'best' symlink found - moving on: /tmp/Rtmp3tDGBK/slt/to_model/best
#> Promoting to 'best': /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02
#> Writing log to /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv
#> No existing symlinks found - moving on
#> No 'best' symlink found - moving on: /tmp/Rtmp3tDGBK/slt/modeled/best
#> Promoting to 'best': /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02
#> Writing log to /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv
Look at the folder structure again.
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | |-- best
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> |-- best
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
Where does the ‘best’ folder point to?
print_symlink("best")
#> [1] "lrwxrwxrwx 1 ssbyrne Domain Users 39 Jul 24 11:13 best -> /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02"
Now let’s mark the other one as best, and see what happens to the symlinks.
# The tool is chatty by default at the console, but it's easy to make it quite if it's part of a pipeline.
suppressMessages({
slt$mark_best(version_name = "2024_02_10", user_entry = list(comment = "testing mark_best"))
})
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- best
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- best
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
In this trio, we see the central log, and each versioned folder’s log of best promotion events.
data.table::fread(PATHS$log_cent)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111338 ssbyrne CENTRAL_LOG /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv create log created
#> 2: 1 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 promote_best testing mark_best
#> 3: 2 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 promote_best testing mark_best
#> 4: 3 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 demote_best testing mark_best
#> 5: 4 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
#> 6: 5 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 demote_best testing mark_best
#> 7: 6 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 promote_best testing mark_best
data.table::fread(PATHS$log_2024_02_02)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 create log created
#> 2: 1 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 promote_best testing mark_best
#> 3: 2 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 demote_best testing mark_best
data.table::fread(PATHS$log_2024_02_10)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 create log created
#> 2: 1 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
You’ve probably also noticed a report. This also shows the current
state of all tool-created symlinks, built from the
version_name
folder logs (not the central log).
mark
a folder.data.table::fread(file.path(root_input, "report_key_versions.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 1 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
data.table::fread(file.path(root_output, "report_key_versions.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 1 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 promote_best testing mark_best
This tool was designed to allow the researcher to use it ‘mid-stream’ during a modeling round. I.e. you may mark existing output folders as best, and all the logging takes care of itself.
In addition, the researcher may choose to have this tool manage
folder creation. This is useful if you want to ensure that all your
output folders are managed by the same tool, and that the tool is aware
of all the version_name
folders that exist. Further, there
are more reports you can run against your version_name
logs
that are more informative if you create all your folders with the tool.
(See ‘Other Features’)
# Let's use a programmatic example to build a new `version_name.`
version_name_input <- get_output_dir(root_input, "today")
version_name_output <- get_output_dir(root_output, "today")
if(!version_name_input == version_name_output) {
stop("version_name_input and version_name_output must be the same")
}
version_name_today <- intersect(version_name_input, version_name_output)
# This creates folders safely, and will not overwrite existing folders if called twice.
slt$make_new_version_folder(version_name = version_name_today)
slt$make_new_version_folder(version_name = version_name_today)
#> Directory already exists: /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01
#> Directory already exists: /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01
Now we can see the new folders, with their logs and creation date-time stamps.
YYYY_MM_DD.VV
if reruns on the same day are
necessary.2024_02_10.01
, 2024_02_10.02
,
2024_02_10.03
, etc.print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.01
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- best
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.01
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- best
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
data.table::fread(file.path(root_input, version_name_today, "logs/log_version_history.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 create log created
This folder can be marked best just as the others, and prior best will be demoted.
suppressMessages({
slt$mark_best(version_name = version_name_today, user_entry = list(comment = "testing mark_best"))
})
data.table::fread(PATHS$log_cent)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111338 ssbyrne CENTRAL_LOG /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv create log created
#> 2: 1 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 promote_best testing mark_best
#> 3: 2 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 promote_best testing mark_best
#> 4: 3 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 demote_best testing mark_best
#> 5: 4 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
#> 6: 5 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 demote_best testing mark_best
#> 7: 6 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 promote_best testing mark_best
#> 8: 7 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 demote_best testing mark_best
#> 9: 8 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_best testing mark_best
#> 10: 9 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 demote_best testing mark_best
#> 11: 10 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 promote_best testing mark_best
data.table::fread(PATHS$log_2024_02_10)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 create log created
#> 2: 1 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
#> 3: 2 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 demote_best testing mark_best
Now let’s say you review your results and
no version of outputs should be ‘best’. You
can run unmark()
to remove the ‘best’ status.
version_name
log shows demotion.version_name
log are the acid
test of what’s current.suppressMessages({
slt$unmark(version_name = version_name_today, user_entry = list(comment = "testing unmark_best"))
})
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.01
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.01
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
data.table::fread(PATHS$log_cent)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111338 ssbyrne CENTRAL_LOG /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv create log created
#> 2: 1 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 promote_best testing mark_best
#> 3: 2 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 promote_best testing mark_best
#> 4: 3 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 demote_best testing mark_best
#> 5: 4 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
#> 6: 5 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 demote_best testing mark_best
#> 7: 6 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 promote_best testing mark_best
#> 8: 7 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 demote_best testing mark_best
#> 9: 8 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_best testing mark_best
#> 10: 9 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 demote_best testing mark_best
#> 11: 10 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 promote_best testing mark_best
#> 12: 11 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 demote_best testing unmark_best
#> 13: 12 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 demote_best testing unmark_best
data.table::fread(file.path(root_input, version_name_today, "logs/log_version_history.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 create log created
#> 2: 1 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_best testing mark_best
#> 3: 2 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 demote_best testing unmark_best
We love our ‘best’ version of outputs as long as it’s best, but time passes and we get new ‘best’ versions.
When it’s time to remove those old folders, we can use this tool to do that safely, in two stages.
Think of the two-step process a bit like git add
and
git commit
.
mark_remove
, which puts
it the the ‘deletion staging area’.delete_version_folders()
to actually delete the folder and
update the central log.Let’s demonstrate on the folder we made programmatically with today’s date.
suppressMessages({
slt$mark_remove(version_name = version_name_today, user_entry = list(comment = "testing mark_remove"))
})
And let’s look at the folder structure.
remove_
, we combine
‘remove_’ with the version_name
to make the folder name
unique.print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.01
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- remove_2025_07_24.01
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.01
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- remove_2025_07_24.01
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
print_symlink("remove")
#> [1] "lrwxrwxrwx 1 ssbyrne Domain Users 42 Jul 24 11:13 remove_2025_07_24.01 -> /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01"
And finally look at the log.
data.table::fread(file.path(root_input, version_name_today, "logs/log_version_history.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 create log created
#> 2: 1 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_best testing mark_best
#> 3: 2 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 demote_best testing unmark_best
#> 4: 3 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_remove testing mark_remove
The report should also update to show the new symlink.
data.table::fread(file.path(root_input, "report_key_versions.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 3 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_remove testing mark_remove
First, let’s naively try to delete a folder we haven’t marked as ready for removal.
Note:
root
s.slt$delete_version_folders(
version_name = "2024_02_02",
user_entry = list(comment = "testing delete_version_folders")
)
#>
#> No valid `remove_` symlink found:
#> for: 2024_02_02
#> in root: /tmp/Rtmp3tDGBK/slt/to_model
#>
#> No valid `remove_` symlink found:
#> for: 2024_02_02
#> in root: /tmp/Rtmp3tDGBK/slt/modeled
Now let’s delete the folder we have marked as ready for removal.
root
, it will ask if you’re sure you want to
delete the folder.slt$delete_version_folders(
version_name = version_name_today,
user_entry = list(comment = "testing delete_version_folders"),
require_user_input = FALSE
)
#>
#> Writing central log to /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv
#> Deleting /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01
#> Deleting /tmp/Rtmp3tDGBK/slt/to_model/remove_2025_07_24.01
#>
#> Writing central log to /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv
#> Deleting /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01
#> Deleting /tmp/Rtmp3tDGBK/slt/modeled/remove_2025_07_24.01
Let’s look at the folder structure.
remove_
symlink should be gone.print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
Since we no longer have a version_name
log, we can’t
look at it. But we can look at the central log.
data.table::fread(PATHS$log_cent)
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111338 ssbyrne CENTRAL_LOG /tmp/Rtmp3tDGBK/slt/log_symlinks_central.csv create log created
#> 2: 1 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 promote_best testing mark_best
#> 3: 2 2025_07_24_111338 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 promote_best testing mark_best
#> 4: 3 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 demote_best testing mark_best
#> 5: 4 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 promote_best testing mark_best
#> 6: 5 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_02 demote_best testing mark_best
#> 7: 6 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 promote_best testing mark_best
#> 8: 7 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_10 demote_best testing mark_best
#> 9: 8 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_best testing mark_best
#> 10: 9 2025_07_24_111339 ssbyrne 2024_02_10 /tmp/Rtmp3tDGBK/slt/modeled/2024_02_10 demote_best testing mark_best
#> 11: 10 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 promote_best testing mark_best
#> 12: 11 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 demote_best testing unmark_best
#> 13: 12 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 demote_best testing unmark_best
#> 14: 13 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 promote_remove testing mark_remove
#> 15: 14 2025_07_24_111339 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 promote_remove testing mark_remove
#> 16: 15 2025_07_24_111340 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/to_model/2025_07_24.01 delete_remove_folder testing delete_version_folders
#> 17: 16 2025_07_24_111340 ssbyrne 2025_07_24.01 /tmp/Rtmp3tDGBK/slt/modeled/2025_07_24.01 delete_remove_folder testing delete_version_folders
Let’s look at the report.
version_name
in the
report, since it’s been deleted.data.table::fread(file.path(root_input, "report_key_versions.csv"))
#> Empty data.table (0 rows and 7 cols): log_id,timestamp,user,version_name,version_path,action...
Other available features will be covered briefly, and will assume the reader has already read the Symlink Tool Intro section.
It’s likely you’ll have other output versions you want to keep, but not as ‘best’. You can mark these as ‘keep’.
keep_<version_name>
suppressMessages(
slt$mark_keep(version_name = "2024_02_10", user_entry = list(comment = "testing mark_keep"))
)
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- report_key_versions.csv
In addition to the report_key_versions.csv
file, there
are other reports available. These will show the status of the last log
row for each version_name
folder in each root
folder.
You can view things like:
NOTE: This includes a discrepancy report that shows if logs do not conform to expected standards.
# Show the types of reports currently available
slt$make_reports
#> function ()
#> {
#> private$msg_sometimes("Writing last-row log reports for:\n")
#> for (root in private$DICT$ROOTS) {
#> private$msg_sometimes(" ", root)
#> private$report_all_logs(root = root)
#> private$report_all_logs_symlink(root = root)
#> private$report_all_logs_tool_symlink(root = root)
#> private$report_all_logs_non_symlink(root = root)
#> private$report_discrepancies(root = root, verbose = FALSE)
#> private$msg_sometimes(" ", root)
#> }
#> }
#> <environment: 0x55d711ad6088>
# Run the reports
suppressMessages({
slt$make_reports()
})
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- report_all_logs.csv
#> | |-- report_all_logs_non_symlink.csv
#> | |-- report_all_logs_symlink.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csv
# View an example report - logs for folders with no active symlink
# - you can see this folder was previously marked 'best'
data.table::fread(file.path(root_input, "report_all_logs_non_symlink.csv"))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 2 2025_07_24_111339 ssbyrne 2024_02_02 /tmp/Rtmp3tDGBK/slt/to_model/2024_02_02 demote_best testing mark_best
# Expect this to be absent for the vignette
try(data.table::fread(file.path(root_input, "REPORT_DISCREPANCIES.csv")))
#> Error in data.table::fread(file.path(root_input, "REPORT_DISCREPANCIES.csv")) :
#> File '/tmp/Rtmp3tDGBK/slt/to_model/REPORT_DISCREPANCIES.csv' does not exist or is non-readable. getwd()=='/tmp/RtmpBjZYIf/Rbuild3a7bce4ab1354e/vmTools/vignettes'
Let’s say you have a set of folders you want to keep or remove, and you want to do it all at once.
We’ll demonstrate by:
remove_
remove_
folders for deletionkeep_
# Make a set of dummy folders
dv1 <- get_output_dir(root_input, "today")
slt$make_new_version_folder(dv1)
dv2 <- get_output_dir(root_input, "today")
slt$make_new_version_folder(dv2)
dv3 <- get_output_dir(root_input, "today")
slt$make_new_version_folder(dv3)
dv4 <- get_output_dir(root_input, "today")
slt$make_new_version_folder(dv4)
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.01
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.03
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.04
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- report_all_logs.csv
#> | |-- report_all_logs_non_symlink.csv
#> | |-- report_all_logs_symlink.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.01
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.03
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.04
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csv
# Mark some as 'remove_'
suppressMessages({
for(dv in c(dv1, dv2)){
slt$mark_remove(dv, user_entry = list(comment = "mark_remove for roundup"))
}
})
suppressMessages({
for(dv in roundup_remove_list$root_input$version_name){
slt$delete_version_folders(
version_name = dv,
user_entry = list(comment = "roundup_remove"),
require_user_input = FALSE
)
}
})
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.03
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.04
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- report_all_logs.csv
#> | |-- report_all_logs_non_symlink.csv
#> | |-- report_all_logs_symlink.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.03
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.04
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csv
Use the log creation date (first row) to round up folders created on, before, or after that date.
my_date <- format(Sys.Date(), "%Y_%m_%d")
roundup_date_list <- slt$roundup_by_date(
user_date = my_date,
date_selector = "lte" # less than or equal to today's date
)
#> Finding all folders with log creation dates that are 'lte' 2025_07_24.
#> NOTE! Log creation dates are used as the file-system does not record creation times.
#> roundup_by_date: Formatting date with time-zone: America/Los_Angeles
#> Folders with symlinks will have duplicate rows by `version_name` (one row for each unique `dir_name`) - showing all for completeness.
# mark all our dummy folders (with the ".VV" pattern) as keepers
dv_keep <- grep(
pattern = "\\.\\d\\d"
, x = roundup_date_list$root_input$version_name
, value = TRUE
)
suppressMessages({
for(dv in dv_keep){
slt$mark_keep(dv, user_entry = list(comment = "roundup_by_date"))
}
})
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.03
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.04
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2025_07_24.03
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2025_07_24.04
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- report_all_logs.csv
#> | |-- report_all_logs_non_symlink.csv
#> | |-- report_all_logs_symlink.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.03
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.04
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2025_07_24.03
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2025_07_24.04
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csv
The date roundup relies on the log creation date (recall, the Linux filesystem does not record folder creation / birth dates). If you’ve made your own folders without the symlink tool, you can make a blank log easily. You can hand-edit the creation date if you know when the folder was made.
Note:
root
independently. So even though we’re not creating a folder in both our
root
s, the tool will create as many logs as it can.# Make a naive folder without a log
dir.create(file.path(root_output, "2024_02_10_naive"))
try(slt$make_new_log(version_name = "2024_02_10_naive"))
print_tree(root_base)
#> |-- log_symlinks_central.csv
#> |-- modeled
#> | |-- 2024_02_02
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2024_02_10_naive
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.03
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- 2025_07_24.04
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2024_02_10
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2025_07_24.03
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- keep_2025_07_24.04
#> | | `-- logs
#> | | `-- log_version_history.csv
#> | |-- report_all_logs.csv
#> | |-- report_all_logs_non_symlink.csv
#> | |-- report_all_logs_symlink.csv
#> | `-- report_key_versions.csv
#> `-- to_model
#> |-- 2024_02_02
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2024_02_10_naive
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.03
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- 2025_07_24.04
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2024_02_10
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2025_07_24.03
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- keep_2025_07_24.04
#> | `-- logs
#> | `-- log_version_history.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csv
You can audit the internal state of the tool with the
print_
functions.
# Print all static fields (output truncated)
slt$return_dictionaries()
#> $FLAGS
#> $FLAGS$allow_schema_repair
#> [1] TRUE
#>
#>
#> $ROOTS
#> $ROOTS$root_input
#> [1] "/tmp/Rtmp3tDGBK/slt/to_model"
#>
#> $ROOTS$root_output
#> [1] "/tmp/Rtmp3tDGBK/slt/modeled"
....
#> [1] "^best"
#>
#> $symlink_regex_extract$keep
#> [1] "^keep_"
#>
#> $symlink_regex_extract$remove
#> [1] "^remove_"
#>
#>
#> $verbose
#> [1] TRUE
# ROOTS are likely most interesting to the user.
slt$return_dictionaries(item_names = "ROOTS")
#> $ROOTS
#> $ROOTS$root_input
#> [1] "/tmp/Rtmp3tDGBK/slt/to_model"
#>
#> $ROOTS$root_output
#> [1] "/tmp/Rtmp3tDGBK/slt/modeled"
# Show the last 'action' the tool performed
# - these fields are set as part of each 'marking' new action.
slt$return_dynamic_fields()
#> $LOG
#> $LOG$version_name
#> [1] "2024_02_10_naive"
#>
#> $LOG$action
#> [1] "promote_keep"
#>
#>
#> $VERS_PATHS
#> $VERS_PATHS$root_input
#> [1] "/tmp/Rtmp3tDGBK/slt/to_model/2024_02_10_naive"
#>
#> $VERS_PATHS$root_output
#> [1] "/tmp/Rtmp3tDGBK/slt/modeled/2024_02_10_naive"