Design

This page documents the general design of fastreg. It covers some requirements, the public-facing interface, and some diagrams highlighting the general flow of the main functions.

Requirements

The core requirements of fastreg are to:

  1. Convert Danish register data from SAS files to the modern and efficient Parquet format.
  2. Read register Parquet files into R as a DuckDB table.
  3. Provide a targets pipeline template to convert multiple registers in parallel.
  4. Provide functions to list available SAS or Parquet register files directly from R.

Interface

The interface (the functions and objects that are exposed to users) is based on some specific naming conventions. Specifically, we generally name function by the action they perform and the object(s) they perform it on in the format {action}_{object}(). Actions are verbs that describe what a function does, while objects are nouns that represent the objects that the functions operate on. Below is an overview of the main actions and objects within fastreg.

The actions are:

While the objects are:

Tip

For a list of all the public functions, see the Reference page.

Converting SAS files from a single register

flowchart TD
    identify_paths("Identify register path(s)<br>with list_sas_files(path)")
    path[/"path<br>[Character vector]"/]
    output_dir[/"output_dir<br>[Character scalar]"/]
    chunk_size[/"chunk_size<br>[Integer scalar]"/]
    convert_register("convert_register()")
    output[/"Parquet file(s)<br>written to output_dir"/]

    %% Edges
    identify_paths -.-> path --> convert_register
    output_dir & chunk_size --> convert_register
    convert_register --> output

    %% Style
    style identify_paths fill:#FFFFFF, color:#000000, stroke-dasharray: 5 5
Figure 1: Expected workflow for converting SAS files from a single register using convert_register().

Converting multiple registers in parallel

flowchart TD
    copy_pipeline("use_targets_template()")
    edit["Edit _targets.R as needed"]
    run_pipeline("targets::tar_make()")
    output[/"Parquet file(s)<br>written to directory<br>specified in _targets.R"/]

    %% Edges
    copy_pipeline --> edit --> run_pipeline --> output

    %% Style
    style edit fill:#FFFFFF, color:#000000, stroke-dasharray: 5 5
Figure 2: Expected workflow for converting multiple registers using the targets pipeline.

Reading a Parquet register

flowchart TD
    path[/"path<br>[Character scalar]"/]
    read_register("read_register()")
    output[/"Output<br>[DuckDB table]"/]

    %% Edges
    path --> read_register --> output

Figure 3: Expected workflow for reading a Parquet register as a DuckDB table using read_register().