rank rank website

Lifecycle: experimental CRAN status R-CMD-check Codecov test coverage GitHub Issues or Pull Requests code size GitHub last commit Dependencies

Rank provides a customizable alternative to the built-in rank() function. The package offers the following features:

  1. Frequency-based ranking of categorical variables: choose whether to rank based on alphabetic order or element frequency.

  2. Control over sorting order: Use desc=TRUE to rank based on descending or ascending order.

Installation

To install rank from CRAN run:

install.packages("rank")

You can install the development version of rank like so:

# install.packages('remotes')
remotes::install_github("selkamand/rank")

Usage

Categorical Input

library(rank)

fruits <- c("Apple", "Orange", "Apple", "Pear", "Orange")

# rank alphabetically
smartrank(fruits)
#> [1] 1.5 3.5 1.5 5.0 3.5

# rank based on frequency
smartrank(fruits, sort_by = "frequency")
#> [1] 2.5 4.5 2.5 1.0 4.5

# rank based on descending order of frequency
smartrank(fruits, sort_by = "frequency", desc = TRUE)
#> [1] 3.5 1.5 3.5 5.0 1.5

Numeric Input

# rank numerically
smartrank(c(1, 3, 2))
#> [1] 1 3 2

# rank numerically based on descending order
smartrank(c(1, 3, 2), desc = TRUE)
#> [1] 3 1 2

Sorting By Rank

We can use order to sort vectors based on their ranks. For example, we can sort the fruits vector based on the frequency of each element.

fruits <- c("Apple", "Orange", "Apple", "Pear", "Orange")
ranks <- smartrank(fruits, sort_by = "frequency")
fruits[order(ranks)]
#> [1] "Pear"   "Apple"  "Apple"  "Orange" "Orange"

Ranking and reordering by priority values

rank_by_priority() assigns the highest ranks to specified values (in order), while all remaining values share the same lower rank.
reorder_by_priority() uses those ranks to move priority values to the front of the vector.

# Prioritise D first, then C; A and B follow in original order
rank_by_priority(c("A", "B", "C", "D"), priority_values = c("D", "C"))
#> [1] 3.5 3.5 2.0 1.0

# Reorder so priorities come first
reorder_by_priority(c("A", "B", "C", "D"), priority_values = c("D", "C"))
#> [1] "D" "C" "A" "B"

Stratified / hierarchical ranking

rank_stratified() computes a single combined rank across all columns of a data frame, where each column is ranked within groups defined by all previous columns. This produces a true hierarchical ordering.

data <- data.frame(
  gender = c("male", "male", "male", "male", "female", "female", "male", "female"),
  pet    = c("cat", "cat", "magpie", "magpie", "giraffe", "cat", "giraffe", "cat")
)

# Hierarchical ranking:
# 1. Rank gender (globally, by frequency)
# 2. Within each gender, rank pet by within-gender frequency
r <- rank_stratified(
  data,
  sort_by = c("frequency", "frequency"),
  desc    = TRUE
)

data[order(r), ]
#>   gender     pet
#> 3   male  magpie
#> 4   male  magpie
#> 1   male     cat
#> 2   male     cat
#> 7   male giraffe
#> 6 female     cat
#> 8 female     cat
#> 5 female giraffe

smartrank can be used to arrange data.frames based on one or more columns, while maintaining complete control over how each column contributes to the final row order.

BaseR

For example, we can sort the following dataframe based on frequency of fruits, but break any ties based on the alphabetical order of the picker.

data <- data.frame(
  fruits = c("Apple", "Orange", "Apple", "Pear", "Orange"),
  picker = c("Elizabeth", "Damian", "Bob", "Cameron", "Alice")
)

# Rank_stratified():
# 1. Rank fruits by frequency (globally)
# 2. Within each fruit, rank pickers alphabetically
strat_ranks <- rank_stratified(
  data,
  cols = c("fruits", "picker"),
  sort_by = c("frequency", "alphabetical"),
  desc = c(TRUE, FALSE)
)

data[order(strat_ranks), ]
#>   fruits    picker
#> 5 Orange     Alice
#> 2 Orange    Damian
#> 3  Apple       Bob
#> 1  Apple Elizabeth
#> 4   Pear   Cameron

Tidyverse Integration

An equivalent way to hierarchically sort data.frames is to use the tidyverse arrange() function

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

arrange(
  data,
  rank_stratified(
    data,
    cols = c("fruits", "picker"),
    sort_by = c("frequency", "alphabetical"),
    desc = c(TRUE, FALSE)
  )
)
#>   fruits    picker
#> 1 Orange     Alice
#> 2 Orange    Damian
#> 3  Apple       Bob
#> 4  Apple Elizabeth
#> 5   Pear   Cameron

Contributing

See CONTRIBUTING.md.