---
title: "Application to simple datasets"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Application to simple datasets}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(grouper)
library(ompr)
library(ompr.roi)
#library(ROI.plugin.gurobi)
library(ROI.plugin.glpk)
```

# Introduction 

This vignette illustrates the use of the package on simple datasets, for which
the optimal solutions are apparent from inspection.

# Diversity-Based Assignment

## Dataset 001 (diversity only)

The first dataset comprises just 4 students. Here is what it looks like. The
name of this dataset indicates that it is for the diversity-based-assignment
(dba) model and that it consists of the group composition (gc) information.

```{r}
dba_gc_ex001
```

It is intuitive that an assignment into two groups of size two, based on the
diversity of majors alone, should assign students 1 and 2 into the first 
group and the remaining two students into another group.

The corresponding YAML `dba_gc_ex001.yml` file for this exercise consists of the
following lines:

```{r echo=FALSE, comment=''}
cat(readLines(system.file("extdata",  "dba_params_ex001.yml",   
                          package = "grouper")),  sep = '\n')
```

To run the assignment using only the primary major (ignoring the skill), we 
can use the following commands. We can use either the gurobi solver, or the 
glpk solver for this example. Both are equally fast.

```{r}
# indicate appropriate columns using integer ids.
df_ex001_list <- extract_student_info(dba_gc_ex001, "diversity",
                                  demographic_cols = 2, skills = 3, 
                                  self_formed_groups = 4)
yaml_ex001_list <- extract_params_yaml(system.file("extdata", 
                                             "dba_params_ex001.yml",  
                                             package = "grouper"),
                                       "diversity")
m1 <- prepare_model(df_ex001_list, yaml_ex001_list, assignment="diversity",
                    w1=1.0, w2=0.0)
#result3 <- solve_model(m1, with_ROI(solver="gurobi"))
result3 <- solve_model(m1, with_ROI(solver="glpk"))
assign_groups(result3, assignment = "diversity", dframe=dba_gc_ex001, 
              group_names="groups")
```

We can see that students 1 and 2 have been assigned to topic 1, repetition 1. Students 
3 and 4 have been assigned to topic 2, repetition 1.

## Dataset 001 (skills only)

```{r}
# indicate appropriate columns using integer ids.
df_ex001_list <- extract_student_info(dba_gc_ex001, "diversity",
                                  demographic_cols = 2, skills = 3, 
                                  self_formed_groups = 4)
yaml_ex001_list <- extract_params_yaml(system.file("extdata", 
                                             "dba_params_ex001.yml",  
                                             package = "grouper"),
                                       "diversity")
m1a <- prepare_model(df_ex001_list, yaml_ex001_list, assignment="diversity",
                    w1=0.0, w2=1.0)
#result3 <- solve_model(m1a, with_ROI(solver="gurobi"))
result3 <- solve_model(m1a, with_ROI(solver="glpk"))

assign_groups(result3, assignment = "diversity", dframe=dba_gc_ex001, 
              group_names="groups")

get_solution(result3, smin)
get_solution(result3, smax)
```

We can see that students 1 and 2 have been assigned to topic 1, repetition 1. Students 
3 and 4 have been assigned to topic 2, repetition 1.

## Dataset 003

This dataset demonstrates the use of a custom dissimilarity matrix instead of
using the default Gower distance from the
[cluster](https://cran.r-project.org/package=cluster) package.

```{r}
dba_gc_ex003
```

Now consider a situation where we wish to consider years 1 and 2 different from
years 3 and 4, and `math` and `dsds` (STEM majors) to be different from `elts`
and `history` (non-STEM majors). For each difference, we assign a score of 1.

This means that students 1 and 2 would have a dissimilarity score of 1 due to 
their difference in majors. Students 1 and 3 would also have a score of 1, but 
due to their difference in years. Students 1 and 4 would have score of 2, due to
their differences in majors and in years. The overall dissimilarity matrix would 
be:

```{r}
d_mat <- matrix(c(0, 1, 1, 2,
                  1, 0, 2, 1,
                  1, 2, 0, 1,
                  2, 1, 1, 0), nrow=4, byrow = TRUE)
```

To run the optimisation for this model, we can execute the following code:

```{r}
df_ex003_list <- extract_student_info(dba_gc_ex003, "diversity",
                                skills = NULL,
                                self_formed_groups = 3,
                                d_mat=d_mat)
yaml_ex003_list <- extract_params_yaml(system.file("extdata",   
                                                   "dba_params_ex003.yml",   
                                                   package = "grouper"), 
                                       "diversity")
m3 <- prepare_model(df_ex003_list, yaml_ex003_list, w1=1.0, w2=0.0)
result <- solve_model(m3, with_ROI(solver="glpk", verbose=TRUE))

assign_groups(result, "diversity", dba_gc_ex003, group_names="self_groups")
```

As you can see, the members of the two groups have maximal difference between 
them - they differ in terms of their year, and in terms of their major.

## Dataset 004

In this example, we demonstrate that `grouper` provides the flexibility to constrain
group sizes for individual topics. This could be useful in a situation where a 
particular project topic may require a larger group.

The dataset we use contains only skill levels (Python skills, higher corresponding
to more skill).

```{r}
dba_gc_ex004
```

Suppose we wish to assign the students to two topics, but the second topic requires
3 members, and the first requires only 2. In this example, we only utilise the 
skill levels; no demographic variables are included in the objective function.

```{r}
df_ex004_list <- extract_student_info(dba_gc_ex004, 
                                      skills = 2, 
                                      self_formed_groups = 3, 
                                      d_mat=matrix(0, 5, 5))
yaml_ex004_list <- extract_params_yaml(system.file("extdata",    
                                                   "dba_params_ex004.yml",   
                                                   package = "grouper"),  
                                       "diversity")
m4 <- prepare_model(df_ex004_list, yaml_ex004_list, w1=0.0, w2=1.0)
result <- solve_model(m4, with_ROI(solver="glpk", verbose=TRUE))
assign_groups(result, "diversity", dba_gc_ex004, group_names="self_groups")
```

Due to the constraints, topic 2 was assigned 3 members, while preserving the 
total skill level in each group (to be 4).

# Preference-Based Assignment

## Dataset 002

The second datasets comprises 8 students. Here is a listing of the dataset:

```{r}
pba_gc_ex002
```

Each student is in a self-formed group of size 2, indicated via the `grouping`
column. Suppose that, for this set of students, the instructor wishes to assign
students into two topics, with each topic having two sub-groups. This requires
the preference matrix to have 4 columns - one for each topic-subgroup
combination. Remember that the ordering of topics/subtopics should be:

T1S1, T2S1, T1S2, T2S2

There should be 4 rows in the preference matrix - one for each self-formed
group.

```{r}
pba_prefmat_ex002
```

It is possible to assign each self-formed group to its optimal choice of
topic-subtopic combination. In our solution, we should see that group 1 is
assigned to subtopic 1 of topic 1, group 2 is assigned to sub-topic 1 of topic
2, and so on.

```{r}
df_ex002_list <- extract_student_info(pba_gc_ex002, "preference", 
                                      self_formed_groups = 2, 
                                      pref_mat = pba_prefmat_ex002)
yaml_ex002_list <- extract_params_yaml(system.file("extdata", 
                                             "pba_params_ex002.yml",  
                                             package = "grouper"),
                                       "preference")
m2 <- prepare_model(df_ex002_list, yaml_ex002_list, "preference")

#result2 <- solve_model(m2, with_ROI(solver="gurobi"))
result2 <- solve_model(m2, with_ROI(solver="glpk"))
assign_groups(result2, assignment = "preference", 
              dframe=pba_gc_ex002, yaml_ex002_list, 
              group_names="grouping")
```