Technical note on Cumulative Link Mixed Models (CLMMs) in R with the package 'ordinal'

Abstract

Background

Cumulative Link Mixed Models (CLMMs) make it possible to analyse ordinal response variables while allowing the use of random effects.

Findings

In the following case study on groups of mallards, the ordinal response variable is the order in which individuals arrive at the end of a maze (first, second or third). Fourteen different explanatory variables were included in the full model, as well as two random variables (individual identity and trial number). Then, by backward stepwise model selection, I produced a minimal model only containing significant explanatory variables. I illustrate the outcome of the minimal model by plotting the expected probabilities of being at a specific rank (first, second or third) for a combination of the explanatory variables.

Conclusions

CLMMs represent an effective, yet still little used, tool for analysing ordinal data. Moreover, this tool facilitates the production of clear graphs, thus allowing to illustrate the results in a convincing way.

Keywords

Cumulative Link Mixed Models, CLMM, Ordinal regression models, ordinal R package

Full version

A full version of the tutorial is available here.

Introduction

The example presented here takes the data published by Bousquet et al. in Behaviour (2017) and shows how to model the effects of different parameters on the order of individuals during collective movements of female mallards.

The retained variables for this experiment are:

  • Individual characteristics:
    • Scaled Mass Index of each bird [high values indicate higher needs]
    • Average exploration score of each bird in the personality tests (3 novel environment tests and 3 novel object tests) [high values indicate more exploratory birds]
    • Frequency of being the nearest neighbour of another bird [high values indicate social birds]
    • Average crossing time of the maze during the learning phase [high values indicate slower birds]
    • Number of passages in the maze, to control for maze exposure


  • Pre-departure behaviours (performed in the area where the mallards were grouped together for 5 minutes before the opening of the sliding door into the maze):

    • Mobility of each bird during the pre-departure period [high values indicate more mobile birds]
    • Surface explored by each bird [high values indicate that the bird explored a higher portion of the pre-departure area]
    • Number of vocalizations [high values indicate more vocal birds]
    • Time spent foraging
    • Number of attacks emitted towards other birds
    • Number of attacks received from other birds
    • Distance to the door


  • Specific characteristics:

    • Side where the reward was present during learning phase
    • Whether the individual is in the majority or in the minority


As all individuals are females that hatched on the same day, sex and age are not taken into account.

As we are interested in the relative effects of these variables in groups of birds, we first compute for each experimental group the difference between each bird’s values and the average values for the corresponding group.

In order to improve the interpretability of the estimates, we scale all numeric variables before adding them to the model.

Note: when the ordinal variable has only two levels, it is equivalent to use either a CLMM approach or a random-effect logistic regression (see the full version of the tutorial if interested).

A - Setup

1. Load the necessary packages to run the script

(use install.packages(“package”) before if the package is not yet installed on your machine)

library(dplyr)
library(tidyr)
library(ordinal)
library(lme4)
library(ggplot2)
library(car)
library(MuMIn)

2. Import the data

load the files “Leadership_RawResults.csv” and “Leadership_IndividualCharacteristics.csv”

Leadership <- read.csv("Leadership_RawResults.csv", header = TRUE)
Ind_Charac <- read.csv("Leadership_IndividualCharacteristics.csv", header = TRUE)

3. Join the two datasets

Leadership <- Leadership %>%
  left_join(Ind_Charac, by = "Individual")

4. Specify which variables are factors

Treating OrderEnd as a factor is essential to run CLMMs: their response variable has to be a (ordered) factor

Leadership$OrderEnd <- factor(Leadership$OrderEnd, ordered = TRUE)
Leadership$Individual <- factor(Leadership$Individual)
Leadership$Group <- factor(Leadership$Group)
Leadership$OrderAll <- factor(Leadership$OrderAll)
Leadership$Side <- factor(Leadership$Side)
Leadership$MajMin <- factor(Leadership$MajMin)

5. Form subset for the experiment investigated here

The analysis of OrderEnd takes into account only events in which the group did not split up.

In order to produce predictions on new data, it is important to limit the levels of OrderEntrance and OrderEnd to the levels existing in each set. Hence the use of the droplevels() function

Set2_nf <- Leadership %>%
  filter(Set == 2)
Set2_nf$OrderEnd <- droplevels(Set2_nf$OrderEnd)

B - Analysis of the end order CLMMs in Set2

This set focuses on trios of female mallards with identical rewards but different spatial information on where to find the rewards.

1. create group-level variables (differences between individuals and group average)

Set2_nf <- Set2_nf %>%
  group_by(OrderAll) %>%
  mutate(PrD_Mob_av = mean(Mobility),
         PrD_Exp_av = mean(Explored_Surface),
         PrD_Voc_av = mean(PrD_voca),
         PrD_For_av = mean(PrD_foraging),
         PrD_Emi_av = mean(Att_Emitted),
         PrD_Suf_av = mean(Att_Suffered),
         PrD_Dis_av = mean(DistDoor),
         SMI_av = mean(SMI),
         Exp_av = mean(Exploration),
         LTi_av = mean(LearningTime),
         FNN_av = mean(FreqNN),
         NPa_av = mean(NumPas),
         PrD_Mob_rel = Mobility - PrD_Mob_av,
         PrD_Exp_rel = Explored_Surface - PrD_Exp_av,
         PrD_Voc_rel = PrD_voca - PrD_Voc_av,
         PrD_For_rel = PrD_foraging - PrD_For_av,
         PrD_Emi_rel = Att_Emitted - PrD_Emi_av,
         PrD_Suf_rel = Att_Suffered - PrD_Suf_av,
         PrD_Dis_rel = DistDoor - PrD_Dis_av,
         SMI_rel = SMI - SMI_av,
         Exp_rel = Exploration - Exp_av,
         LTi_rel = LearningTime - LTi_av,
         FNN_rel = FreqNN - FNN_av,
         NPa_rel = NumPas - NPa_av)

2. scale numerical variables

The as.numeric() function avoids creating lists.

Set2_nf$PrD_Mob_rel_s <- as.numeric(scale(Set2_nf$PrD_Mob_rel))
Set2_nf$PrD_Exp_rel_s <- as.numeric(scale(Set2_nf$PrD_Exp_rel))         
Set2_nf$PrD_Voc_rel_s <- as.numeric(scale(Set2_nf$PrD_Voc_rel))         
Set2_nf$PrD_For_rel_s <- as.numeric(scale(Set2_nf$PrD_For_rel))         
Set2_nf$PrD_Emi_rel_s <- as.numeric(scale(Set2_nf$PrD_Emi_rel))         
Set2_nf$PrD_Suf_rel_s <- as.numeric(scale(Set2_nf$PrD_Suf_rel))         
Set2_nf$PrD_Dis_rel_s <- as.numeric(scale(Set2_nf$PrD_Dis_rel))         
Set2_nf$SMI_rel_s <- as.numeric(scale(Set2_nf$SMI_rel))         
Set2_nf$Exp_rel_s <- as.numeric(scale(Set2_nf$Exp_rel))         
Set2_nf$LTi_rel_s <- as.numeric(scale(Set2_nf$LTi_rel))         
Set2_nf$FNN_rel_s <- as.numeric(scale(Set2_nf$FNN_rel))         
Set2_nf$NPa_rel_s <- as.numeric(scale(Set2_nf$NPa_rel))

3. Analysis of the end order in the maze

3a. Modelling

First, we create a null model containing only the random effects, here the identifier of the experimental group (OrderAll) and the identifier of each individual (Individual)

## End Order
# Null CLMM
CLMM_End_S2_Null <- clmm(OrderEnd ~ 1 +
                            (1|OrderAll) + (1|Individual), data = Set2_nf)
summary(CLMM_End_S2_Null)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: OrderEnd ~ 1 + (1 | OrderAll) + (1 | Individual)
## data:    Set2_nf
## 
##  link  threshold nobs logLik AIC    niter   max.grad cond.H 
##  logit flexible  90   -87.26 182.51 94(291) 1.03e-06 1.1e+01
## 
## Random effects:
##  Groups     Name        Variance  Std.Dev. 
##  OrderAll   (Intercept) 6.518e-16 2.553e-08
##  Individual (Intercept) 2.532e+00 1.591e+00
## Number of groups:  OrderAll 30,  Individual 10 
## 
## No Coefficients
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -0.9383     0.5792  -1.620
## 2|3   1.0818     0.5805   1.864

We can see that the random effect of the experimental group (OrderAll) does not explain any variance, but we keep it to respect the experimental protocol.

Next, we implement the full model with all relevant (scaled) variables for Set2.

# Full CLMM
CLMM_End_S2_Full <- clmm(OrderEnd ~ PrD_Mob_rel_s + PrD_Exp_rel_s + PrD_Voc_rel_s + PrD_For_rel_s + PrD_Emi_rel_s +
                                PrD_Suf_rel_s + PrD_Dis_rel_s + SMI_rel_s + Exp_rel_s + LTi_rel_s + FNN_rel_s + NPa_rel_s +
                                Side + MajMin +
                                (1|OrderAll) + (1|Individual), data = Set2_nf)
summary(CLMM_End_S2_Full)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: 
## OrderEnd ~ PrD_Mob_rel_s + PrD_Exp_rel_s + PrD_Voc_rel_s + PrD_For_rel_s +  
##     PrD_Emi_rel_s + PrD_Suf_rel_s + PrD_Dis_rel_s + SMI_rel_s +  
##     Exp_rel_s + LTi_rel_s + FNN_rel_s + NPa_rel_s + Side + MajMin +  
##     (1 | OrderAll) + (1 | Individual)
## data:    Set2_nf
## 
##  link  threshold nobs logLik AIC    niter      max.grad cond.H 
##  logit flexible  75   -49.48 134.97 1380(2801) 3.28e-05 1.9e+02
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  OrderAll   (Intercept) 0.0000   0.0000  
##  Individual (Intercept) 0.4849   0.6964  
## Number of groups:  OrderAll 25,  Individual 10 
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## PrD_Mob_rel_s   1.3284     0.8474   1.568   0.1170    
## PrD_Exp_rel_s  -0.2305     0.4255  -0.542   0.5880    
## PrD_Voc_rel_s  -1.0112     0.7287  -1.388   0.1652    
## PrD_For_rel_s  -0.7188     0.4513  -1.593   0.1112    
## PrD_Emi_rel_s  -0.3871     0.4844  -0.799   0.4243    
## PrD_Suf_rel_s  -0.6252     0.5662  -1.104   0.2695    
## PrD_Dis_rel_s   0.1599     0.3481   0.460   0.6459    
## SMI_rel_s      -2.1614     1.0325  -2.093   0.0363 *  
## Exp_rel_s       1.2972     0.7235   1.793   0.0730 .  
## LTi_rel_s      -0.7494     0.6488  -1.155   0.2481    
## FNN_rel_s       2.3422     0.5962   3.928 8.56e-05 ***
## NPa_rel_s       0.6103     0.3300   1.849   0.0644 .  
## SideG           2.1551     1.6245   1.327   0.1846    
## MajMin1        -0.7692     0.7328  -1.050   0.2938    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -0.8786     1.1600  -0.757
## 2|3   2.2758     1.0643   2.138
## (15 observations deleted due to missingness)

By conducting backwards elimination of non-significant terms, we arrive to a minimal model.

# backward elimination of non-significant term, only the first step and the minimal model are shown here
CLMM_End_S2_Full_drop <- drop1(CLMM_End_S2_Full)
CLMM_End_S2_Full_drop[CLMM_End_S2_Full_drop$AIC == min(CLMM_End_S2_Full_drop$AIC), ]
## Single term deletions
## 
## Model:
## OrderEnd ~ PrD_Mob_rel_s + PrD_Exp_rel_s + PrD_Voc_rel_s + PrD_For_rel_s + 
##     PrD_Emi_rel_s + PrD_Suf_rel_s + PrD_Dis_rel_s + SMI_rel_s + 
##     Exp_rel_s + LTi_rel_s + FNN_rel_s + NPa_rel_s + Side + MajMin + 
##     (1 | OrderAll) + (1 | Individual)
##               Df    AIC
## PrD_Dis_rel_s  1 133.18
CLMM_End_S2_min <- clmm(OrderEnd ~ FNN_rel_s +
                          (1|OrderAll) + (1|Individual), data = Set2_nf)
summary(CLMM_End_S2_min)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: OrderEnd ~ FNN_rel_s + (1 | OrderAll) + (1 | Individual)
## data:    Set2_nf
## 
##  link  threshold nobs logLik AIC    niter    max.grad cond.H 
##  logit flexible  90   -78.22 166.43 133(402) 4.23e-05 5.3e+00
## 
## Random effects:
##  Groups     Name        Variance  Std.Dev. 
##  OrderAll   (Intercept) 4.618e-11 6.795e-06
##  Individual (Intercept) 9.903e-01 9.951e-01
## Number of groups:  OrderAll 30,  Individual 10 
## 
## Coefficients:
##           Estimate Std. Error z value Pr(>|z|)    
## FNN_rel_s   1.5626     0.3842   4.067 4.75e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.0420     0.4371  -2.384
## 2|3   1.1924     0.4350   2.741

Only the frequency of being a nearest neighbour of another individual outside in daily life has a significant effect on the probability to arrive first at the end of the maze. Being frequently a nearest neighbour (i.e., being more socially connected) decreases the probability to arrive first at the end of the maze.

3b. Visualization of the effects

To visualize the effects, we cannot use the clmm() function (at the time of writing), but we have to use the clmm2() function, which authorizes only 1 random effect to be specified. Its syntax is slightly different. We have to specify Hess = TRUE to get a summary.

# in order to obtain predicted values, switch to clmm2()
CLMM2_End_S2_min <- clmm2(OrderEnd ~ FNN_rel_s,
                               random = Individual, Hess = TRUE, data = Set2_nf)
summary(CLMM2_End_S2_min)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## Call:
## clmm2(location = OrderEnd ~ FNN_rel_s, random = Individual, data = Set2_nf, 
##     Hess = TRUE)
## 
## Random effects:
##                  Var   Std.Dev
## Individual 0.9880973 0.9940308
## 
## Location coefficients:
##           Estimate Std. Error z value Pr(>|z|)  
## FNN_rel_s  1.5624   0.3840     4.0684 4.7342e-05
## 
## No scale coefficients
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2 -1.0418   0.4368    -2.3850
## 2|3  1.1923   0.4348     2.7424
## 
## log-likelihood: -78.21655 
## AIC: 164.4331 
## Condition number of Hessian: 5.319765

We can now predict values on new data, which will cover the whole parameter space of FNN_rel_s.

# New data for Set2
newdata_Set2_End <- expand.grid(FNN_rel = seq(from = -0.036,
                                              to = 0.036,
                                              by = 0.001))

# the next line enables to report the data on the scaled units
newdata_Set2_End$FNN_rel_s <- (newdata_Set2_End$FNN_rel - mean(Set2_nf$FNN_rel)) / sd(Set2_nf$FNN_rel)

Now, we predict the values for the new data based on the modelling approach. The sapply() is important to get the results for each of the three outcomes (first, second or third in the end order).

predict_Set2_End <- sapply(as.character(1:3),
                            function(x) {
                              newdata1 = expand.grid(FNN_rel_s = seq(from = min(newdata_Set2_End$FNN_rel_s),
                                                                     to = max(newdata_Set2_End$FNN_rel_s),
                                                                     by =  (0.001 - mean(Set2_nf$FNN_rel)) / sd(Set2_nf$FNN_rel)),
                                                     OrderEnd = factor(x, levels = levels(Set2_nf$OrderEnd)))
                              predict(CLMM2_End_S2_min, newdata = newdata1) })


# bind together the new data and the predicted values
predict_Set2_End <- cbind(newdata_Set2_End, predict_Set2_End)

# pass the dataframe in a long format
predict_Set2_End <- predict_Set2_End %>%
  gather("1", "2", "3", key = "Rank", value = "prob")

Once predicted values have been computed, it is easy to plot the effects by using ggplot2.

predict_Set2_End %>%
  ggplot(aes(x = FNN_rel, y = prob, colour = Rank)) +
  geom_point() +
  geom_line() +
  labs(x = "Frequency of being a nearest neighbour\nrelative to the average of the group (in percent)",
       y = "Probability to attain a specific rank") +
  theme_gray(base_size = 15)

By plotting the random effect of the identity of the mallards, we can verify that all but one individual have a 95% confidence interval that contains 0: only individual 5 behaved very differently from the others. As its individual effect is positive, we can conclude that this individual had consistently a higher arrival order than all other individuals: if individual 5 is in a group, it is very likely to arrive among the last.

# random effect for Set2 End order
ci_Set2_End <- CLMM2_End_S2_min$ranef + qnorm(0.975) * sqrt(CLMM2_End_S2_min$condVar) %o% c(-1, 1)
ord.re_Set2_End <- order(CLMM2_End_S2_min$ranef)
ci_Set2_End <- ci_Set2_End[order(CLMM2_End_S2_min$ranef), ]
plot(1:10, CLMM2_End_S2_min$ranef[ord.re_Set2_End], axes = FALSE, ylim = range(ci_Set2_End),
     xlab = "Individual", ylab = "Individual effect")
axis(1, at = 1:10, labels = ord.re_Set2_End)
axis(2)
for(i in 1:10) segments(i, ci_Set2_End[i, 1], i, ci_Set2_End[i, 2])
abline(h = 0, lty = 2)

4. Data availability

All codes and data present in this tehcnical note and in the full version of the tutorial are freely available on GitHub or upon request [chr_bousquet [at] protonmail.com]

5. References

Bousquet, Ahr, Sueur & Petit (2017). Determinants of leadership in groups of female mallards. Behaviour, 154:4, 467-507

Christensen (2019). ordinal - Regression Models for Ordinal Data. R package version 2019.12-10. https://CRAN.R-project.org/package=ordinal