---
title: "Variable importance methods"
author: "Alex Zwanenburg"
date: "2026-04-22"
output:
  rmarkdown::html_vignette:
    includes:
      in_header: familiar_logo.html
      after_body: license.html
    toc: TRUE
  rmarkdown::github_document:
    html_preview: FALSE
    includes:
      in_header: familiar_logo.html
      after_body: license.html
    toc: TRUE
bibliography: "refs.bib"
vignette: >
  %\VignetteIndexEntry{Variable importance methods}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



Variable importance methods score the importance of each feature.
This information is then used by the learning algorithm to determine which and
how many features should be included in the model.

| **method** | **tag** | **binomial** | **multinomial** | **continuous** | **survival** |
|:------------|:------------|:-----:|:-----:|:-----:|:-----:|:-----:|
| **correlation** |
| Pearson's *r* | `pearson`                                             |   |   | × | × |
| Spearman's *ρ* | `spearman`                                           |   |   | × | × |
| Kendall's *τ* | `kendall`                                             |   |   | × | × |
| **concordance** |
| concordance^a^ | `concordance`                                        | × | × | × | × |
| **CORElearn** |
| information gain ratio | `gain_ratio`                                 | × | × |   |   |
| gini-index | `gini`                                                   | × | × |   |   |
| minimum description length | `mdl`                                    | × | × |   |   |
| ReliefF with exponential weighting of distance ranks | `relieff_exp_rank`       | × | × | × |   |
| **mutual information**|
| mutual information maximisation | `mim`                               | × | × | × | × |
| mutual information features selection | `mifs`                        | × | × | × | × |
| minimum redundancy maximum relevance | `mrmr`                         | × | × | × | × |
| **univariate regression** |
| univariate regression | `univariate_regression`                       | × | × | × | × |
| **multivariate regression**| 
| multivariate regression | `multivariate_regression`                   | × | × | × | × |
| **lasso regression**| 
| general^a^ | `lasso`                                                  | × | × | × | × |
| logistic | `lasso_binomial`                                           | × |   |   |   |
| multi-logistic | `lasso_multinomial`                                  |   | × |   |   |
| normal (gaussian) | `lasso_gaussian`                                  |   |   | × |   |
| cox | `lasso_cox`                                                     |   |   |   | × |
| **ridge regression**|
| general^a^ | `ridge`                                                  | × | × | × | × |
| logistic | `ridge_binomial`                                           | × |   |   |   |
| multi-logistic | `ridge_multinomial`                                  |   | × |   |   |
| normal (gaussian) | `ridge_gaussian`                                  |   |   | × |   |
| cox | `ridge_cox`                                                     |   |   |   | × |
| **elastic net regression**| 
| general^a,b^ | `elastic_net`                                          | × | × | × | × |
| logistic^b^ | `elastic_net_binomial`                                  | × |   |   |   |
| multi-logistic^b^ | `elastic_net_multinomial`                         |   | × |   |   |
| normal (gaussian)^b^ | `elastic_net_gaussian`                         |   |   | × |   |
| cox^b^ | `elastic_net_cox`                                            |   |   |   | × |
| **random forest (RFSRC) variable importance**|
| permutation^b^ | `random_forest_permutation`                                     | × | × | × | × |
| permutation (unoptimised) | `random_forest_permutation_default`                  | × | × | × | × |
| hold-out ^b^ | `random_forest_holdout`                                           | × | × | × | × |
| hold-out (unoptimised) | `random_forest_holdout_default`                         | × | × | × | × |
| **random forest (ranger) variable importance**|
| permutation^b^ | `random_forest_ranger_permutation`                              | × | × | × | × |
| permutation (unoptimised) | `random_forest_ranger_permutation_default`           | × | × | × | × |
| hold-out permutation^b^ | `random_forest_ranger_holdout_permutation`             | × | × | × | × |
| hold-out perm. (unoptimised) |`random_forest_ranger_holdout_permutation_default` | × | × | × | × | 
| impurity^b^ | `random_forest_ranger_impurity`                                    | × | × | × | x |
| impurity (unoptimised) | `random_forest_ranger_impurity_default`                 | × | × | × | x |
| **special methods**|
| no selection | `none`                                                 | × | × | × | × |
| random selection | `random`                                           | × | × | × | × |
| signature only | `signature_only`                                     | × | × | × | × |

Table: Overview of variable importance methods. ^a^ This is a general method where
an appropriate specific method will be chosen, or multiple distributions or
linking families are tested in an attempt to find the best option. ^b^ This
method requires hyperparameter optimisation.

## Configuration options

Variable importance methods and related options can be set using the
`variable_importance` parameter in the *xml* file or as function argument.

| **tag** / **argument** | **description** | **default** |
|:----------:|:-------------------------|:-----------:|
| `vimp_method` | The desired variable importance method. Multiple methods may be provided at the same time. This setting has no default and must be provided. | -- (required) |
| `vimp_method_parameter` | Several variable importance methods have hyperparameters that can be set and/or optimised. | -- (optional) |
| `vimp_aggregation_method` | The aggregation method used to aggregate feature ranks over different bootstraps. | `borda` |
| `vimp_aggregation_rank_threshold` | Several aggregation methods count features if they have a rank below the threshold, i.e. are among the most important features. If `NULL`, a dynamic threshold is decided through Otsu-thresholding. | `5` | 
| `parallel_vimp` | Enables parallel processing for variable importance. Ignored if `parallel = FALSE`. Overrides `parallel = TRUE`. | `TRUE` |

## Providing parameters for variable importance computation

Some of the variable importance methods, notably those based on random forests and
(penalised) regression, have parameters that can be configured. These parameters are
mentioned under the respective entries in the [Overview of variable importance
methods] section. Moreover, some of these parameters are model parameters. In
this case, these parameters are optimised using hyperparameter optimisation,
which is described in the *learning algorithms and hyperparameter optimisation* vignette.

The syntax for such parameters is the same as for hyperparameter optimisation.
For the `multivariate_regression` variable importance method the `alpha` parameter
(which determines feature elimination during forward selection) may be provided as
follows using the configuration file:
```
<vimp_method_parameter>
  <multivariate_regression>
    <alpha>0.05</alpha>
  </multivariate_regression>
</vimp_method_parameter>
```

Or as a nested list passed as the `vimp_method_parameter` argument to `summon_familiar`:
```
vimp_method_parameter = list("multivariate_regression"=list("alpha"=0.05))
```

# Overview of variable importance methods

The variable importance methods implemented in familiar are described in more
detail in this section.

## Correlation methods

Correlation methods determine variable importance by assessing the correlation
between a feature and the outcome of interest. High (anti-)correlation indicates
an important feature, whereas low (anti-)correlation indicates that a feature is
not directly related to the outcome. Correlation-based variable importance is
determined using the `cor` function of the `stats` package that is part of the R
core distribution [@rcore2018].

Three correlation coefficients can be computed:

- `pearson`: Pearson’s $r$
- `spearman`: Spearman’s $\rho$
- `kendall`: Kendall’s $\tau$

To compute correlation of features with survival outcomes, only samples with an
event are considered.

## Concordance methods

Concordance methods assess how well the ordering of feature values corresponds
to the ordering of the outcome. The method internally refers to the `kendall`
method for continuous outcomes. For categorical outcomes, i.e. `binomial` and
`multinomial`, concordance is assessed using the area under the receiver
operating characteristic curve. For survival outcomes, concordance is measured
using the `concordance_index`.

## CORElearn methods

Familiar provides an interface to several variable importance methods
implemented in the `CORElearn` package. These methods are the *Information Gain Ratio* 
(`gain_ratio`), the *Gini-index* (`gini`), *Minimum Description Length* (`mdl`) 
and *ReliefF* and *rReliefF* with exponential distance rank weighting
(`relieff_exp_rank`).

## Mutual information-based methods

Mutual information $I$ is a measure of interdependency between two variables $x$
and $y$. In the context of variable importance, $x$ is a feature vector and $y$ is
the outcome vector.

Computing mutual information requires that probability distributions of $x$ and
$y$ are known. In practice, we don't know either one. For categorical $x$ and $y$
we can use the sample estimates instead. For continuous or mixed data, the
situation is more complex, and is often resolved through binning.

In familiar we therefore use the following two approaches to compute mutual
information:

1. For binomial, multinomial and continuous outcomes mutual information is
computed using the `praznik` package.

2. For survival outcomes we adapted the approximation proposed by De 
Jay et al. after Gel'fand and Yaglom [@Gelfand1959-de; @De_Jay2013-yl] for
use with a concordance index: $I  = -0.5 \log(1 - (2 * (C-0.5))^2 + \epsilon)$, 
with $C$ the concordance index.

### Mutual information maximisation

The `mim` method is a univariate method that ranks each feature by its mutual
information with the outcome.

### Mutual information variable importance

Mutual information variable importance (MIFS) finds a feature set that maximises
mutual information [@Battiti1994-ja]. This is done using forward selection. As
in mutual information maximisation, mutual information $I_{y,j}$ between each
feature and the outcome is computed. Starting from a potential pool of all
features, the feature with the highest mutual information is selected and
removed from the pool.

The rest proceeds iteratively. The mutual information $I_{s,1j}$ between
the previously selected feature and the remaining features is computed. This
mutual information is also called *redundancy*. The feature with the highest 
mutual information with the outcome and least redundancy (i.e. maximum
$I_{y,j} - I_{s,1j}$) is selected next, and removed from the pool of remaining
features. Then the mutual information $I_{s,2j}$ between this feature and
remaining features is computed, and the feature that maximises
$I_{y,j} - I{s,1j} - I_{s,2j}$ is selected, and so forth.

The iterative process stops if there is no feature $j$ for which
$I_{y,j} - \sum_{i\in S}  I_{s,ij} > 0$, with $S$ being the subset of selected
features, or all features have been exhausted.

To reduce the number of required computations, the implementation in familiar
actively eliminates any feature $j$ for which
$I_{y,j} - \sum_{i\in S}  I_{s,ij} \leq 0$ at the earliest instance, as the
$\sum_{i\in S}  I_{s,ij}$ term will monotonously increase.

### Minimum redundancy maximum relevance

Minimum redundancy maximum relevance (mRMR) is similar to
MIFS but differs in the way redundancy is used during optimisation
[@Peng2005-oo]. Whereas for MIFS the optimisation criterion is
$I_{y,j} - \sum_{i\in S} I_{s,ij}$, in mRMR the optimisation criterion is
$I_{y,j} - \frac{1} {\left| S \right|} \sum_{i\in S} I_{s,ij}$,
with $\left| S \right|$ the number of features already selected.

To limit computational complexity, features for which
$I_{y,j} - \frac{1} {\left| S \right|} \sum_{i\in S} I_{s,ij} \leq 0$,
are eliminated.

## Univariate and multivariate regression methods

Univariate and multivariate regression use a single feature or set of features
as predictors for regression models, respectively. The performance of the
regression model is then measured using a metric. Training and testing of
regression models are repeated multiple times using bootstraps. For each
bootstrap, the in-bag samples are used for training and the out-of-bag samples
are using for testing.

This also defines the parameters of both methods, which are shown in the table
below.

| **parameter** | **tag** | **values** | **optimised** | **comments** |
|:--------------|:--------|:----------:|:----------:|:-------------|
| regression learner | `learner` | dependent on outcome | no | Any generalised linear regression model from the *learning algorithms and hyperparameter optimisation* vignette can be selected. Default values are `glm_logistic` for binomial, `glm_multinomial` for multinomial, `glm_gaussian` for continuous, and `cox` for survival outcomes. |
| performance metric | `metric` | dependent on outcome | no | Any metric from the *performance metrics* vignette can be selected. Default values are `auc_roc` for binomial and multinomial, `mse` for continuous, and `concordance_index` for survival outcomes |
| number of bootstraps | `n_bootstrap` | $\mathbb{Z} \in \left[1, \infty\right)$ | no | The default value is $10$. |
| drop-out alpha level | `alpha` | $\mathbb{R} \in \left[0, 1\right]$ | no | The default value is $0.33$. Only used in multivariate regression. |

### Univariate regression

In the univariate regression method, a regression model is built with each
feature separately using the in-bag data of the bootstrap. Then this model is
evaluated using the metric, expressed as an objective score with standardised optimum (see
*computing the objective score* in the *learning algorithms and hyperparameter optimisation* vignette). 

Then, the median of objective scores on out-of-bag data is used as variable
importance for each feature.

### Multivariate regression

The procedure described for univariate regression forms the first step in
multivariate regression. The rest follows forward selection. The most
important feature is assigned to the subset of selected features and removed
from the set of available features. Separate regression models are then built
with each remaining feature and all the feature(s) in the selected feature
subset as predictors. Thus, the subset of selected features iteratively increases
in size until no features are remaining or the objective score no longer
increases.

To limit mostly redundant computation, features that are unlikely to be selected
are actively removed. To do so, the standard deviation of the objective score
over the bootstraps is computed for each feature. The fraction `alpha` of the 
lowest scoring features is eliminated each iteration.

## Lasso, ridge and elastic net regression

Penalised regression is also a form of feature selection, as it selects an
'optimal' set of features to create a regression model. As features are usually
normalised as part of pre-processing, the magnitude of each coefficient can be
interpreted as its importance. All three shrinkage methods are implemented using
the `glmnet` package [@Hastie2009-ed; @Simon2011-ih].

Only elastic net regression has a model hyperparameter that requires
optimisation, but other parameters may be set as well, as shown in the table
below:

| **parameter** | **tag** | **values** | **optimised** | **comments** |
|:--------------|:--------|:----------:|:----------:|:-------------|
| family | `family` | `gaussian`, `binomial`, `poisson`, `multinomial`, `cox` | continuous outcomes | For continuous outcomes `gaussian` and `poisson` may be tested. The family is not optimised when it is specified, e.g. `lasso_gaussian`. For other outcomes only one applicable family exists.|
| elastic net penalty | `alpha` | $\mathbb{R} \in \left[0,1\right]$ | elastic net | This penalty is fixed for ridge regression (`alpha = 0`) and lasso (`alpha = 1`). |
| optimal lambda | `lambda_min` | `lambda.1se`, `lambda.min` | no | Default is `lambda.min`.|
| number of CV folds | `n_folds` | $\mathbb{Z} \in \left[3,n\right]$ | no | Default is $3$ if $n<30$, $\lfloor n/10\rfloor$ if $30\leq n \leq 200$ and $20$ if $n>200$.|
| normalisation | `normalise` | `FALSE`, `TRUE` | no | Default is `FALSE`, as normalisation is part of pre-processing in familiar.|

## Random forest-based methods

Several feature selection methods are based on random forests. All these methods
require training a random forest. Hence, `familiar` will train a random
forest based on the training data prior to determining variable importance.
Random forest learners have a set of hyperparameters that can be optimised prior
to training. These parameters, which are slightly different
for `ranger`-based and `randomForestSRC`-based methods, are shown below.

| **parameter** | **tag** | **values** | **optimised** | **comments** |
|:--------------|:--------|:----------:|:----------:|:-------------|
| number of trees | `n_tree` | $\mathbb{Z} \in \left[0,\infty\right)$ | yes |  This parameter is expressed on the $\log_{2}$ scale, i.e. the actual input value will be $2^\texttt{n_tree}$ [@Oshiro2012-mq]. The default range is $\left[4, 10\right]$.|
| subsampling fraction | `sample_size` | $\mathbb{R} \in \left(0, 1.0\right]$ | yes | Fraction of available data that is used for to create a single tree. The default range is $\left[2 / m, 1.0\right]$, with $m$ the number of samples. |
| number of features at each node | `m_try` | $\mathbb{R} \in \left[0.0, 1.0\right]$ | yes | Familiar ensures that there is always at least one candidate feature. |
| node size | `node_size` | $\mathbb{Z} \in \left[1, \infty\right)$ | yes | Minimum number of unique samples in terminal nodes. The default range is $\left[5, \lfloor m / 3\rfloor\right]$, with $m$ the number of samples.|
| maximum tree depth | `tree_depth` | $\mathbb{Z} \in \left[1,\infty\right)$ | yes | Maximum depth to which trees are allowed to grow. The default range is $\left[1, 10\right]$. |
| number of split points | `n_split` | $\mathbb{Z} \in \left[0, \infty\right)$ | no | By default, splitting is deterministic and has one split point ($0$).|
| splitting rule (`randomForestSRC` only) | `split_rule` | `gini`, `auc`, `entropy`, `mse`, `quantile.regr`, `la.quantile.regr`, `logrank`, `logrankscore`, `bs.gradient` | no | Default splitting rules are `gini` for `binomial` and `multinonial` outcomes, `mse` for `continuous` outcomes and `logrank` for `survival` outcomes.|
| splitting rule (`ranger` only) | `split_rule` | `gini`, `hellinger`, `extratrees`, `beta`, `variance`, `logrank`, `C`, `maxstat`| no | Default splitting rules are `gini` for `binomial` and `multinomial` outcomes and `maxstat` for `continuous` and `survival` outcomes.|
| significance split threshold (`ranger` only) | `alpha` | $\mathbb{R} \in \left(0.0, 1.0\right]$ | `maxstat` | Minimum significance level for further splitting. The default range is $\left[10^{-6}, 1.0\right]$ |

The unoptimised methods do not require hyperparameter optimisation, and use
default values from the `ranger` and `randomForestSRC`.

### Permutation importance

The permutation importance method is implemented by `random_forest_permutation`
and `random_forest_permutation_default` (`randomForestSRC` package) and
`random_forest_ranger_permutation` and
`random_forest_ranger_permutation_default` (`ranger` package). In short, this
method functions as follows [Ishwaran2007-va]. As usual, each tree in the random
forest is constructed using the in-bag samples of a bootstrap of the data. The
predictive performance of each model is first measured using the out-of-bag
data. Subsequently, the out-of-bag instances for each feature are randomly
permuted, and predictive performance is assessed again. The difference between
the normal performance and the permuted performance is used as a measure of the
variable importance. For important features, this difference is large, whereas
for irrelevant features the difference is negligible or even negative.

### Holdout permutation importance

This variant on permutation importance
(`random_forest_ranger_holdout_permutation` and
`random_forest_ranger_holdout_permutation_default`) is implemented using
`ranger::holdoutRF`. Instead of using out-of-bag to compute feature importance,
two cross-validation folds are used. A random forest is trained on either fold,
and variable importance determined on the other [@Janitza2018-kl].

The hold-out variable importance method implemented in the `randomForestSRC`
package (`random_forest_holdout` and `random_forest_holdout_default`) is
implemented using `randomForestSRC::holdout.vimp`. It is similar to the previous
variant, but does not use cross-validation folds. Instead, out-of-bag prediction
errors for models trained with and without each feature are compared.

### Minimum depth variable selection

Important features tend to appear closer to the root of trees in random forests.
Therefore, the position of each feature within a tree is assessed in minimum
depth variable selection [@Ishwaran2010-zv].

### Impurity importance

At each node, the data is split into (two) subsets, which connects to two
branches. After splitting, each single subset is purer than the parent dataset.
As a concrete example, in regression problems the variance of each of the
subsets is lower than that of the data prior to splitting. The decrease in
variance specifically, or the decrease of impurity generally, is then used to
assess feature importance.

`familiar` uses the `impurity_corrected` importance measure, which is unbiased
to the number of split points of a feature and its distribution
[@Nembrini2018-ay].

## Special methods

Familiar offers several methods that are special in that they are not feature
selection methods in the sense that they determine a variable importance that
can be used for establishing feature rankings.

### No variable importance

As the name suggests, the `none` method avoids computing variable importances
altogether. All features are passed into a model. Feature order is randomly
shuffled prior to building a model to avoid influence of the order of provided
features.

### Random variable importance

The `random` method randomly draws features prior to model building. It does not
assign a random variable importance to a feature. New features are drawn each
time a model is built. All features are available for the draw, but only $m$
features are drawn. Here $m$ is the signature size that is usually optimised by
hyperparameter optimisation.

### Signature only

When configuring familiar, any number of features can be set as a model
signature using the `signature` configuration parameter. However, more features
may be added to this signature after computing variable importances. To make
sure that only the provided features enter a model, the `signature_only` method
may be used.

# Aggregating variable importance

In case of variable importance or modelling in the presence of subsampling (e.g.
bootstraps), the ranks of features may need to be aggregated across the
different instances [@Wald2012-zk]. The rank aggregation methods shown in the
table below can be used for this purpose. Several methods require a threshold to
indicate the size of the set of most highly ranked features, which can be set by
specifying the `vimp_aggregation_rank_threshold` configuration parameter.

| **aggregation method** | **tag** | **comments** |
|:--------------|:--------|:----------:|
| none | `none` | |
| mean rank | `mean` | |
| median rank | `median` | |
| best rank | `best` | |
| worst rank | `worst` | |
| stability selection | `stability` | uses threshold |
| exponential selection | `exponential` | uses threshold  |
| borda ranking | `borda` | |
| enhanced borda ranking | `enhanced_borda`| uses threshold |
| truncated borda ranking | `truncated_borda` | uses threshold |
| enhanced truncated borda ranking | `enhanced_truncated_borda` | uses threshold |

## Notation

Let $N$ be the number of ranking experiments that should be aggregated. Feature
$i$ for experiment $j$ of $N$ then has rank $r_{ij}$. A lower rank indicates a
more important feature. Some features may not receive a score during a ranking
experiment, for example for multivariate variable importance methods such as
lasso regression, or through use of a threshold $\tau$. This is designated by
$\delta_{ij}$, which is $0$ if the feature is absent, and $1$ if it is present.

In case a threshold is used, $\delta_{ij} = 1$ if $r_{ij} \leq \tau$, and $0$
otherwise.

Thus, for each experiment $m_j = \sum^M_{i=1} \delta_{ij}$ features are ranked,
out of $M$ features. $m_j$ is then also the maximum rank found in experiment
$j$.

Aggregating ranks for each feature results in an aggregate rank score $s_i$.
Features are subsequently ranked according to this method-specific score to
arrive at an aggregate feature rank $r_i$.

## No rank aggregation

The `none` option does not aggregate ranks. Rather, scores are aggregated by
computing the average score of a feature over all experiments that contain it.
Ranks are then computed from the aggregated scores.

## Mean rank aggregation

The mean rank aggregation method ranks features by computing the mean rank of a
feature across all experiments that contain it.

$$s_i = \frac{\sum^{N}_{j=1} \delta_{ij} r_{ij}}{\sum^{N}_{j=1} \delta_{ij}}$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in ascending order.

## Median rank aggregation

The median rank aggregation method ranks features by computing the median rank
of a feature across all experiments that contain it.

$$s_i = \underset{j \in N, \, \delta_{ij}=1}{\textrm{median}}(r_{ij})$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in ascending order.

## Best rank aggregation

The best rank aggregation method ranks features by the best rank that a feature
has across all experiments that contain it.

$$s_i =  \underset{j \in N, \, \delta_{ij}=1}{\textrm{min}} (r_{ij})$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in ascending order.

## Worst rank aggregation

The worst rank aggregation method ranks features by the worst rank that a
feature has across all instances that contain it.

$$s_i =  \underset{j \in N, \, \delta_{ij}=1}{\textrm{max}} (r_{ij})$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in ascending order.

## Stability rank aggregation

The stability aggregation method ranks features by their occurrence within the
set of highly ranked features across all experiments. Our implementation
generalises the method originally proposed by Meinshausen and Bühlmann
[@Meinshausen2010-do].

This method uses threshold $\tau$ to designate the highly ranked features. Thus
$\delta_{ij} = 1$ if $r_{ij} \leq \tau$, and $0$ otherwise.

The aggregate rank score is computed as:

$$s_i = \frac{1}{N} \sum^N_{j=1} \delta_{ij}$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in descending order, as more commonly occurring features are considered
more important.

## Exponential rank aggregation

The exponential aggregation method ranks features by the sum of the negative
exponentials of their normalised ranks in instances where they occur within the
set of highly ranked features. This method was originally suggested by Haury et
al. [@Haury2011-zd].

This method uses threshold $\tau$ to designate the highly ranked features. Thus
$\delta_{ij} = 1$ if $r_{ij} \leq \tau$, and $0$ otherwise.

$$s_i = \sum^N_{j=1} \delta_{ij} \exp({-r_{ij} / \tau)}$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in descending order.

## Borda rank aggregation

Borda rank aggregation ranks a feature by the sum of normalised ranks (the borda
score) across all experiments that contain it. In case every experiment contains all
features, the result is equivalent to the mean aggregation method [@Wald2012-zk].

$$s_i = \sum^N_{j=1} \frac{m_j - r_{ij} + 1}{m_j}$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in descending order.

## Enhanced borda rank aggregation

Enhanced borda rank aggregation combines borda rank aggregation with stability
rank aggregation. The borda score is multiplied by the occurrence of the feature
within the set of highly ranked features across all experiments [@Wald2012-zk].

This method uses threshold $\tau$ to designate the highly ranked features for
the purpose of computing the occurrence. Thus $\delta_{ij} = 1$ if
$r_{ij} \leq \tau$, and $0$ otherwise.

$$s_i = \left( \frac{1}{N} \sum^N_{j=1} \delta_{ij} \right) \left( \sum^N_{j=1}
\frac{m_j - r_{ij} + 1}{m_j} \right)$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in descending order.

## Truncated borda rank aggregation

Truncated borda rank aggregation is borda rank aggregation performed with only
the set of most highly ranked features in each instance.

This method uses threshold $\tau$ to designate the highly ranked features. Thus
$\delta_{ij} = 1$ if $r_{ij} \leq \tau$, and $0$ otherwise.

$$s_i = \sum^N_{j=1} \delta_{ij} \frac{\tau - r_{ij} + 1}{\tau}$$

Note that compared to the borda method, the number of ranked features in an
experiment $m_j$ is replaced by threshold $\tau$.

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in descending order.

## Truncated enhanced borda rank aggregation

Truncated enhanced borda rank aggregation is enhanced borda aggregation
performed with only the set of most highly ranked features in each experiment.

This method uses threshold $\tau$ to designate the highly ranked features. Thus
$\delta_{ij} = 1$ if $r_{ij} \leq \tau$, and $0$ otherwise.

$$s_i = \left( \frac{1}{N} \sum^N_{j=1} \delta_{ij} \right) \left( \sum^N_{j=1}
\delta_{ij} \frac{\tau - r_{ij} + 1}{\tau} \right)$$

The aggregate rank of features is then determined by sorting aggregate scores
$s_i$ in descending order.

# References