Table Of Contents

Previous topic

Measures

Next topic

Miscellaneous

This content refers to the previous stable release of PyMVPA. Please visit www.pymvpa.org for the most recent version of PyMVPA and its documentation.

Feature Selection

This section has been contributed by James M. Hughes.

It is often the case in machine learning problems that we wish to reduce a feature space of high dimensionality into something more manageable by selecting only those features that contribute most to classification performance. Feature selection methods attempt to achieve this goal in an algorithmic fashion.

PyMVPA’s flexible framework allows various feature selection methods to take place within a small block of code. FeatureSelectionClassifier extends the basic classifier framework to allow for the use of arbitrary methods of feature selection according to whatever ranking metric, feature selection criteria, and stopping criterion the user chooses for a given application. Examples of the code/classification algorithms presented here can be found in mvpa/clfs/warehouse.py.

More formally, a FeatureSelectionClassifier is a meta-classifier. That is, it is not a classifier itself – it can take any slave Classifier, perform some feature selection in advance, select those features, and then train the provided slave Classifier on those features. Externally, however, it looks like a Classifier, in that it fulfills the specialization of the Classifier base class. The following are the relevant arguments to the constructor of such a Classifier:

clf: Classifier
classifier based on which mask classifiers is created
feature_selection: FeatureSelection
whatever feature selection is considered best
testdataset: Dataset (optional)
dataset which would be given on call to feature_selection

Let us turn out attention to the second argument, FeatureSelection. As noted above, this feature selection can be arbitrary and should be chosen appropriately for the task at hand. For example, we could perform a one-way ANOVA statistic to select features, then keep only the most important 5% of them. It is crucial to note that, in PyMVPA, the way in which features are selected (in this example by keeping only 5% of them) is wholly independent of the way features are ranked (in this example, by using a one-way ANOVA). Feature selection using this method could be accomplished using the following code (from mvpa/clfs/warehouse.py):

>>> from mvpa.suite import *
>>> FeatureSelection = SensitivityBasedFeatureSelection(
...     OneWayAnova(),
...     FractionTailSelector(0.05, mode='select', tail='upper'))

A more interesting analysis is one in which we use the weights (hyperplane coefficients) to rank features. This allows us to use the same classifier to train the selected features as we used to select them:

It bears mentioning at this point that caution must be exercised when selecting features. The process of feature selection must be performed on an independent training dataset: it is not possible to select features using the entire dataset, re-train a classifier on a subset of the original data (but using only the selected features) and then test on a held-out testing dataset. This results in an obvious positive bias in classification performance. PyMVPA allows for easy dataset splitting, however, so creating independent training and testing datasets is easily accomplished, for instance using an NFoldSplitter, OddEvenSplitter, etc.

Recursive Feature Elimination

Recursive feature elimination (RFE, applied to fMRI data in (Hanson et al., 2008)) is a technique that falls under the larger umbrella of feature selection. Recursive feature elimination specifically attempts to reduce the number of selected features used for classification in the following way:

  • A classifier is trained on a subset of the data and features are ranked according to an arbitrary metric.
  • Some amount of those features is either selected or discarded according to a pre-selected rule.
  • The classifier is retrained and features are once again ranked; this process continues until some criterion determined textit{a priori} (such as classification error) is reached.
  • One or more classifiers trained only on the final set of selected features are used on a generalization dataset and performance is calculated.

PyMVPA’s flexible framework allows each of these steps to take place within a small block of code. To actually perform recursive feature elimination, we consider two separate analysis scenarios that deal with a pre-selected training dataset:

  • We split the training dataset into an arbitrary number of independent datasets and perform RFE on each of these; the sensitivity analysis of features is performed independently for each split and features are selected based on those independent measures.
  • We split the training dataset into an arbitrary number of independent datasets (as before), but we average the feature sensitivities and select which features to prune/select based on that one average measure.

We will concentrate on the second approach. The following code can be used to perform such an analysis:

>>> rfesvm_split = SplitClassifier(LinearCSVMC())
>>> clf = \
...  FeatureSelectionClassifier(
...   clf = LinearCSVMC(),
...   # on features selected via RFE
...   feature_selection = RFE(
...       # based on sensitivity of a clf which does splitting internally
...       sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(),
...       transfer_error=ConfusionBasedError(
...          rfesvm_split,
...          confusion_state="confusion"),
...          # and whose internal error we use
...       feature_selector=FractionTailSelector(
...                          0.2, mode='discard', tail='lower'),
...                          # remove 20% of features at each step
...       update_sensitivity=True),
...       # update sensitivity at each step
...   descr='LinSVM+RFE(splits_avg)' )

The code above introduces the SplitClassifier, which in this case is yet another meta-classifier that takes in a Classifier (in this case a LinearCSVMC) and an arbitrary Splitter object, so that the dataset can be split in whatever way the user desires. Prior to training, the SplitClassifier splits the training dataset, dedicates a separate classifier to each split, trains each on the training part of the split, and then computes transfer error on the testing part of the split. If a SplitClassifier instance is later on asked to predict some new data, it uses (by default) the MaximalVote strategy to derive an answer. A summary about the performance of a SplitClassifier internally on each split of the training dataset is available by accessing the confusion state variable.

To summarize somewhat, RFE is just one method of feature selection, so we use a FeatureSelectionClassifier to facilitate this. To parameterize the RFE process, we refer above to the following:

sensitivity_analyzer
in this case just the default from a linear C-SVM (the SVM weights), taken as an average over all splits (in accordance with scenario 2 as above)
transfer_error
confusion-based error that relies on the confusion matrices computed during splitting of the dataset by the SplitClassifier; this is used to provide a value that can be compared against a stopping criterion to stop eliminating features
feature_selector
in this example we simply discard the 20% of features deemed least important
update_sensitivity
true to retrain the classifiers each time we eliminate features; should be false if a non-classifier-based sensitivity measure (such as one-way ANOVA) is used

As has been shown, recursive feature elimination is an easy-to-implement, flexible, and powerful tool within the PyMVPA framework. Various ranking methods for selecting features have been discussed. Additionally, several analysis scenarios have been presented, along with enough requisite knowledge that the user can plug in whatever classifiers, error metrics, or sensitivity measures are most appropriate for the task at hand.