Multivariate Pattern Analysis in Python |
This section has been contributed by James M. Hughes.
It is often the case in machine learning problems that we wish to reduce a feature space of high dimensionality into something more manageable by selecting only those features that contribute most to classification performance. Feature selection methods attempt to achieve this goal in an algorithmic fashion.
PyMVPA’s flexible framework allows various feature selection methods to take place within a small block of code. FeatureSelectionClassifier extends the basic classifier framework to allow for the use of arbitrary methods of feature selection according to whatever ranking metric, feature selection criteria, and stopping criterion the user chooses for a given application. Examples of the code/classification algorithms presented here can be found in mvpa/clfs/warehouse.py.
More formally, a FeatureSelectionClassifier is a meta-classifier. That is, it is not a classifier itself – it can take any slave Classifier, perform some feature selection in advance, select those features, and then train the provided slave Classifier on those features. Externally, however, it looks like a Classifier, in that it fulfills the specialization of the Classifier base class. The following are the relevant arguments to the constructor of such a Classifier:
Let us turn out attention to the second argument, FeatureSelection. As noted above, this feature selection can be arbitrary and should be chosen appropriately for the task at hand. For example, we could perform a one-way ANOVA statistic to select features, then keep only the most important 5% of them. It is crucial to note that, in PyMVPA, the way in which features are selected (in this example by keeping only 5% of them) is wholly independent of the way features are ranked (in this example, by using a one-way ANOVA). Feature selection using this method could be accomplished using the following code (from mvpa/clfs/warehouse.py):
>>> from mvpa.suite import *
>>> FeatureSelection = SensitivityBasedFeatureSelection(
... OneWayAnova(),
... FractionTailSelector(0.05, mode='select', tail='upper'))
A more interesting analysis is one in which we use the weights (hyperplane coefficients) to rank features. This allows us to use the same classifier to train the selected features as we used to select them:
It bears mentioning at this point that caution must be exercised when selecting features. The process of feature selection must be performed on an independent training dataset: it is not possible to select features using the entire dataset, re-train a classifier on a subset of the original data (but using only the selected features) and then test on a held-out testing dataset. This results in an obvious positive bias in classification performance. PyMVPA allows for easy dataset splitting, however, so creating independent training and testing datasets is easily accomplished, for instance using an NFoldSplitter, OddEvenSplitter, etc.
Recursive feature elimination (RFE, applied to fMRI data in (Hanson et al., 2008)) is a technique that falls under the larger umbrella of feature selection. Recursive feature elimination specifically attempts to reduce the number of selected features used for classification in the following way:
PyMVPA’s flexible framework allows each of these steps to take place within a small block of code. To actually perform recursive feature elimination, we consider two separate analysis scenarios that deal with a pre-selected training dataset:
We will concentrate on the second approach. The following code can be used to perform such an analysis:
>>> rfesvm_split = SplitClassifier(LinearCSVMC())
>>> clf = \
... FeatureSelectionClassifier(
... clf = LinearCSVMC(),
... # on features selected via RFE
... feature_selection = RFE(
... # based on sensitivity of a clf which does splitting internally
... sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(),
... transfer_error=ConfusionBasedError(
... rfesvm_split,
... confusion_state="confusion"),
... # and whose internal error we use
... feature_selector=FractionTailSelector(
... 0.2, mode='discard', tail='lower'),
... # remove 20% of features at each step
... update_sensitivity=True),
... # update sensitivity at each step
... descr='LinSVM+RFE(splits_avg)' )
The code above introduces the SplitClassifier, which in this case is yet another meta-classifier that takes in a Classifier (in this case a LinearCSVMC) and an arbitrary Splitter object, so that the dataset can be split in whatever way the user desires. Prior to training, the SplitClassifier splits the training dataset, dedicates a separate classifier to each split, trains each on the training part of the split, and then computes transfer error on the testing part of the split. If a SplitClassifier instance is later on asked to predict some new data, it uses (by default) the MaximalVote strategy to derive an answer. A summary about the performance of a SplitClassifier internally on each split of the training dataset is available by accessing the confusion state variable.
To summarize somewhat, RFE is just one method of feature selection, so we use a FeatureSelectionClassifier to facilitate this. To parameterize the RFE process, we refer above to the following:
As has been shown, recursive feature elimination is an easy-to-implement, flexible, and powerful tool within the PyMVPA framework. Various ranking methods for selecting features have been discussed. Additionally, several analysis scenarios have been presented, along with enough requisite knowledge that the user can plug in whatever classifiers, error metrics, or sensitivity measures are most appropriate for the task at hand.