Table Of Contents

Previous topic

clfs.libsvmc.svmc

Next topic

clfs.model_selector

This content refers to the previous stable release of PyMVPA. Please visit www.pymvpa.org for the most recent version of PyMVPA and its documentation.

clfs.meta

Module: clfs.meta

Inheritance diagram for mvpa.clfs.meta:

Classes for meta classifiers – classifiers which use other classifiers

Meta Classifiers can be grouped according to their function as

group BoostedClassifiers:
 CombinedClassifier MulticlassClassifier SplitClassifier
group ProxyClassifiers:
 ProxyClassifier BinaryClassifier MappedClassifier FeatureSelectionClassifier
group PredictionsCombiners for CombinedClassifier:
 PredictionsCombiner MaximalVote MeanPrediction

Classes

BinaryClassifier

class mvpa.clfs.meta.BinaryClassifier(clf, poslabels, neglabels, **kwargs)

Bases: mvpa.clfs.meta.ProxyClassifier

ProxyClassifier which maps set of two labels into +1 and -1

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

ProxyClassifier

Parameters:
  • clf (Classifier) – classifier to use
  • poslabels (list) – list of labels which are treated as +1 category
  • neglabels (list) – list of labels which are treated as -1 category
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled

BoostedClassifier

class mvpa.clfs.meta.BoostedClassifier(clfs=None, propagate_states=True, harvest_attribs=None, copy_attribs='copy', **kwargs)

Bases: mvpa.clfs.base.Classifier, mvpa.misc.state.Harvestable

Classifier containing the farm of other classifiers.

Should rarely be used directly. Use one of its childs instead

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • harvested: Store specified attributes of classifiers at each split
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • raw_predictions: Predictions obtained from each classifier
  • raw_values: Values obtained from each classifier
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base classes for more information:

Classifier, Harvestable

Initialize the instance.

Parameters:
  • clfs (list) – list of classifier instances to use (slave classifiers)
  • propagate_states (bool) – either to propagate enabled states into slave classifiers. It is in effect only when slaves get assigned - so if state is enabled not during construction, it would not necessarily propagate into slaves
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • harvest_attribs (list of basestr or dicts) – What attributes of call to store and return within harvested state variable. If an item is a dictionary, following keys are used [‘name’, ‘copy’]
  • copy_attribs (None or basestr) – Default copying. If None – no copying, ‘copy’ - shallow copying, ‘deepcopy’ – deepcopying
clfs

Used classifiers

getSensitivityAnalyzer(**kwargs)

Return an appropriate SensitivityAnalyzer

untrain()

Untrain BoostedClassifier

Has to untrain any known classifier

ClassifierCombiner

class mvpa.clfs.meta.ClassifierCombiner(clf, variables=None)

Bases: mvpa.clfs.meta.PredictionsCombiner

Provides a decision using training a classifier on predictions/values

TODO: implement

Note

Available state variables:

  • predictions+: Trained predictions

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

PredictionsCombiner

Initialize ClassifierCombiner

Parameters:
  • clf (Classifier) – Classifier to train on the predictions
  • variables (list of basestring) – List of state variables stored in ‘combined’ classifiers, which to use as features for training this classifier
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
untrain()

It might be needed to untrain used classifier

CombinedClassifier

class mvpa.clfs.meta.CombinedClassifier(clfs=None, combiner=None, **kwargs)

Bases: mvpa.clfs.meta.BoostedClassifier

BoostedClassifier which combines predictions using some PredictionsCombiner functor.

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • harvested: Store specified attributes of classifiers at each split
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • raw_predictions: Predictions obtained from each classifier
  • raw_values: Values obtained from each classifier
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

BoostedClassifier

Initialize the instance.

Parameters:
  • clfs (list of Classifier) – list of classifier instances to use
  • combiner (PredictionsCombiner) – callable which takes care about combining multiple results into a single one (e.g. maximal vote for classification, MeanPrediction for regression))
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • propagate_states (bool) – either to propagate enabled states into slave classifiers. It is in effect only when slaves get assigned - so if state is enabled not during construction, it would not necessarily propagate into slaves
  • harvest_attribs (list of basestr or dicts) – What attributes of call to store and return within harvested state variable. If an item is a dictionary, following keys are used [‘name’, ‘copy’]
  • copy_attribs (None or basestr) – Default copying. If None – no copying, ‘copy’ - shallow copying, ‘deepcopy’ – deepcopying
NB: combiner might need to operate not on ‘predictions’ descrete
labels but rather on raw ‘class’ values classifiers estimate (which is pretty much what is stored under values
combiner

Used combiner to derive a single result

summary()

Provide summary for the CombinedClassifier.

untrain()

Untrain CombinedClassifier

FeatureSelectionClassifier

class mvpa.clfs.meta.FeatureSelectionClassifier(clf, feature_selection, testdataset=None, **kwargs)

Bases: mvpa.clfs.meta.ProxyClassifier

ProxyClassifier which uses some FeatureSelection prior training.

FeatureSelection is used first to select features for the classifier to use for prediction. Internally it would rely on MappedClassifier which would use created MaskMapper.

TODO: think about removing overhead of retraining the same classifier if feature selection was carried out with the same classifier already. It has been addressed by adding .trained property to classifier, but now we should expclitely use isTrained here if we want... need to think more

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

ProxyClassifier

Initialize the instance

Parameters:
  • clf (Classifier) – classifier based on which mask classifiers is created
  • feature_selection (FeatureSelection) – whatever FeatureSelection comes handy
  • testdataset (Dataset) – optional dataset which would be given on call to feature_selection
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
feature_selection

Used FeatureSelection

getSensitivityAnalyzer(*args_, **kwargs_)
maskclf

Used MappedClassifier

setTestDataset(testdataset)

Set testing dataset to be used for feature selection

testdataset
untrain()

Untrain FeatureSelectionClassifier

Has to untrain any known classifier

MappedClassifier

class mvpa.clfs.meta.MappedClassifier(clf, mapper, **kwargs)

Bases: mvpa.clfs.meta.ProxyClassifier

ProxyClassifier which uses some mapper prior training/testing.

MaskMapper can be used just a subset of features to train/classify. Having such classifier we can easily create a set of classifiers for BoostedClassifier, where each classifier operates on some set of features, e.g. set of best spheres from SearchLight, set of ROIs selected elsewhere. It would be different from simply applying whole mask over the dataset, since here initial decision is made by each classifier and then later on they vote for the final decision across the set of classifiers.

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

ProxyClassifier

Initialize the instance

Parameters:
  • clf (Classifier) – classifier based on which mask classifiers is created
  • mapper – whatever Mapper comes handy
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
getSensitivityAnalyzer(*args_, **kwargs_)
mapper

Used mapper

MaximalVote

class mvpa.clfs.meta.MaximalVote

Bases: mvpa.clfs.meta.PredictionsCombiner

Provides a decision using maximal vote rule

Note

Available state variables:

  • all_label_counts: Counts across classifiers for each label/sample
  • predictions+: Voted predictions

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

PredictionsCombiner

XXX Might get a parameter to use raw decision values if voting is not unambigous (ie two classes have equal number of votes

Parameters:
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled

MeanPrediction

class mvpa.clfs.meta.MeanPrediction(descr=None, **kwargs)

Bases: mvpa.clfs.meta.PredictionsCombiner

Provides a decision by taking mean of the results

Note

Available state variables:

  • predictions+: Mean predictions

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

PredictionsCombiner

MulticlassClassifier

class mvpa.clfs.meta.MulticlassClassifier(clf, bclf_type='1-vs-1', **kwargs)

Bases: mvpa.clfs.meta.CombinedClassifier

CombinedClassifier to perform multiclass using a list of BinaryClassifier.

such as 1-vs-1 (ie in pairs like libsvm doesn) or 1-vs-all (which is yet to think about)

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • harvested: Store specified attributes of classifiers at each split
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • raw_predictions: Predictions obtained from each classifier
  • raw_values: Values obtained from each classifier
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

CombinedClassifier

Initialize the instance

Parameters:
  • clf (Classifier) – classifier based on which multiple classifiers are created for multiclass
  • bclf_type – “1-vs-1” or “1-vs-all”, determines the way to generate binary classifiers
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • clfs (list of Classifier) – list of classifier instances to use
  • combiner (PredictionsCombiner) – callable which takes care about combining multiple results into a single one (e.g. maximal vote for classification, MeanPrediction for regression))
  • propagate_states (bool) – either to propagate enabled states into slave classifiers. It is in effect only when slaves get assigned - so if state is enabled not during construction, it would not necessarily propagate into slaves
  • harvest_attribs (list of basestr or dicts) – What attributes of call to store and return within harvested state variable. If an item is a dictionary, following keys are used [‘name’, ‘copy’]
  • copy_attribs (None or basestr) – Default copying. If None – no copying, ‘copy’ - shallow copying, ‘deepcopy’ – deepcopying

PredictionsCombiner

class mvpa.clfs.meta.PredictionsCombiner(descr=None, **kwargs)

Bases: mvpa.misc.state.ClassWithCollections

Base class for combining decisions of multiple classifiers

train(clfs, dataset)

PredictionsCombiner might need to be trained

Parameters:
  • clfs (list of Classifier) – List of classifiers to combine. Has to be classifiers (not pure predictions), since combiner might use some other state variables (value’s) instead of pure prediction’s
  • dataset (Dataset) – training data in this case

ProxyClassifier

class mvpa.clfs.meta.ProxyClassifier(clf, **kwargs)

Bases: mvpa.clfs.base.Classifier

Classifier which decorates another classifier

Possible uses:

  • modify data somehow prior training/testing: * normalization * feature selection * modification
  • optimized classifier?

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

Classifier

Initialize the instance

Parameters:
  • clf (Classifier) – classifier based on which mask classifiers is created
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
clf

Used Classifier

getSensitivityAnalyzer(*args_, **kwargs_)
summary()
untrain()

Untrain ProxyClassifier

SplitClassifier

class mvpa.clfs.meta.SplitClassifier(clf, splitter=<mvpa.datasets.splitters.NFoldSplitter object at 0x4869450>, **kwargs)

Bases: mvpa.clfs.meta.CombinedClassifier

BoostedClassifier to work on splits of the data

Note

Available state variables:

  • confusion: Resultant confusion whenever classifier trained on 1 part and tested on 2nd part of each split
  • feature_ids: Feature IDS which were used for the actual training.
  • harvested: Store specified attributes of classifiers at each split
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • raw_predictions: Predictions obtained from each classifier
  • raw_values: Values obtained from each classifier
  • splits: Store the actual splits of the data. Can be memory expensive
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

CombinedClassifier

Initialize the instance

Parameters:
  • clf (Classifier) – classifier based on which multiple classifiers are created for multiclass
  • splitter (Splitter) – Splitter to use to split the dataset prior training
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • clfs (list of Classifier) – list of classifier instances to use
  • combiner (PredictionsCombiner) – callable which takes care about combining multiple results into a single one (e.g. maximal vote for classification, MeanPrediction for regression))
  • propagate_states (bool) – either to propagate enabled states into slave classifiers. It is in effect only when slaves get assigned - so if state is enabled not during construction, it would not necessarily propagate into slaves
  • harvest_attribs (list of basestr or dicts) – What attributes of call to store and return within harvested state variable. If an item is a dictionary, following keys are used [‘name’, ‘copy’]
  • copy_attribs (None or basestr) – Default copying. If None – no copying, ‘copy’ - shallow copying, ‘deepcopy’ – deepcopying
getSensitivityAnalyzer(*args_, **kwargs_)
splitter

Splitter user by SplitClassifier

TreeClassifier

class mvpa.clfs.meta.TreeClassifier(clf, groups, **kwargs)

Bases: mvpa.clfs.meta.ProxyClassifier

TreeClassifier which allows to create hierarchy of classifiers

Functions by grouping some labels into a single “meta-label” and training classifier first to separate between meta-labels. Then each group further proceeds with classification within each group.

Possible scenarios:

TreeClassifier(SVM(),
 {'animate':  ((1,2,3,4),
               TreeClassifier(SVM(),
                   {'human': (('male', 'female'), SVM()),
                    'animals': (('monkey', 'dog'), SMLR())})),
  'inanimate': ((5,6,7,8), SMLR())})

would create classifier which would first do binary classification to separate animate from inanimate, then for animate result it would separate to classify human vs animal and so on:

                 SVM
               /                                  animate   inanimate
           /                                        SVM             SMLR
       /     \          / | \                     human    animal      5  6 7  8
   |          |
  SVM        SVM
 /   \       /                   male female monkey dog
1      2    3      4

If it is desired to have a trailing node with a single label and thus without any classification, such as in

SVM

/ g1 g2

/ 1 SVM
/ 2 3

then just specify None as the classifier to use:

TreeClassifier(SVM(),
   {'g1':  ((1,), None),
    'g2':  ((1,2,3,4), SVM())})

Note

Available state variables:

  • feature_ids: Feature IDS which were used for the actual training.
  • predicting_time+: Time (in seconds) which took classifier to predict
  • predictions+: Most recent set of predictions
  • trained_dataset: The dataset it has been trained on
  • trained_labels+: Set of unique labels it has been trained on
  • trained_nsamples+: Number of samples it has been trained on
  • training_confusion: Confusion matrix of learning performance
  • training_time+: Time (in seconds) which took classifier to train
  • values+: Internal classifier values the most recent predictions are based on

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:

ProxyClassifier

Initialize TreeClassifier

Parameters:
  • clf (Classifier) – Classifier to separate between the groups
  • groups (dict of meta-label: tuple of (tuple of labels, classifier)) – Defines the groups of labels and their classifiers. See TreeClassifier for example
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
clfs = None

Dictionary of classifiers used by the groups

summary()

Provide summary for the TreeClassifier.

untrain()

Untrain TreeClassifier