Table Of Contents

Previous topic


Next topic


This content refers to the previous stable release of PyMVPA. Please visit for the most recent version of PyMVPA and its documentation.


Module: clfs.stats

Inheritance diagram for mvpa.clfs.stats:

Estimator for classifier error distributions.



class mvpa.clfs.stats.AdaptiveNormal(dist, **kwargs)

Bases: mvpa.clfs.stats.AdaptiveNullDist

Adaptive Normal Distribution: params are (0, sqrt(1/nfeatures))


Available state variables:

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:


  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.


class mvpa.clfs.stats.AdaptiveNullDist(dist, **kwargs)

Bases: mvpa.clfs.stats.FixedNullDist

Adaptive distribution which adjusts parameters according to the data

WiP: internal implementation might change


Available state variables:

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:


  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.
fit(measure, wdata, vdata=None)

Cares about dimensionality of the feature space in measure


class mvpa.clfs.stats.AdaptiveRDist(dist, **kwargs)

Bases: mvpa.clfs.stats.AdaptiveNullDist

Adaptive rdist: params are (nfeatures-1, 0, 1)


Available state variables:

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:


  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.


class mvpa.clfs.stats.FixedNullDist(dist, **kwargs)

Bases: mvpa.clfs.stats.NullDist

Proxy/Adaptor class for SciPy distributions.

All distributions from SciPy’s ‘stats’ module can be used with this class.

>>> import numpy as N
>>> from scipy import stats
>>> from mvpa.clfs.stats import FixedNullDist
>>> dist = FixedNullDist(stats.norm(loc=2, scale=4))
>>> dist.p(2)
>>> dist.cdf(N.arange(5))
array([ 0.30853754,  0.40129367,  0.5       ,  0.59870633,  0.69146246])
>>> dist = FixedNullDist(stats.norm(loc=2, scale=4), tail='right')
>>> dist.p(N.arange(5))
array([ 0.69146246,  0.59870633,  0.5       ,  0.40129367,  0.30853754])


Available state variables:

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:


  • dist (distribution object) – This can be any object the has a cdf() method to report the cumulative distribition function values.
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.

Return value of the cumulative distribution function at x.

fit(measure, wdata, vdata=None)

Does nothing since the distribution is already fixed.


class mvpa.clfs.stats.MCNullDist(dist_class=<class 'mvpa.clfs.stats.Nonparametric'>, permutations=100, **kwargs)

Bases: mvpa.clfs.stats.NullDist

Null-hypothesis distribution is estimated from randomly permuted data labels.

The distribution is estimated by calling fit() with an appropriate DatasetMeasure or TransferError instance and a training and a validation dataset (in case of a TransferError). For a customizable amount of cycles the training data labels are permuted and the corresponding measure computed. In case of a TransferError this is the error when predicting the correct labels of the validation dataset.

The distribution can be queried using the cdf() method, which can be configured to report probabilities/frequencies from left or right tail, i.e. fraction of the distribution that is lower or larger than some critical value.

This class also supports FeaturewiseDatasetMeasure. In that case cdf() returns an array of featurewise probabilities/frequencies.


Available state variables:

  • dist_samples: Samples obtained for each permutation

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:


Initialize Monte-Carlo Permutation Null-hypothesis testing

  • dist_class (class) – This can be any class which provides parameters estimate using fit() method to initialize the instance, and provides cdf(x) method for estimating value of x in CDF. All distributions from SciPy’s ‘stats’ module can be used.
  • permutations (int) – This many permutations of label will be performed to determine the distribution under the null hypothesis.
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled
  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.

Return value of the cumulative distribution function at x.


Clean stored distributions

Storing all of the distributions might be too expensive (e.g. in case of Nonparametric), and the scope of the object might be too broad to wait for it to be destroyed. Clean would bind dist_samples to empty list to let gc revoke the memory.

fit(measure, wdata, vdata=None)

Fit the distribution by performing multiple cycles which repeatedly permuted labels in the training dataset.

  • measure ((Featurewise)`DatasetMeasure` | TransferError) – TransferError instance used to compute all errors.
  • wdata (Dataset which gets permuted and used to compute the) – measure/transfer error multiple times.
  • vdata (Dataset used for validation.) – If provided measure is assumed to be a TransferError and working and validation dataset are passed onto it.


class mvpa.clfs.stats.Nonparametric(dist_samples, correction='clip')

Bases: object

Non-parametric 1d distribution – derives cdf based on stored values.

Introduced to complement parametric distributions present in scipy.stats.

  • dist_samples (ndarray) – Samples to be used to assess the distribution.
  • correction ({‘clip’} or None, optional) – Determines the behavior when .cdf is queried. If None – no correction is made. If ‘clip’ – values are clipped to lie in the range [1/(N+2), (N+1)/(N+2)] (simply because non-parametric assessment lacks the power to resolve with higher precision in the tails, so ‘imagery’ samples are placed in each of the two tails).

Returns the cdf value at x.

static fit(dist_samples)


class mvpa.clfs.stats.NullDist(tail='both', **kwargs)

Bases: mvpa.misc.state.ClassWithCollections

Base class for null-hypothesis testing.


Available state variables:

(States enabled by default are listed with +)

See also

Please refer to the documentation of the base class for more information:


Cheap initialization.

  • tail (str (‘left’, ‘right’, ‘any’, ‘both’)) – Which tail of the distribution to report. For ‘any’ and ‘both’ it chooses the tail it belongs to based on the comparison to p=0.5. In the case of ‘any’ significance is taken like in a one-tailed test.
  • enable_states (None or list of basestring) – Names of the state variables which should be enabled additionally to default ones
  • disable_states (None or list of basestring) – Names of the state variables which should be disabled

Implementations return the value of the cumulative distribution function (left or right tail dpending on the setting).

fit(measure, wdata, vdata=None)

Implement to fit the distribution to the data.

p(x, **kwargs)

Returns the p-value for values of x. Returned values are determined left, right, or from any tail depending on the constructor setting.

In case a FeaturewiseDatasetMeasure was used to estimate the distribution the method returns an array. In that case x can be a scalar value or an array of a matching shape.




Cheater for human beings – wraps dist if needed with some NullDist

tail and other arguments are assumed to be default as in NullDist/MCNullDist

mvpa.clfs.stats.nanmean(x, axis=0)

Compute the mean over the given axis ignoring nans.

  • x (ndarray) – input array
  • axis (int) – axis along which the mean is computed.
Results :
m : float

the mean.