mvpa.datasets.splitters.Splitter

Each splitter should be initialized with all its necessary parameters. The final splitting is done running the splitter object on a certain Dataset via __call__(). This method has to be implemented like a generator, i.e. it has to return every possible split with a yield() call.

Each split has to be returned as a sequence of Datasets. The properties of the splitted dataset may vary between implementations. It is possible to declare a sequence element as 'None'.

Please note, that even if there is only one Dataset returned it has to be an element in a sequence and not just the Dataset object!

init(self, nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', discard_boundary=None, attr='chunks', reverse=False)
(Constructor)

source code

Initialize splitter base.

Parameters:

nperlabel (int or str (or list of them) or float) - Number of dataset samples per label to be included in each split. If given as a float, it must be in [0,1] range and would mean the ratio of selected samples per each label. Two special strings are recognized: 'all' uses all available samples (default) and 'equal' uses the maximum number of samples the can be provided by all of the classes. This value might be provided as a sequence whos length matches the number of datasets per split and indicates the configuration for the respective dataset in each split.
nrunspersplit, int - Number of times samples for each split are chosen. This is mostly useful if a subset of the available samples is used in each split and the subset is randomly selected for each run (see the nperlabel argument).
permute (bool) - If set to True, the labels of each generated dataset will be permuted on a per-chunk basis.
count (None or int) - Desired number of splits to be output. It is limited by the number of splits possible for a given splitter (e.g. OddEvenSplitter can have only up to 2 splits). If None, all splits are output (default).
strategy (str) -

If count is not None, possible strategies are possible:

first

First count splits are chosen

random

Random (without replacement) count splits are chosen

equidistant

Splits which are equidistant from each other
discard_boundary (None or int or sequence of int) - If not None, how many samples on the boundaries between parts of the split to discard in the training part. If int, then discarded in all parts. If a sequence, numbers to discard are given per part of the split. E.g. if splitter splits only into (training, testing) parts, then `discard_boundary`=(2,0) would instruct to discard 2 samples from training which are on the boundary with testing.
attr (str) - Sample attribute used to determine splits.
reverse (bool) - If True, the order of datasets in the split is reversed, e.g. instead of (training, testing), (training, testing) will be spit out

Overrides: object.__init__

Class Splitter

init(self, nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', discard_boundary=None, attr='chunks', reverse=False)
(Constructor)

setNPerLabel(self, value)

call(self, dataset)
(Call operator)

splitDataset(self, dataset, specs)

str(self)
(Informal representation operator)

strategy

Class Splitter

__init__(self, nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', discard_boundary=None, attr='chunks', reverse=False) (Constructor)

setNPerLabel(self, value)

__call__(self, dataset) (Call operator)

splitDataset(self, dataset, specs)

__str__(self) (Informal representation operator)

strategy

init(self, nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', discard_boundary=None, attr='chunks', reverse=False)
(Constructor)

call(self, dataset)
(Call operator)

str(self)
(Informal representation operator)