Package mvpa :: Package datasets :: Module splitters :: Class Splitter
[hide private]
[frames] | no frames]

Class Splitter

source code


Base class of dataset splitters.

Each splitter should be initialized with all its necessary parameters. The final splitting is done running the splitter object on a certain Dataset via __call__(). This method has to be implemented like a generator, i.e. it has to return every possible split with a yield() call.

Each split has to be returned as a sequence of Datasets. The properties of the splitted dataset may vary between implementations. It is possible to declare a sequence element as 'None'.

Please note, that even if there is only one Dataset returned it has to be an element in a sequence and not just the Dataset object!

Instance Methods [hide private]
 
__init__(self, nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', discard_boundary=None, attr='chunks', reverse=False)
Initialize splitter base.
source code
 
_setStrategy(self, strategy)
Set strategy to select splits out from available
source code
 
setNPerLabel(self, value)
Set the number of samples per label in the split datasets.
source code
 
_getSplitConfig(self, uniqueattr)
Each subclass has to implement this method. It gets a sequence with the unique attribte ids of a dataset and has to return a list of lists containing attribute ids to split into the second dataset.
source code
 
__call__(self, dataset)
Splits the dataset.
source code
 
splitDataset(self, dataset, specs)
Split a dataset by separating the samples where the configured sample attribute matches an element of specs.
source code
 
__str__(self)
String summary over the object
source code
 
splitcfg(self, dataset)
Return splitcfg for a given dataset
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Class Variables [hide private]
  _STRATEGIES = 'first', 'random', 'equidistant'
  _NPERLABEL_STR = ['equal', 'all']
  __doc__ = enhancedDocString('Splitter', locals())
  strategy = property(fget= lambda self: self.__strategy, fset= ...
Instance Variables [hide private]
  count
Number (max) of splits to output on call
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, nperlabel='all', nrunspersplit=1, permute=False, count=None, strategy='equidistant', discard_boundary=None, attr='chunks', reverse=False)
(Constructor)

source code 
Initialize splitter base.
Parameters:
  • nperlabel (int or str (or list of them) or float) - Number of dataset samples per label to be included in each split. If given as a float, it must be in [0,1] range and would mean the ratio of selected samples per each label. Two special strings are recognized: 'all' uses all available samples (default) and 'equal' uses the maximum number of samples the can be provided by all of the classes. This value might be provided as a sequence whos length matches the number of datasets per split and indicates the configuration for the respective dataset in each split.
  • nrunspersplit, int - Number of times samples for each split are chosen. This is mostly useful if a subset of the available samples is used in each split and the subset is randomly selected for each run (see the nperlabel argument).
  • permute (bool) - If set to True, the labels of each generated dataset will be permuted on a per-chunk basis.
  • count (None or int) - Desired number of splits to be output. It is limited by the number of splits possible for a given splitter (e.g. OddEvenSplitter can have only up to 2 splits). If None, all splits are output (default).
  • strategy (str) -
    If count is not None, possible strategies are possible:
    first

    First count splits are chosen

    random

    Random (without replacement) count splits are chosen

    equidistant

    Splits which are equidistant from each other

  • discard_boundary (None or int or sequence of int) - If not None, how many samples on the boundaries between parts of the split to discard in the training part. If int, then discarded in all parts. If a sequence, numbers to discard are given per part of the split. E.g. if splitter splits only into (training, testing) parts, then `discard_boundary`=(2,0) would instruct to discard 2 samples from training which are on the boundary with testing.
  • attr (str) - Sample attribute used to determine splits.
  • reverse (bool) - If True, the order of datasets in the split is reversed, e.g. instead of (training, testing), (training, testing) will be spit out
Overrides: object.__init__

setNPerLabel(self, value)

source code 

Set the number of samples per label in the split datasets.

'equal' sets sample size to highest possible number of samples that can be provided by each class. 'all' uses all available samples (default).

__call__(self, dataset)
(Call operator)

source code 

Splits the dataset.

This method behaves like a generator.

splitDataset(self, dataset, specs)

source code 
Split a dataset by separating the samples where the configured sample attribute matches an element of specs.
Parameters:
  • dataset (Dataset) - This is this source dataset.
  • specs (sequence of sequences) - Contains ids of a sample attribute that shall be split into the another dataset.
Returns:
Tuple of splitted datasets.

__str__(self)
(Informal representation operator)

source code 
String summary over the object
Overrides: object.__str__

Class Variable Details [hide private]

strategy

Value:
property(fget= lambda self: self.__strategy, fset= _setStrategy)