Package mvpa :: Package datasets :: Module splitters :: Class CustomSplitter
[hide private]
[frames] | no frames]

Class CustomSplitter

source code


Split a dataset using an arbitrary custom rule.

The splitter is configured by passing a custom spitting rule (splitrule) to its constructor. Such a rule is basically a sequence of split definitions. Every single element in this sequence results in excatly one split generated by the Splitter. Each element is another sequence for sequences of sample ids for each dataset that shall be generated in the split.

Example:

Instance Methods [hide private]
 
__init__(self, splitrule, **kwargs)
Cheap init.
source code
 
_getSplitConfig(self, uniqueattrs)
Huka chaka!
source code
 
__str__(self)
String summary over the object
source code

Inherited from Splitter: __call__, setNPerLabel, splitDataset, splitcfg

Inherited from Splitter (private): _setStrategy

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Class Variables [hide private]
  __doc__ = enhancedDocString('CustomSplitter', locals(), Splitter)

Inherited from Splitter: strategy

Inherited from Splitter (private): _NPERLABEL_STR, _STRATEGIES

Instance Variables [hide private]

Inherited from Splitter: count

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, splitrule, **kwargs)
(Constructor)

source code 
Cheap init.
Parameters:
  • nperlabel - Number of dataset samples per label to be included in each split. If given as a float, it must be in [0,1] range and would mean the ratio of selected samples per each label. Two special strings are recognized: 'all' uses all available samples (default) and 'equal' uses the maximum number of samples the can be provided by all of the classes. This value might be provided as a sequence whos length matches the number of datasets per split and indicates the configuration for the respective dataset in each split.
  • nrunspersplit, int - Number of times samples for each split are chosen. This is mostly useful if a subset of the available samples is used in each split and the subset is randomly selected for each run (see the nperlabel argument).
  • permute - If set to True, the labels of each generated dataset will be permuted on a per-chunk basis.
  • count - Desired number of splits to be output. It is limited by the number of splits possible for a given splitter (e.g. OddEvenSplitter can have only up to 2 splits). If None, all splits are output (default).
  • strategy -
    If count is not None, possible strategies are possible:
    first
    First count splits are chosen
    random
    Random (without replacement) count splits are chosen
    equidistant
    Splits which are equidistant from each other
  • discard_boundary - If not None, how many samples on the boundaries between parts of the split to discard in the training part. If int, then discarded in all parts. If a sequence, numbers to discard are given per part of the split. E.g. if splitter splits only into (training, testing) parts, then `discard_boundary`=(2,0) would instruct to discard 2 samples from training which are on the boundary with testing.
  • attr - Sample attribute used to determine splits.
  • reverse - If True, the order of datasets in the split is reversed, e.g. instead of (training, testing), (training, testing) will be spit out
Overrides: object.__init__

_getSplitConfig(self, uniqueattrs)

source code 
Huka chaka!
Overrides: Splitter._getSplitConfig

__str__(self)
(Informal representation operator)

source code 
String summary over the object
Overrides: object.__str__