Multivariate Pattern Analysis in Python |
Inheritance diagram for mvpa.datasets.splitters:
Collection of dataset splitters.
Splitters are destined to split the provided dataset varous ways to simplify cross-validation analysis, implement boosting of the estimates, or sample null-space via permutation testing.
Most of the splitters at the moment split 2-ways – conventionally first part is used for training, and 2nd part for testing by CrossValidatedTransferError and SplitClassifier.
Bases: mvpa.datasets.splitters.Splitter
Split a dataset using an arbitrary custom rule.
The splitter is configured by passing a custom spitting rule (splitrule) to its constructor. Such a rule is basically a sequence of split definitions. Every single element in this sequence results in excatly one split generated by the Splitter. Each element is another sequence for sequences of sample ids for each dataset that shall be generated in the split.
Example:
Generate two splits. In the first split the second dataset contains all samples with sample attributes corresponding to either 0, 1 or 2. The first dataset of the first split contains all samples which are not split into the second dataset.
The second split yields three datasets. The first with all samples corresponding to sample attributes 1 and 2, the second dataset contains only samples with attrbiute 3 and the last dataset contains the samples with attribute 5 and 6.
CustomSplitter([(None, [0, 1, 2]), ([1,2], [3], [5, 6])])
Cheap init.
Parameters: |
|
---|
Bases: mvpa.datasets.splitters.Splitter
Split a dataset into two halves of the sample attribute.
The splitter yields to splits: first (1st half, 2nd half) and second (2nd half, 1st half).
Cheap init.
Parameters: |
|
---|
Bases: mvpa.datasets.splitters.Splitter
Generic N-fold data splitter.
Provide folding splitting. Given a dataset with N chunks, with cvtype=1 (which is default), it would generate N splits, where each chunk sequentially is taken out (with replacement) for cross-validation. Example, if there is 4 chunks, splits for cvtype=1 are:
[[1, 2, 3], [0]] [[0, 2, 3], [1]] [[0, 1, 3], [2]] [[0, 1, 2], [3]]
If cvtype>1, then all possible combinations of cvtype number of chunks are taken out for testing, so for cvtype=2 in previous example:
[[2, 3], [0, 1]] [[1, 3], [0, 2]] [[1, 2], [0, 3]] [[0, 3], [1, 2]] [[0, 2], [1, 3]] [[0, 1], [2, 3]]
Initialize the N-fold splitter.
Parameters: |
|
---|
Bases: mvpa.datasets.splitters.Splitter
Split a dataset into N-groups of the sample attribute.
For example, NGroupSplitter(2) is the same as the HalfSplitter and yields to splits: first (1st half, 2nd half) and second (2nd half, 1st half).
Initialize the N-group splitter.
Parameters: |
|
---|
Bases: mvpa.datasets.splitters.Splitter
This is a dataset splitter that does not split. It simply returns the full dataset that it is called with.
The passed dataset is returned as the second element of the 2-tuple. The first element of that tuple will always be ‘None’.
Cheap init – nothing special
Parameters: |
|
---|
Bases: mvpa.datasets.splitters.Splitter
Split a dataset into odd and even values of the sample attribute.
The splitter yields to splits: first (odd, even) and second (even, odd).
Cheap init.
Parameters: |
|
---|
Bases: object
Base class of dataset splitters.
Each splitter should be initialized with all its necessary parameters. The final splitting is done running the splitter object on a certain Dataset via __call__(). This method has to be implemented like a generator, i.e. it has to return every possible split with a yield() call.
Each split has to be returned as a sequence of Datasets. The properties of the splitted dataset may vary between implementations. It is possible to declare a sequence element as ‘None’.
Please note, that even if there is only one Dataset returned it has to be an element in a sequence and not just the Dataset object!
Initialize splitter base.
Parameters: |
|
---|
Number (max) of splits to output on call
Set the number of samples per label in the split datasets.
‘equal’ sets sample size to highest possible number of samples that can be provided by each class. ‘all’ uses all available samples (default).
Split a dataset by separating the samples where the configured sample attribute matches an element of specs.
Parameters: |
|
---|---|
Returns : | Tuple of splitted datasets. |
Return splitcfg for a given dataset