Multivariate Pattern Analysis in Python |
Bases: object
The Dataset.
This class provides a container to store all necessary data to perform MVPA analyses. These are the data samples, as well as the labels associated with the samples. Additionally, samples can be grouped into chunks.
Groups : |
|
---|
Important: labels assumed to be immutable, i.e. no one should modify them externally by accessing indexed items, ie something like dataset.labels[1] += 100 should not be used. If a label has to be modified, full copy of labels should be obtained, operated on, and assigned back to the dataset, otherwise dataset.uniquelabels would not work. The same applies to any other attribute which has corresponding unique* access property.
Initialize dataset instance
There are basically two different way to create a dataset:
Create a new dataset from samples and sample attributes. In this mode a two-dimensional ndarray has to be passed to the samples keyword argument and the corresponding samples attributes are provided via the labels and chunks arguments.
The second way is used internally to perform quick coyping of datasets, e.g. when performing feature selection. In this mode and the two dictionaries (data and dsattr) are required. For performance reasons this mode bypasses most of the sanity check performed by the previous mode, as for internal operations data integrity is assumed.
Parameters: |
|
---|---|
Keywords : |
|
Each of the Keywords arguments overwrites what is/might be already in the data container.
Apply a function to each row of the samples matrix of a dataset.
The functor given as fx has to honour an axis keyword argument in the way that NumPy used it (e.g. NumPy.mean, var).
Return type: | a new Dataset object with the aggregated feature(s). |
---|
Obtain new dataset by applying mappers over features and/or samples.
While featuresmappers leave the sample attributes information unchanged, as the number of samples in the dataset is invariant, samplesmappers are also applied to the samples attributes themselves!
Applying a featuresmapper will destroy any feature grouping information.
Parameters: |
|
---|
Change chunking of the dataset
Group chunks into groups to match desired number of chunks. Makes sense if originally there were no strong groupping into chunks or each sample was independent, thus belonged to its own chunk
Parameters: |
|
---|
Returns a boolean mask with all features in ids selected.
Parameters: | ids (list or 1d array) – To be selected features ids. |
---|---|
Return type: | ndarray |
Returns: | All selected features are set to True; False otherwise. |
Returns feature ids corresponding to non-zero elements in the mask.
Parameters: | mask (1d ndarray) – Feature mask. |
---|---|
Return type: | ndarray |
Returns: | Ids of non-zero (non-False) mask elements. |
Create a copy (clone) of the dataset, by fully copying current one
Keywords : |
|
---|
Assign definition to featuregroups
XXX Feature-groups was not finished to be useful
Given a dataset, detrend the data inplace either entirely or per each chunk
Parameters: |
|
---|
Stored labels map (if any)
Number of features per pattern.
Currently available number of patterns.
Select a random set of samples.
If ‘nperlabel’ is an integer value, the specified number of samples is randomly choosen from the group of samples sharing a unique label value ( total number of selected samples: nperlabel x len(uniquelabels).
If ‘nperlabel’ is a list which’s length has to match the number of unique label values. In this case ‘nperlabel’ specifies the number of samples that shall be selected from the samples with the corresponding label.
The method returns a Dataset object containing the selected samples.
Returns an array with the number of samples per label in each chunk.
Array shape is (chunks x labels).
Parameters: | dataset (Dataset) – Source dataset. |
---|
To verify if dataset is in the same state as when smth else was done
Like if classifier was trained on the same dataset as in question
Find samples which are on the boundaries of the blocks
Such samples might need to be removed. By default (with prior=0, post=0) ids of the first samples in a ‘block’ are reported
Parameters: |
|
---|
Universal indexer to obtain indexes of interesting samples/features. See .select() for more information
Return : | tuple of (samples indexes, features indexes). Each item could be also None, if no selection on samples or features was requested (to discriminate between no selected items, and no selections) |
---|
Stored labels map (if any)
Number of features per pattern.
Currently available number of patterns.
Permute the labels.
TODO: rename status into something closer in semantics.
Parameters: |
|
---|
Returns a new dataset with all invariant features removed.
Universal selector
WARNING: if you need to select duplicate samples (e.g. samples=[5,5]) or order of selected samples of features is important and has to be not ordered (e.g. samples=[3,2,1]), please use selectFeatures or selectSamples functions directly
Mimique plain selectSamples:
dataset.select([1,2,3])
dataset[[1,2,3]]
Mimique plain selectFeatures:
dataset.select(slice(None), [1,2,3])
dataset.select('all', [1,2,3])
dataset[:, [1,2,3]]
Mixed (select features and samples):
dataset.select([1,2,3], [1, 2])
dataset[[1,2,3], [1, 2]]
Select samples matching some attributes:
dataset.select(labels=[1,2], chunks=[2,4])
dataset.select('labels', [1,2], 'chunks', [2,4])
dataset['labels', [1,2], 'chunks', [2,4]]
Mixed – out of first 100 samples, select only those with labels 1 or 2 and belonging to chunks 2 or 4, and select features 2 and 3:
dataset.select(slice(0,100), [2,3], labels=[1,2], chunks=[2,4])
dataset[:100, [2,3], 'labels', [1,2], 'chunks', [2,4]]
Select a number of features from the current set.
Parameters: |
|
---|
WARNING: The order of ids determines the order of features in the returned dataset. This might be useful sometimes, but can also cause major headaches! Order would is verified when running in non-optimized code (if __debug__)
Choose a subset of samples defined by samples IDs.
Returns a new dataset object containing the selected sample subset.
TODO: yoh, we might need to sort the mask if the mask is a list of ids and is not ordered. Clarify with Michael what is our intent here!
Set labels map.
Checks for the validity of the mapping – values should cover all existing labels in the dataset
Set the data type of the samples array.
String summary over the object
Parameters: |
|
---|
Provide summary statistics over the labels and chunks
Parameters: |
|
---|
Obtain indexes of interesting samples/features. See select() for more information
XXX somewhat obsoletes idsby...
Z-Score the samples of a Dataset (in-place).
mean and std can be used to pass custom values to the z-scoring. Both may be scalars or arrays.
All computations are done in place. Data upcasting is done automatically if necessary into targetdtype
If baselinelabels provided, and mean or std aren’t provided, it would compute the corresponding measure based only on labels in baselinelabels
If perchunk is True samples within the same chunk are z-scored independent of samples from other chunks, e.i. mean and standard deviation are calculated individually.
Decorator to easily bind functions to a Dataset class