Package mvpa :: Package misc :: Module data_generators
[hide private]
[frames] | no frames]

Module data_generators

source code

Miscelaneous data generators for unittests and demos
Functions [hide private]
 
multipleChunks(func, n_chunks, *args, **kwargs)
Replicate datasets multiple times raising different chunks
source code
 
dumbFeatureDataset()
Create a very simple dataset with 2 features and 3 labels
source code
 
dumbFeatureBinaryDataset()
Very simple binary (2 labels) dataset
source code
 
normalFeatureDataset(perlabel=50, nlabels=2, nfeatures=4, nchunks=5, means=None, nonbogus_features=None, snr=3.0)
Generate a univariate dataset with normal noise and specified means.
source code
 
pureMultivariateSignal(patterns, signal2noise=1.5, chunks=None)
Create a 2d dataset with a clear multivariate signal, but no univariate information.
source code
 
normalFeatureDataset__(dataset=None, labels=None, nchunks=None, perlabel=50, activation_probability_steps=1, randomseed=None, randomvoxels=False)
NOT FINISHED
source code
 
getMVPattern(s2n)
Simple multivariate dataset
source code
 
wr1996(size=200)
Generate '6d robot arm' dataset (Williams and Rasmussen 1996)
source code
 
sinModulated(n_instances, n_features, flat=False, noise=0.4)
Generate a (quite) complex multidimensional non-linear dataset
source code
 
chirpLinear(n_instances, n_features=4, n_nonbogus_features=2, data_noise=0.4, noise=0.1)
Generates simple dataset for linear regressions
source code
 
linear_awgn(size=10, intercept=0.0, slope=0.4, noise_std=0.01, flat=False)
Generate a dataset from a linear function with AWGN (Added White Gaussian Noise).
source code
 
noisy_2d_fx(size_per_fx, dfx, sfx, center, noise_std=1) source code
 
linear1d_gaussian_noise(size=100, slope=0.5, intercept=1.0, x_min=-2.0, x_max=3.0, sigma=0.2)
A straight line with some Gaussian noise.
source code

Imports: N, Dataset, debug


Function Details [hide private]

multipleChunks(func, n_chunks, *args, **kwargs)

source code 

Replicate datasets multiple times raising different chunks

Given some randomized (noisy) generator of a dataset with a single chunk call generator multiple times and place results into a distinct chunks

normalFeatureDataset(perlabel=50, nlabels=2, nfeatures=4, nchunks=5, means=None, nonbogus_features=None, snr=3.0)

source code 

Generate a univariate dataset with normal noise and specified means.

Probably it is a generalization of pureMultivariateSignal where means=[ [0,1], [1,0] ]

Specify either means or nonbogus_features so means get assigned accordingly

Parameters:
  • perlabel (int) - Number of samples per each label
  • nlabels (int) - Number of labels in the dataset
  • nfeatures (int) - Total number of features (including bogus features which carry no label-related signal)
  • nchunks (int) - Number of chunks (perlabel should be multiple of nchunks)
  • means (None or list of float or ndarray) - Specified means for each of features among nfeatures.
  • nonbogus_features (None or list of int) - Indexes of non-bogus features (1 per label)
  • snr (float) - Signal-to-noise ration assuming that signal has std 1.0 so we just divide random normal noise by snr

wr1996(size=200)

source code 

Generate '6d robot arm' dataset (Williams and Rasmussen 1996)

Was originally created in order to test the correctness of the implementation of kernel ARD. For full details see: http://www.gaussianprocess.org/gpml/code/matlab/doc/regression.html#ard

x_1 picked randomly in [-1.932, -0.453] x_2 picked randomly in [0.534, 3.142] r_1 = 2.0 r_2 = 1.3 f(x_1,x_2) = r_1 cos (x_1) + r_2 cos(x_1 + x_2) + N(0,0.0025) etc.

Expected relevances: ell_1 1.804377 ell_2 1.963956 ell_3 8.884361 ell_4 34.417657 ell_5 1081.610451 ell_6 375.445823 sigma_f 2.379139 sigma_n 0.050835

sinModulated(n_instances, n_features, flat=False, noise=0.4)

source code 

Generate a (quite) complex multidimensional non-linear dataset

Used for regression testing. In the data label is a sin of a x^2 + uniform noise

chirpLinear(n_instances, n_features=4, n_nonbogus_features=2, data_noise=0.4, noise=0.1)

source code 

Generates simple dataset for linear regressions

Generates chirp signal, populates n_nonbogus_features out of n_features with it with different noise level and then provides signal itself with additional noise as labels

linear_awgn(size=10, intercept=0.0, slope=0.4, noise_std=0.01, flat=False)

source code 

Generate a dataset from a linear function with AWGN (Added White Gaussian Noise).

It can be multidimensional if 'slope' is a vector. If flat is True (in 1 dimesion) generate equally spaces samples instead of random ones. This is useful for the test phase.