Multivariate Pattern Analysis in Python |
The following example runs Gaussian Process Regression (GPR) on a simple 1D dataset using squared exponential (i.e., Gaussian or RBF) kernel and different hyperparameters. The resulting classifier solutions are finally visualized in a single figure.
As usual we start by importing all of PyMVPA:
# Lets use LaTeX for proper rendering of greek
from matplotlib import rc
rc('text', usetex=True)
from mvpa.suite import *
The next lines build two datasets using one of PyMVPA’s data generators.
# Generate dataset for training:
train_size = 40
F = 1
dataset = data_generators.sinModulated(train_size, F)
# Generate dataset for testing:
test_size = 100
dataset_test = data_generators.sinModulated(test_size, F, flat=True)
The last configuration step is the definition of four sets of hyperparameters to be used for GPR.
# Hyperparameters. Each row is [sigma_f, length_scale, sigma_noise]
hyperparameters = N.array([[1.0, 0.2, 0.4],
[1.0, 0.1, 0.1],
[1.0, 1.0, 0.1],
[1.0, 0.1, 1.0]])
The plotting of the final figure and the actually GPR runs are performed in a single loop.
rows = 2
columns = 2
P.figure(figsize=(12, 12))
for i in range(rows*columns):
P.subplot(rows, columns, i+1)
regression = True
logml = True
data_train = dataset.samples
label_train = dataset.labels
data_test = dataset_test.samples
label_test = dataset_test.labels
The next lines configure a squared exponential kernel with the set of hyperparameters for the current subplot and assign the kernel to the GPR instance.
sigma_f, length_scale, sigma_noise = hyperparameters[i, :]
kse = KernelSquaredExponential(length_scale=length_scale,
sigma_f=sigma_f)
g = GPR(kse, sigma_noise=sigma_noise, regression=regression)
print g
if regression:
g.states.enable("predicted_variances")
if logml:
g.states.enable("log_marginal_likelihood")
After training GPR the predictions are queried by passing the test dataset samples and accuracy measures are computed.
g.train(dataset)
prediction = g.predict(data_test)
# print label_test
# print prediction
accuracy = None
if regression:
accuracy = N.sqrt(((prediction-label_test)**2).sum()/prediction.size)
print "RMSE:", accuracy
else:
accuracy = (prediction.astype('l')==label_test.astype('l')).sum() \
/ float(prediction.size)
print "accuracy:", accuracy
The remaining code simply plots both training and test datasets, as well as the GPR solutions.
if F == 1:
P.title(r"$\sigma_f=%0.2f$, $length_s=%0.2f$, $\sigma_n=%0.2f$" \
% (sigma_f,length_scale,sigma_noise))
P.plot(data_train, label_train, "ro", label="train")
P.plot(data_test, prediction, "b-", label="prediction")
P.plot(data_test, label_test, "g+", label="test")
if regression:
P.plot(data_test, prediction-N.sqrt(g.predicted_variances),
"b--", label=None)
P.plot(data_test, prediction+N.sqrt(g.predicted_variances),
"b--", label=None)
P.text(0.5, -0.8, "$RMSE=%.3f$" %(accuracy))
P.text(0.5, -0.95, "$LML=%.3f$" %(g.log_marginal_likelihood))
else:
P.text(0.5, -0.8, "$accuracy=%s" % accuracy)
P.legend(loc='lower right')
print "LML:", g.log_marginal_likelihood
if cfg.getboolean('examples', 'interactive', True):
# show all the cool figures
P.show()
See also
The full source code of this example is included in the PyMVPA source distribution (doc/examples/gpr.py).