Package mvpa :: Package clfs :: Module distance
[hide private]
[frames] | no frames]

Module distance

source code

Distance functions to be used in kernels and elsewhere
Functions [hide private]
 
cartesianDistance(a, b)
Return Cartesian distance between a and b
source code
 
absminDistance(a, b)
Returns dinstance max(|a-b|) XXX There must be better name! XXX Actually, why is it absmin not absmax?
source code
 
manhattenDistance(a, b)
Return Manhatten distance between a and b
source code
 
mahalanobisDistance(x, y=None, w=None)
Calculate Mahalanobis distance of the pairs of points.
source code
 
squared_euclidean_distance(data1, data2=None, weight=None)
Compute weighted euclidean distance matrix between two datasets.
source code
 
oneMinusCorrelation(X, Y)
Return one minus the correlation matrix between the rows of two matrices.
source code
 
pnorm_w_python(data1, data2=None, weight=None, p=2, heuristic='auto', use_sq_euclidean=True)
Weighted p-norm between two datasets (pure Python implementation)
source code
 
pnorm_w(data1, data2=None, weight=None, p=2, heuristic='auto', use_sq_euclidean=True)
Weighted p-norm between two datasets (pure Python implementation)
source code

Imports: N, externals, debug, warning, weave, converters


Function Details [hide private]

absminDistance(a, b)

source code 

Returns dinstance max(|a-b|) XXX There must be better name! XXX Actually, why is it absmin not absmax?

Useful to select a whole cube of a given "radius"

mahalanobisDistance(x, y=None, w=None)

source code 

Calculate Mahalanobis distance of the pairs of points.

Inverse covariance matrix can be calculated with the following

w = N.linalg.solve(N.cov(x.T), N.identity(x.shape[1]))

or

w = N.linalg.inv(N.cov(x.T))
Parameters:
  • x - first list of points. Rows are samples, columns are features.
  • y - second list of points (optional)
  • w (N.ndarray) - optional inverse covariance matrix between the points. It is computed if not given

squared_euclidean_distance(data1, data2=None, weight=None)

source code 
Compute weighted euclidean distance matrix between two datasets.
Parameters:
  • data1 (N.ndarray) - first dataset
  • data2 (N.ndarray) - second dataset. If None, compute the euclidean distance between the first dataset versus itself. (Defaults to None)
  • weight (N.ndarray) - vector of weights, each one associated to each dimension of the dataset (Defaults to None)

oneMinusCorrelation(X, Y)

source code 

Return one minus the correlation matrix between the rows of two matrices.

This functions computes a matrix of correlations between all pairs of rows of two matrices. Unlike NumPy's corrcoef() this function will only considers pairs across matrices and not within, e.g. both elements of a pair never have the same source matrix as origin.

Both arrays need to have the same number of columns.

Example:

>>> X = N.random.rand(20,80)
>>> Y = N.random.rand(5,80)
>>> C = oneMinusCorrelation(X, Y)
>>> print C.shape
(20, 5)

Parameters: X: 2D-array Y: 2D-array

pnorm_w_python(data1, data2=None, weight=None, p=2, heuristic='auto', use_sq_euclidean=True)

source code 

Weighted p-norm between two datasets (pure Python implementation)

||x - x'||_w = (sum_{i=1...N} (w_i*|x_i - x'_i|)**p)**(1/p)

Parameters:
  • data1 (N.ndarray) - First dataset
  • data2 (N.ndarray or None) - Optional second dataset
  • weight (N.ndarray or None) - Optional weights per 2nd dimension (features)
  • p - Power
  • heuristic (basestring) -
    Which heuristic to use:
    • 'samples' -- python sweep over 0th dim
    • 'features' -- python sweep over 1st dim
    • 'auto' decides automatically. If # of features (shape[1]) is much larger than # of samples (shape[0]) -- use 'samples', and use 'features' otherwise
  • use_sq_euclidean (bool) - Either to use squared_euclidean_distance_matrix for computation if p==2

pnorm_w(data1, data2=None, weight=None, p=2, heuristic='auto', use_sq_euclidean=True)

source code 

Weighted p-norm between two datasets (pure Python implementation)

||x - x'||_w = (sum_{i=1...N} (w_i*|x_i - x'_i|)**p)**(1/p)

Parameters:
  • data1 (N.ndarray) - First dataset
  • data2 (N.ndarray or None) - Optional second dataset
  • weight (N.ndarray or None) - Optional weights per 2nd dimension (features)
  • p - Power
  • heuristic (basestring) -
    Which heuristic to use:
    • 'samples' -- python sweep over 0th dim

    • 'features' -- python sweep over 1st dim

    • 'auto' decides automatically. If # of features (shape[1]) is much larger than # of samples (shape[0]) -- use 'samples', and use 'features' otherwise

  • use_sq_euclidean (bool) - Either to use squared_euclidean_distance_matrix for computation if p==2