matchDistribution(data,
nsamples=None,
loc=None,
scale=None,
args=None,
test='kstest',
distributions=None,
**kwargs)
| source code
|
Determine best matching distribution.
Can be used for 'smelling' the data, as well to choose a
parametric distribution for data obtained from non-parametric
testing (e.g. `MCNullDist`).
WiP: use with caution, API might change
:Parameters:
data : N.ndarray
Array of the data for which to deduce the distribution. It has
to be sufficiently large to make a reliable conclusion
nsamples : int or None
If None -- use all samples in data to estimate parametric
distribution. Otherwise use only specified number randomly selected
from data.
loc : float or None
Loc for the distribution (if known)
scale : float or None
Scale for the distribution (if known)
test : basestring
What kind of testing to do. Choices:
'p-roc' : detection power for a given ROC. Needs two
parameters: `p=0.05` and `tail='both'`
'kstest' : 'full-body' distribution comparison. The best
choice is made by minimal reported distance after estimating
parameters of the distribution. Parameter `p=0.05` sets
threshold to reject null-hypothesis that distribution is the
same.
WARNING: older versions (e.g. 0.5.2 in etch) of scipy have
incorrect kstest implementation and do not function
properly
distributions : None or list of basestring or tuple(basestring, dict)
Distributions to check. If None, all known in scipy.stats
are tested. If distribution is specified as a tuple, then
it must contain name and additional parameters (name, loc,
scale, args) in the dictionary. Entry 'scipy' adds all known
in scipy.stats.
**kwargs
Additional arguments which are needed for each particular test
(see above)
:Example:
data = N.random.normal(size=(1000,1));
matches = matchDistribution(
data,
distributions=['rdist',
('rdist', {'name':'rdist_fixed',
'loc': 0.0,
'args': (10,)})],
nsamples=30, test='p-roc', p=0.05)
|