skdim.id.MLE

class skdim.id.MLE(dnoise=None, sigma=0, n=None, integral_approximation='Haro', unbiased=False, neighborhood_based=True, K=5)[source]

Intrinsic dimension estimation using the Maximum Likelihood algorithm. [Haro2008] [Hill1975] [Levina2005] [IDJohnsson]

The estimators are based on the referenced paper by Haro et al. (2008), using the assumption that there is a single manifold. The estimator in the paper is obtained using default parameters and dnoise = ‘dnoiseGaussH’.

With integral.approximation = ‘Haro’ the Taylor expansion approximation of r^(m-1) that Haro et al. (2008) used are employed.

With integral.approximation = ‘guaranteed.convergence’, r is factored out and kept and r^(m-2) is approximated with the corresponding Taylor expansion.

This guarantees convergence of the integrals. Divergence might be an issue when the noise is not sufficiently small in comparison to the smallest distances. With integral.approximation = ‘iteration’, five iterations is used to determine m.

Parameters:
dnoise : None or 'dnoiseGaussH'

Vector valued function giving the transition density. ‘dnoiseGaussH’ is the one used in Haro

sigma : float, default=0

Estimated standard deviation for the noise.

n : int, default='None'

Dimension of the noise (at least data.shape[1])

integral.approximation : str, default='Haro'

Can take values ‘Haro’, ‘guaranteed.convergence’, ‘iteration’

unbiased : bool, default=False

Whether to correct bias or not

neighborhood.based : bool, default=True

Means that estimation is made for each neighborhood, otherwise the estimation is based on distances in the entire data set.

comb : str, default='mle'

How to aggregate the pointwise estimates. Possible values ‘mle’, ‘mean’, ‘median’

K : int, default=5

Number of neighbors per data point that is considered, only used for neighborhood.based = FALSE

Methods

fit(X[, y, precomputed_knn_arrays, smooth, …]) Fitting method for local ID estimators :param X: The training input samples.
fit_once(X)
fit_predict(X[, y, precomputed_knn_arrays, …]) Fit-predict method for local ID estimators :param X: The training input samples.
fit_transform(X[, y, …]) Fit-transform method for local ID estimators
fit_transform_pw(X[, …]) Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform([X]) Predict ID after a previous call to self.fit
transform_pw([X]) Return an array of pointwise ID estimates after a previous call to self.fit_pw
fit(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mle', n_jobs=1)[source]

Fitting method for local ID estimators :param X: The training input samples. :type X: {array-like}, shape (n_samples, n_features) :param y: :type y: dummy parameter to respect the sklearn API :param precomputed_knn_arrays: Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices) :type precomputed_knn_arrays: tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ] :param n_neighbors: Number of nearest neighbors to use (ignored when using precomputed_knn) :type n_neighbors: int, default=self._N_NEIGHBORS :param n_jobs: Number of processes :type n_jobs: int :param smooth: Additionally computes a smoothed version of pointwise estimates by

taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_
Returns:self (object) – Returns self.
fit_predict(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mle', n_jobs=1)[source]

Fit-predict method for local ID estimators :param X: The training input samples. :type X: {array-like}, shape (n_samples, n_features) :param y: :type y: dummy parameter to respect the sklearn API :param precomputed_knn_arrays: Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices) :type precomputed_knn_arrays: tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ] :param n_neighbors: Number of nearest neighbors to use (ignored when using precomputed_knn) :type n_neighbors: int, default=self._N_NEIGHBORS :param n_jobs: Number of processes :type n_jobs: int :param smooth: Additionally computes a smoothed version of pointwise estimates by

taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_
Returns:dimension_ ({int, float}) – The estimated intrinsic dimension
fit_transform(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mean', n_jobs=1)

Fit-transform method for local ID estimators

Parameters:
X : {array-like}, shape (n_samples, n_features)

The training input samples.

y : dummy parameter to respect the sklearn API

precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]

Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)

n_neighbors : int, default=self._N_NEIGHBORS

Number of nearest neighbors to use (ignored when using precomputed_knn)

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_

Returns:

dimension_ ({int, float}) – The estimated intrinsic dimension

fit_transform_pw(X, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, n_jobs=1)

Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.

Parameters:
X : np.array (n_samples x n_neighbors)

Dataset to fit

precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]

Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)

n_neighbors : int, default=self._N_NEIGHBORS

Number of nearest neighbors to use (ignored when using precomputed_knn).

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by

taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)

smooth_

Returns:

  • dimension_pw (np.array) – Pointwise ID estimates
  • dimension_pw_smooth (np.array) – If smooth is True, additionally returns smoothed pointwise ID estimates

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X=None)

Predict ID after a previous call to self.fit

Parameters:
X : Dummy parameter

Returns:

dimension_ ({int, float}) – The estimated ID

transform_pw(X=None)

Return an array of pointwise ID estimates after a previous call to self.fit_pw

Parameters:
X : Dummy parameter

Returns:

  • dimension_pw (np.array) – Pointwise ID estimates
  • dimension_pw_smooth (np.array) – If self.fit_pw(smooth=True), additionally returns smoothed pointwise ID estimates