skdim.id.ESS

class skdim.id.ESS(ver='a', d=1, random_state=None)[source]

Intrinsic dimension estimation using the Expected Simplex Skewness algorithm. [Johnsson2015] [IDJohnsson] The ESS method assumes that the data is local, i.e. that it is a neighborhood taken from a larger data set, such that the curvature and the noise within the neighborhood is relatively small. In the ideal case (no noise, no curvature) this is equivalent to the data being uniformly distributed over a hyper ball.

Parameters:
ver : str, 'a' or 'b'

See Johnsson et al. (2015).

d : int, default=1

For ver =’a’, any value of d is possible, for ver =’b’, only d = 1 is supported.

ess_

The Expected Simplex Skewness value.

Type:float

Methods

fit(X[, y, precomputed_knn_arrays, smooth, …]) Fitting method for local ID estimators
fit_once(X[, y]) Fit ESS on a single neighborhood.
fit_transform(X[, y, …]) Fit-transform method for local ID estimators
fit_transform_pw(X[, …]) Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform([X]) Predict ID after a previous call to self.fit
transform_pw([X]) Return an array of pointwise ID estimates after a previous call to self.fit_pw
fit(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mean', n_jobs=1)

Fitting method for local ID estimators

Parameters:
X : {array-like}, shape (n_samples, n_features)

The training input samples.

y : dummy parameter to respect the sklearn API

precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]

Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)

n_neighbors : int, default=self._N_NEIGHBORS

Number of nearest neighbors to use (ignored when using precomputed_knn)

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_

Returns:

self (object) – Returns self.

fit_once(X, y=None)[source]

Fit ESS on a single neighborhood. /!Not meant to be used on a complete dataset - X should be a local patch of a dataset, otherwise call .fit() :param X: The training input samples. /!Should be a local patch of a dataset :type X: {array-like}, shape (n_samples, n_features) :param y: :type y: dummy parameter to respect the sklearn API

Returns:self (object) – Returns self.
fit_transform(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mean', n_jobs=1)

Fit-transform method for local ID estimators

Parameters:
X : {array-like}, shape (n_samples, n_features)

The training input samples.

y : dummy parameter to respect the sklearn API

precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]

Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)

n_neighbors : int, default=self._N_NEIGHBORS

Number of nearest neighbors to use (ignored when using precomputed_knn)

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_

Returns:

dimension_ ({int, float}) – The estimated intrinsic dimension

fit_transform_pw(X, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, n_jobs=1)

Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.

Parameters:
X : np.array (n_samples x n_neighbors)

Dataset to fit

precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]

Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)

n_neighbors : int, default=self._N_NEIGHBORS

Number of nearest neighbors to use (ignored when using precomputed_knn).

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by

taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)

smooth_

Returns:

  • dimension_pw (np.array) – Pointwise ID estimates
  • dimension_pw_smooth (np.array) – If smooth is True, additionally returns smoothed pointwise ID estimates

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X=None)

Predict ID after a previous call to self.fit

Parameters:
X : Dummy parameter

Returns:

dimension_ ({int, float}) – The estimated ID

transform_pw(X=None)

Return an array of pointwise ID estimates after a previous call to self.fit_pw

Parameters:
X : Dummy parameter

Returns:

  • dimension_pw (np.array) – Pointwise ID estimates
  • dimension_pw_smooth (np.array) – If self.fit_pw(smooth=True), additionally returns smoothed pointwise ID estimates