skdim.id.MLE¶
-
class
skdim.id.MLE(dnoise=None, sigma=0, n=None, integral_approximation='Haro', unbiased=False, neighborhood_based=True, K=5)[source]¶ Intrinsic dimension estimation using the Maximum Likelihood algorithm. [Haro2008] [Hill1975] [Levina2005] [IDJohnsson]
The estimators are based on the referenced paper by Haro et al. (2008), using the assumption that there is a single manifold. The estimator in the paper is obtained using default parameters and dnoise = ‘dnoiseGaussH’.
With integral.approximation = ‘Haro’ the Taylor expansion approximation of r^(m-1) that Haro et al. (2008) used are employed.
With integral.approximation = ‘guaranteed.convergence’, r is factored out and kept and r^(m-2) is approximated with the corresponding Taylor expansion.
This guarantees convergence of the integrals. Divergence might be an issue when the noise is not sufficiently small in comparison to the smallest distances. With integral.approximation = ‘iteration’, five iterations is used to determine m.
Parameters: - dnoise : None or 'dnoiseGaussH'
Vector valued function giving the transition density. ‘dnoiseGaussH’ is the one used in Haro
- sigma : float, default=0
Estimated standard deviation for the noise.
- n : int, default='None'
Dimension of the noise (at least data.shape[1])
- integral.approximation : str, default='Haro'
Can take values ‘Haro’, ‘guaranteed.convergence’, ‘iteration’
- unbiased : bool, default=False
Whether to correct bias or not
- neighborhood.based : bool, default=True
Means that estimation is made for each neighborhood, otherwise the estimation is based on distances in the entire data set.
- comb : str, default='mle'
How to aggregate the pointwise estimates. Possible values ‘mle’, ‘mean’, ‘median’
- K : int, default=5
Number of neighbors per data point that is considered, only used for neighborhood.based = FALSE
Methods
fit(X[, y, precomputed_knn_arrays, smooth, …])Fitting method for local ID estimators :param X: The training input samples. fit_once(X)fit_predict(X[, y, precomputed_knn_arrays, …])Fit-predict method for local ID estimators :param X: The training input samples. fit_transform(X[, y, …])Fit-transform method for local ID estimators fit_transform_pw(X[, …])Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point. get_params([deep])Get parameters for this estimator. set_params(**params)Set the parameters of this estimator. transform([X])Predict ID after a previous call to self.fit transform_pw([X])Return an array of pointwise ID estimates after a previous call to self.fit_pw -
fit(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mle', n_jobs=1)[source]¶ Fitting method for local ID estimators :param X: The training input samples. :type X: {array-like}, shape (n_samples, n_features) :param y: :type y: dummy parameter to respect the sklearn API :param precomputed_knn_arrays: Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices) :type precomputed_knn_arrays: tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ] :param n_neighbors: Number of nearest neighbors to use (ignored when using precomputed_knn) :type n_neighbors: int, default=self._N_NEIGHBORS :param n_jobs: Number of processes :type n_jobs: int :param smooth: Additionally computes a smoothed version of pointwise estimates by
taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_Returns: self (object) – Returns self.
-
fit_predict(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mle', n_jobs=1)[source]¶ Fit-predict method for local ID estimators :param X: The training input samples. :type X: {array-like}, shape (n_samples, n_features) :param y: :type y: dummy parameter to respect the sklearn API :param precomputed_knn_arrays: Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices) :type precomputed_knn_arrays: tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ] :param n_neighbors: Number of nearest neighbors to use (ignored when using precomputed_knn) :type n_neighbors: int, default=self._N_NEIGHBORS :param n_jobs: Number of processes :type n_jobs: int :param smooth: Additionally computes a smoothed version of pointwise estimates by
taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_Returns: dimension_ ({int, float}) – The estimated intrinsic dimension
-
fit_transform(X, y=None, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, comb='mean', n_jobs=1)¶ Fit-transform method for local ID estimators
Parameters: - X : {array-like}, shape (n_samples, n_features)
The training input samples.
- y : dummy parameter to respect the sklearn API
- precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]
Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)
- n_neighbors : int, default=self._N_NEIGHBORS
Number of nearest neighbors to use (ignored when using precomputed_knn)
- n_jobs : int
Number of processes
- smooth : bool, default = False
Additionally computes a smoothed version of pointwise estimates by taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_) smooth_
Returns: dimension_ ({int, float}) – The estimated intrinsic dimension
-
fit_transform_pw(X, precomputed_knn_arrays=None, smooth=False, n_neighbors=None, n_jobs=1)¶ Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.
Parameters: - X : np.array (n_samples x n_neighbors)
Dataset to fit
- precomputed_knn_arrays : tuple[ np.array (n_samples x n_dims), np.array (n_samples x n_dims) ]
Provide two precomputed arrays: (sorted nearest neighbor distances, sorted nearest neighbor indices)
- n_neighbors : int, default=self._N_NEIGHBORS
Number of nearest neighbors to use (ignored when using precomputed_knn).
- n_jobs : int
Number of processes
- smooth : bool, default = False
- Additionally computes a smoothed version of pointwise estimates by
taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)
Returns: - dimension_pw (np.array) – Pointwise ID estimates
- dimension_pw_smooth (np.array) – If smooth is True, additionally returns smoothed pointwise ID estimates
-
get_params(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params (dict) – Parameter names mapped to their values.
-
set_params(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: self (estimator instance) – Estimator instance.
-
transform(X=None)¶ Predict ID after a previous call to self.fit
Parameters: - X : Dummy parameter
Returns: dimension_ ({int, float}) – The estimated ID
-
transform_pw(X=None)¶ Return an array of pointwise ID estimates after a previous call to self.fit_pw
Parameters: - X : Dummy parameter
Returns: - dimension_pw (np.array) – Pointwise ID estimates
- dimension_pw_smooth (np.array) – If self.fit_pw(smooth=True), additionally returns smoothed pointwise ID estimates