skdim.id.lPCA¶
-
class
skdim.id.
lPCA
(ver='FO', alphaRatio=0.05, alphaFO=0.05, alphaFan=10, betaFan=0.8, PFan=0.95, verbose=True, fit_explained_variance=False)[source]¶ Intrinsic dimension estimation using the PCA algorithm. [Cangelosi2007] [Fan2010] [Fukunaga2010] [IDJohnsson]
Version ‘FO’ (Fukunaga-Olsen) returns eigenvalues larger than alphaFO times the largest eigenvalue.
Version ‘Fan’ is the method by Fan et al.
Version ‘maxgap’ returns the position of the largest relative gap in the sequence of eigenvalues.
Version ‘ratio’ returns the number of eigenvalues needed to retain at least alphaRatio of the variance.
Version ‘participation_ratio’ returns the number of eigenvalues given by PR=sum(eigenvalues)^2/sum(eigenvalues^2)
Version ‘Kaiser’ returns the number of eigenvalues above average (the average eigenvalue is 1)
Version ‘broken_stick’ returns the number of eigenvalues above corresponding values of the broken stick distribution
Parameters: - ver : str, default='FO'
Version. Possible values: ‘FO’, ‘Fan’, ‘maxgap’,’ratio’, ‘Kaiser’, ‘broken_stick’.
- alphaRatio : float in (0,1)
Only for ver = ‘ratio’. ID is estimated to be the number of principal components needed to retain at least alphaRatio of the variance.
- alphaFO : float in (0,1)
Only for ver = ‘FO’. An eigenvalue is considered significant if it is larger than alpha times the largest eigenvalue.
- alphaFan : float
Only for ver = ‘Fan’. The alpha parameter (large gap threshold).
- betaFan : float
Only for ver = ‘Fan’. The beta parameter (total covariance threshold).
- PFan : float
Only for ver = ‘Fan’. Total covariance in non-noise.
- verbose : bool, default=False
- explained_variance : bool, default=False
If True, lPCA.fit(X) expects as input a precomputed explained_variance vector: X = sklearn.decomposition.PCA().fit(X).explained_variance_
Methods
fit
(X[, y])A reference implementation of a fitting function. fit_pw
(X[, precomputed_knn, smooth, …])Creates an array of pointwise ID estimates (self.dimension_pw_) by fitting the estimator in kNN of each point. fit_transform
(X[, y])Fit estimator and return ID fit_transform_pw
(X[, precomputed_knn, …])Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
([X])Predict dimension after a previous call to self.fit transform_pw
([X])Return an array of pointwise ID estimates after a previous call to self.fit_pw -
fit
(X, y=None)[source]¶ A reference implementation of a fitting function.
Parameters: - X : {array-like}, shape (n_samples, n_features)
A local dataset of training input samples.
- y : dummy parameter to respect the sklearn API
Returns: self (object) – Returns self.
-
fit_pw
(X, precomputed_knn=None, smooth=False, n_neighbors=100, n_jobs=1)¶ Creates an array of pointwise ID estimates (self.dimension_pw_) by fitting the estimator in kNN of each point.
Parameters: - X : np.array (n_samples x n_neighbors)
Dataset to fit
- precomputed_knn : np.array (n_samples x n_dims)
An array of precomputed (sorted) nearest neighbor indices
- n_neighbors
Number of nearest neighbors to use (ignored when using precomputed_knn)
- n_jobs : int
Number of processes
- smooth : bool, default = False
- Additionally computes a smoothed version of pointwise estimates by
taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)
Returns: self (object) – Returns self
-
fit_transform
(X, y=None)¶ Fit estimator and return ID
Parameters: - X : {array-like}, shape (n_samples, n_features)
The training input samples.
Returns: dimension_ ({int, float}) – The estimated intrinsic dimension
-
fit_transform_pw
(X, precomputed_knn=None, smooth=False, n_neighbors=100, n_jobs=1)¶ Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.
Parameters: - X : np.array (n_samples x n_neighbors)
Dataset to fit
- precomputed_knn : bool
An array of precomputed (sorted) nearest neighbor indices
- n_neighbors
Number of nearest neighbors to use (ignored when using precomputed_knn)
- n_jobs : int
Number of processes
- smooth : bool, default = False
- Additionally computes a smoothed version of pointwise estimates by
taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)
Returns: - dimension_pw_ (np.array with dtype {int, float}) – Pointwise ID estimates
- dimension_pw_smooth_ (np.array with dtype float) – Smoothed pointwise ID estimates returned if self.fit_pw(smooth=True)
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: - deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params (dict) – Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters: - **params : dict
Estimator parameters.
Returns: self (estimator instance) – Estimator instance.
-
transform
(X=None)¶ Predict dimension after a previous call to self.fit
Parameters: - X : Dummy parameter
Returns: dimension_ ({int, float}) – The estimated ID
-
transform_pw
(X=None)¶ Return an array of pointwise ID estimates after a previous call to self.fit_pw
Parameters: - X : Dummy parameter
Returns: - dimension_pw_ (np.array with dtype {int, float}) – Pointwise ID estimates
- dimension_pw_smooth_ (np.array with dtype float) – Smoothed pointwise ID estimates returned if self.fit_pw(smooth=True)