skdim.id.lPCA¶

class skdim.id.lPCA(ver='FO', alphaRatio=0.05, alphaFO=0.05, alphaFan=10, betaFan=0.8, PFan=0.95, verbose=True, fit_explained_variance=False)[source]¶

Intrinsic dimension estimation using the PCA algorithm. [Cangelosi2007] [Fan2010] [Fukunaga2010] [IDJohnsson]

Version ‘FO’ (Fukunaga-Olsen) returns eigenvalues larger than alphaFO times the largest eigenvalue.

Version ‘Fan’ is the method by Fan et al.

Version ‘maxgap’ returns the position of the largest relative gap in the sequence of eigenvalues.

Version ‘ratio’ returns the number of eigenvalues needed to retain at least alphaRatio of the variance.

Version ‘participation_ratio’ returns the number of eigenvalues given by PR=sum(eigenvalues)^2/sum(eigenvalues^2)

Version ‘Kaiser’ returns the number of eigenvalues above average (the average eigenvalue is 1)

Version ‘broken_stick’ returns the number of eigenvalues above corresponding values of the broken stick distribution

Parameters:

ver : str, default='FO': Version. Possible values: ‘FO’, ‘Fan’, ‘maxgap’,’ratio’, ‘Kaiser’, ‘broken_stick’.
alphaRatio : float in (0,1): Only for ver = ‘ratio’. ID is estimated to be the number of principal components needed to retain at least alphaRatio of the variance.
alphaFO : float in (0,1): Only for ver = ‘FO’. An eigenvalue is considered significant if it is larger than alpha times the largest eigenvalue.
alphaFan : float: Only for ver = ‘Fan’. The alpha parameter (large gap threshold).
betaFan : float: Only for ver = ‘Fan’. The beta parameter (total covariance threshold).
PFan : float: Only for ver = ‘Fan’. Total covariance in non-noise.
verbose : bool, default=False
explained_variance : bool, default=False: If True, lPCA.fit(X) expects as input a precomputed explained_variance vector: X = sklearn.decomposition.PCA().fit(X).explained_variance_

Methods

`fit`(X[, y])	A reference implementation of a fitting function.
`fit_pw`(X[, precomputed_knn, smooth, …])	Creates an array of pointwise ID estimates (self.dimension_pw_) by fitting the estimator in kNN of each point.
`fit_transform`(X[, y])	Fit estimator and return ID
`fit_transform_pw`(X[, precomputed_knn, …])	Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`([X])	Predict dimension after a previous call to self.fit
`transform_pw`([X])	Return an array of pointwise ID estimates after a previous call to self.fit_pw

fit(X, y=None)[source]¶

A reference implementation of a fitting function.

Parameters:	X : {array-like}, shape (n_samples, n_features) A local dataset of training input samples. y : dummy parameter to respect the sklearn API
Returns:	self (object) – Returns self.

fit_pw(X, precomputed_knn=None, smooth=False, n_neighbors=100, n_jobs=1)¶

Creates an array of pointwise ID estimates (self.dimension_pw_) by fitting the estimator in kNN of each point.

Parameters:

X : np.array (n_samples x n_neighbors)

Dataset to fit

precomputed_knn : np.array (n_samples x n_dims)

An array of precomputed (sorted) nearest neighbor indices

n_neighbors

Number of nearest neighbors to use (ignored when using precomputed_knn)

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by: taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)

smooth_

Returns:

self (object) – Returns self

fit_transform(X, y=None)¶

Fit estimator and return ID

Parameters:	X : {array-like}, shape (n_samples, n_features) The training input samples.
Returns:	dimension_ ({int, float}) – The estimated intrinsic dimension

fit_transform_pw(X, precomputed_knn=None, smooth=False, n_neighbors=100, n_jobs=1)¶

Returns an array of pointwise ID estimates by fitting the estimator in kNN of each point.

Parameters:

X : np.array (n_samples x n_neighbors)

Dataset to fit

precomputed_knn : bool

An array of precomputed (sorted) nearest neighbor indices

n_neighbors

Number of nearest neighbors to use (ignored when using precomputed_knn)

n_jobs : int

Number of processes

smooth : bool, default = False

Additionally computes a smoothed version of pointwise estimates by: taking the ID of a point as the average ID of each point in its neighborhood (self.dimension_pw_)

smooth_

Returns:

dimension_pw_ (np.array with dtype {int, float}) – Pointwise ID estimates
dimension_pw_smooth_ (np.array with dtype float) – Smoothed pointwise ID estimates returned if self.fit_pw(smooth=True)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params (dict) – Parameter names mapped to their values.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:	*params : dict* Estimator parameters.
Returns:	self (estimator instance) – Estimator instance.

transform(X=None)¶

Predict dimension after a previous call to self.fit

Parameters:	X : Dummy parameter
Returns:	dimension_ ({int, float}) – The estimated ID

transform_pw(X=None)¶

Return an array of pointwise ID estimates after a previous call to self.fit_pw

Parameters:

X : Dummy parameter

Returns:

dimension_pw_ (np.array with dtype {int, float}) – Pointwise ID estimates
dimension_pw_smooth_ (np.array with dtype float) – Smoothed pointwise ID estimates returned if self.fit_pw(smooth=True)