Example usage¶
[1]:
import skdim
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
Generating data¶
Generated data are np.array (n_points x n_dim). Here we generate two datasets that contain clusters with different intrinsic dimensions (ID)
[2]:
#generate data
data1, clusters = skdim.datasets.lineDiskBall(n = 2000, random_state = 0)
data2 = skdim.datasets.swissRoll3Sph(n_swiss=4000,n_sphere=2000, h=2, random_state = 0)
#plot
fig = make_subplots(rows=1, cols=2,specs=[[{'type': 'Scatter3d'}]*2])
trace1=go.Scatter3d(dict(zip(['x','y','z'],data1.T[:3])),
mode='markers',marker=dict(size=1.5,colorbar=dict()))
trace2=go.Scatter3d(dict(zip(['x','y','z'],data2.T[:3])),
mode='markers',marker=dict(size=1.5,colorbar=dict()))
fig.add_traces([trace1,trace2],rows=1,cols=[1,2])
fig.layout.update(height=450, width=800)
fig.show(renderer="notebook")
Estimating global ID¶
Estimators can compute global ID estimates (for the entire dataset).
This can be done with any of these calls: - .fit(data).dimension_ - .fit(data).transform() - .fit_transform(data)
[3]:
pca=skdim.id.lPCA()
#global ID
gid1=pca.fit(data1).dimension_
gid2=pca.fit(data2).dimension_
print(gid1,gid2)
2 3
Estimating local ID¶
Estimators can also compute local ID estimates using k-NN around each point, which is useful to find if a dataset has regions with different IDs.
To compute local ID call either: - .fit_pw(data).dimension_pw_ - .fit_pw(data).transform_pw() - .fit_transform_pw(data)
Several estimators also naturally compute the distribution of local ID to derive the global ID estimates.
These estimators do not have a .fit_pw() method since .fit() already computes local estimates and you can simply call : - .fit(data).dimension_pw_ - .fit(data).transform_pw() - .fit_transform_pw(data)
[4]:
#local ID (pointwise estimates)
lid1=pca.fit_pw(data1,n_neighbors=25).dimension_pw_
lid2=pca.fit_pw(data2,n_neighbors=25).dimension_pw_
[5]:
fig.update_traces({'text':lid1,'marker.color':lid1,'marker.colorbar':dict(thickness=5,x=.42)},col=1)
fig.update_traces({'text':lid2,'marker.color':lid2,'marker.colorbar':dict(thickness=5,x=.98)},col=2)
fig.show(renderer="notebook")
Benchmarking estimators¶
Estimators are commonly benchmarked using various synthetic manifolds with known ID.
Using datasets.BenchmarkManifolds, one can generate a set of standard manifolds used in most research papers.
[6]:
benchmark = skdim.datasets.BenchmarkManifolds(random_state=0)
#dictionary with all datasets
dict_data = benchmark.generate()
#ground truth dataframe
truth = benchmark.truth
#generate a dataset with custom parameters
M1_sphere_custom = benchmark.generate(name="M1_Sphere",n=2500,dim=10,d=5)
M1_sphere = dict_data['M1_Sphere'] #np.array (n x dim)
truth
[6]:
| Intrinsic Dimension | Number of variables | Description | |
|---|---|---|---|
| M1_Sphere | 10 | 11 | 10D sphere linearly embedded |
| M2_Affine_3to5 | 3 | 5 | Affine space |
| M3_Nonlinear_4to6 | 4 | 6 | Concentrated figure, mistakable with a 3D one |
| M4_Nonlinear | 4 | 8 | Nonlinear manifold |
| M5a_Helix1d | 1 | 3 | 1D helix |
| M5b_Helix2d | 2 | 3 | 2D helix |
| M6_Nonlinear | 6 | 36 | Nonlinear manifold |
| M7_Roll | 2 | 3 | Swiss Roll |
| M8_Nonlinear | 12 | 72 | Nonlinear (highly curved) manifold |
| M9_Affine | 20 | 20 | Affine space |
| M10a_Cubic | 10 | 11 | 10D hypercube |
| M10b_Cubic | 17 | 18 | 17D hypercube |
| M10c_Cubic | 24 | 25 | 24D hypercube |
| M10d_Cubic | 70 | 71 | 70D hypercube |
| M11_Moebius | 2 | 3 | Möebius band 10-times twisted |
| M12_Norm | 20 | 20 | Isotropic multivariate Gaussian |
| M13a_Scurve | 2 | 3 | 2D S-curve |
| M13b_Spiral | 1 | 13 | 1D helix curve |
| Mbeta | 10 | 40 | Manifold generated with a smooth nonuniform pd... |
| Mn1_Nonlinear | 18 | 72 | Nonlinearly embedded manifold of high ID (see ... |
| Mn2_Nonlinear | 24 | 96 | Nonlinearly embedded manifold of high ID (see ... |
| Mp1_Paraboloid | 3 | 12 | 3D paraboloid, nonlinearly embedded in (3(3+1)... |
| Mp2_Paraboloid | 6 | 21 | 6D paraboloid, nonlinearly embedded in (3*(6+1... |
| Mp3_Paraboloid | 9 | 30 | 9D paraboloid, nonlinearly embedded in (3*(9+1... |