t-SNE#

T-distributed Stochastic Neighbor Embedding is a two-stage non-linear manifold learning algorithm based on Batch Gradient Descent that seeks to maintain the distances between samples in low-dimensional space. During the first stage (early stage) the distances are exaggerated to encourage more pronounced clusters. Since the t-SNE cost function (KL Divergence) has a rough gradient, momentum is employed to help escape bad local minima.

Note

T-SNE is implemented using the exact method which scales quadratically in the number of samples. Therefore, it is recommended to subsample datasets larger than a few thousand samples.

Interfaces: Transformer, Verbose

Data Type Compatibility: Depends on distance kernel

Parameters#

#	Name	Default	Type	Description
1	dimensions	2	int	The number of dimensions of the target embedding.
2	rate	100.0	float	The learning rate that controls the global step size.
3	perplexity	30	int	The number of effective nearest neighbors to refer to when computing the variance of the distribution over that sample.
4	exaggeration	12.0	float	The factor to exaggerate the distances between samples during the early stage of embedding.
5	epochs	1000	int	The maximum number of times to iterate over the embedding.
6	minGradient	1e-7	float	The minimum norm of the gradient necessary to continue embedding.
7	window	10	int	The number of epochs without improvement in the training loss to wait before considering an early stop.
8	kernel	Euclidean	Distance	The distance kernel to use when measuring distances between samples.

Example#

use Rubix\ML\Transformers\TSNE;
use Rubix\ML\Kernels\Distance\Manhattan;

$transformer = new TSNE(3, 10.0, 30, 12.0, 500, 1e-6, 10, new Manhattan());

Additional Methods#

Return an iterable progress table with the steps from the last training session:

public steps() : iterable

use Rubix\ML\Extractors\CSV;

$extractor = new CSV('progress.csv', true);

$extractor->export($transformer->steps());

Return the magnitudes of the gradient at each epoch from the last embedding:

public losses() : float[]|null

References#

L. van der Maaten et al. (2008). Visualizing Data using t-SNE. ↩
L. van der Maaten. (2009). Learning a Parametric Embedding by Preserving Local Structure. ↩

Last update: 2021-05-08