T-distributed Stochastic Neighbor Embedding is a two-stage non-linear manifold learning algorithm based on batch Gradient Descent that seeks to maintain the distances between samples in low dimensional space. During the first stage (early exaggeration) the distances are exaggerated to encourage more pronounced clusters. Since the t-SNE cost function (KL Divergence) has a rough gradient, additional momentum is employed to help escape bad local minima.

Note: T-SNE is implemented using the exact method which scales quadratically in the number of samples. Therefore, it is recommended to subsample datasets larger than a few thousand samples.

Interfaces: Verbose

Data Type Compatibility: Continuous


# Param Default Type Description
1 dimensions 2 int The number of dimensions of the target embedding.
2 rate 100.0 float The learning rate that controls the global step size.
3 perplexity 30 int The number of effective nearest neighbors to refer to when computing the variance of the distribution over that sample.
4 exaggeration 12.0 float The factor to exaggerate the distances between samples during the early stage of embedding.
5 epochs 1000 int The maximum number of times to iterate over the embedding.
6 min gradient 1e-7 float The minimum norm of the gradient necessary to continue embedding.
7 window 10 int The number of epochs without improvement in the training loss to wait before considering an early stop.
8 kernel Euclidean object The distance kernel to use when measuring distances between samples.

Additional Methods#

Return the magnitudes of the gradient at each epoch from the last embedding:

public steps() : array


use Rubi\ML\Embedders\TSNE;
use Rubix\ML\Kernels\Manhattan;

$embedder = new TSNE(3, 10.0, 30, 12.0, 500, 1e-6, 10, new Manhattan());


  • L. van der Maaten et al. (2008). Visualizing Data using t-SNE.
  • L. van der Maaten. (2009). Learning a Parametric Embedding by Preserving Local Structure.