t-SNE#
T-distributed Stochastic Neighbor Embedding is a two-stage non-linear manifold learning algorithm based on Batch Gradient Descent that seeks to maintain the distances between samples in low-dimensional space. During the first stage (early stage) the distances are exaggerated to encourage more pronounced clusters. Since the t-SNE cost function (KL Divergence) has a rough gradient, momentum is employed to help escape bad local minima.
Note
T-SNE is implemented using the exact method which scales quadratically in the number of samples. Therefore, it is recommended to subsample datasets larger than a few thousand samples.
Interfaces: Transformer, Verbose
Data Type Compatibility: Depends on distance kernel
Parameters#
# | Name | Default | Type | Description |
---|---|---|---|---|
1 | dimensions | 2 | int | The number of dimensions of the target embedding. |
2 | rate | 100.0 | float | The learning rate that controls the global step size. |
3 | perplexity | 30 | int | The number of effective nearest neighbors to refer to when computing the variance of the distribution over that sample. |
4 | exaggeration | 12.0 | float | The factor to exaggerate the distances between samples during the early stage of embedding. |
5 | epochs | 1000 | int | The maximum number of times to iterate over the embedding. |
6 | minGradient | 1e-7 | float | The minimum norm of the gradient necessary to continue embedding. |
7 | window | 10 | int | The number of epochs without improvement in the training loss to wait before considering an early stop. |
8 | kernel | Euclidean | Distance | The distance kernel to use when measuring distances between samples. |
Example#
use Rubix\ML\Transformers\TSNE;
use Rubix\ML\Kernels\Distance\Manhattan;
$transformer = new TSNE(3, 10.0, 30, 12.0, 500, 1e-6, 10, new Manhattan());
Additional Methods#
Return an iterable progress table with the steps from the last training session:
public steps() : iterable
use Rubix\ML\Extractors\CSV;
$extractor = new CSV('progress.csv', true);
$extractor->export($transformer->steps());
Return the magnitudes of the gradient at each epoch from the last embedding:
public losses() : float[]|null