[source]

Gaussian Mixture#

A Gaussian Mixture model (GMM) is a probabilistic model for representing the presence of clusters within an overall population without requiring a sample to know which sub-population it belongs to beforehand. GMMs are similar to centroid-based clusterers like K Means but allow both the cluster centers (means) as well as the radii (variances) to be learned as well. For this reason, GMMs are especially useful for clusterings that are of different radius.

Interfaces: Estimator, Learner, Probabilistic, Verbose, Persistable

Data Type Compatibility: Continuous

Parameters#

# Param Default Type Description
1 k int The number of target clusters.
2 epochs 100 int The maximum number of training rounds to execute.
3 min change 1e-3 float The minimum change in the components necessary for the algorithm to continue training.
6 seeder PlusPlus object The seeder used to initialize the Guassian components.

Additional Methods#

Return the cluster prior probabilities based on their representation over all training samples:

public priors() : array

Return the running means of each feature column for each cluster:

public means() : array

Return the variance of each feature column for each cluster:

public variances() : array

Example#

use Rubix\ML\Clusterers\GaussianMixture;
use Rubix\ML\Clusterers\Seeders\KMC2;

$estimator = new GaussianMixture(5, 1e-4, 100, new KMC2(50));

References#

  • A. P. Dempster et al. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm.
  • J. Blomer et al. (2016). Simple Methods for Initializing the EM Algorithm for Gaussian Mixture Models.