# Choosing an Estimator#

Estimators make up the core of the Rubix ML library and include classifiers, regressors, clusterers, anomaly detectors, and meta-estimators organized into their own namespaces. They are responsible for making predictions and are usually trained with data. Most estimators allow tuning by adjusting their user-defined hyper-parameters. Hyper-parameters are arguments to the learning algorithm that effect its behavior during training and inference. The values for the hyper-parameters can be chosen by intuition, tuning, or completely at random. The defaults provided by the library are a good place to start for most problems. To instantiate a new estimator, pass the desired values of the hyper-parameters to the estimator's constructor like in the example below.

use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Kernels\Distance\Minkowski;

\$estimator = new KNearestNeighbors(10, false, new Minkowski(2.5));


## Classifiers#

Classifiers are supervised learners that predict a categorical class label. They can be used to recognize (cat, dog, turtle), differentiate (spam, not spam), or describe (running, walking) the samples in a dataset based on the labels they were trained on. In addition, classifiers that implement the Probabilistic interface can infer the joint probability distribution of each possible class given an unclassified sample.

Name Flexibility Proba Online Ranks Features Verbose Data Compatibility
AdaBoost High Depends on base learner
Classification Tree Medium Categorical, Continuous
Extra Tree Classifier Medium Categorical, Continuous
Gaussian Naive Bayes Medium Continuous
K-d Neighbors Medium Depends on distance kernel
K Nearest Neighbors Medium Depends on distance kernel
Logistic Regression Low Continuous
Logit Boost High Categorical, Continuous
Multilayer Perceptron High Continuous
Naive Bayes Medium Categorical
Radius Neighbors Medium Depends on distance kernel
Random Forest High Categorical, Continuous
Softmax Classifier Low Continuous
SVC High Continuous

## Regressors#

Regressors are a type of supervised learner that predict a continuous-valued outcome such as 1.275 or 655. They can be used to quantify a sample such as its credit score, age, or steering wheel position in units of degrees. Unlike classifiers whose range of predictions is bounded by the number of possible classes in the training set, a regressor's range is unbounded - meaning, the number of possible values a regressor could predict is infinite.

Name Flexibility Online Ranks Features Verbose Persistable Data Compatibility
Extra Tree Regressor Medium Categorical, Continuous
K-d Neighbors Regressor Medium Depends on distance kernel
KNN Regressor Medium Depends on distance kernel
MLP Regressor High Continuous
Radius Neighbors Regressor Medium Depends on distance kernerl
Regression Tree Medium Categorical, Continuous
Ridge Low Continuous
SVR High Continuous

## Clusterers#

Clusterers are unsupervised learners that predict an integer-valued cluster number such as 0, 1, ..., n. They are similar to classifiers, however since they lack a supervised training signal, they cannot be used to recognize or describe samples. Instead, clusterers differentiate and group samples using only the information found within the structure of the samples without their labels.

Name Flexibility Proba Online Verbose Persistable Data Compatibility
DBSCAN High Depends on distance kernel
Fuzzy C Means Low Continuous
Gaussian Mixture Medium Continuous
K Means Low Continuous
Mean Shift Medium Continuous

## Anomaly Detectors#

Anomaly Detectors are unsupervised learners that predict whether a sample should be classified as an anomaly or not. We use the value 1 to indicate an outlier and 0 for a regular sample and the predictions can be cast to their boolean equivalent if needed. Anomaly detectors that implement the Scoring interface can output an anomaly score that can be used to sort the samples by their degree of anomalousness.

Name Scope Scoring Online Verbose Persistable Data Compatibility
Gaussian MLE Global Continuous
Isolation Forest Local Categorical, Continuous
Local Outlier Factor Local Depends on distance kernel
Loda Local Continuous
One Class SVM Global Continuous
Robust Z-Score Global Continuous