Most estimators have to be trained before they can make predictions. Estimators that require training are called Learners and implement the train() method among others. Training is the process of feeding data to a learner so that it can build an internal representation (or model) of the problem. Supervised learners such as classifiers and regressors require a dataset with labels to act as a training guide. Unsupervised learners such as clusterers and anomaly detectors can be trained with either a labeled or unlabeled dataset but only the samples are used for training. Every learner has a unique way of approaching the problem but no matter how the learner works under the hood the training API is the same.

To begin training a learner, pass a dataset object to the train() method on the learner instance like in the example below.



Batch vs Online Learning#

Batch learning is when a learner is trained in full using only one dataset in a single session. Calling the train() method on the learner instance is an example of batch learning. In contrast, online learning occurs when a learner is trained over multiple sessions with multiple datasets as small as a single sample each. Learners that are capable of being partially trained like this implement the Online interface which includes the partial() method for training in an online scheme. Subsequent calls to the partial() method will continue training where the learner left off since the last training session.


$folds = $dataset->fold(3);




Monitoring Progress#

Since training is often an iterative process, it is sometimes useful to obtain real-time feedback as to how the learner is progressing. For example, you may want to monitor the training loss to make sure that it isn't increasing instead of decreasing with training. Such early feedback can indicate model overfitting or improperly tuned hyper-parameters. Learners that implement the Verbose interface accept a PSR-3 logger instance that can be used to output training information at each time step (or epoch).

Rubix ML comes built-in with a Screen Logger that does the job for most cases.


use Rubix\ML\Other\Loggers\Screen;

$estimator->setLogger(new Screen('example'));