[source]

Isolation Forest#

An ensemble of Isolation Trees that are each trained on a unique subset of the training set. Isolation Trees are a type of randomized decision tree that assign anomaly scores based on the depth a sample reaches when traversing the tree from root to leaf node. Anomalies are isolated into the shallowest leaf nodes and therefore receive the highest isolation scores. The Isolation Forest works by averaging the anomaly scores over a user-specified number of trees.

Interfaces: Estimator, Learner, Ranking, Persistable

Data Type Compatibility: Categorical, Continuous

Parameters#

# Param Default Type Description
1 estimators 100 int The number of isolation trees to train in the ensemble.
2 ratio null float The ratio of samples to train each estimator with. If null, the subsample size will be set to 256.
3 contamination null float The proportion of outliers that are assumed to be present in the training set. If null, the threshold anomaly score will be set to 0.5.

Additional Methods#

This estimator does not have any additional methods.

Example#

use Rubix\ML\AnomalyDetectors\IsolationForest;

$estimator = new IsolationForest(100, 0.2, 0.03);

References#

  • F. T. Liu et al. (2008). Isolation Forest.
  • F. T. Liu et al. (2011). Isolation-based Anomaly Detection.
  • M. Garchery et al. (2018). On the influence of categorical features in ranking anomalies using mixed data.