An unsupervised imputer that replaces missing values in a dataset with the distance-weighted average of the samples' k nearest neighbors' values. The average for a continuous feature column is defined as the mean of the values of each donor sample while average is defined as the most frequent for categorical features.
Note: Requires NaN safe distance kernels, such as Safe Euclidean, for continuous features.
Data Type Compatibility: Depends on distance kernel
|1||k||5||int||The number of nearest neighbors to consider when imputing a value.|
|2||weighted||true||bool||Should we use distances as weights when selecting a donor sample?|
|3||placeholder||'?'||string||The categorical placeholder denoting the category that contains missing values.|
|4||tree||BallTree||Spatial||The spatial tree used to run nearest neighbor searches.|
use Rubix\ML\Transformers\KNNImputer; use Rubix\ML\Graph\Trees\BallTee; use Rubix\ML\Kernels\Distance\SafeEuclidean; $transformer = new KNNImputer(10, false, '?', new BallTree(30, new SafeEuclidean()));
This transformer does not have any additional methods.
- O. Troyanskaya et al. (2001). Missing value estimation methods for DNA microarrays.