Random Hot Deck Imputer#

A method of imputation similar to KNN Imputer but instead of computing a weighted average of the neighbors' features, Random Hot Deck picks a value from the neighborhood randomly but sampled by distance. This makes Random Hot Deck Imputer slightly more computationally efficient while satisfying some balancing equations at the same time.

Note: NaN safe distance kernels, such as Safe Euclidean, are required for continuous features.

Interfaces: Transformer, Stateful

Data Type Compatibility: Depends on distance kernel


# Param Default Type Description
1 k 5 int The number of nearest neighbors to consider when imputing a value.
2 weighted true bool Should we use the inverse distances as confidence scores when imputing values?
3 placeholder '?' string The categorical placeholder denoting the category that contains missing values.
4 tree BallTree Spatial The spatial tree used to run nearest neighbor searches.

Additional Methods#

This transformer does not have any additional methods.


use Rubix\ML\Transformers\RandomHotDeckImputer;
use Rubix\ML\Graph\Trees\BallTree;
use Rubix\ML\Kernels\Distance\SafeEuclidean;

$transformer = new KNNImputer(20, true, '?', new BallTree(50, new SafeEuclidean()));


  • C. Hasler et al. (2015). Balanced k-Nearest Neighbor Imputation.