Hot Deck Imputer#
A hot deck is a set of complete donor samples that may be referenced when imputing a value for a missing feature value. Hot Deck Imputer first finds the k nearest donors to a sample that contains a missing value and then chooses a value at random from the neighborhood.
Note
Requires a NaN safe distance kernel such as Safe Euclidean for continuous features.
Interfaces: Transformer, Stateful, Persistable
Data Type Compatibility: Depends on distance kernel
Parameters#
# | Name | Default | Type | Description |
---|---|---|---|---|
1 | k | 5 | int | The number of nearest neighbor donors to consider when imputing a value. |
2 | weighted | false | bool | Should we use distances as weights when selecting a donor sample? |
3 | categoricalPlaceholder | '?' | string | The categorical placeholder denoting the category that contains missing values. |
4 | tree | BallTree | Spatial | The spatial tree used to run nearest neighbor searches. |
Example#
use Rubix\ML\Transformers\HotDeckImputer;
use Rubix\ML\Graph\Trees\BallTree;
use Rubix\ML\Kernels\Distance\SafeEuclidean;
$transformer = new HotDeckImputer(20, false, '?', new BallTree(50, new SafeEuclidean()));
Additional Methods#
This transformer does not have any additional methods.
References#
-
C. Hasler et al. (2015). Balanced k-Nearest Neighbor Imputation. ↩
Last update:
2021-04-11