TF-IDF Transformer#
Term Frequency - Inverse Document Frequency is a measurement of how important a word is to a document. The TF-IDF value increases with the number of times a word appears in a document (TF) and is offset by the frequency of the word in the corpus (IDF).
Note
TF-IDF Transformer assumes that its inputs are token frequency vectors such as those created by Word Count Vectorizer.
Interfaces: Transformer, Stateful, Elastic, Reversible, Persistable
Data Type Compatibility: Continuous only
Parameters#
# | Name | Default | Type | Description |
---|---|---|---|---|
1 | smoothing | 1.0 | float | The amount of additive (Laplace) smoothing to add to the IDFs. |
2 | dampening | false | bool | Should we apply a sub-linear function to dampen the effect of recurring tokens? |
Example#
use Rubix\ML\Transformers\TfIdfTransformer;
$transformer = new TfIdfTransformer(2.0, true);
Additional Methods#
Return the document frequencies calculated during fitting:
public dfs() : ?array
References#
Last update:
2021-07-03