Skip-Gram#
Skip-grams are a technique similar to n-grams, whereby n-grams are formed but in addition to allowing adjacent sequences of words, the next k words will be skipped forming n-grams of the new forward looking sequences.
Parameters#
# | Name | Default | Type | Description |
---|---|---|---|---|
1 | n | 2 | int | The number of contiguous words to a single token. |
2 | skip | 2 | int | The number of words to skip over to form new n-gram sequences. |
Example#
use Rubix\ML\Extractors\Tokenizers\SkipGram;
$tokenizer = new SkipGram(2, 2);