Regex Filter#
Filters the text features of a dataset by matching and removing patterns from a list of regular expressions.
Note
Patterns are filtered in the same sequence as they are given in the constructor.
Interfaces: Transformer
Data Type Compatibility: Categorical
Parameters#
# | Name | Default | Type | Description |
---|---|---|---|---|
1 | patterns | array | A list of regular expression patterns used to filter the text columns of the dataset. |
Example#
use Rubix\ML\Transformers\RegexFilter;
$transformer = new RegexFilter([
RegexFilter::URL,
RegexFilter::MENTION,
'/(?<me>.+)/',
RegexFilter::EXTRA_CHARACTERS,
]);
Predefined Regex Patterns#
Class Constant | Description |
---|---|
A pattern to match any email address. | |
URL | An alias for the default URL matching pattern. |
GRUBER_1 | The original Gruber URL matching pattern. |
GRUBER_2 | The improved Gruber URL matching pattern. |
EXTRA_CHARACTERS | Matches consecutively repeated non word or number characters such as punctuation and special characters. |
EXTRA_WORDS | Matches consecutively repeated words. |
EXTRA_WHITESPACE | Matches consecutively repeated whitespace characters. |
MENTION | A pattern that matches Twitter-style mentions (@example). |
HASHTAG | Matches Twitter-style hashtags (#example). |
Additional Methods#
This transformer does not have any additional methods.
References:#
Last update:
2021-05-03