Skip to content

[source]

Labeled#

A Labeled dataset is used to train supervised learners and for testing a model by providing the ground-truth. In addition to the standard dataset API, a labeled dataset can perform operations such as stratification and sorting the dataset using the label column.

Note

Since PHP silently converts integer strings (ex. '1') to integers in some circumstances, you should not use integer strings as class labels. Instead, use an appropriate non-integer string class name such as 'class 1', '#1', or 'first'.

Parameters#

# Name Default Type Description
1 samples array A 2-dimensional array consisting of rows of samples and columns with feature values.
2 labels array A 1-dimensional array of labels that correspond to each sample in the dataset.
2 verify true bool Should we verify the data?

Example#

use Rubix\ML\Datasets\Labeled;

$samples = [
    [0.1, 20, 'furry'],
    [2.0, -5, 'rough'],
    [0.01, 5, 'furry'],
];

$labels = ['not monster', 'monster', 'not monster'];

$dataset = new Labeled($samples, $labels);

Additional Methods#

Selectors#

Return the labels of the dataset in an array:

public labels() : array

Return a single label at the given row offset:

public label(int $offset) : mixed

Return all of the possible outcomes i.e. the unique labels in an array:

public possibleOutcomes() : array

print_r($dataset->possibleOutcomes());
Array
(
    [0] => female
    [1] => male
)

Data Types#

Return the data type of the label:

public labelType() : Rubix\ML\DataType

echo $dataset->labelType();
continuous

Stratification#

Group samples by their class label and return them in their own dataset:

public stratify() : array

$strata = $dataset->stratify();

Split the dataset into left and right subsets such that the proportions of class labels remain intact:

public stratifiedSplit($ratio = 0.5) : array

[$training, $testing] = $dataset->stratifiedSplit(0.8);

Return k equal size subsets of the dataset such that class proportions remain intact:

public stratifiedFold($k = 10) : array

$folds = $dataset->stratifiedFold(3);

Transform Labels#

Transform the labels in the dataset using a callback function and return self for method chaining:

public transformLabels(callable $fn) : self

Note

The callback function called for each individual label and should return the transformed label as a continuous or categorical value.

$dataset->transformLabels('intval');

//

$dataset->transformLabels(function ($label) {
    return $label > 0.5 ? 'yes' : 'no';
});

Describe by Label#

Describe the features of the dataset broken down by categorical label:

public describeByLabel() : Report

echo $dataset->describeByLabel();
{
    "not monster": [
        {
            "type": "categorical",
            "num categories": 2,
            "probabilities": {
                "friendly": 0.75,
                "loner": 0.25
            }
        },
        {
            "type": "continuous",
            "mean": 1.125,
            "variance": 12.776875,
            "standard deviation": 3.574475485997911,
            "skewness": -1.0795676577113944,
            "kurtosis": -0.7175867765792474,
            "min": -5,
            "25%": 0.6999999999999993,
            "median": 2.75,
            "75%": 3.175,
            "max": 4
        }
    ],
    "monster": [
        {
            "type": "categorical",
            "num categories": 2,
            "probabilities": {
                "loner": 0.5,
                "friendly": 0.5
            }
        },
        {
            "type": "continuous",
            "mean": -1.25,
            "standard deviation": 0.25,
            "skewness": 0,
            "kurtosis": -2,
            "min": -1.5,
            "25%": -1.375,
            "median": -1.25,
            "75%": -1.125,
            "max": -1
        }
    ]
}

Last update: 2021-06-06