Random samples from classification dataset

Split a dataset by randomly drawing samples.

Parameters

Classification dataset [file]
Classification dataset pickle file with feature data X and target data y to draw from.
Number of samples per category [string]
Number of samples to draw from each category. Set a single value N to draw N samples for each category. Set a list of values N1, N2, … Ni, … to draw Ni samples for category i.
Draw with replacement [boolean]

Whether to draw samples with replacement.

Default: False

Draw proportional [boolean]

Whether to interprete number of samples N or Ni as percentage to be drawn from each category.

Default: False

Random seed [number]
The seed for the random generator can be provided.

Outputs

Output dataset [fileDestination]
Pickle file destination.Stores sampled data.
Output dataset complement [fileDestination]
Pickle file destination.Stores remaining data that was not sampled.

Command-line usage

>qgis_process help enmapbox:RandomSamplesFromClassificationDataset:

----------------
Arguments
----------------

dataset: Classification dataset
    Argument type:  file
    Acceptable values:
            - Path to a file
n: Number of samples per category
    Argument type:  string
    Acceptable values:
            - String value
replace: Draw with replacement
    Default value:  false
    Argument type:  boolean
    Acceptable values:
            - 1 for true/yes
            - 0 for false/no
proportional: Draw proportional
    Default value:  false
    Argument type:  boolean
    Acceptable values:
            - 1 for true/yes
            - 0 for false/no
seed: Random seed (optional)
    Argument type:  number
    Acceptable values:
            - A numeric value
outputDatasetRandomSample: Output dataset
    Argument type:  fileDestination
    Acceptable values:
            - Path for new file
outputDatasetRandomSampleComplement: Output dataset complement (optional)
    Argument type:  fileDestination
    Acceptable values:
            - Path for new file

----------------
Outputs
----------------

outputDatasetRandomSample: <outputFile>
    Output dataset
outputDatasetRandomSampleComplement: <outputFile>
    Output dataset complement