Random samples from regression dataset

Split a dataset by randomly drawing samples.

Usage:

  1. Open the algorithm from the processing toolbox.

  2. Select or create a regression dataset, specify the number stratification bins as well as the number of samples per bin, then click run.

    usr_section/usr_manual/processing_algorithms/dataset_creation/source/usr_section/usr_manual/processing_algorithms_includes/dataset_creation/img/random_samples_reg.png

Parameters

Regression dataset [file]

Regression dataset pickle file with feature data X and target data y to draw from.

Number of stratification bins [number]

Number of bins used to stratify the target range.

Default: 1

Number of samples per bin [string]

Number of samples to draw from each bin. Set a single value N to draw N samples for each bin. Set a list of values N1, N2, … Ni, … to draw Ni samples for bin i.

Draw with replacement [boolean]

Whether to draw samples with replacement.

Default: False

Draw proportional [boolean]

Whether to interprete number of samples N or Ni as percentage to be drawn from each bin.

Default: False

Random seed [number]

The seed for the random generator can be provided.

Outputs

Output dataset [fileDestination]

Pickle file destination.Stores sampled data.

Output dataset complement [fileDestination]

Pickle file destination.Stores remaining data that was not sampled.

Command-line usage

>qgis_process help enmapbox:RandomSamplesFromRegressionDataset:

----------------
Arguments
----------------

dataset: Regression dataset
    Argument type:  file
    Acceptable values:
            - Path to a file
bins: Number of stratification bins
    Default value:  1
    Argument type:  number
    Acceptable values:
            - A numeric value
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
n: Number of samples per bin
    Argument type:  string
    Acceptable values:
            - String value
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
replace: Draw with replacement
    Default value:  false
    Argument type:  boolean
    Acceptable values:
            - 1 for true/yes
            - 0 for false/no
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
proportional: Draw proportional
    Default value:  false
    Argument type:  boolean
    Acceptable values:
            - 1 for true/yes
            - 0 for false/no
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
seed: Random seed (optional)
    Argument type:  number
    Acceptable values:
            - A numeric value
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
outputDatasetRandomSample: Output dataset
    Argument type:  fileDestination
    Acceptable values:
            - Path for new file
outputDatasetRandomSampleComplement: Output dataset complement (optional)
    Argument type:  fileDestination
    Acceptable values:
            - Path for new file

----------------
Outputs
----------------

outputDatasetRandomSample: <outputFile>
    Output dataset
outputDatasetRandomSampleComplement: <outputFile>
    Output dataset complement