Fit StandardScaler

Standardize features by removing the mean and scaling to unit variance. The standard score of a sample x is calculated as: z = (x - u) / s where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False. Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform. Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance). For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

Usage:

  1. Start the algorithm from the Processing Toolbox panel.

  2. Select a raster layer to process and click run.

    ../../../../_images/stdscalar.png

Parameters

Transformer [string]

Scikit-learn python code. See StandardScaler for information on different parameters.

Default:

from sklearn.preprocessing import StandardScaler

transformer = StandardScaler()
Raster layer with features [raster]

Raster layer with feature data X used for fitting the transformer. Mutually exclusive with parameter: Training dataset

Sample size [number]

Approximate number of samples drawn from raster. If 0, whole raster will be used. Note that this is only a hint for limiting the number of rows and columns.

Default: 1000

Training dataset [file]

Training dataset pickle file used for fitting the transformer. Mutually exclusive with parameter: Raster layer with features

Outputs

Output transformer [fileDestination]

Pickle file destination.

Command-line usage

>qgis_process help enmapbox:FitStandardscaler:

----------------
Arguments
----------------

transformer: Transformer
    Default value:  from sklearn.preprocessing import StandardScaler

transformer = StandardScaler()
    Argument type:  string
    Acceptable values:
            - String value
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
featureRaster: Raster layer with features (optional)
    Argument type:  raster
    Acceptable values:
            - Path to a raster layer
sampleSize: Sample size (optional)
    Default value:  1000
    Argument type:  number
    Acceptable values:
            - A numeric value
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
dataset: Training dataset (optional)
    Argument type:  file
    Acceptable values:
            - Path to a file
outputTransformer: Output transformer
    Argument type:  fileDestination
    Acceptable values:
            - Path for new file

----------------
Outputs
----------------

outputTransformer: <outputFile>
    Output transformer