Hierarchical feature clustering

Evaluate feature multicollinearity by performing hierarchical/agglomerative clustering with Ward linkage using squared Spearman rank-order correlation as distance between features. The result report includes i) pairwise squared Spearman rank-order correlation matrix, ii) clustering dendrogram, iii) inter-cluster correlation distribution, iv) intra-cluster correlation distribution, and v) a clustering hierarchy table detailing selected cluster representatives for each cluster size n. For further analysis, all relevant results are also stored as a JSON sidecar file next to the report.

Usage:

  1. Open the algorithm from the processing toolbox.

  2. Select a classifier, then click run.

    ../../../../_images/feature_clustering.png
  3. The output report will automatically open in your web browser.

Parameters

Dataset [file]

Dataset pickle file with feature data X to be evaluated.

Do not report plots [boolean]

Skip the creation of plots, which can take a lot of time for large features sets.

Default: False

Open output report in webbrowser after running algorithm [boolean]

Whether to open the output report in the web browser.

Default: True

Outputs

Output report [fileDestination]

Report file destination.

Command-line usage

>qgis_process help enmapbox:HierarchicalFeatureClustering:

----------------
Arguments
----------------

dataset: Dataset
    Argument type:  file
    Acceptable values:
            - Path to a file
noPlot: Do not report plots
    Default value:  false
    Argument type:  boolean
    Acceptable values:
            - 1 for true/yes
            - 0 for false/no
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
openReport: Open output report in webbrowser after running algorithm
    Default value:  true
    Argument type:  boolean
    Acceptable values:
            - 1 for true/yes
            - 0 for false/no
            - field:FIELD_NAME to use a data defined value taken from the FIELD_NAME field
            - expression:SOME EXPRESSION to use a data defined value calculated using a custom QGIS expression
outputHierarchicalFeatureClustering: Output report
    Argument type:  fileDestination
    Acceptable values:
            - Path for new file

----------------
Outputs
----------------

outputHierarchicalFeatureClustering: <outputHtml>
    Output report