Skip to contents

Kmeans clustering is performed using add_clusters.

[Experimental]

Usage

cluster_tab(
  data,
  cols,
  newcol = NULL,
  k = NULL,
  method = "kmeans",
  labels = TRUE,
  clean = TRUE,
  ...
)

Arguments

data

A tibble.

cols

A tidy selection of item columns or a single column with cluster values as a factor. If the column already contains a cluster result from add_clusters, it is used, and other parameters are ignored. If no cluster result exists, it is calculated with add_clusters.

newcol

Name of the new cluster column as a character vector. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "cls_".

k

Number of clusters to calculate. Set to NULL to output a scree plot for up to 10 clusters and automatically choose the number of clusters based on the elbow criterion. The within-sums of squares for the scree plot are calculated by stats::kmeans.

method

The method as character value. Currently, only kmeans is supported. All items are scaled before performing the cluster analysis using base::scale.

labels

If TRUE (default) extracts labels from the attributes, see codebook.

clean

Prepare data by data_clean.

...

Placeholder to allow calling the method with unused parameters from tab_metrics.

Value

A volker list with with three volker tabs: cluster centers, cluster counts, and clustering diagnostics.

Examples

library(volker)
data <- volker::chatgpt

cluster_tab(data, starts_with("cg_adoption"), k = 2)
#> 
#> 
#> |Expectations                                                |     total| Cluster 1| Cluster 2|
#> |:-----------------------------------------------------------|---------:|---------:|---------:|
#> |ChatGPT has clear advantages compared to similar offerings. | 3.4 (1.0)| 3.8 (0.9)| 3.1 (1.0)|
#> |Using ChatGPT brings financial benefits.                    | 2.7 (1.2)| 3.4 (1.0)| 2.0 (0.9)|
#> |Using ChatGPT is advantageous in many tasks.                | 3.6 (1.1)| 4.0 (0.7)| 3.1 (1.2)|
#> |Compared to other systems, using ChatGPT is more fun.       | 3.5 (1.0)| 3.9 (0.8)| 3.2 (1.0)|
#> |Much can go wrong when using ChatGPT.                       | 3.1 (1.1)| 3.0 (1.1)| 3.2 (1.1)|
#> |There are legal issues with using ChatGPT.                  | 3.1 (1.2)| 3.2 (1.1)| 3.0 (1.2)|
#> |The security of user data is not guaranteed with ChatGPT.   | 3.2 (1.0)| 3.4 (1.0)| 3.0 (1.1)|
#> |Using ChatGPT could bring personal disadvantages.           | 2.7 (1.1)| 2.9 (1.2)| 2.6 (1.0)|
#> |In my environment, using ChatGPT is standard.               | 2.5 (1.1)| 3.4 (0.9)| 1.7 (0.6)|
#> |Almost everyone in my environment uses ChatGPT.             | 2.4 (1.2)| 3.3 (0.9)| 1.6 (0.6)|
#> |Not using ChatGPT is considered being an outsider.          | 2.0 (1.2)| 2.6 (1.3)| 1.4 (0.6)|
#> |Using ChatGPT brings me recognition from my environment.    | 2.3 (1.2)| 3.0 (1.3)| 1.7 (0.8)|
#> |n                                                           |        97|        47|        50|
#> 
#> 4 missing case(s) omitted.
#> 
#> 
#> 
#> |Cluster   |  n|    p|
#> |:---------|--:|----:|
#> |Cluster 1 | 47|  48%|
#> |Cluster 2 | 50|  52%|
#> |total     | 97| 100%|
#> 
#> n=97.
#> 
#> 
#> 
#> |Statistic                      |  Value|
#> |:------------------------------|------:|
#> |Within-Cluster Sum of Squares  | 910.04|
#> |Between-Cluster Sum of Squares | 241.96|