Title: | Cluster Analysis with Trimming |
Version: | 0.2-0 |
VersionNote: | Released 0.1-6 on 2025-06-28 on CRAN |
Depends: | R (≥ 1.9.0) |
Imports: | tclust |
Suggests: | fpc |
Description: | Trimmed k-means clustering. The method is described in Cuesta-Albertos et al. (1997) <doi:10.1214/aos/1031833664>. |
Maintainer: | Valentin Todorov <valentin@todorov.at> |
License: | GPL (≥ 3) |
URL: | https://github.com/valentint/trimcluster |
BugReports: | https://github.com/valentint/trimcluster/issues |
Packaged: | 2025-07-16 20:55:24 UTC; valen |
Repository: | CRAN |
Date/Publication: | 2025-07-17 08:40:01 UTC |
NeedsCompilation: | no |
Author: | Christian Hennig [aut],
Valentin Todorov |
Trimmed k-means clustering
Description
The trimmed k-means clustering method by Cuesta-Albertos, Gordaliza and Matran (1997). This optimizes the k-means criterion under trimming a portion of the points.
Usage
trimkmeans(data,k,trim=0.1, scaling=FALSE,
runs=500, niter1=3, niter2=20, nkeep=5, points=NULL,
countmode, printcrit, maxit,
parallel=FALSE, n.cores=-1, trace=0, ...)
## S3 method for class 'tkm'
print(x, ...)
## S3 method for class 'tkm'
plot(x, data, ...)
Arguments
data |
matrix or data.frame with raw data |
k |
integer. Number of clusters. |
trim |
numeric between 0 and 1. Proportion of points to be trimmed. |
scaling |
logical. If |
runs |
The number of random initializations to be performed. |
niter1 |
The number of concentration steps to be performed for the nstart initializations. |
niter2 |
The maximum number of concentration steps to be performed for the
|
nkeep |
The number of iterated initializations (after niter1 concentration steps) with the best values in the target function that are kept for further iterations |
points |
|
countmode |
(deprecated) optional positive integer. Every |
printcrit |
(deprecated) logical. If |
maxit |
(deprecated, use the combination |
parallel |
A logical value, specifying whether the nstart initializations should be done in parallel. |
n.cores |
The number of cores to use when paralellizing, only taken into account if parallel=TRUE. |
trace |
Defines the tracing level, which is set to 0 by default. Tracing level 1 gives additional information on the stage of the iterative process. |
x |
object of class |
... |
further arguments to be transferred to |
Details
The function trimkmeans()
now calls the function tkmeans()
from
the package tclust
. This makes the procedure much faster since
(a) tkmeans()
is implemented in C++, (b) a new random initialization is introduced
(see the parameters niter1
, niter2
and nkeep
which replace
the previous maxit
and (c) it is posible to run the initialization in parallel
(see the argument parallel
and ncores
.
plot.tkm
calls plotcluster
if the
dimensionality of the data p
is 1, shows a scatterplot
with non-trimmed regions if p=2
and discriminant coordinates
computed from the clusters (ignoring the trimmed points) if p>2
.
Value
An object of class 'tkm' which is a LIST with components
classification |
integer vector coding cluster membership with trimmed
observations coded as |
means |
numerical matrix giving the mean vectors of the k classes. |
disttom |
vector of squared Euclidean distances of all points to the closest mean. |
ropt |
maximum value of |
k |
see above. |
trim |
see above. |
runs |
see above. |
scaling |
see above. |
Author(s)
Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche/
References
Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553-576.
See Also
Examples
set.seed(10001)
n1 <-60
n2 <-60
n3 <-70
n0 <-10
nn <- n1+n2+n3+n0
pp <- 2
X <- matrix(rep(0,nn*pp),nrow=nn)
ii <-0
for (i in 1:n1){
ii <-ii+1
X[ii,] <- c(5,-5)+rnorm(2)
}
for (i in 1:n2){
ii <- ii+1
X[ii,] <- c(5,5)+rnorm(2)*0.75
}
for (i in 1:n3){
ii <- ii+1
X[ii,] <- c(-5,-5)+rnorm(2)*0.75
}
for (i in 1:n0){
ii <- ii+1
X[ii,] <- rnorm(2)*8
}
tkm1 <- trimkmeans(X, k=3, trim=0.1, runs=5)
## runs=5 is used to save computing time; runs must be >= nkeep
print(tkm1)
plot(tkm1,X)