PuSH - Publikationsserver des Helmholtz Zentrums München

Behzadi, S.* ; Müller, N.S. ; Plant, C.* ; Böhm, C.*

Clustering of mixed-type data considering concept hierarchies: Problem specification and algorithm.

Int. J. Data Sci. Anal. 10, 233–248 (2020)
Verlagsversion DOI
Open Access Gold (Paid Option)
Creative Commons Lizenzvertrag
Most clustering algorithms have been designed only for pure numerical or pure categorical data sets, while nowadays many applications generate mixed data. It raises the question how to integrate various types of attributes so that one could efficiently group objects without loss of information. It is already well understood that a simple conversion of categorical attributes into a numerical domain is not sufficient since relationships between values such as a certain order are artificially introduced. Leveraging the natural conceptual hierarchy among categorical information, concept trees summarize the categorical attributes. In this paper, we introduce the algorithm ClicoT (CLustering mixed-type data Including COncept Trees) as reported by Behzadi et al. (Advances in Knowledge Discovery and Data Mining, Springer, Cham, 2019) which is based on the minimum description length principle. Profiting of the conceptual hierarchies, ClicoT integrates categorical and numerical attributes by means of a MDL-based objective function. The result of ClicoT is well interpretable since concept trees provide insights into categorical data. Extensive experiments on synthetic and real data sets illustrate that ClicoT is noise-robust and yields well-interpretable results in a short runtime. Moreover, we investigate the impact of concept hierarchies as well as various data characteristics in this paper.
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Review
Schlagwörter Information-theoretic Clustering ; Mixed-type Data
ISSN (print) / ISBN 2364-415X
e-ISSN 2364-4168
Quellenangaben Band: 10, Heft: , Seiten: 233–248 Artikelnummer: , Supplement: ,
Verlag Springer
Verlagsort Cham (ZG)