PuSH - Publication Server of Helmholtz Zentrum München

Metric factorization for exploratory analysis of complex data.

In: Proceedings (IEEE International Conference on Data Mining (ICDM), 14 - 17 December 2014, Shenzhen, China). 2014. 510-519
as soon as is submitted to ZB.
How to explore complex data? Often, several representations for each data object are available, the data are described by attributes of heterogeneous data type and/or each data object is characterized by many features. It is difficult to choose a suitable similarity measure and an appropriate data mining technique to get an unbiased overview on the information contained in complex data. In this paper, we introduce Metric Factorization as a novel data mining task. The goal of Metric Factorization is to discover the major alternative views of complex data. Our novel algorithm MF extends matrix factorization techniques to support metric data. We do not need to choose a single similarity measure but can just input any available metric. Metric Factorization builds automatically interesting basis spaces from a large variety of input metrics. Due to metric properties, the basis spaces can be further explored with standard techniques like Multidimensional Scaling. We relate the Metric Factorization task to data compression and demonstrate how ideas from information theory (Minimum Description Length principle) make the parametrization of MF optional. We further introduce the idea of landmark points to effectively compress and thus support large data sets. Extensive experiments demonstrate the benefits of our approach.
Additional Metrics?
Edit extra informations Login
Publication type Article: Conference contribution
Reviewing status