PuSH - Publication Server of Helmholtz Zentrum München

Factorizing complex discrete data “with Finesse”.

In: (IEEE International Conference on Data Mining (ICDM), 13 December 2016, Barcelona). 2016. 1-6 (Conf. Proc. IEE)
DOI
as soon as is submitted to ZB.
Can we mine latent patterns from discrete, nonnumeric heterogeneous data? Many modern data sets contain heterogeneous non-numerical information measured over Boolean, ordinal and ternary scales. Values for features like these are “mixable” in the sense that they have intuitive non-linear analogs to classical “addition” (e.g. logical OR for Boolean data). We present a novel, general and extensible matrix factorization framework for any such “mixable” features. The framework lets us support heterogeneous data and encourages us to deduce other interesting “mixable” features, like those which encapsulate sub-trees over an ontology. We present FINESSE, an algorithm with linear run-time complexity in the size of the data. FINESSE outperforms state-of-the-art techniques in the special cases in terms of effectiveness and efficiency, and yields insightful patterns from its novel application to large real-world heterogeneous data.
Altmetric
Additional Metrics?
Edit extra informations Login
Publication type Article: Conference contribution
Reviewing status