as soon as is submitted to ZB.
Factorizing complex discrete data “with Finesse”.
In: (IEEE International Conference on Data Mining (ICDM), 13 December 2016, Barcelona). 2016. 1-6 (Conf. Proc. IEE)
Can we mine latent patterns from discrete, nonnumeric heterogeneous data? Many modern data sets contain heterogeneous non-numerical information measured over Boolean, ordinal and ternary scales. Values for features like these are “mixable” in the sense that they have intuitive non-linear analogs to classical “addition” (e.g. logical OR for Boolean data). We present a novel, general and extensible matrix factorization framework for any such “mixable” features. The framework lets us support heterogeneous data and encourages us to deduce other interesting “mixable” features, like those which encapsulate sub-trees over an ontology. We present FINESSE, an algorithm with linear run-time complexity in the size of the data. FINESSE outperforms state-of-the-art techniques in the special cases in terms of effectiveness and efficiency, and yields insightful patterns from its novel application to large real-world heterogeneous data.
Edit extra informations Login
Publication type Article: Conference contribution
Institute(s) Institute of Computational Biology (ICB)