About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Ines Chami
- sl:arxiv_num : 2010.00402
- sl:arxiv_published : 2020-10-01T13:43:19Z
- sl:arxiv_summary : Similarity-based Hierarchical Clustering (HC) is a classical unsupervised
machine learning algorithm that has traditionally been solved with heuristic
algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete
optimization problem by introducing a global cost function measuring the
quality of a given tree. In this work, we provide the first continuous
relaxation of Dasgupta's discrete optimization problem with provable quality
guarantees. The key idea of our method, HypHC, is showing a direct
correspondence from discrete trees to continuous representations (via the
hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm
that maps leaf embeddings to a dendrogram), allowing us to search the space of
discrete binary trees with continuous optimization. Building on analogies
between trees and hyperbolic space, we derive a continuous analogue for the
notion of lowest common ancestor, which leads to a continuous relaxation of
Dasgupta's discrete objective. We can show that after decoding, the global
minimizer of our continuous relaxation yields a discrete tree with a (1 +
epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be
made arbitrarily small and controls optimization challenges. We experimentally
evaluate HypHC on a variety of HC benchmarks and find that even approximate
solutions found with gradient descent have superior clustering quality than
agglomerative heuristics or other gradient based algorithms. Finally, we
highlight the flexibility of HypHC using end-to-end training in a downstream
classification task.@en
- sl:arxiv_title : From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering@en
- sl:arxiv_updated : 2020-10-01T13:43:19Z
- sl:bookmarkOf : https://arxiv.org/abs/2010.00402
- sl:creationDate : 2020-10-03
- sl:creationTime : 2020-10-03T14:46:20Z
Documents with similar tags (experimental)