Bing Cai1, Xiaoli Wang2, Gui-Fu Lu3, Zechao Li1*
1Nanjing University of Science and Technology
2Nanjing Forestry University
3Anhui Polytechnic University
{bingcai, zechao.li}@njust.edu.cn, xiaoliwang@njfu.edu.cn, lu-guifu@ahpu.edu.cn
Multi-view contrastive clustering has emerged as a powerful paradigm for learning comprehensive representations from heterogeneous data sources. However, prevailing approaches typically overlook the intrinsic geometric and clustering structures, rendering them structure-agnostic. In this paper, we propose a novel framework that performs Multi-Hierarchical Contrastive Spectral Fusion (MCSF) to address these limitations. MCSF integrates deep spectral embedding into the encoder to preserve local manifold structure, guiding the learned representations to be clustering-friendly. To enhance cross-view consistency, MCSF introduces a multi-hierarchical contrastive loss jointly optimizing (1) view-specific structure preservation, (2) view-consensus alignment, and (3) consensus structure refinement. This mechanism enables the construction of an accurate and semantically consistent consensus representation, effectively fusing multi-view information and uncovering authentic cluster structures. Extensive experiments on benchmarks validate the effectiveness of multi-hierarchical contrastive spectral fusion in clustering accuracy and representation quality.
pytorch>=2.1.0
numpy>=1.23.0
scikit-learn>=1.5.2
munkres>=1.1.4/MCSF/main.py
If you have any question, please contact this e-mail: bingly@foxmail.com.
@inproceedings{cai2026multi,
title={Multi-Hierarchical Contrastive Spectral Fusion for Multi-View Clustering},
author={Cai, Bing and Wang, Xiaoli and Lu, Gui-Fu and Li, Zechao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={39617--39626},
year={2026}
}