Skip to content

PLAN-Lab/CALICO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models [CVPR 2025]

PLAN logo

PLAN Lab, University of Illinois Urbana-Champaign

Paper PDF Paper arXiv Model Dataset Project Website


📢 Latest Updates


CALICO Overview

CALICO is a pixel-grounded Large Vision-Language Model for part-focused semantic co-segmentation. Given multiple images, CALICO identifies, labels, and segments common objects, common object parts, and unique object parts. This enables fine-grained visual comparison across images rather than single-image segmentation alone.

CALICO task overview


🏆 Contributions

  • New Task. We introduce part-focused semantic co-segmentation: segmenting and labeling common objects, common parts, and unique parts across images.
  • CALICO Model. We propose a multi-image LVLM with a Correspondence Extraction Module and Correspondence Adaptation Modules for part-level reasoning.
  • Efficient Visual Tokens. CALICO uses a Q-Former visual interface to reduce image-token cost while preserving segmentation-grounded reasoning.
  • Mixed Parts Dataset. We curate Mixed Parts, a large-scale benchmark built from public part segmentation datasets with logically comparable object pairs.
  • Strong Results. CALICO outperforms adapted LVLM baselines on Mixed Parts while fine-tuning only about 0.3% of model parameters.

🚀 Dive Deeper: Code, Data, and Checkpoints

  • Installation: Environment setup, CUDA/PyTorch notes, and a smoke test.
  • Data Preparation: Instructions for preparing ADE20KPart234, PartImageNet, COCO2017, PACO-LVIS, and the Mixed Parts annotation bundle.
  • Evaluation: Official Mixed Parts evaluation command, outputs, metrics, and useful flags.
  • Training: Fine-tuning command, training flags, distributed launch notes, and resume behavior.
  • Model Checkpoint: Released merged CALICO checkpoint for evaluation and inference.
  • Mixed Parts Dataset: Released annotation bundle for training and evaluation.

😸 CALICO Architecture

CALICO combines multi-image LVLM reasoning with pixel-level segmentation. Images are encoded through an EVA-CLIP/Q-Former visual interface, text outputs include [SEG] tokens, and the corresponding token embeddings are decoded into segmentation masks by a SAM-based mask decoder. CALICO should be loaded through this repository's local model code using the released checkpoint, rather than through generic AutoModel loading.

Key components:

  • Q-Former visual interface: queries compact visual tokens from EVA-CLIP image features.
  • SAM mask decoder: decodes [SEG] token embeddings into segmentation masks.
  • Correspondence Extraction Module (CEM): extracts semantic correspondences between object parts across images.
  • Correspondence Adaptation Modules (CAMs): inject correspondence information into selected LLM layers.
  • LoRA adapters: fine-tune a small subset of the language model parameters.

CALICO architecture overview


🪑 Mixed Parts Dataset

Mixed Parts contains multi-image object-part comparison samples for three subtasks: common object co-segmentation, common part co-segmentation, and unique part segmentation. It is curated from ADE20KPart234, PartImageNet, and PACO-LVIS image assets. Prepare the dataset by following docs/DATA.md.

Mixed Parts dataset examples


⚡ Quick Start

Install CALICO with docs/INSTALL.md, prepare data with docs/DATA.md, then run evaluation on the official Mixed Parts test split:

python evaluate.py \
  --merged_ckpt_path PLAN-Lab/CALICO \
  --dataset_dir ./data \
  --output_save_path ./evaluate_results/calico_mixed_parts \
  --val_dataset "MixedPartsObjectVal|MixedPartsPartVal" \
  --multi_image_filepath_prefix ./data/mixed_parts_data/mixed_parts_test.json \
  --mode test \
  --compute_metrics

For more options, see docs/EVALUATION.md. For fine-tuning, see docs/TRAINING.md.


📜 Citation

@article{nguyen2025calico,
  title={CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models},
  author={Nguyen, Kiet A. and Juvekar, Adheesh and Yu, Tianjiao and Wahed, Muntasir and Lourentzou, Ismini},
  journal={In Proceedings for the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

🙏 Acknowledgement

We thank LLaVA, GLaMM, LISA, and SAM for releasing models and code that supported this project.

About

Official code repository for "CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models" (CVPR 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages