CoCoA-Mix

CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization

1Korea Advanced Institute of Science and Technology
ICML 2025

TL;DR

CoCoA-Mix enhances specialization and generalization in prompt tuning using CoA-loss for refined decision boundaries and CoA-weights for confidence-based scaling.

Abstract

Prompt tuning, which adapts vision-language models by freezing model parameters and optimizing only the prompt, has proven effective for task-specific adaptations. The core challenge in prompt tuning is improving specialization for a specific task and generalization for unseen domains. However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. Additionally, we mathematically demonstrate that a mixture model can enhance generalization without compromising specialization. This is achieved using confidence-aware weights (CoA-weights), which adjust the weights of each prediction in the mixture model based on its confidence within the class domains. Extensive experiments show that CoCoA-Mix, a mixture model with CoA-loss and CoA-weights, outperforms state-of-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/CoCoA-Mix.

Method

Description for image 1

The learnable prompt \( \boldsymbol{t}_i \) is optimized with CoA-loss to specialize in distinguishing confusing classes within the training domain. CoA-weights adjusts prediction confidence by increasing \(\pi_i^\text{in}\) for in-class and decreasing \(\pi_i^\text{out}\) for out-class. At inference, the specialized predictions \(\hat{p}_{\mathbf{t}_i}\), adjusted via CoA-weights, are combined with the generalized predictions \(\hat{p}_{\mathbf{t}_0}\) to ensure generalization while preserving specialization.

Experiment Results

CoCoA-Mix achieves average harmonic mean improvements of 15.28% and 3.28% over zero-shot CLIP in base-to-new generalization and cross-dataset transfer, respectively; it also improves the average accuracy in few-shot class-incremental learning by 5.6%p.

Poster

BibTeX

@article{hong2025cocoa,
  title={CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization},
  author={Hong, Dasol and Lee, Wooju and Myung, Hyun},
  journal={arXiv preprint arXiv:2506.07484},
  year={2025}
}