COSALPURE: Learning Concept from Group Images for Robust Co-Saliency

Jiayi Zhu1, Qing Guo2, Felix Juefei-Xu3, Yihao Huang4, Yang Liu4, Geguang Pu1

1East China Normal University, 2IHPC & CFAR, A*STAR,
3New York University, 4Nanyang Technological University
Teaser Image

Examples of COSALPURE and comparative results before and after purification.

Abstract

We propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement.

Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition.

We show that CosalPure can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly.

Method

Teaser Image

Group-image concept learning module:
utilizing a group of input images for learning the text-aligned embedding of common objects.

Teaser Image

Concept-guided diffusion purification module:
reconstructing the group images based on the learned concept.

Results

Teaser Image

It can be observed from the visual comparison that CosalPure does better than DiffPure and DDA in purification for co-salient object detection.

Ablation

Teaser Image

To validate the effect of the learned concepts on CoSOD results, we conduct ablation studies on Cosal2015 and CoSOD3k. "w/o concept inversion" represents only utilizing the continuous representation module and not applying the subsequent purification process. "w/ None concept" denotes passing a meaningless "None" as the concept during the purification. "w/ learned concept" denotes the complete pipeline.