COSALPURE: Learning Concept from Group Images for Robust Co-Saliency

Abstract

We propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement.

Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition.

We show that CosalPure can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly.

Method

Group-image concept learning module:
utilizing a group of input images for learning the text-aligned embedding of common objects.

Concept-guided diffusion purification module:
reconstructing the group images based on the learned concept.

Results

It can be observed from the visual comparison that CosalPure does better than DiffPure and DDA in purification for co-salient object detection.

Ablation

To validate the effect of the learned concepts on CoSOD results, we conduct ablation studies on Cosal2015 and CoSOD3k. "w/o concept inversion" represents only utilizing the continuous representation module and not applying the subsequent purification process. "w/ None concept" denotes passing a meaningless "None" as the concept during the purification. "w/ learned concept" denotes the complete pipeline.