This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dual Progressive Transformations for Weakly Supervised Semantic Segmentation
Supplementary Material

Written by AAAI Press Staff1
AAAI Style Contributions by Pater Patel Schneider, Sunil Issar,
J. Scott Penberthy, George Ferguson, Hans Guesgen, Francisco Cruz\equalcontrib, Marc Pujol-Gonzalez\equalcontrib
With help from the AAAI Publications Committee.

I. Effectiveness of co-learning

We found that the multi-branch networks used by the existing methods are limited by a single kind of basic operation. Their different branches focus on similar regions, which makes them unable to learn more information from each other. Therefore, we propose CRT network using different operations.

To verify that different branches indeed benefit from our CRT method, we conduct experiments on the PASCAL VOC 2012 training set. In the experiments, we keep the same settings except for the hyperparameter λ\lambda, and train two multi-branch networks. After training, we performed a visual comparison in the CAM generation stage.

It can be seen from Figure 1 that when λ=0\lambda=0 (note that λ=0\lambda=0 is equivalent to independent training of two branches without any information exchange), both branches of the network have the problem of over-activation. This problem is especially severe on the Transformer branch. While λ0\lambda\neq 0, the over-activation problem is significantly suppressed.

II. Additional qualitative experiments

To further demonstrate the effectiveness of CRT, we conduct more visualization experiments on the PASCAL VOC 2012 training set, as shown in Figure 3 and the PASCAL VOC 2012 val set, as shown in Figure 2

Refer to caption
Figure 1: Independent training versus mutual learning. a) Input image; b) CAM of the CNN branch; c) CAM of the Transformer branch; d) CAM of the Transformer branch, using oC&M method.
Refer to caption
Figure 2: Visualization of pseudo-segmentation masks on the PASCAL VOC 2012 val set. a) Input image; b) Ground truth; c) CRT
Refer to caption
Figure 3: Visualization of pseudo-segmentation masks on the PASCAL VOC 2012 training set. a) Input image; b) Ground truth; c) IRNet; d) TS-CAM; e) CRT