Dual Progressive Transformations for Weakly Supervised Semantic Segmentation
Supplementary Material
I. Effectiveness of co-learning
We found that the multi-branch networks used by the existing methods are limited by a single kind of basic operation. Their different branches focus on similar regions, which makes them unable to learn more information from each other. Therefore, we propose CRT network using different operations.
To verify that different branches indeed benefit from our CRT method, we conduct experiments on the PASCAL VOC 2012 training set. In the experiments, we keep the same settings except for the hyperparameter , and train two multi-branch networks. After training, we performed a visual comparison in the CAM generation stage.
It can be seen from Figure 1 that when (note that is equivalent to independent training of two branches without any information exchange), both branches of the network have the problem of over-activation. This problem is especially severe on the Transformer branch. While , the over-activation problem is significantly suppressed.
II. Additional qualitative experiments
To further demonstrate the effectiveness of CRT, we conduct more visualization experiments on the PASCAL VOC 2012 training set, as shown in Figure 3 and the PASCAL VOC 2012 val set, as shown in Figure 2


