This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Supplementary Material: FSL Framework to Reduce Inter-Observer Variability

I The ParESN Model

The Parallel ESN Framework presented in this work is inspired by the previous works in [sohiniesn] and [esn2]. The primary difference with the model in [sohiniesn] is the variation in importance for the hidden states with respect to a previous pixel in the same image vs. the same pixel in the previous image. With respect to [esn2], the parallel branches in this work generate similarly trained regional proposals (RPs) instead of a cumulative combination from the three parallel streams in [esn2]. Also, the choice of three parallel layers in this work is optimal amongst the choice of {2,3,4,5}\{2,3,4,5\} parallel RPs based on a grid search and leave-one-out cross validation across vendor image stacks. The detailed system setup is shown in Fig. 1.

Refer to caption
Figure 1: System setup of the proposed ParESN model. Each image and its 3 pre-processed planes are converted into input matrix 𝐔\mathbf{U} and fed to the 3 parallel ESN branches to update the reservoir state matrix 𝐗\mathbf{X} per branch. At the end of the training process, a cc-dimensional vector is output per pixel location, where cc represents the number of classes to be predicted (c=2c=2 for binary segmentation). The per-pixel output (p(k){P1,P2,P3p(k)\in\{P_{1},P_{2},P_{3}}) is the class label with maximal probability. Each regional proposals (RPs), i.e. P1,P2,P3P_{1},P_{2},P_{3} from the 3 parallel arms represent cyst-like pixels that are trained from the same images and similar training setups. To demonstrate qualitative overlap between the RPs, the RP from the top, middle and bottom ESN arms are visualized with the red, blue and green image planes respectively.

The OCT cyst images per vendor stack represent a volumetric level scan. This implies if a large cyst appears in a scan, there is a high probability of the same cyst appearing in some shape and form in the previous and succeeding scans as well. The proposed setup is analogous to the video processing setup in [sohiniesn]. Thus, the reservoir states per pixel location of subsequent images will be affected by the previous and current image, represented in (1).

𝐱ν(k)=(1α)𝐱ν(k1)\displaystyle\mathbf{x_{\nu}}(k)=(1-\alpha)\,\mathbf{x_{\nu}}(k-1) (1)
+αf(𝐖in,ν[1;𝐮(k)]+𝐖ν𝐱ν(k1)).\displaystyle+\alpha\,f\big{(}\mathbf{W}_{\mathrm{in,\nu}}\,[1;\mathbf{u}(k)]+\mathbf{W_{\nu}}\,\mathbf{x_{\nu}}(k-1)\big{)}.

At the end of the training stage, Wout,ν,ν={1,2,3}W_{out,\nu},\nu=\{1,2,3\} are computed for each parallel layer using (2).

𝐰out,ν=(l=1L𝐳ν,l(k)𝐳ν,lT(k)+λ 1)1(l=1L𝐳ν,l(k)y(k))\displaystyle\mathbf{w}_{\mathrm{out,\nu}}=\Big{(}\sum_{l=1}^{L}\mathbf{z}_{\nu,l}(k)\mathbf{z}^{\mathrm{T}}_{\nu,l}(k)+\lambda\,\mathbf{\mathds{1}}\Big{)}^{-1}\Big{(}\sum_{l=1}^{L}\mathbf{z}_{\nu,l}(k)y(k)\Big{)} (2)

where, 𝐳ν,𝐥(k)=[1;𝐮(k);𝐱ν(k)]\mathbf{z_{\nu,l}}(k)=[1;\mathbf{u}(k);\mathbf{x_{\nu}}(k)] are the extended system states for the 3 parallel layers, evaluated over l={1,2L}l=\{1,2\dots L\} training images, and y(k)y(k) represents the target label at pixel location kk, and 𝟙\mathbf{\mathds{1}} represents identity matrix.

The leave-one-out cross validation experiment across vendor stacks helps identify the optimal parameter set {α,λ}\{\alpha,\lambda\} in (1), (2), respectively. We observe α=[0.95]\alpha=[0.95] to be optimal from the search set of [0.30:0.99][0.30:0.99] in increments of 0.02. This implies very low “leaky-memory” requirement for the data set [DeepESN]. Also, the sensitivity to λ\lambda is found to be very low, with λ=105\lambda=10^{-5} being optimal for the setup in the search set of [10100.1][10^{-10}-0.1] in order increments of 10.

II Qualitative Analysis of Target Label Selection Algotithm (TLSA)

The key contribution of our work is the use of the RPs to detect the “best” manual annotation at image level. Examples of the TLSA for TL vs noisy TL selection and for G1G_{1} vs. G2G_{2} selection are shown below.

II-A Examples of TLSA against noisy labels

For this experiment the Random Crop and Paste function is invoked to generate noisy TLs. In Fig 2, the actual TLs are represented as red, noisy generations using RCAP as blue and their intersections as white regions, respectively. Also, the RPs P1,P2,P3P_{1},P_{2},P_{3} are represented in red, green and blue planes, respectively. In Fig. 2 qualitative explanations are provided for manual interventions and for automatic selection of actual TL over the noisy counterparts.

Refer to caption
(a) Left: TL and noisy TLs. Right: RPs from ParESN model. Very small difference between the TL and noisy TLs (left) and high variability in regional overlap between RPs (right) requires manual intervention in all these images.
Refer to caption
(b) Left: TL and noisy TLs. Right: RPs from ParESN model. In all these cases, a different RP is found to have the most overlap with TL or noisy TL, respectively. However, the mean overlap metric between the RPs and the TLs are the decisive metric to select the correct TL over its noisy counterparts in each case.
Figure 2: TL vs. noisy TL assessment.

II-B Examples of TLSA for best TL selection

In Fig. 3, qualitative examples of best TL selection are shown. In Fig. 3, the left columns represent the TLs such that G1G_{1} is in red, G2G_{2} in blue and their intersection is in white. The right columns represent the RPs in red, green and blue planes respectively. In Fig. 3(a), the Few-shot learning (FSL) models are trained on G1G_{1}, but the RPs agree more with G2G_{2}, and hence, G2G_{2} is selected as the best label for all these images. In Fig. 3(b), the FSL models are trained on G2G_{2}, but the RPs agree more with G1G_{1}, and hence G1G_{1} is selected as the best label for all these images. The examples in Fig. 3 demonstrate the extent of variabilities by manual target labels G1G_{1} and G2G_{2}.

For the images in Fig 3(a), G1G_{1} annotations represented many contiguous small cyst regions, based on which, RPs get trained to identify small contiguous cysts in test images. However, G1G_{1} annotates some larger cyst areas in the test images as opposed to G2G_{2} that detects smaller cysts. Thus, G2G_{2} is preferable in all such examples in Fig 3(a). For images in Fig. 3(b), the RPs trained on G2G_{2} have an affinity to detect large cysts that appear in G1G_{1}. Hence, in all these examples, G1G_{1} is the preferred TL in spite of the FSL models being trained on G2G_{2}. From the examples in Fig. 3, we are able to qualitatively assess the importance of the TLSA for standardizing cyst segmentation to overcome inter-observer variabilities.

Refer to caption
(a) Left: TLs. Right: RPs from ParESN model. RPs are trained on G1G_{1}, but agree more with G2G_{2} in blue.
Refer to caption
(b) Left: TLs. Right: RPs from ParESN model. RPs are trained on G2G_{2}, but agree more with G1G_{1} in red.
Figure 3: Assessment of best TL selection.