22email: {ntkhoa,jnggh,wkjeong}@korea.ac.kr, [email protected]
RLCorrector: Reinforced Proofreading for Cell-level Microscopy Image Segmentation
Abstract
Segmentation of nanoscale electron microscopy (EM) images is crucial but still challenging in connectomics research. One reason for this is that none of the existing segmentation methods are error-free, so they require proofreading, which is typically implemented as an interactive, semi-automatic process via manual intervention. Herein, we propose a fully automatic proofreading method based on reinforcement learning that mimics the human decision process of detection, classification, and correction of segmentation errors. We systematically design the proposed system by combining multiple reinforcement learning agents in a hierarchical manner, where each agent focuses only on a specific task while preserving dependency between agents. Furthermore, we demonstrate that the episodic task setting of reinforcement learning can efficiently manage a combination of merge and split errors concurrently presented in the input. We demonstrate the efficacy of the proposed system by comparing it with conventional proofreading methods over various testing cases.
Keywords:
Cell Segmentation Proofreading Reinforcement Learning.1 Introduction
Connectomics is a research field of investigating cellular-level neural connections in the brain [4]. Nanoscale electron microscopy (EM) images are typically used to resolve cell-level neuronal structures (e.g., dendritic spine necks and synapses) of only tens of nanometers in size [11]. Because the raw data size of EM serial sections of a small tissue sample can easily reach hundreds of terabytes, the need to develop high-throughput and automatic image processing algorithms has been growing in the past decade.

Recent advances in deep learning have demonstrated the significant potential in the high-throughput, automatic segmentation of connectome images. Many existing segmentation methods are based on pixel-level classification using convolutional neural networks (CNNs) and instance clustering [16, 2, 13]. However, such methods are not perfect and prone to errors, particularly when applied to real data, which requires manual proofreading by humans. Existing proofreading methods are primarily based on the interactive manual correction of either merge or split errors (see Figure 1) using an intuitive user interface and visualization [9, 12, 6]. Even with the support of such interactive tools, manual proofreading is a time-consuming and labor-intensive task, resulting in a bottleneck in the connectome analysis workflow.
To address the issues outlined above, fully automated proofreading approaches have been developed. Zung et al. [19] designed an automatic proofreading algorithm at the level of the neuron reconstruction. The authors assumed that split errors are rare in the initial segmentation result, so they formulated the algorithm to prune wrongly merged super-voxels iteratively. Haehn et al. [8] designed a ranking system from a the CNN-based error detector for cell-level segmentation. The above methods consist of multiple sub-tasks, such as detecting erroneous location, classifying error types, and correcting errors, applied in a prioritized brute-force manner. Moreover, there are many erroneous regions across the entire image, each of which requires application of different combination of above sub-tasks, making the proofreading process inefficient.
To address these issues, we propose a novel automatic proofreading method based on reinforcement learning (RL), RLCorrector, for cell-level microscopy image segmentation. The main motivation of this work stems from the following observation: Unlike general pixel-level image editing, our proofreading task can be regarded as an iterative decision-making process that consists of error location and error corrector selections. By designing the discrete action space and environment to model this decision process, the human proofreading process can be successfully mimicked by RL agents. To the best of our knowledge, this is the first RL-based proofreading system that operates fully automatically without human intervention, which can be a novel addition to the recent effort of using RL in various image processing problems [5, 18, 1, 17]. We show that our method outperforms conventional CNN-based methods in both error correction performance and execution time.
2 Method
2.1 System Overview

The proofreading process of RLCorrector consists of identifying a patch containing errors (the locator agent), selecting an error correction method (the selector agent), and correcting errors (the merger and the splitter agents) as depicted in Figure 2. The locator’s task is to identify an error location (i.e., a patch) on a coarse grid, and this identification is repeated until no more erroneous region are left. On the erroneous patch selected by the locator, the selector repeatedly chooses either the splitter or the merger to fix the error until no further error is found. As for the reward metric, we employ the circuit reconstruction from EM images (CREMI) score [3] from the MICCAI 2016 challenge. While evaluating the performance using our test set, the environment can also perform termination of an episode when there is no notable change on the label map over time.
2.2 Details of RL Agents
The architecture of the RL agent is based on the asynchronous advantage actor–critic method [14] shown in Figure 3. The input consists of three channels – an EM image, a point map, and a label map. The point map keeps track of previous action points, and the label map shows the current segmentation result after applying the error correcting actions. The neural network for the agent consists of one convolutional layer (orange arrow) followed by five residual units (green arrows) of full pre-activation [10] and a fully connected layer is added after flattening (violet arrow). The large blue box represents an output feature vector from a layer or residual unit, and the small blue box indicates a logit function. The output of the actor consists of many logits, whereas that of the critic has a single logit. The actor’s output vector size is the number of actions, which depends on each agent’s action space, as shown in Figure 3.
2.2.1 Locator.
We used a 7 7 two-dimensional (2D) grid to define a finite action space with 49 locations, where the selected point collects four adjacent squares with dashed lines (see Figure 2). We generated the point map by applying a Gaussian kernel to provide a decent amount of field-of-view to the neural network. As for the reward, we sought to evaluate the quality of merge and split error correction to encourage the selection of the patches with errors, and we measured the quality using a CREMI score. When an action is performed at time step , the selected patch and its label map are forwarded to the selector. When (the set of erroneous patches), a positive reward of 1 is received. is added to promote the locator to select the patch with errors (i.e., the CREMI score of the resulting label map returned from the selector is reduced). Training the stop signal is important because avoiding wrong corrections on patches with no errors is critical to the overall performance. Thus, we give a high reward of 2 when it properly stops and a penalty of when wrong corrections are made on the non-error area. There are two proper termination conditions during training time: One is when there are no further CREMI score improvements, whereas the other is when the CREMI score is very low because of a lack of errors in the selected patch. The entire reward function for the locator is described in Equation 1:
(1) |
(2) |

2.2.2 Selector.
The selector performs a series of selections among the merger, splitter, and stop actions. The agent launches a new episode to fix the corresponding errors of the selected agent. The stop action ends the current episode made by the selector, and the execution is returned to the locator. The input vector size to the agent is 128 128 2, as there is no point map. We scaled down the values for stop rewards to 1 and -1 to make the selector focuses more on the decision of selecting a proper error corrector. The positive reward is for , as no grid actions are involved. The whole reward function is shown in Equation 3. We set the maximum length of a selector episode to six to encourage exploration during training (four for the inference). See the supplementary for an example of a selector episode (Supp-Fig. 1). Each label map under the selector time step is the result of a merger or splitter episode. At , a wrong pair of segments is merged, but by applying another merger and a splitter episode, the errors are completely fixed. This demonstrates the manner in which the selector explores the action space and solves the problem.
(3) |
2.2.3 Splitter.
The splitter agent is similar to the locator except a finer grid space is employed (i.e., the total actions for the grid selection are 15 15). For a selected grid point, the watershed algorithm is applied to split the given segment into two. The altitude map for the watershed processing is generated by Gaussian smoothing applied on the EM image. (Note that the cell membrane is darker than other regions, which makes a natural boundary for watershed). We set the maximum episode length to six. (See Supp-Fig. 2 for an example of a splitter episode). Note that the label map is updated with new action at the end of and . The reward function of the splitter is similar to that of the locator except the stop signal:
(4) |
2.2.4 Merger.
The merger’s reward function is almost identical to that of the splitter, but the episode formulation of the merger is different because the merging operation requires at least one pair of neighboring segments. Each action selects a grid point, and a labeled segment by the selected point will be a part of a set of segments to be merged. We apply the merging operation once every two time steps (i.e., after selecting two grid points). An example of a single merge operation after two time steps of a merger episode is shown in the supplementary data (Supp-Fig. 3). A negative reward is given to prevent an incorrect merge that increases the CREMI score.
3 Results
3.1 Experiment setup
We compared our scheme with the method by Haehn et al. [8] with the CREMI dataset [3] which consists of three sets – A, B, and C – each of which comprises 125 slices of 1250 1250 in size. The first 92 slices were used for training, 23 for validating, and 10 for testing. Training was performed using the bottom-up approach. The merger and splitter were trained first, followed by the selector, and then the locator was trained at the end. For training the agents, we used the Adam optimizer and set the learning rate to . We used the asynchronous advantage actor–critic method [14] based on the source code from [7].
For a fair comparison with Haehn et al. [8], we shared our agent’s backbone network with Haehn’s proofreading classifier to give a similar model capacity. In addition, we set the input image size to be the same as that of our splitter and selector. For fully automated proofreading on a single image, we applied a sliding window method to Haehn’s proofreading algorithm.
3.2 Performance comparisons
3.2.1 Patch-level Performance.
The first experiment involved evaluating patch-level performance of the agents. We set the patch size as , and this was cropped centered at the randomly chosen grid-action point in the image level grid (). Table 1 shows the per-patch error correction performance measured on 1000 patches. We used three synthetically generated test sets with different error types – merge error only, split error only, and both errors. In our method, “Static” is the version without the selector, so that the merger and splitter are applied once in a pre-defined order, as in Haehn’s. The “Selector” is the full version of RLCorrector with a dynamic selection of error correctors by the selector agent. As shown in Table 1, our method significantly improves the error correction performance, even without the selector. (However, Haehn’s merger has an edge over ours with a tiny margin for the “split error only“ case). It should also be noted that our splitter (i.e., merge error corrector) is much stronger than Haehn’s. The performance of our method is improved further when the selector kicks in, which makes our method outperform Haehn’s method in all cases.
Test set | Haehn’s [8] | Ours | ||
---|---|---|---|---|
Static | Selector | |||
Merge error only | 0.173 | 0.081 | 0.032 | 0.028 |
Split error only | 0.152 | 0.019 | 0.020 | 0.014 |
Combined | 0.272 | 0.111 | 0.045 | 0.040 |
3.2.2 Image-level Performance.
In this experiment, we assessed the performance at the image level with real segmentation errors. For this, we used the attention U-Net [15] to generate the initial segmentation result on the CREMI dataset. Since the locator action space is 7 7 for a patch size (i.e., the input size of the error corrector) of 128 128 with a stride of 64, the input image is split into sub-images of size 512 512 with a stride of 256. In each sub-image, we applied a sliding-window scheme to fix errors except for our Locator-Selector because it can directly select the location of the erroneous patch. As shown in Table 2, all three variations of our method outperformed Haehn’s methods in terms of both error conrrection performance and speed. Especially, our method achieved a lower CREMI score by 26.5% on average in set A, while 54.6% in set B, and 50.7% in set C over Haehn’s method. Execution times were reduced by 95.7%, 94.7%, and 94.6% for our Locator-Selector, Sliding-Static, and Sliding-Selector methods, respectively . It should be noted that the selector contributes to improving error correction performance (low CREMI score), whereas the locator contributes to reducing the execution time over the static method, as we expected.
Test set | Haehn’s [8] | Ours | ||||
Sliding | Sliding | Sliding | Locator | |||
(Static) | (Selector) | (Selector) | ||||
CREMI A | Ave. score | 0.481 | 0.312 | 0.228 | 0.219 | 0.241 |
Ex. time (min.) | N/A | 324.5 | 19.3 | 18.6 | 14.9 | |
CREMI B | Ave. score | 1.429 | 0.535 | 0.270 | 0.250 | 0.208 |
Ex. time (min.) | N/A | 358.3 | 18.8 | 18.4 | 14.7 | |
CREMI C | Ave. score | 0.967 | 0.706 | 0.358 | 0.337 | 0.349 |
Ex. time (min.) | N/A | 388.9 | 19.7 | 19.7 | 16.7 |

3.2.3 Visual Assessment.
To have a better understanding of our method, we visually inspected how our selector and locator contribute to performance improvement. Figure 4 compares several example cases in which the error correctors behaved differently; these cases are described as follows:
-
•
Case 1: The locator spotted the erroneous areas correctly. In this area, the selector could choose agents adequately to fix the errors, whereas static methods (Haehn’s and our static method) failed. A more detailed step-by-step description can be found in the supplementary (see Supp-Fig. 4).
-
•
Case 2: The locator missed the error location so the error was not fixed, but the selector was able to fix the error.
-
•
Case 3: Some errors were newly created during the process (because of false error correction), but the sliding window methods could not fix them. However, selective choice of erroneous areas in proper order by the locator can avoid this issue.
-
•
Case 4: Another case in the CREMI C set when errors could not be fixed without help from the locator.
4 Conclusion and Future Work
Inspired by the human decision-making process, we introduced a novel, fully automatic proofreading method based on reinforcement learning for cell-level microscopy image segmentation. In this work, we modeled each task in the proofreading process using a reinforcement learning agent and hierarchically combined them to design a multi-agent system. We demonstrated that the dynamic nature of our system significantly improved segmentation performance while reducing execution time compared with conventional proofreading methods.
Despite the performance benefit of the proposed method, there is still room for improvement. Because of the coarse and discrete nature of the action space, handling small fragments is difficult. We plan to address this problem by employing a continuous action space. Furthermore, extension to three dimensions is another interesting future research direction.
4.0.1 Acknowledgements.
TBD
References
- [1] Araslanov, N., Rothkopf, C.A., Roth, S.: Actor-critic instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8237–8246 (2019)
- [2] Beier, T., Pape, C., Rahaman, N., Prange, T., Berg, S., Bock, D., Cardona, A., Knott, G.W., Plaza, S.M., Scheffer, L.K., Köthe, U., Kreshuk, A., Hamprecht, F.A.: Multicut brings automated neurite segmentation closer to human performance. Nature Methods 14, 101–102 (2017). https://doi.org/10.1038/nmeth.4151, http://rdcu.be/oVDQ
- [3] CREMI: MICCAI Challenge on Circuit Reconstruction from Electron Microscopy Images (2016), https://cremi.org
- [4] DeWeerdt, S.: How to map the brain. Nature 571(7766), S6–S6 (2019)
- [5] Furuta, R., Inoue, N., Yamasaki, T.: PixelRL: Fully Convolutional Network With Reinforcement Learning for Image Processing. IEEE Transactions on Multimedia 22(7), 1704–1719 (Jul 2020). https://doi.org/10.1109/TMM.2019.2960636, conference Name: IEEE Transactions on Multimedia
- [6] Gonda, F., Wang, X., Beyer, J., Hadwiger, M., Lichtman, J.W., Pfister, H.: VICE: Visual Identification and Correction of Neural Circuit Errors. arXiv:2105.06861 [cs] (May 2021), http://arxiv.org/abs/2105.06861, arXiv: 2105.06861
- [7] Griffis, D.: GPU/CPU Architecture of A3C (2019), https://github.com/dgriff777/rl_a3c_pytorch
- [8] Haehn, D., Kaynig, V., Tompkin, J., Lichtman, J.W., Pfister, H.: Guided proofreading of automatic segmentations for connectomics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9319–9328 (2018)
- [9] Haehn, D., Knowles-Barley, S., Roberts, M., Beyer, J., Kasthuri, N., Lichtman, J.W., Pfister, H.: Design and evaluation of interactive proofreading tools for connectomics. IEEE transactions on visualization and computer graphics 20(12), 2466–2475 (2014)
- [10] He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European conference on computer vision. pp. 630–645. Springer (2016)
- [11] Helmstaedter, M.: Cellular-resolution connectomics: challenges of dense neural circuit reconstruction. Nature methods 10(6), 501–7 (2013)
- [12] Knowles-Barley, S., Roberts, M., Kasthuri, N., Lee, D., Pfister, H., Lichtman, J.W.: Mojo 2.0: Connectome annotation tool. Frontiers in Neuroinformatics 60, 1 (2013)
- [13] Meirovitch, Y., Mi, L., Saribekyan, H., Matveev, A., Rolnick, D., Shavit, N.: Cross-classification clustering: An efficient multi-object tracking technique for 3-d instance segmentation in connectomics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
- [14] Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. pp. 1928–1937 (2016)
- [15] Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
- [16] Quan, T.M., Hildebrand, D.G.C., Jeong, W.K.: FusionNet: A Deep Fully Residual Convolutional Neural Network for Image Segmentation in Connectomics. Frontiers in Computer Science 3 (2021). https://doi.org/10.3389/fcomp.2021.613981, https://www.frontiersin.org/article/10.3389/fcomp.2021.613981
- [17] Tuan, T.A., Khoa, N.T., Quan, T.M., Jeong, W.K.: ColorRL: Reinforced Coloring for End-to-End Instance Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16727–16736 (June 2021)
- [18] Uzkent, B., Ermon, S.: Learning when and where to zoom with deep reinforcement learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
- [19] Zung, J., Tartavull, I., Lee, K., Seung, H.S.: An error detection and correction framework for connectomics. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 6821–6832. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (Dec 2017)