Autonomous Robotic Drilling System for Mice Cranial Window Creation: An Evaluation with an Egg Model
Abstract
Robotic assistance for experimental manipulation in the life sciences is expected to enable precise manipulation of valuable samples, regardless of the skill of the scientist. Experimental specimens in the life sciences are subject to individual variability and deformation, and therefore require autonomous robotic control. As an example, we are studying the installation of a cranial window in a mouse. This operation requires the removal of the skull, which is approximately 300 um thick, to cut it into a circular shape 8 mm in diameter, but the shape of the mouse skull varies depending on the strain of mouse, sex and week of age. The thickness of the skull is not uniform, with some areas being thin and others thicker. It is also difficult to ensure that the skulls of the mice are kept in the same position for each operation. It is not realistically possible to measure all these features and pre-program a robotic trajectory for individual mice. The paper therefore proposes an autonomous robotic drilling method. The proposed method consists of drilling trajectory planning and image-based task completion level recognition. The trajectory planning adjusts the z-position of the drill according to the task completion level at each discrete point, and forms the 3D drilling path via constrained cubic spline interpolation while avoiding overshoot. The task completion level recognition uses a DSSD-inspired deep learning model to estimate the task completion level of each discrete point. Since an egg has similar characteristics to a mouse skull in terms of shape, thickness and mechanical properties, removing the egg shell without damaging the membrane underneath was chosen as the simulation task. The proposed method was evaluated using a 6-DOF robotic arm holding a drill and achieved a success rate of 80% out of 20 trials.
I Introduction
A cranial window is a transparent observatory window created in the skull of a mouse through which scientists can observe the mouse brain. For example, human organoids are implanted into the mouse brain [1]. In this procedure, a microdrill is used to remove an 8 mm circular patch of the mouse skull under a microscope, which is then replaced with a cover glass. Any damage to the underlying brain will result in a failed procedure.
Many devices and robotic systems have been developed to assist in cranial window creation. Phuong et al. [2] developed a robotic stereotaxic platform for small rodents by combining a skull profiler sub-system and a six-degrees-of-freedom robotic platform. Ghanbari et al. [3] developed a cranial microsurgery platform (Craniobot), based on a modified desktop computer numerical controlled mill, which was used for precise microsurgical procedures. Pak et al. [4] implemented a cranial drilling robot which can judge whether the mouse cranium is penetrated using conductance measurements to perform automated craniotomies.Our group is exploring the concept of autonomously performing cranial windows using a robotic manipulator retrofitted with a micro-drill [5], as shown in Fig. 1, and cranial window installation is one of the target tasks to be demonstrated.
Obviously, it is not ethically acceptable to use mice to study autonomous robot control at this stage of the research, and so we have chosen to simulate the removal of the mouse skull without damaging the underlying dura by removing a chicken egg shell without damaging the underlying membrane. The characteristics of the egg shell are similar to those of the mouse skull, and in fact egg shell drilling is used for training in mouse surgery [6].
I-A Related works
In order to develop an autonomous robot drilling system, perception is an important topic. There have been many works on the estimation of drilling penetration and drilling status by different signals during drilling procedures of human bones. Hu et al. [7] , by analyzing the force signal during the screw-path-drilling process, and Dai et al. [8], by analyzing the electrical impedance signal in bone-drilling process, succeeded to estimate the relative position between the screw or drill and the bone being drilled and the drilling penetration respectively. Ying et al. [9] found that it is more effective to measure the force and sound signals during the drilling process. With the help of neural network, they achieved a high precision estimate of the progress of the drilling procedure.
Considering the similarities and differences between drilling human bones and mouse craniums, several works aimed to enhance the perception of robot in drilling surgery for mouse cranium. Pohl et al. [10] designed a module for assisting the craniotomy for mice, by measuring force and sound signals. The drilling penetration was estimated and the drill feed speed is controlled. Pak et al. [4] showed a penetration detection strategy using a measurement circuit that detects electrical conductance between the drill and the mouse. They reported that a sudden increase in the electrical conductance could indicate when the skull was penetrated.
The aforementioned works can only estimate the drilling completion progress at the point of contact between the drill and the skull, because the signals of force, sound, and vibration are only generated during contact. In this work, we aim to perceive the drilling completion progress of the entire drilled region at once. To do so, we rely on image processing using a neural network.Such network needs to serve two functions. First, the drilled area must be detected. Second, the pixel-wise completion level must be given. It is shown in [11] that the two tasks benefit from each other in terms of accuracy while training.
Related to those goals, fast R-CNN and faster R-CNN [12, 13] can be used to obtain the target’s bounding box using a CNN which is then filtered using non-maximum suppression. Based on it, mask R-CNN [14] extends faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. It offers high accuracy but has a comparatively slow inference speed. Moreover, the popular object detector Single Shot Detector (SSD) [15] classifies all boxes directly using a sliding window technique. To find items of varying sizes in a single forward pass, SSD builds a scale pyramid. Deconvolutional Single Shot Detector (DSSD) [16] is a network created on the foundation of SSD that combines a cutting-edge classifier (ResNet-101) with a quick detection framework (SSD), and then adds deconvolution layers to SSD+ResNet-101 to add additional large-scale context in object detection and recognition.
Considering that the DSSD architecture consists of deconvolutional layers, it is natural to try to do the semantic segmentation in the same network because the purpose of a sequence of deconvolutional layers is to increase the resolution of the output feature maps. Based on this characteristic, other network architectures [17, 18, 19, 11] also aim to simultaneously perform semantic segmentation and object detection. Our work is inspired by these previous attempts, where the task of semantic segmentation is replaced by the completion level map prediction.
I-B Statement of contributions
In this work, we develop an autonomous robotic drilling system with only image feedback. For this, we (1) propose a trajectory planning algorithm based on constrained splines that is adjusted in real time by the image feedback; (2) propose a neural network to detect the drilling area and obtain the pixel-wise completion level from the microscopic image to update the planner trajectory; and (3) evaluated our method in robotic drilling experiments using eggshells.
II Problem Statement
Consider the setup shown in Fig. 1, using one of the robotic arms of our robot platform for scientific exploration [5]. Let be the robotic arm (CVR038, Densowave, Japan) with joint values holding the micro drill (MD1200, Braintree Scientific, USA). The micro drill is used to drill the egg fixed by a clamping device. Images are obtained through a 4K microscope system (STC-HD853HDMI, Omron-Sentech, Japan) from above.
II-A Goal
Our goal is to autonomously drill the eggshell along a circular path with respect to with the robot system using only image as feedback.
II-B Premises
Our robot controller was designed under the following premises:
• The joint velocities of the robot, , are defined by the output of the controller (Premise I).
• The solution must always be feasible. That is, at worst the robots can stop moving (Premise II).
• The robot has a kinematic model that is precise enough for the task (Premise III).
III Proposed image-based autonomous drilling method
The overview of the proposed method is shown in Fig. 2. In our strategy, the drill is autonomously controlled along a trajectory planner (introduced in Section III-A) in the plane while the axis coordinate of each point is updated by the proposed image-based completion level recognition system (introduced in Section III-B).
III-A Trajectory Planning
Without loss of generality, let the goal be to plan a circular trajectory in the plane with the depth modulated by the image-based estimation algorithm.
We discretize the circular path into equal intervals resulting in the trajectory points , . In addition, let be constant while only varies on time. At , all the points have the same coordinate value . We modulate the velocity by which each trajectory point goes down, , using the completion level of each point estimated by the image.
With this, we expect to autonomously drill a circular patch without making further assumptions on the drilling surface. Then, we interpolate between the trajectory points to obtain a smooth trajectory using a constrained spline interpolation algorithm.
III-A1 Z-coordinate Value Calculation
Our proposal is to modulate the lowering velocity of each trajectory point as
(1) |
where is the initial downward velocity and is the drilling completion level at point .
The position is then calculated through simple integration with a sampling time as
(2) |
The completion level map gives us enough information to assume the completion level of points. We interpolate between them with a constrained cubic spline to obtain a smooth trajectory.
III-A2 Constrained Cubic Spline Interpolation
Cubic spline interpolation has been widely applied for path planning but it is, by itself, unsuitable for our application as it might overshoot between two trajectory points and cause rupture to the membrane (see Fig. 3). Instead, we propose the use of constrained spline interpolation by eliminating the requirement for equal second order derivatives at every point and replacing it with specified first order derivatives [20].
Consider any two contiguous trajectory points and , . Let be a free parameter to generate a third-degree polynomial curve between these two points. Let be the beginning of the curve, , and be the end of the curve, . Notice that when , is the last point on the trajectory so will be the curve between the last point and the first point . Then are defined by
(3) |
where
The first order derivative and second order derivative of the curve are
(4) |
and
(5) |
In order to calculate the parameters of the curve, we have three conditions. First, the curve must pass through points and , that is,
(6) |
Second, the first order derivative of the whole trajectory must be continuous, which means the first order derivatives must be the same at each interval end, that is,
(7) |
The third conditions for traditional traditional cubic spline interpolation and for constrained cubic spline interpolation are different.
For traditional cubic spline interpolation, the second order derivative of the whole trajectory must be continuous, which means the second order derivative on the left side of a point, , must be the same as that on the right side, . As a result, the generated trajectory is smooth enough but overshoot might occur.
For constrained cubic spline interpolation, we aim to solve the overshoot problem by sacrificing the smoothness so that the second order derivative doesn’t need to be continuous. Instead, the value of first order derivative at each point is known, that is,
(8) |
By combining (3), (4), (6), (7) and 8, we get
With the constrained cubic spline interpolation described above, we generate a curve as depicted in green in Fig. 3 and compare it with a traditional cubic spline interpolation in red. We can see in Fig. 3 that the green curve is enveloped by the tangent planes but red curve is not, hence solving the problem of overshoot.
III-B Completion Level Recognition
In the completion level recognition system, we aim to detect the drilling area and predict the drilling completion level simultaneously using a single neural network. It is known that detection and semantic segmentation benefit from each other in training.
III-B1 Network architecture
The overview of the architecture is shown in Fig. 4. The structure of the network is inspired by DSSD [16], which is a object detection network with an encoder-decoder structure that provides a reasonable trade-off between accuracy and speed. One notable difference is that we crop and concentrate the activation of the downscale stream and the upscale stream together as a completion level prediction branch.
Backbone of the network
The input image is resized to () and processed with a convolutional neural network in order to obtain a map that carries high-level features. ResNet-50 is applied as a feature extractor.
The feature extractor network ResNet-50 is followed by a series of convolutional, pooling layers, and convolutional layers.
Bounding-box detection branch
For bounding-box detection, we apply a similar approach to DSSD which adds several residual structures before using multi-scale feature maps for classification and regression. A single conv-skip connection is applied to process the output of feature layers and obtain the bounding box of the drilling area.
Completion prediction branch
In our completion prediction branch, the desired output is a () tensor where is the size of the output picture and 2 is the number of channels. In order to use the feature information of each layer as much as possible, we upsample the output of conv11, conv12, conv13, conv14 and conv15 all into the size of conv16, then concentrate them together and follow that by a convolutional layer to change the number of the channel into 2. Finally, we crop the 2-channel output according to the relative position of the bounding box and the overall input image and resize it into (). One channel refers to the drill channel and another one is the completion level map, whose grayscale of each pixel refers to the percentage of completion for this pixel.
III-B2 Training Dataset
Generally, ground truth images with full manual annotation are required for multi-task learning. In this work, we expect our network to output a bounding box of the drilling area and a completion level map. So it would be necessary to manually annotate the coordinates of the four corners of the bounding box for the drilling area and a cropped completion level map within the drilling area for the completion level prediction branch as ground truth.
To generate the completion level maps, one approach would be to manually annotate each pixel manually with a grayscale value ranging from 0 to 100. Considering the complexity of this approach, we propose a more treatable annotation strategy.
Given an image of eggshell drilling, first we define 6 classes for a semantic segmentation classifier: drill, drilled (background), penetrated, penetrated, penetrated, and penetrated. The percentage of penetration is defined subjectively with our experience in pilot studies. Second, in a higher level in the hierarchy, we separate the image into confidence maps. The first image is the confidence map for pixels of the drill, with 255 corresponding to the full confidence, so that we can filter it out. The second confidence map corresponds to the completion level map, where 0 corresponds to undrilled pixels and 100 corresponds to completely drilled pixels. . Lastly, taking advantage of the expected continuity in drilled regions, a Gaussian filter is applied to smooth the completion level map channel so that its greyscale values of pixels are continuous from 0 to 100.
A 4-dimensional vector () is used as the ground truth of the bounding box detection branch, which is generated from the completion level map channel by noting the minimum value and maximum value from width and height direction of those pixels whose grayscale values are not 0. After that, we crop both the completion level map channel and drill channel using the coordinate of the bounding box () and () and then resize them into . Merging two channels together we can get a matrix as the ground truth of the completion level prediction branch.
At the training stage, we used common image augmentation methods: random flip, rotate, random crop, and random change of brightness and contrast. We manually labeled 518 images and 16,576 after augmentation. In total, 13,260 images were used for training, 1,658 were used for validation and 1,658 were used for testing.
III-B3 Network training
Loss function
The loss function of our training procedure contains three parts: the loss function of bounding box detection for the drilling area, the loss function of the drill semantic segmentation for the drill, and the loss function of completion level prediction .
is a weighted sum of the localization loss and the confidence loss , given by
(9) |
where is the number of matched default boxes.If the number of matched default boxes is 0, we set as 1.
is a weighted sum of dice loss and boundary loss , given by
(10) |
where . We can see that as the training process progresses, the weight of continuously increases.
is L2 loss function which calculates the addition of the absolute value of the difference of each according pixel.
The overall loss function is given by
(11) |
We could not find noticeable improvements in preliminary experiments in which we tested different weights for the summation of loss functions.
Training condition
During the training stage, the images are trained in a total of 6,000 epochs with the Adam optimizer. The training rate is set as while is the number of epochs, which is changeable with the training process. Moreover, we used a batch size of 16 during the training stage. All the architectures were implemented in Python 3.8 using PyTorch 1.10.1 and CUDA 11.3 and executed in Ubuntu 20.04 with an NVIDIA RTX A5000 graphics card.
III-B4 Training Result and Evaluation
An example of output results is shown in Fig. 5. The network is able to output a bounding box for drilling area (Fig. 5-(a)) and the completion level map for the area (Fig. 5-(b)). The completion levels of parts that are drilled and not occluded by the drill are shown in the completion level map, while undrilled or occluded parts are suppressed. We can visualize the completion level map using heatmap (Fig. 5-(c)) where color is red if the completion level is higher and the color is blue if the completion level is blue. By adding and normalizing the according pixel value, we can easily calculate a value between 0 and 1 for each angle along a circle and generate a drilling progress bar (Fig. 5-(d)) based on the completion level map to show the drilling progress. It is important to note that the progress bar is generated by not only the completion level map in this moment but also those in the past because we expect the drilling progress of an area to be monotonically increasing. By setting discrete points evenly along the circle on the progress bar, the value of point , which is the completion level at point , is used to calculate the lowering velocity of drill by (1).
mAP (mean Average Precision) is applied as the evaluation metric of object detection task while MAPE (Mean Absolute Percentage Error) is applied as the evaluation metric of completion level prediction task. Our network architecture is able to reach up 78.5 in mAP for detection task and in MAPE for prediction task while the speed is . We expected the result to be , and real-time (), so our work is able to achieve the acquirement and have a good trade-off.
Parameters | |||
---|---|---|---|
Selected value | 8 | 30 |
IV Experiment
An experiment was conducted to evaluate the feasibility of the autonomous robotic drilling system.
The system is set up as shown in Fig. 1. An egg is fixed stably using a clamping device. The micro drill is held by a robot arm and a 4K camera is set vertically above the center of the egg to observe the drilling procedure. An air compressor with a silicone tube is applied to blow away the shell dust emitted from drilling the surface of the eggshell. Communication with the robot was enabled by the SmartArmStack111https://github.com/SmartArmStack. The quaternion algebra and robot kinematics were implemented using DQ Robotics [21] with Python3.
Before the experiment starts, the drill is teleoperated by an operator so that it is within the field-of-view of the camera throughout the entire circular path. We planned the drilling trajectory with the method mentioned in Sec. III-A.
We performed a total of 20 trials with 20 different eggs of random shape, size and thickness of shell. After the autonomous drilling algorithm stopped, we tried to remove the resected circular shell piece manually using tweezers. If the eggshell could be removed without damaging the membrane beneath the eggshell, the experiment was deemed successful. On the other hand, if the membrane broke during the drilling or removal procedure, or the eggshell was not easily removed, the experiment was counted as a failure. If the robot over-drilled the egg and ruptured the membrane, this would also be considered a failure.
IV-A Preparation
The parameters used in the experiment, namely the the initial speed of the drill in (1), the radius of the circular drilling path , and the frequency of the transmission of the calculated positions to the robot are summarized in Table I. Also, the drilling will stop autonomously when the system judges the drilling procedure as complete. This happens when of points on the drilling path achieve at least of completion level.
IV-B Results and discussion
In the experiment, 16 out of 20 cases were successful, resulting in a success ratio of . Fig. 6 shows a typical case of success where the drilling procedure will stop automatically when it is judged as complete and the circular patch can be removed perfectly without the membrane broke. The result can initially prove the feasibility of the autonomous robotic drilling scheme.
The average required time for successful drilling was , which is acceptable but we still expect it to be shorter considering the drilling time of 7 out of 16 cases of success is over . The main reason for the excessive drilling time could be a tilted initial pose of the egg. Considering the drill trajectory in the plane is a circle and the -coordinate of any point on the trajectory at the initial moment are the same, if the initial pose of the egg is tilted, the distance of any point on the drilling trajectory from the initial position to its projection on the outside of the eggshell differ much. As a result, the drill touches the eggshell at one point on the drilling trajectory sooner than another, even already reach completion level at one point and doesn’t touch the eggshell at another, resulting in a long drilling time. On the other hand, if the initial pose of the egg is not tilted, the drill touches any point of the circular trajectory almost at the same time and finishes drilling almost simultaneously, leading to a short drilling time.
A tilted initial pose of the egg may also cause drilling failure, as shown in Fig. 7-(a). The failure cases are due to the fact that the drill bit rubs the membrane surface despite retaining its z-position at points reaching a 100% completion level when the drill is trying to drill the other points that are not. Such failures accounted for 2 of the 4 total cases of failure. In this initial proposal, theautonomous drilling process was sensitive to the initial tilt of the eggs and this is being addressed as part of future work.
The reason for the other 2 cases of failure is that the areas whose completion level is already 100 are predicted as less than 100 by the prediction model, resulting in a continuous drop of the drill at those areas, rupturing the membrane, as shown in Fig. 7-(b). That refers that we should still refine the neural network for a higher recognition accuracy while being in real time.
Furthermore, the experiment result also demonstrated the possibility of generalizing the system for the mouse skull drilling experiments. It is to be noted that the trajectory planning block of the system can be promoted directly to the experiment on mice skull while transfer learning is additionally necessary for the generalizing of the completion level recognition block since the the training datasets differ.
V Conclusion
In this paper, we proposed a autonomous robotic drilling system for cranial window creation. To achieve this, a new algorithm for trajectory planning while avoiding overshoot is imposed. In order to adjust the generated trajectory in real-time, we applied a neural network to recognize the completion level of drilling in real-time. In the experiment, we showed that our robotic drilling system is able to achieve autonomous drilling at a reasonable success rate and speed with an egg model and demonstrates the possibility of the system on mice cranial window creation procedure.
Future works include increasing the drilling speed by improving the trajectory planner to accommodate for initially tilted eggs, improving the success rate of drilling by including multimodal information such as contact force, and the application of the system to drilling mouse skulls.
References
- [1] H. Koike, K. Iwasawa, R. Ouchi, M. Maezawa, K. Giesbrecht, N. Saiki, A. Ferguson, M. Kimura, W. L. Thompson, J. M. Wells, A. M. Zorn, and T. Takebe, “Modelling human hepato-biliary-pancreatic organogenesis from the foregut–midgut boundary,” Nature, vol. 574, no. 7776, pp. 112–116, sep 2019.
- [2] P. T. Ly, A. Lucas, S. H. Pun, A. Dondzillo, C. Liu, A. Klug, and T. C. Lei, “Robotic stereotaxic system based on 3d skull reconstruction to improve surgical accuracy and speed,” jan 2020.
- [3] L. Ghanbari, M. L. Rynes, J. Hu, D. S. Schulman, G. W. Johnson, M. Laroque, G. M. Shull, and S. B. Kodandaramaiah, “Craniobot: A computer numerical controlled robot for cranial microsurgeries,” Scientific Reports, vol. 9, no. 1, jan 2019.
- [4] N. Pak, J. H. Siegle, J. P. Kinney, D. J. Denman, T. J. Blanche, and E. S. Boyden, “Closed-loop, ultraprecise, automated craniotomies,” Journal of Neurophysiology, vol. 113, no. 10, pp. 3943–3953, jun 2015.
- [5] M. Marques Marinho, J. José Quiroz-Omaña, and K. Harada, “Design and validation of a multi-arm robotic platform for scientific exploration,” arXiv e-prints, pp. arXiv–2210, 2022.
- [6] L. Andreoli, H. Simplício, and E. Morya, “Egg model training protocol for stereotaxic neurosurgery and microelectrode implantation,” World Neurosurgery, vol. 111, pp. 243–250, 2018.
- [7] Y. Hu, H. Jin, L. Zhang, P. Zhang, and J. Zhang, “State recognition of pedicle drilling with force sensing in a robotic spinal surgical system,” IEEE/ASME Transactions on Mechatronics, vol. 19, no. 1, pp. 357–365, feb 2014.
- [8] Y. Dai, Y. Xue, and J. Zhang, “Drilling electrode for real-time measurement of electrical impedance in bone tissues,” Annals of Biomedical Engineering, vol. 42, no. 3, pp. 579–588, nov 2013.
- [9] Z. Ying, L. Shu, and N. Sugita, “Autonomous penetration perception for bone cutting during laminectomy,” in 2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob). IEEE, nov 2020.
- [10] B. M. Pohl, A. Schumacher, and U. G. Hofmann, “Towards an automated, minimal invasive, precision craniotomy on small animals,” in 2011 5th International IEEE/EMBS Conference on Neural Engineering. IEEE, apr 2011.
- [11] N. Dvornik, K. Shmelkov, J. Mairal, and C. Schmid, “Blitznet: A real-time deep network for scene understanding,” 2017.
- [12] R. Girshick, “Fast r-cnn,” 2015.
- [13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2015.
- [14] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” 2017.
- [15] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” 2015.
- [16] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd : Deconvolutional single shot detector,” 2017.
- [17] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014.
- [18] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” 2015.
- [19] I. Kokkinos, “Ubernet: Training a ‘universal’ convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory,” 2016.
- [20] C. Kruger, “Constrained cubic spline interpolation,” Chemical Engineering Applications, vol. 1, no. 1, 2003.
- [21] B. V. Adorno and M. M. Marinho, “DQ Robotics: a library for robot modeling and control,” IEEE Robotics and Automation Magazine (RAM), vol. 28, no. 3, pp. 102–116, Sep. 2021, invited for presentation at IROS’21. [Online]. Available: https://arxiv.org/pdf/1910.11612