This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Autonomous Robotic Drilling System for Mice Cranial Window Creation: An Evaluation with an Egg Model

Enduo Zhao, Murilo M. Marinho, and Kanako Harada This work was supported by JST Moonshot R&D JPMJMS2033.(Corresponding author: Murilo M. Marinho)Enduo Zhao, Murilo M. Marinho, and Kanako Harada are with the Department of Mechanical Engineering, the University of Tokyo, Tokyo, Japan. Emails:{endowzhao1996, murilo, kanakoharada,}@g.ecc.u-tokyo.ac.jp.
Abstract

Robotic assistance for experimental manipulation in the life sciences is expected to enable precise manipulation of valuable samples, regardless of the skill of the scientist. Experimental specimens in the life sciences are subject to individual variability and deformation, and therefore require autonomous robotic control. As an example, we are studying the installation of a cranial window in a mouse. This operation requires the removal of the skull, which is approximately 300 um thick, to cut it into a circular shape 8 mm in diameter, but the shape of the mouse skull varies depending on the strain of mouse, sex and week of age. The thickness of the skull is not uniform, with some areas being thin and others thicker. It is also difficult to ensure that the skulls of the mice are kept in the same position for each operation. It is not realistically possible to measure all these features and pre-program a robotic trajectory for individual mice. The paper therefore proposes an autonomous robotic drilling method. The proposed method consists of drilling trajectory planning and image-based task completion level recognition. The trajectory planning adjusts the z-position of the drill according to the task completion level at each discrete point, and forms the 3D drilling path via constrained cubic spline interpolation while avoiding overshoot. The task completion level recognition uses a DSSD-inspired deep learning model to estimate the task completion level of each discrete point. Since an egg has similar characteristics to a mouse skull in terms of shape, thickness and mechanical properties, removing the egg shell without damaging the membrane underneath was chosen as the simulation task. The proposed method was evaluated using a 6-DOF robotic arm holding a drill and achieved a success rate of 80% out of 20 trials.

I Introduction

A cranial window is a transparent observatory window created in the skull of a mouse through which scientists can observe the mouse brain. For example, human organoids are implanted into the mouse brain [1]. In this procedure, a microdrill is used to remove an 8 mm circular patch of the mouse skull under a microscope, which is then replaced with a cover glass. Any damage to the underlying brain will result in a failed procedure.

Many devices and robotic systems have been developed to assist in cranial window creation. Phuong et al. [2] developed a robotic stereotaxic platform for small rodents by combining a skull profiler sub-system and a six-degrees-of-freedom robotic platform. Ghanbari et al. [3] developed a cranial microsurgery platform (Craniobot), based on a modified desktop computer numerical controlled mill, which was used for precise microsurgical procedures. Pak et al. [4] implemented a cranial drilling robot which can judge whether the mouse cranium is penetrated using conductance measurements to perform automated craniotomies.Our group is exploring the concept of autonomously performing cranial windows using a robotic manipulator retrofitted with a micro-drill [5], as shown in Fig.  1, and cranial window installation is one of the target tasks to be demonstrated.

Obviously, it is not ethically acceptable to use mice to study autonomous robot control at this stage of the research, and so we have chosen to simulate the removal of the mouse skull without damaging the underlying dura by removing a chicken egg shell without damaging the underlying membrane. The characteristics of the egg shell are similar to those of the mouse skull, and in fact egg shell drilling is used for training in mouse surgery [6].

I-A Related works

In order to develop an autonomous robot drilling system, perception is an important topic. There have been many works on the estimation of drilling penetration and drilling status by different signals during drilling procedures of human bones. Hu et al. [7] , by analyzing the force signal during the screw-path-drilling process, and Dai et al. [8], by analyzing the electrical impedance signal in bone-drilling process, succeeded to estimate the relative position between the screw or drill and the bone being drilled and the drilling penetration respectively. Ying et al. [9] found that it is more effective to measure the force and sound signals during the drilling process. With the help of neural network, they achieved a high precision estimate of the progress of the drilling procedure.

Considering the similarities and differences between drilling human bones and mouse craniums, several works aimed to enhance the perception of robot in drilling surgery for mouse cranium. Pohl et al. [10] designed a module for assisting the craniotomy for mice, by measuring force and sound signals. The drilling penetration was estimated and the drill feed speed is controlled. Pak et al. [4] showed a penetration detection strategy using a measurement circuit that detects electrical conductance between the drill and the mouse. They reported that a sudden increase in the electrical conductance could indicate when the skull was penetrated.

The aforementioned works can only estimate the drilling completion progress at the point of contact between the drill and the skull, because the signals of force, sound, and vibration are only generated during contact. In this work, we aim to perceive the drilling completion progress of the entire drilled region at once. To do so, we rely on image processing using a neural network.Such network needs to serve two functions. First, the drilled area must be detected. Second, the pixel-wise completion level must be given. It is shown in [11] that the two tasks benefit from each other in terms of accuracy while training.

Related to those goals, fast R-CNN and faster R-CNN [12, 13] can be used to obtain the target’s bounding box using a CNN which is then filtered using non-maximum suppression. Based on it, mask R-CNN [14] extends faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. It offers high accuracy but has a comparatively slow inference speed. Moreover, the popular object detector Single Shot Detector (SSD) [15] classifies all boxes directly using a sliding window technique. To find items of varying sizes in a single forward pass, SSD builds a scale pyramid. Deconvolutional Single Shot Detector (DSSD) [16] is a network created on the foundation of SSD that combines a cutting-edge classifier (ResNet-101) with a quick detection framework (SSD), and then adds deconvolution layers to SSD+ResNet-101 to add additional large-scale context in object detection and recognition.

Considering that the DSSD architecture consists of deconvolutional layers, it is natural to try to do the semantic segmentation in the same network because the purpose of a sequence of deconvolutional layers is to increase the resolution of the output feature maps. Based on this characteristic, other network architectures [17, 18, 19, 11] also aim to simultaneously perform semantic segmentation and object detection. Our work is inspired by these previous attempts, where the task of semantic segmentation is replaced by the completion level map prediction.

Refer to caption
4K Camera
Robotic Arm
Micro Drill
Egg
Clamp
Refer to caption
ı^\hat{\imath}
k^\hat{k}
ȷ^\hat{\jmath}
R1\mathcal{F}_{\text{R1}}
Refer to caption
Figure 1: The system setup used in this work, consisting of one of the robotic arms of our AI-robot platform for scientific exploration [5]. For this work, we used the arm with a micro drill as an end effector, a 4K camera, and a clamping device for the egg.

I-B Statement of contributions

In this work, we develop an autonomous robotic drilling system with only image feedback. For this, we (1) propose a trajectory planning algorithm based on constrained splines that is adjusted in real time by the image feedback; (2) propose a neural network to detect the drilling area and obtain the pixel-wise completion level from the microscopic image to update the planner trajectory; and (3) evaluated our method in robotic drilling experiments using eggshells.

Refer to caption
Robotic Drilling System
Completion Level Recognition
Drilling Trajectory Planning
Image taken by 4K camera
Completion level of key points
Coordinates of trajectory points
Refer to caption
Base Plane
Control Points
Cropped Image
Progress Bar
4K Camera Image
Refer to caption
4K Camera
6DOF Robot Arm
Micro Drill
Refer to caption
Figure 2: Block diagram of the autonomous robotic drilling system. The robotic system interacts with the egg and a 4K camera provides images to the completion level recognition block. The completion level recognition block outputs the depth of the points for the trajectory planning block. Lastly, the trajectory planning block uses the image information to update the trajectory and sends setpoints to the robotic system, closing the loop.

II Problem Statement

Consider the setup shown in Fig. 1, using one of the robotic arms of our robot platform for scientific exploration [5]. Let R1R1 be the robotic arm (CVR038, Densowave, Japan) with joint values 𝒒R16\boldsymbol{q}_{\text{R1}}\in\mathbb{R}^{6} holding the micro drill (MD1200, Braintree Scientific, USA). The micro drill is used to drill the egg fixed by a clamping device. Images are obtained through a 4K microscope system (STC-HD853HDMI, Omron-Sentech, Japan) from above.

II-A Goal

Our goal is to autonomously drill the eggshell along a circular path with respect to R1\mathcal{F}_{\text{R1}} with the robot system using only image as feedback.

II-B Premises

Our robot controller was designed under the following premises:

• The joint velocities of the robot, 𝒒˙6\dot{\boldsymbol{q}}\in\mathbb{R}^{6}, are defined by the output of the controller 𝒖R16\boldsymbol{u}_{\text{R1}}\in\mathbb{R}^{6} (Premise I).

• The solution 𝒖R1=06\boldsymbol{u}_{\text{R1}}=0_{\text{6}} must always be feasible. That is, at worst the robots can stop moving (Premise II).

• The robot has a kinematic model that is precise enough for the task (Premise III).

III Proposed image-based autonomous drilling method

The overview of the proposed method is shown in Fig. 2. In our strategy, the drill is autonomously controlled along a trajectory planner (introduced in Section III-A) in the xyx\text{--}y plane while the zz-axis coordinate of each point is updated by the proposed image-based completion level recognition system (introduced in Section III-B).

III-A Trajectory Planning

Without loss of generality, let the goal be to plan a circular trajectory in the xyx\text{--}y plane with the depth modulated by the image-based estimation algorithm.

We discretize the circular path into nn equal intervals resulting in the trajectory points 𝒑i(t)[px,ipy,ipz,i(t)]T3\boldsymbol{p}_{i}\left(t\right)\triangleq\begin{bmatrix}p_{x,i}&p_{y,i}&p_{z,i}\left(t\right)\end{bmatrix}^{\text{T}}\in\mathbb{R}^{3}, i=1,,ni=1,\cdots,n\in\mathbb{N}. In addition, let px,i,py,ip_{x,i},p_{y,i} be constant while only pz,i(t)p_{z,i}\left(t\right) varies on time. At t=0t=0, all the points have the same zz-coordinate value pz,i(0)=pz(0)ip_{z,i}\left(0\right)=p_{z}\left(0\right)~{}\forall~{}i. We modulate the velocity by which each trajectory point goes down, vz,i(t)=p˙z,i(t)v_{z,i}\left(t\right)=\dot{p}_{z,i}\left(t\right), using the completion level of each point estimated by the image.

With this, we expect to autonomously drill a circular patch without making further assumptions on the drilling surface. Then, we interpolate between the nn trajectory points to obtain a smooth trajectory using a constrained spline interpolation algorithm.

III-A1 Z-coordinate Value Calculation

Our proposal is to modulate the lowering velocity of each trajectory point as

vz,i(t)=\displaystyle v_{z,i}\left(t\right)= (1ci)vz(0),\displaystyle\left(1-c_{i}\right)v_{z}\left(0\right), (1)

where vz(0)v_{z}\left(0\right) is the initial downward velocity and 0ci10\leqslant c_{i}\leqslant 1 is the drilling completion level at point ii.

The position is then calculated through simple integration with a sampling time TT as

pz,i(t+T)=\displaystyle p_{z,i}\left(t+T\right)= pz,i(t)vz,i(t)T.\displaystyle p_{z,i}\left(t\right)-v_{z,i}\left(t\right)T. (2)

The completion level map gives us enough information to assume the completion level of nn points. We interpolate between them with a constrained cubic spline to obtain a smooth trajectory.

III-A2 Constrained Cubic Spline Interpolation

Cubic spline interpolation has been widely applied for path planning but it is, by itself, unsuitable for our application as it might overshoot between two trajectory points and cause rupture to the membrane (see Fig. 3). Instead, we propose the use of constrained spline interpolation by eliminating the requirement for equal second order derivatives at every point and replacing it with specified first order derivatives [20].

Consider any two contiguous trajectory points 𝒑i[px,ipy,ipz,i]T\boldsymbol{p}_{i}\triangleq\begin{bmatrix}p_{x,i}&p_{y,i}&p_{z,i}\end{bmatrix}^{\text{T}} and 𝒑i+1[px,i+1py,i+1pz,i+1]T\boldsymbol{p}_{i+1}\triangleq\begin{bmatrix}p_{x,i+1}&p_{y,i+1}&p_{z,i+1}\end{bmatrix}^{\text{T}}, i=1,,ni=1,\cdots,n\in\mathbb{N}. Let uu be a free parameter to generate a third-degree polynomial curve 𝒔i(u)\boldsymbol{s}_{i}\left(u\right) between these two points. Let u=0u=0 be the beginning of the curve, 𝒑i\boldsymbol{p}_{i}, and u=1u=1 be the end of the curve, 𝒑i+1\boldsymbol{p}_{i+1}. Notice that when i=ni=n, 𝒑n\boldsymbol{p}_{n} is the last point on the trajectory so 𝒔n(u)\boldsymbol{s}_{n}\left(u\right) will be the curve between the last point 𝒑n\boldsymbol{p}_{n} and the first point 𝒑1\boldsymbol{p}_{1}. Then 0u10\leqslant u\leqslant 1 are defined by

𝒔i(u)\displaystyle\boldsymbol{s}_{i}\left(u\right) =[sx,i(u)sy,i(u)sz,i(u)]T\displaystyle=\begin{bmatrix}s_{x,i}\left(u\right)&s_{y,i}\left(u\right)&s_{z,i}\left(u\right)\end{bmatrix}^{\text{T}}
=𝒂iT+𝒃iTu+𝒄iTu2+𝒅iTu3,\displaystyle=\boldsymbol{a}_{i}^{\text{T}}+\boldsymbol{b}_{i}^{\text{T}}u+\boldsymbol{c}_{i}^{\text{T}}u^{2}+\boldsymbol{d}_{i}^{\text{T}}u^{3}, (3)

where

𝒂iT=[ax,iay,iaz,i]T𝒃iT=[bx,iby,ibz,i]T𝒄iT=[cx,icy,icz,i]T𝒅iT=[dx,idy,idz,i]T.\begin{array}[]{c}\boldsymbol{a}_{i}^{\text{T}}=\begin{bmatrix}a_{x,i}&a_{y,i}&a_{z,i}\end{bmatrix}^{\text{T}}\\ \boldsymbol{b}_{i}^{\text{T}}=\begin{bmatrix}b_{x,i}&b_{y,i}&b_{z,i}\end{bmatrix}^{\text{T}}\\ \boldsymbol{c}_{i}^{\text{T}}=\begin{bmatrix}c_{x,i}&c_{y,i}&c_{z,i}\end{bmatrix}^{\text{T}}\\ \boldsymbol{d}_{i}^{\text{T}}=\begin{bmatrix}d_{x,i}&d_{y,i}&d_{z,i}\end{bmatrix}^{\text{T}}\end{array}.

The first order derivative and second order derivative of the curve are

𝒔i(u)=[sx,isy,isz,i(u)]T=𝒃iT+2𝒄iTu+3𝒅iTu2\boldsymbol{s}_{i}^{\prime}\left(u\right)=\begin{bmatrix}s_{x,i}^{\prime}&s_{y,i}^{\prime}&s_{z,i}^{\prime}\left(u\right)\end{bmatrix}^{\text{T}}=\boldsymbol{b}_{i}^{\text{T}}+2\boldsymbol{c}_{i}^{\text{T}}u+3\boldsymbol{d}_{i}^{\text{T}}u^{2} (4)

and

𝒔i′′(u)=[sx,i′′sy,i′′sz,i′′(u)]T=2𝒄iT+6𝒅iTu.\boldsymbol{s}_{i}^{\prime\prime}\left(u\right)=\begin{bmatrix}s_{x,i}^{\prime\prime}&s_{y,i}^{\prime\prime}&s_{z,i}^{\prime\prime}\left(u\right)\end{bmatrix}^{\text{T}}=2\boldsymbol{c}_{i}^{\text{T}}+6\boldsymbol{d}_{i}^{\text{T}}u. (5)

In order to calculate the parameters of the curve, we have three conditions. First, the curve must pass through points 𝒑i\boldsymbol{p}_{i} and 𝒑i+1\boldsymbol{p}_{i+1}, that is,

𝒔i(0)=[px,ipy,ipz,i]T𝒔i(1)=[px,i+1py,i+1pz,i+1]T.\begin{array}[]{c}\boldsymbol{s}_{i}\left(0\right)=\begin{bmatrix}p_{x,i}&p_{y,i}&p_{z,i}\end{bmatrix}^{\text{T}}\\ \boldsymbol{s}_{i}\left(1\right)=\begin{bmatrix}p_{x,i+1}&p_{y,i+1}&p_{z,i+1}\end{bmatrix}^{\text{T}}\end{array}. (6)

Second, the first order derivative of the whole trajectory must be continuous, which means the first order derivatives must be the same at each interval end, that is,

𝒔i(1)=𝒔i+1(0).\boldsymbol{s}_{i}^{\prime}\left(1\right)=\boldsymbol{s}_{i+1}^{\prime}\left(0\right). (7)

The third conditions for traditional traditional cubic spline interpolation and for constrained cubic spline interpolation are different.

For traditional cubic spline interpolation, the second order derivative of the whole trajectory must be continuous, which means the second order derivative on the left side of a point, 𝒔i′′(1)\boldsymbol{s}_{i}^{\prime\prime}\left(1\right), must be the same as that on the right side, 𝒔i+1′′(0)\boldsymbol{s}_{i+1}^{\prime\prime}\left(0\right). As a result, the generated trajectory is smooth enough but overshoot might occur.

For constrained cubic spline interpolation, we aim to solve the overshoot problem by sacrificing the smoothness so that the second order derivative doesn’t need to be continuous. Instead, the value of first order derivative at each point is known, that is,

𝒔i(0)=[px,ipy,ipz,i]T𝒔i(1)=[px,i+1py,i+1pz,i+1]T.\begin{array}[]{c}\boldsymbol{s}_{i}^{\prime}\left(0\right)=\begin{bmatrix}p_{x,i}^{\prime}&p_{y,i}^{\prime}&p_{z,i}^{\prime}\end{bmatrix}^{\text{T}}\\ \boldsymbol{s}_{i}^{\prime}\left(1\right)=\begin{bmatrix}p_{x,i+1}^{\prime}&p_{y,i+1}^{\prime}&p_{z,i+1}^{\prime}\end{bmatrix}^{\text{T}}\end{array}. (8)

By combining (3), (4), (6), (7) and 8, we get

𝒂iT=[px,ipy,ipz,i],𝒃iT=[px,ipy,ipz,i],\boldsymbol{a}_{i}^{\text{T}}=\left[\begin{array}[]{c}p_{x,i}\\ p_{y,i}\\ p_{z,i}\end{array}\right],\,\boldsymbol{b}_{i}^{\text{T}}=\left[\begin{array}[]{c}p_{x,i}^{\prime}\\ p_{y,i}^{\prime}\\ p_{z,i}^{\prime}\end{array}\right],
𝒄iT=[3(px,i+1px,i)(px,i+1+2px,i)3(py,i+1py,i)(py,i+1+2py,i)3(pz,i+1pz,i)(pz,i+1+2pz,i)],\boldsymbol{c}_{i}^{\text{T}}=\left[\begin{array}[]{c}3\left(p_{x,i+1}-p_{x,i}\right)-\left(p_{x,i+1}^{\prime}+2p_{x,i}^{\prime}\right)\\ 3\left(p_{y,i+1}-p_{y,i}\right)-\left(p_{y,i+1}^{\prime}+2p_{y,i}^{\prime}\right)\\ 3\left(p_{z,i+1}-p_{z,i}\right)-\left(p_{z,i+1}^{\prime}+2p_{z,i}^{\prime}\right)\end{array}\right],
𝒅iT=[2(px,ipx,i+1)+px,i+1+px,i2(py,ipy,i+1)+py,i+1+py,i2(pz,ipz,i+1)+pz,i+1+pz,i].\boldsymbol{d}_{i}^{\text{T}}=\left[\begin{array}[]{c}2\left(p_{x,i}-p_{x,i+1}\right)+p_{x,i+1}^{\prime}+p_{x,i}^{\prime}\\ 2\left(p_{y,i}-p_{y,i+1}\right)+p_{y,i+1}^{\prime}+p_{y,i}^{\prime}\\ 2\left(p_{z,i}-p_{z,i+1}\right)+p_{z,i+1}^{\prime}+p_{z,i}^{\prime}\end{array}\right].

With the constrained cubic spline interpolation described above, we generate a curve as depicted in green in Fig. 3 and compare it with a traditional cubic spline interpolation in red. We can see in Fig. 3 that the green curve is enveloped by the tangent planes but red curve is not, hence solving the problem of overshoot.

Refer to caption
Traditional Cubic Spline
Constrained Cubic Spline
Given Points
Refer to caption
p1p_{1}
p2p_{2}
Plane 1
Plane 2
Refer to caption
Figure 3: Generated curve with n=30n=30 points along the circular path. The blue points are the sampling points, the red line is the result of traditional spline interpolation, and the green line is the result of constrained spline interpolation. We can see the red line overshoots while the green does not.

III-B Completion Level Recognition

In the completion level recognition system, we aim to detect the drilling area and predict the drilling completion level simultaneously using a single neural network. It is known that detection and semantic segmentation benefit from each other in training.

III-B1 Network architecture

Refer to caption
ResNet-50
through Conv5_\_3 layer
Imput image
(300×\times300)
Conv4_\_3
(38×\times38)
Conv16
(38×\times38)
Conv6
(19×\times19)
Conv7
(19×\times19)
Conv15
(19×\times19)
Conv8
(10×\times10)
Conv9
(5×\times5)
Conv13
(5×\times5)
Conv14
(10×\times10)
Conv10
(3×\times3)
Conv12
(3×\times3)
Conv11
(1×\times1)
×\times2
×\times4
×\times8
×\times16
×\times32
Bounding box detection
Completion level map
(128×\times128×\times2)
Figure 4: The overall network architecture for the drilling area detection and completion level prediction. On the left, ResNet-50 is applied as a feature extractor. It is followed by the downscale-stream and the upscale-stream, which consist of a sequence of deconvolution layers interleaved with conv-skip connection. The localization and classification of bounding boxes (top) and pixelwise completion level map (bottom) are performed in a multiscale fashion by single convolutional layers operating on the output of deconvolution layers.

The overview of the architecture is shown in Fig. 4. The structure of the network is inspired by DSSD [16], which is a object detection network with an encoder-decoder structure that provides a reasonable trade-off between accuracy and speed. One notable difference is that we crop and concentrate the activation of the downscale stream and the upscale stream together as a completion level prediction branch.

Backbone of the network

The input image is resized to (300×300300\times 300) and processed with a convolutional neural network in order to obtain a map that carries high-level features. ResNet-50 is applied as a feature extractor.

The feature extractor network ResNet-50 is followed by a series of convolutional, pooling layers, and convolutional layers.

Bounding-box detection branch

For bounding-box detection, we apply a similar approach to DSSD which adds several residual structures before using multi-scale feature maps for classification and regression. A single conv-skip connection is applied to process the output of feature layers and obtain the bounding box of the drilling area.

Completion prediction branch

In our completion prediction branch, the desired output is a (128×128×2128\times 128\times 2) tensor where 128×128128\times 128 is the size of the output picture and 2 is the number of channels. In order to use the feature information of each layer as much as possible, we upsample the output of conv11, conv12, conv13, conv14 and conv15 all into the size of conv16, then concentrate them together and follow that by a convolutional layer to change the number of the channel into 2. Finally, we crop the 2-channel output according to the relative position of the bounding box and the overall input image and resize it into (128×128×2128\times 128\times 2). One channel refers to the drill channel and another one is the completion level map, whose grayscale (01)(0\text{\textendash}1) of each pixel refers to the percentage of completion for this pixel.

III-B2 Training Dataset

Generally, ground truth images with full manual annotation are required for multi-task learning. In this work, we expect our network to output a bounding box of the drilling area and a completion level map. So it would be necessary to manually annotate the coordinates of the four corners of the bounding box for the drilling area and a cropped completion level map within the drilling area for the completion level prediction branch as ground truth.

To generate the completion level maps, one approach would be to manually annotate each pixel manually with a grayscale value ranging from 0 to 100. Considering the complexity of this approach, we propose a more treatable annotation strategy.

Given an image of eggshell drilling, first we define 6 classes for a semantic segmentation classifier: drill, 0%0\,\% drilled (background), 25%25\,\mathrm{\%} penetrated, 50%50\,\% penetrated, 75%75\,\% penetrated, and 100%100\,\% penetrated. The percentage of penetration is defined subjectively with our experience in pilot studies. Second, in a higher level in the hierarchy, we separate the image into confidence maps. The first image is the confidence map for pixels of the drill, with 255 corresponding to the full confidence, so that we can filter it out. The second confidence map corresponds to the completion level map, where 0 corresponds to undrilled pixels and 100 corresponds to completely drilled pixels. . Lastly, taking advantage of the expected continuity in drilled regions, a Gaussian filter is applied to smooth the completion level map channel so that its greyscale values of pixels are continuous from 0 to 100.

A 4-dimensional vector (x1,y1,x2,y2x_{1},\,y_{1},\,x_{2},\,y_{2}) is used as the ground truth of the bounding box detection branch, which is generated from the completion level map channel by noting the minimum value and maximum value from width and height direction of those pixels whose grayscale values are not 0. After that, we crop both the completion level map channel and drill channel using the coordinate of the bounding box (x1,y1x_{1},\,y_{1}) and (x2,y2x_{2},\,y_{2}) and then resize them into 128×128128\times 128. Merging two channels together we can get a 128×128×2128\times 128\times 2 matrix as the ground truth of the completion level prediction branch.

At the training stage, we used common image augmentation methods: random flip, rotate, random crop, and random change of brightness and contrast. We manually labeled 518 images and 16,576 after augmentation. In total, 13,260 images were used for training, 1,658 were used for validation and 1,658 were used for testing.

Refer to caption
(c)
(a)
(b)
Refer to caption
(d)
Figure 5: Output image of the network. (a) Result of bounding box detection; (b) The completion level map; (c) The heatmap of completion level map for visualization; (d) The progress bar for visualization.

III-B3 Network training

Loss function

The loss function of our training procedure contains three parts: the loss function of bounding box detection LbbL_{\text{bb}} for the drilling area, the loss function of the drill semantic segmentation LdrillL_{\text{drill}} for the drill, and the loss function of completion level prediction LcomL_{\text{com}}.

LbbL_{\text{bb}} is a weighted sum of the localization loss LlocL_{\text{loc}} and the confidence loss LconfL_{\text{conf}}, given by

Lbb=\displaystyle L_{\text{bb}}= 1N(Lloc+Lconf).\displaystyle\frac{1}{N}\left(L_{\text{loc}}+L_{\text{conf}}\right). (9)

where NN is the number of matched default boxes.If the number of matched default boxes is 0, we set NN as 1.

LdrillL_{\text{drill}} is a weighted sum of dice loss LdiceL_{\text{dice}} and boundary loss LbL_{\text{b}}, given by

Ldrill\displaystyle L_{\text{drill}} =αLdice+(1α)Lb.\displaystyle=\text{$\alpha L_{\text{dice}}$}+\left(1-\alpha\right)L_{\text{b}}. (10)

where α=max(0.01, 10.01×epoch)\alpha=max\left(0.01,\,1-0.01\times epoch\right). We can see that as the training process progresses, the weight of LbL_{\text{b}} continuously increases.

LcomL_{\text{com}} is L2 loss function which calculates the addition of the absolute value of the difference of each according pixel.

The overall loss function LL is given by

L=\displaystyle L= Lbb+Ldrill+Lcom.\displaystyle L_{\text{bb}}+L_{\text{drill}}+L_{\text{com}}. (11)

We could not find noticeable improvements in preliminary experiments in which we tested different weights for the summation of loss functions.

Training condition

During the training stage, the images are trained in a total of 6,000 epochs with the Adam optimizer. The training rate is set as 0.01×0.9995n0.01\times 0.9995^{n} while nn is the number of epochs, which is changeable with the training process. Moreover, we used a batch size of 16 during the training stage. All the architectures were implemented in Python 3.8 using PyTorch 1.10.1 and CUDA 11.3 and executed in Ubuntu 20.04 with an NVIDIA RTX A5000 graphics card.

III-B4 Training Result and Evaluation

An example of output results is shown in Fig. 5. The network is able to output a bounding box for drilling area (Fig. 5-(a)) and the completion level map for the area (Fig. 5-(b)). The completion levels of parts that are drilled and not occluded by the drill are shown in the completion level map, while undrilled or occluded parts are suppressed. We can visualize the completion level map using heatmap (Fig. 5-(c)) where color is red if the completion level is higher and the color is blue if the completion level is blue. By adding and normalizing the according pixel value, we can easily calculate a value between 0 and 1 for each angle along a circle and generate a drilling progress bar (Fig. 5-(d)) based on the completion level map to show the drilling progress. It is important to note that the progress bar is generated by not only the completion level map in this moment but also those in the past because we expect the drilling progress of an area to be monotonically increasing. By setting nn discrete points evenly along the circle on the progress bar, the value cic_{i} of point ii, which is the completion level at point ii, is used to calculate the lowering velocity of drill by (1).

mAP (mean Average Precision) is applied as the evaluation metric of object detection task while MAPE (Mean Absolute Percentage Error) is applied as the evaluation metric of completion level prediction task. Our network architecture is able to reach up 78.5 in mAP for detection task and 15.05%15.05\,\% in MAPE for prediction task while the speed is 72Hz72\,\mathrm{Hz}. We expected the result to be mAP> 75\text{mAP}\,>\,75, MAPE< 20%\text{MAPE}\,<\,20\,\% and real-time (fps> 60\mathrm{fps\,}>\,60), so our work is able to achieve the acquirement and have a good trade-off.

TABLE I: The selected parameters during the experiment.
Parameters v0(m/s)v_{\text{0}}\,(\text{m/s}) R(mm)R\,(\text{mm}) f(Hz)f\,(\text{Hz})
Selected value 6×1066\times 10^{-6} 8 30

IV Experiment

An experiment was conducted to evaluate the feasibility of the autonomous robotic drilling system.

The system is set up as shown in Fig. 1. An egg is fixed stably using a clamping device. The micro drill is held by a robot arm and a 4K camera is set vertically above the center of the egg to observe the drilling procedure. An air compressor with a silicone tube is applied to blow away the shell dust emitted from drilling the surface of the eggshell. Communication with the robot was enabled by the SmartArmStack111https://github.com/SmartArmStack. The quaternion algebra and robot kinematics were implemented using DQ Robotics [21] with Python3.

Before the experiment starts, the drill is teleoperated by an operator so that it is within the field-of-view of the camera throughout the entire circular path. We planned the drilling trajectory with the method mentioned in Sec. III-A.

We performed a total of 20 trials with 20 different eggs of random shape, size and thickness of shell. After the autonomous drilling algorithm stopped, we tried to remove the resected circular shell piece manually using tweezers. If the eggshell could be removed without damaging the membrane beneath the eggshell, the experiment was deemed successful. On the other hand, if the membrane broke during the drilling or removal procedure, or the eggshell was not easily removed, the experiment was counted as a failure. If the robot over-drilled the egg and ruptured the membrane, this would also be considered a failure.

IV-A Preparation

The parameters used in the experiment, namely the the initial speed v0v_{\text{0}} of the drill in (1), the radius of the circular drilling path RR, and the frequency of the transmission of the calculated zz-positions to the robot ff are summarized in Table I. Also, the drilling will stop autonomously when the system judges the drilling procedure as complete. This happens when 80%80\,\% of points on the drilling path achieve at least 85%85\,\% of completion level.

IV-B Results and discussion

Refer to caption
Figure 6: A series of snapshots of a case of success from the start of drilling to the removal of resected circular shell piece.

In the experiment, 16 out of 20 cases were successful, resulting in a success ratio of 80%80\,\%. Fig. 6 shows a typical case of success where the drilling procedure will stop automatically when it is judged as complete and the circular patch can be removed perfectly without the membrane broke. The result can initially prove the feasibility of the autonomous robotic drilling scheme.

The average required time for successful drilling was 16.8min16.8\,\text{min}, which is acceptable but we still expect it to be shorter considering the drilling time of 7 out of 16 cases of success is over 25min25\,\text{min}. The main reason for the excessive drilling time could be a tilted initial pose of the egg. Considering the drill trajectory in the xyx\text{--}y plane is a circle and the zz-coordinate of any point on the trajectory at the initial moment are the same, if the initial pose of the egg is tilted, the distance of any point on the drilling trajectory from the initial position to its projection on the outside of the eggshell differ much. As a result, the drill touches the eggshell at one point on the drilling trajectory sooner than another, even already reach 100%100\,\% completion level at one point and doesn’t touch the eggshell at another, resulting in a long drilling time. On the other hand, if the initial pose of the egg is not tilted, the drill touches any point of the circular trajectory almost at the same time and finishes drilling almost simultaneously, leading to a short drilling time.

A tilted initial pose of the egg may also cause drilling failure, as shown in Fig. 7-(a). The failure cases are due to the fact that the drill bit rubs the membrane surface despite retaining its z-position at points reaching a 100% completion level when the drill is trying to drill the other points that are not. Such failures accounted for 2 of the 4 total cases of failure. In this initial proposal, theautonomous drilling process was sensitive to the initial tilt of the eggs and this is being addressed as part of future work.

The reason for the other 2 cases of failure is that the areas whose completion level is already 100 are predicted as less than 100 by the prediction model, resulting in a continuous drop of the drill at those areas, rupturing the membrane, as shown in Fig. 7-(b). That refers that we should still refine the neural network for a higher recognition accuracy while being in real time.

Furthermore, the experiment result also demonstrated the possibility of generalizing the system for the mouse skull drilling experiments. It is to be noted that the trajectory planning block of the system can be promoted directly to the experiment on mice skull while transfer learning is additionally necessary for the generalizing of the completion level recognition block since the the training datasets differ.

Refer to caption
(a)
(b)
Figure 7: Two failure case of the experiments (a) A failure case caused by tilted initial pose; (b) A failure case caused by prediction errors of completion level.

V Conclusion

In this paper, we proposed a autonomous robotic drilling system for cranial window creation. To achieve this, a new algorithm for trajectory planning while avoiding overshoot is imposed. In order to adjust the generated trajectory in real-time, we applied a neural network to recognize the completion level of drilling in real-time. In the experiment, we showed that our robotic drilling system is able to achieve autonomous drilling at a reasonable success rate and speed with an egg model and demonstrates the possibility of the system on mice cranial window creation procedure.

Future works include increasing the drilling speed by improving the trajectory planner to accommodate for initially tilted eggs, improving the success rate of drilling by including multimodal information such as contact force, and the application of the system to drilling mouse skulls.

References

  • [1] H. Koike, K. Iwasawa, R. Ouchi, M. Maezawa, K. Giesbrecht, N. Saiki, A. Ferguson, M. Kimura, W. L. Thompson, J. M. Wells, A. M. Zorn, and T. Takebe, “Modelling human hepato-biliary-pancreatic organogenesis from the foregut–midgut boundary,” Nature, vol. 574, no. 7776, pp. 112–116, sep 2019.
  • [2] P. T. Ly, A. Lucas, S. H. Pun, A. Dondzillo, C. Liu, A. Klug, and T. C. Lei, “Robotic stereotaxic system based on 3d skull reconstruction to improve surgical accuracy and speed,” jan 2020.
  • [3] L. Ghanbari, M. L. Rynes, J. Hu, D. S. Schulman, G. W. Johnson, M. Laroque, G. M. Shull, and S. B. Kodandaramaiah, “Craniobot: A computer numerical controlled robot for cranial microsurgeries,” Scientific Reports, vol. 9, no. 1, jan 2019.
  • [4] N. Pak, J. H. Siegle, J. P. Kinney, D. J. Denman, T. J. Blanche, and E. S. Boyden, “Closed-loop, ultraprecise, automated craniotomies,” Journal of Neurophysiology, vol. 113, no. 10, pp. 3943–3953, jun 2015.
  • [5] M. Marques Marinho, J. José Quiroz-Omaña, and K. Harada, “Design and validation of a multi-arm robotic platform for scientific exploration,” arXiv e-prints, pp. arXiv–2210, 2022.
  • [6] L. Andreoli, H. Simplício, and E. Morya, “Egg model training protocol for stereotaxic neurosurgery and microelectrode implantation,” World Neurosurgery, vol. 111, pp. 243–250, 2018.
  • [7] Y. Hu, H. Jin, L. Zhang, P. Zhang, and J. Zhang, “State recognition of pedicle drilling with force sensing in a robotic spinal surgical system,” IEEE/ASME Transactions on Mechatronics, vol. 19, no. 1, pp. 357–365, feb 2014.
  • [8] Y. Dai, Y. Xue, and J. Zhang, “Drilling electrode for real-time measurement of electrical impedance in bone tissues,” Annals of Biomedical Engineering, vol. 42, no. 3, pp. 579–588, nov 2013.
  • [9] Z. Ying, L. Shu, and N. Sugita, “Autonomous penetration perception for bone cutting during laminectomy,” in 2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob).   IEEE, nov 2020.
  • [10] B. M. Pohl, A. Schumacher, and U. G. Hofmann, “Towards an automated, minimal invasive, precision craniotomy on small animals,” in 2011 5th International IEEE/EMBS Conference on Neural Engineering.   IEEE, apr 2011.
  • [11] N. Dvornik, K. Shmelkov, J. Mairal, and C. Schmid, “Blitznet: A real-time deep network for scene understanding,” 2017.
  • [12] R. Girshick, “Fast r-cnn,” 2015.
  • [13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2015.
  • [14] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” 2017.
  • [15] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” 2015.
  • [16] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd : Deconvolutional single shot detector,” 2017.
  • [17] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014.
  • [18] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” 2015.
  • [19] I. Kokkinos, “Ubernet: Training a ‘universal’ convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory,” 2016.
  • [20] C. Kruger, “Constrained cubic spline interpolation,” Chemical Engineering Applications, vol. 1, no. 1, 2003.
  • [21] B. V. Adorno and M. M. Marinho, “DQ Robotics: a library for robot modeling and control,” IEEE Robotics and Automation Magazine (RAM), vol. 28, no. 3, pp. 102–116, Sep. 2021, invited for presentation at IROS’21. [Online]. Available: https://arxiv.org/pdf/1910.11612