Visualization of fully connected layer weights in deep learning CT reconstruction

Qiyang Zhang^1,2, Dong Liang^1,2 ¹ Paul C Lauterbur Research Center for Biomedical Imaging, Research Center for Medical Artificial Intelligence, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China ² Paul C Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China [email protected],[email protected]

Abstract

Recently, the use of deep learning techniques to reconstruct computed tomography (CT) images has become a hot research topic, including sinogram domain methods, image domain methods and sinogram domain to image domain methods. All these methods have achieved favorable results. In this article, we have studied the important functions of fully connected layers used in the sinogram domain to image domain approach. First, we present a simple domain mapping neural networks. Then, we analyze the role of the fully connected layers of these networks and visually analyze the weights of the fully connected layers. Finally, by visualizing the weights of the fully connected layer, we found that the main role of the fully connected layer is to implement the back projection function in CT reconstruction. This finding has important implications for the use of deep learning techniques to reconstruct computed tomography (CT) images. For example, since fully connected layer weights need to consume huge memory resources, the back-projection function can be implemented by using analytical algorithms to avoid resource occupation, which can be embedded in the entire network.

Keywords: fully connected layer, computed tomography, neural network (NN)

1 Introduction

Currently, neural networks and deep learning have completely changed the traditional way of signal processing, image processing, image recognition, etc. Many of these successful cases have been applied to the medical field (?, ?, ?, ?, ?, ?, ?, ?). Also, the authors of a recent article (?) described the vision of using machine learning to create new CT image reconstruction algorithms to improve conventional analysis and iterative methods. Now, more and more academic research hotspots for CT reconstruction are focused on DL (deep learning) methods, including sinogram domain methods (?, ?), image domain methods (?, ?) and sinogram domain to image domain methods (?, ?). In particular, the domain transformation method has great potential to remove noise and artifacts simultaneously during graphics reconstruction (?). However, there is currently no clear mathematical explanation for each part of the domain mapping network. In this article, we use visualization techniques to study the function of the fully connected layer used in the domain mapping network.

2 Methodology

2.1 Prepare and train the simple network

In order to explain what the fully connected layer of the network, like AUTOMAP (?) for CT reconstruction, learned, we built a simple network, as illustrated in Fig. 1, with only one fully connected layer followed by five convolutional layers, each convolutional layer with 128 filters, except for the final layer that has only 1 filter. a $3\times 3$ filter with a filter stride 1 is used for all convolutional layers. The activation function for all layers is tanh. The loss function was a simple squared loss between the network output and target image intensity values. Here we only want to explain the role of the fully connected layer, so all used CT images size were downsampled to $64\times 64$ . A fan-beam CT imaging geometry was simulated for this simple network. The source to detector distance was 1500.00 mm, and the source to rotation center was 1000.00 mm. There were 128 detector elements, each had a dimension of 1.6 mm. To make the CT images fit in this simulated imaging geometry, we further assumed that all CT images have the same pixel dimension of $1.0mm\times 1.0mm$ . Forward projections, i.e., Radon projections, were collected from 90 views with 4.00 degree angular interval. Notice that this is only a simulated fan-beam CT imaging geometry to to explain the role of the fully connected layer. Also be aware that no noise is added in these simulations.

Refer to caption — Figure 1: The architecture of the simple network. The network with only one fully connected layer followed by five convolutional layers, each convolutional layer with 128 filters, except for the final layer that has only 1 filter. The weight kernels for all convolutional layers are $3\times 3$ with a filter stride 1. The three numbers in each box denotes the image column number, image row number and the channel number, respectively. The activation function for all layers is tanh.

As illustrated in Fig. 1, the shape of the input image, which be fed to fully connected layer, needs to be reshaped from two dimension ( $90\times 128$ ) to one dimension( $11520$ ). Also, the output shape of the fully connected layer(FC1), which be fed to CNNs, needs reverse change from one dimension(( $4096$ )) to two dimension( $64\times 64$ ) .

The network was trained by Adam algorithm (?) with starting learning rate of $10^{-5}$ . The learning rate was exponentially decayed by a factor of 0.96 after every 1000 steps. The mini-batch had a size of 60, and batch-shuffling was turned on to increase the randomness of the training data. The network was trained for 200 epochs on the Tensorflow deep learning framework using a single graphics processing unit (GPU, NVIDIA GeForce GTX 1080Ti) with 11 GB memory capacity.

The numerical experimental results of this simple neural network are shown in Fig. 2. Sinogram image is fed to this network and CT reconstruction image be generated from it. Because we only using this simple neural network to analyze what the fully connected layer has learned, so wo not do quantitative analysis. But the mid result of this network, which was reshaped for human-readable, was shown(FC1 out). As shown in Fig. 2 FC1 out is somehow close to label.

2.2 Change the shape of the fully connetced layer weights

In order to observe the physical meaning of the weights learned by the network, we need to recombine the shape of the full connection layer weights. As illustrated in Fig. 1, where $S_{i}$ denote the reshaped projection data. $V_{j}$ denote the output of fully connected layer. $W_{i,j}$ is refers to one weight of FC1 (fully connected layer), which denote from input end $i$ to output end $j$ . There are $11520\times 4096$ weights.

The reshape rules from two dimension ( $90\times 128$ ) to one dimension ( $11520$ ) of input sinogram image $I$ can be defined by

S_{i}=I_{p,q},\quad where\ i=(p-1)\times Q+q,\quad 1\leq p\leq P;1\leq q\leq Q

(1)

where $I_{p,q}$ represent the pixel value of the p-th row and q-th column of the input sinogram image, $P$ and $Q$ represent the total pixels along the row and column directions. And, physically, $P$ and $Q$ represent the the total acquisition angles and detector cells. Here $P=90$ and $Q=128$ .

Because of the convolutional layers, which followed FC1, are end-to-end mapping network, So, the output data shape of FC1 layer is the same as the label image. Then the reshape rules of output of FC1 layer from one dimension ( $4096$ ) to two dimension ( $64\times 64$ ) can be defined by

M_{c,t}=V_{j},\quad where\ j=(c-1)\times T+t,\quad 1\leq c\leq C;1\leq t\leq T

(2)

where $M_{c,t}$ represent the value of the p-th row and q-th column of the input two dimension feature data for convolutional layers, $C$ and $T$ represent the total numbers along the row and column directions of the feature data. Here $C=64$ and $T=64$ .

Following the rules for changing shapes described above, we can transform the shape of the fully connected layer weights to match the rules of back projection in CT reconstruction. Because in this simulated fan-beam CT imaging geometry the sinogram datas were collected from 90 views with 4.00 degree angular interval, the detector has 128 cells and the CT image be projected with two dimension of $64\times 64$ . So we can define a cell matix $H$ with dimension of $K\times L$ , as shown in Fig. 3 where $K$ denotes total columns of the CT image and $L$ denotes total collected views. Here $K=64$ and $L=90$ . Each cell has the dimension of $A\times B$ , where $A$ denotes total rows of the CT image and $B$ denotes total numbers of detector cells. Here $A=64$ and $B=128$ .

We note that $H^{k,l}$ to represent each element of the cell matix $H$ and $H^{k,l}_{a,b}$ to represent each element of cell $H^{k,l}$ . So, $H^{k,l}_{a,b}$ can represent each element of the fully connected layer weights:

$\displaystyle H^{k,l}_{a,b}=W_{i,j},\quad where\ i$	$\displaystyle=(k-1)\times A+(a-1)$	(3)
$\displaystyle j$	$\displaystyle=(l-1)\times B+(b-1),$
$\displaystyle 1\leq k\leq K;$	$\displaystyle 1\leq l\leq L;1\leq a\leq A;1\leq b\leq B$

3 Visualization of weights

In this section, we only see what weights map (feature map) look like, not do quantitative analysis. So, the display windows of the images shown in this section are not the same but be adjusted comfortable to see feature shape.

First, let’s visualize the weights of one given fixed detector unit for different pixel values of the final reconstructed CT image at different acquisition angles. In order to make it more obvious, we have selected the 64th detector unit ( $H^{k,l}_{a,b},b=64$ ). Let ( $H^{k,l}_{a,b},b=64;l=1;k=1;2,...,K;a=1,2,...,A$ ) be the first ( $l=1$ ) back projection view of CT reconstruction weights map, ( $H^{k,l}_{a,b},b=64;l=2;k=1;2,...,K;a=1,2,...,A$ ) be the second ( $l=2$ ) ,…,( $H^{k,l}_{a,b},b=64;l=90;k=1;2,...,K;a=1,2,...,A$ ) be the last ( $l=90$ ) back projection view of CT reconstruction weights map. The displayed of those maps are show on the right image of Fig. 4. The image on the left of Fig. 4 is weights map directly calculated by the back projection analytic algorithm at the same rule of above.

Second, let’s visualize the weights of different detector unit for different pixel values of the final reconstructed CT image at a fixed acquisition angle. Here we select the 12th view degree for visualization ( $H^{k,l}_{a,b},l=12$ ), of course, we can also choose other degrees. Let ( $H^{k,l}_{a,b},b=1;l=12;k=1;2,...,K;a=1,2,...,A$ ) be the first ( $b=1$ ) first detector unit weight map at the 12th back projection view of CT reconstruction, ( $H^{k,l}_{a,b},b=2;l=12;k=1;2,...,K;a=1,2,...,A$ ) be the second ( $b=2$ ) ,…,( $H^{k,l}_{a,b},b=128;l=12;k=1;2,...,K;a=1,2,...,A$ ) be the last ( $b=128$ ) detector unit weight map at the 12th back projection view of CT reconstruction. The displayed of those images are show on the right image of Fig. 5. The image on the left of Fig. 5 is weights map directly calculated by the back projection analytic algorithm at the same rule of above.

From Fig. 4 and Fig. 5, we can realize that, the feature map of fully connected layer of this trained network is the same as that of analytic algorithm, they are very sparse data. The fully connected layer implements mapping from sinogram domain to image domain and do some filtering. In Fig. 2, FC1 out image appears to be inversely numerical in comparison with the label image or the net output image. This phenomenon can also be seen from Fig. 4 and Fig. 5. Because of the back propagation process of deep learning affects all weight parameters, include fully connected layer weights and convolutional layer weights, so, this phenomenon (inversely numerical and some filted) affected weights may be learned out at the fully connected layer . After the above analysis, we can draw a conclusion that the main role of the fully connected layer is to implement the back projection function in sinogram domain to image domain mapping CT reconstruction networks.

4 Discussion

The main problem of using neural networks to realize back-projection is that it consumes huge memory resources. Just like this simple network, it can only be used to reconstruct CT images of $64\times 64$ size, the number of weights in fully connected layer was 45M, if we use 4Bytes for one float type data, the consumption of memory resources is 180M Bytes. If we use the clinical CT image size of $512\times 512$ at 360 collection views, 768 detector units, the number of weights in one fully connected layer will reach more than 69G. With today’s computer graphics storage technology, it is impossible to achieve such a large amount of data storage in high-speed computing. Therefore, it is currently impossible to obtain practical applications.

By visualizing the weights of fully connected layers, it shows that back-projection can be learned through neural networks. This means that we can directly use analytic algorithms to implement back projection, so that the network can learn more complex problems, such as noise reduction and artifact suppression.

References

[1]
[2] [] Chen H, Zhang Y, Kalra M K, Lin F, Chen Y, Liao P, Zhou J & Wang G 2017 IEEE transactions on medical imaging 36(12), 2524–2535.
[3]
[4] [] Gulshan V, Peng L, Coram M, Stumpe M C, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J et al. 2016 Jama 316(22), 2402–2410.
[5]
[6] [] Han Y & Ye J C 2018 IEEE transactions on medical imaging 37(6), 1418–1429.
[7]
[8] [] Jin K H, McCann M T, Froustey E & Unser M 2017 IEEE Transactions on Image Processing 26(9), 4509–4522.
[9]
[10] [] Kang E, Chang W, Yoo J & Ye J C 2018 IEEE transactions on medical imaging 37(6), 1358–1369.
[11]
[12] [] Kingma D P & Ba J 2014 arXiv preprint arXiv:1412.6980 .
[13]
[14] [] Krishnan A, Zhang R, Yao V, Theesfeld C L, Wong A K, Tadych A, Volfovsky N, Packer A, Lash A & Troyanskaya O G 2016 Nature neuroscience 19(11), 1454.
[15]
[16] [] Lee H, Lee J, Kim H, Cho B & Cho S 2018 arXiv preprint arXiv:1803.00694 .
[17]
[18] [] Pelt D M & Batenburg K J 2013 IEEE Transactions on Image Processing 22(12), 5238–5251.
[19]
[20] [] Shan H, Zhang Y, Yang Q, Kruger U, Kalra M K, Sun L, Cong W & Wang G 2018 IEEE transactions on medical imaging 37(6), 1522–1534.
[21]
[22] [] Wang G 2016 arXiv preprint arXiv:1609.04375 .
[23]
[24] [] Wang G, Ye J C, Mueller K & Fessler J A 2018 IEEE transactions on medical imaging 37(6), 1289–1296.
[25]
[26] [] Würfl T, Hoffmann M, Christlein V, Breininger K, Huang Y, Unberath M & Maier A K 2018 IEEE transactions on medical imaging 37(6), 1454–1463.
[27]
[28] [] Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Kalra M K, Zhang Y, Sun L & Wang G 2018 IEEE transactions on medical imaging .
[29]
[30] [] Zhang Z, Liang X, Dong X, Xie Y & Cao G 2018 IEEE transactions on medical imaging 37(6), 1407–1417.
[31]
[32] [] Zhu B, Liu J Z, Cauley S F, Rosen B R & Rosen M S 2018 Nature 555(7697), 487.
[33]