\cftpagenumbersoff

figure \cftpagenumbersofftable

Spatiotemporal Image Reconstruction to Enable High-Frame Rate Dynamic Photoacoustic Tomography with Rotating-Gantry Volumetric Imagers

Refik Mert Cam University of Illinois Urbana-Champaign, Department of Electrical & Computer Engineering, 306 North Wright Street, Urbana, IL, United States, 61801 Chao Wang Department of Statistics and Data Science, National University of Singapore, Singapore Weylan Thompson PhotoSound Technologies Inc., Houston, TX 77036, USA Sergey A. Ermilov PhotoSound Technologies Inc., Houston, TX 77036, USA Mark A. Anastasio University of Illinois Urbana-Champaign, Department of Bioengineering, 1406 West Green Street, Urbana, IL, United States, 61801 Umberto Villa The University of Texas at Austin, Oden Institute for Computational Engineering & Sciences, 201 East 24th Street, Austin, TX, United States, 78712

Abstract

Significance: Dynamic photoacoustic computed tomography (PACT) is a valuable imaging technique for monitoring physiological processes. However, current dynamic PACT imaging techniques are often limited to two-dimensional (2D) spatial imaging. While volumetric PACT imagers are commercially available, these systems typically employ a rotating measurement gantry in which the tomographic data are sequentially acquired, as opposed to being acquired simultaneously at all views. Because the dynamic object varies during the data-acquisition process, the sequential data-acquisition process poses substantial challenges to image reconstruction associated with data incompleteness. The proposed image reconstruction method is highly significant in that it will address these challenges and enable volumetric dynamic PACT imaging with existing preclinical imagers.

Aim: The aim of this study is to develop a spatiotemporal image reconstruction method for dynamic PACT that can be applied to commercially available volumetric PACT imagers that employ a sequential scanning strategy. The proposed reconstruction method aims to overcome the challenges caused by the limited number of tomographic measurements acquired per frame.

Approach: A low-rank matrix estimation-based spatiotemporal image reconstruction method (LRME-STIR) is proposed to enable dynamic volumetric PACT. The LRME-STIR method leverages the spatiotemporal redundancies in the dynamic object to accurately reconstruct a four-dimensional (4D) spatiotemporal image.

Results: The conducted numerical studies substantiate the LRME-STIR method’s efficacy in reconstructing 4D dynamic images from tomographic measurements acquired with a rotating measurement gantry. The experimental study demonstrates the method’s ability to faithfully recover the flow of a contrast agent at a frame rate of 0.1 seconds, even when only a single tomographic measurement per frame is available.

Conclusions: The proposed LRME-STIR method offers a promising solution to the challenges faced by enabling 4D dynamic imaging using commercially available volumetric PACT imagers. By enabling accurate spatiotemporal image reconstructions, this method has the potential to significantly advance preclinical research and facilitate the monitoring of critical physiological biomarkers.

keywords:

Photoacoustic computed tomography, Optoacoustic tomography, Spatiotemporal image reconstruction, Dynamic imaging, Small animal imaging, Low-rank matrix estimation

†Refik Mert Cam and Chao Wang are equal contributors to this work and designated as co-first authors.

*Umberto Villa, \linkable[email protected]

1 Introduction

Photoacoustic computed tomography (PACT), also referred to as optoacoustic tomography, is an emerging and promising imaging modality with broad applications in the field of biomedical imaging [1, 2, 3]. By combining the high spatial resolution of ultrasound imaging with the high soft tissue contrast of optical imaging, PACT offers unique advantages for imaging biological structures while avoiding the use of ionizing radiation. In PACT, a fast laser pulse in the near infrared range illuminates the object. Absorption of optical energy by various molecules within the object (chromophore) induces a localized increase of acoustic pressure through the photoacoustic effect. The acoustic wavefield propagating through the object and coupling medium (water) is subsequently detected by ultrasonic transducers. The measured wavefield data can then be utilized to reconstruct an image that depicts the initial induced pressure distribution within the object.

In preclinical and clinical research, the ability to monitor dynamic physiological processes is of utmost importance for comprehending the progression of diseases and developing new treatments [4, 5, 6]. For example, tumor vascular perfusion is one such dynamic processes that is critical in the study of cancers. High vascular perfusion is indicative of angiogenesis, a well-established hallmark of cancerous growth[7, 8]. Due to its noninvasive nature and the combination of optical contrast and spatial resolution at depths beyond the optical diffusion limit, PACT represents a promising imaging modality for monitoring critical dynamic physiological processes in preclinical and clinical research [9, 10, 11, 12, 13].

Despite its considerable promise, current dynamic PACT technologies suffer from fundamental limitations. They often target two-dimensional (2D) spatial imaging due to shorter data acquisition times and computationally less demanding image reconstruction compared to 3D imaging [14, 15, 16, 17, 11, 18]. Most existing 3D PACT imagers developed to-date utilize a rotating measurement geometry in which the tomographic data are sequentially acquired [19, 2, 20, 21], as opposed to being acquired simultaneously at all views. This design is advantageous because it reduces system costs by employing a limited number of acoustic transducers and associated electronics. However, data-acquisition times for a complete tomographic scan can be tens of seconds. Due to the relatively slow rotational speed, the temporal resolution is significantly limited. While enhancing temporal resolution by using sparsely sampled tomographic data is possible, the associated dynamic image reconstruction problem becomes ill-posed and highly challenging.

Previous studies on dynamic PACT[14, 15, 16, 17, 11, 18, 10, 22], have primarily focused on scenarios where the sufficiently sampled tomographic data can be rapidly acquired. In such cases, a straightforward approach is to employ a frame-by-frame image reconstruction (FBFIR) method[14, 15, 16, 17, 11, 18, 10]. These techniques utilize conventional static image reconstruction methods to estimate a sequence of images from sufficiently sampled tomographic data. The temporal resolution is limited by the duration of the complete data acquisition process. Rapid data acquisition is feasible either with 2D PACT imaging or by leveraging a dense, albeit expensive, static transducer array in 3D imaging. However, for volumetric imagers with sequential scanning strategy, FBFIR methods are not applicable, primarily due to the extended time required to accumulate the complete set of tomographic measurements.

On the other hand, spatiotemporal image reconstruction (STIR) methods estimate a sequence of images simultaneously instead of frame-by-frame, and they have demonstrated their efficacy in accurately reconstructing dynamic objects from sparsely sampled data in various medical imaging modalities, including computed tomography[23], positron emission tomography[24], single photon emission computed tomography[25], and magnetic resonance imaging[26]. While a few STIR techniques have been proposed for PACT, some of them still presume the availability of sufficiently sampled tomographic data and strive for enhanced accuracy and/or reduced computational complexity compared to FBFIR techniques[27]. The STIR methods[28, 29, 30] considering sparsely sampled tomographic data, have relied on the principles of compressed sensing and required specific data sampling schemes that are different than the sequential sampling schemes employed by currently available volumetric imagers.

This study introduces a novel spatiotemporal image reconstruction method based on low-rank matrix estimation (LRME-STIR), which is applicable to currently available sequential 3D PACT imaging systems without requiring hardware modifications. By employing the LRME-STIR technique, it becomes possible to overcome the challenges caused by the sparsely sampled tomographic data. The proposed approach holds the potential to advance the field by enabling accurate and efficient spatiotemporal image reconstruction, thereby facilitating the monitoring of dynamic physiological changes using PACT.

The remainder of the paper is organized as follows. Section 2 presents an imaging model for sequential volumetric imagers and introduces the inverse problem formulation. The proposed LRME-STIR method is described in Section 3. The conducted numerical and experimental studies, and the results are provided in Sections 4 and 5, respectively. Finally, the paper concludes with a discussion in Section 6.

2 Imaging Model for Sequential Volumetric Imagers and Inverse Problem Formulation

In the context of PACT, the sequential data acquisition strategy commonly involves utilizing one or more rotating or translating ultrasonic transducer arrays within single or multiple acoustic probes[19, 2, 20]. This approach facilitates data collection by rotating the probes along a fixed axis, resulting in the acquisition of a few tomographic measurements at each step, as depicted in Figure 1. These measurements are accumulated sequentially to form a complete tomographic measurement set.

Refer to caption — Figure 1: Schematic illustrating the sequential scanning strategy employing a rotating acoustic probe around the object.

During each step of sequential data acquisition, the object can be considered as static (quasi-static assumption), which is justified by the negligible data acquisition time during each step (on the order of $10^{-4}$ s), significantly shorter than the time between consecutive measurements (on the order of $10^{-1}$ s). An object frame is defined as the short period of time when the object is considered static. The sequence of object frames constitutes the dynamic object. Essentially, each data acquisition step, hereafter referred to as an imaging frame, corresponds to an object frame. Although object and imaging frames may seem interchangeable, the distinction lies in the fact that the set of imaging frames is essentially a subset of the set of object frames because the dynamic object might not be imaged throughout all object frames. Under the quasi-static assumption, the time-dependent object function at the $k$ -th imaging frame, specifically the dynamic induced initial pressure distribution, can be represented as $f_{k}(\boldsymbol{r})=f(\boldsymbol{r},k\Delta t)$ for $k=1,...,K$ . Here, $K$ represents the number of imaging frames, and $\Delta t$ is the time interval between consecutive imaging frames, which is equal to the laser repetition rate. Finally, the rotation speed of the gantry determines the angular spacing between imaging frames.

Under the quasi-static assumption, the data acquisition process at each imaging frame can be described by a continuous-to-discrete (C-D) imaging model as[31, 27]:

[\boldsymbol{g}_{k}]_{(q-1)P+p}=\frac{1}{\Omega_{qk}}\int_{\Omega_{qk}}d\boldsymbol{r}_{qk}^{\prime}\frac{1}{4\pi}\int_{V}d\boldsymbol{r}\frac{f_{k}(\boldsymbol{r})}{c_{0}^{2}}\frac{d}{d\tau}\frac{\delta(\tau-\frac{|\boldsymbol{r}_{qk}^{\prime}-\boldsymbol{r}|}{c_{0}})}{|\boldsymbol{r}_{qk}^{\prime}-\boldsymbol{r}|}\Bigg{|}_{\tau=p\Delta T},\ \genfrac{}{}{0.0pt}{}{p=1,2,\dots,P}{q=1,2,\dots,Q,}

(1)

where $\tau$ denotes the fast-time (i.e., the arrival time of acoustic signals to the transducer), and $\Delta T$ corresponds to the fast-time sampling interval. The object function at the $k$ -th imaging frame, $f_{k}(\boldsymbol{r})$ , is assumed to be bounded and contained within the volume $V$ . The scalar $c_{0}$ denotes the speed of sound, which is assumed to be constant throughout the volume $V$ . The quantities $\boldsymbol{r}$ and $\boldsymbol{r}_{qk}^{\prime}$ specify the spatial coordinates within $V$ and the location of the $q$ -th transducer at the $k$ -th imaging frame, respectively. The vector $\boldsymbol{g}_{k}\in\mathbb{R}^{PQ}$ represents lexicographically ordered pressure traces measured by the transducers at the $k$ -th imaging frame. Here, $Q$ stands for the number of transducers, and $P$ represents the number of electrical signals recorded by each transducer. The notation $[\boldsymbol{g}_{k}]_{(q-1)P+p}$ refers to the $(q-1)P+p$ -th entry of the measurement vector $\boldsymbol{g}_{k}$ . Here, the integer-valued indices $q$ and $p$ denote the transducer index and temporal sample, and $\Omega_{qk}$ denotes the detection area of the $q$ -th transducer at the $k$ -th imaging frame. When the transducer size is small and/or the object is located near the center of a relatively large measurement geometry, an idealized point-like transducer model can be assumed and the surface integral over $\Omega_{qk}$ can be neglected[27].

To facilitate the implementation of an iterative image reconstruction algorithm, a discrete-to-discrete (D-D) imaging model is defined as follow. The spatially continuous object functions $f_{k}$ corresponding to the $k$ -th imaging frame are approximated by use of a finite linear combination of spatial expansion functions $\{\psi_{n}(\boldsymbol{r})\}_{n=1}^{N}$ :

f_{k}^{(N)}(\boldsymbol{r})=\sum_{n=1}^{N}\alpha_{n}^{k}\psi_{n}(\boldsymbol{r}),\quad k=1,\dots,K,

(2)

where $N$ denotes the number of spatial expansion functions. In this study, the expansion functions are piecewise trilinear Lagrangian functions defined on a uniform Cartesian grid. Their expression is given by[31]:

\psi_{n}(\boldsymbol{r})=\begin{cases}\left(1-\frac{|x-x_{n}|}{\Delta s}\right)\left(1-\frac{|y-y_{n}|}{\Delta s}\right)\left(1-\frac{|z-z_{n}|}{\Delta s}\right),&\text{if $|x-x_{n}|$, $|y-y_{n}|$, $|z-z_{n}|\leq\Delta s$}\\ 0,&\text{otherwise.}\end{cases}

(3)

Here, $\boldsymbol{r}=(x,y,z)$ denotes the spatial coordinate and $\boldsymbol{r}_{n}=(x_{n},y_{n},z_{n})$ specifies the location of the $n$ -th node of the uniform Cartesian grid. The parameter $\Delta s$ indicates the distance between adjacent grid points.

The coefficients $\{\alpha_{n}^{k}\}_{n=1,k=1}^{NK}$ can be organized into a matrix $\boldsymbol{F}\in\mathbb{R}^{N\times K}$ with entries $[\boldsymbol{F}]_{nk}\equiv\alpha_{n}^{k}$ . The $k$ -th column of $\boldsymbol{F}$ is denoted by $\boldsymbol{f}_{k}\in\mathbb{R}^{N}$ and represents the discrete approximation of the object function at the $k$ -th imaging frame. That is,

\boldsymbol{F}=\sum_{k=1}^{K}\boldsymbol{f}_{k}\otimes\boldsymbol{e}_{k}=\begin{bmatrix}\alpha_{1}^{1}&\dots&\alpha_{1}^{K}\\ \vdots&\ddots&\vdots\\ \alpha_{N}^{1}&\dots&\alpha_{N}^{K}\end{bmatrix},

(4)

where $\boldsymbol{e}_{k}\in\mathbb{R}^{K}$ represents the $k$ -th column of the identity matrix in $\mathbb{R}^{K}$ and $\otimes$ denotes the vector outer product. Correspondingly, a frame-dependent D-D imaging model accounting for measurement noise can be expressed as:

\underline{\boldsymbol{g}}_{k}=\boldsymbol{H}_{k}\boldsymbol{f}_{k}+\boldsymbol{\eta}_{k},\quad k=1,2,\dots K,

(5)

where $\underline{\boldsymbol{g}}_{k}$ represents the (noisy) tomographic measurements at the $k$ -th imaging frame and $\boldsymbol{\eta}_{k}$ accounts for measurement noise and modeling and discretization errors. The operator $\boldsymbol{H}_{k}\in\mathbb{R}^{QP\times N}$ stems from discretization of the C-D PACT imaging operator associated with the $k$ -th imaging frame. In particular, the vector $\boldsymbol{g}_{k}\in\mathbb{R}^{QP}$ representing the action of $\boldsymbol{H}_{k}$ on $\boldsymbol{f}_{k}$ is defined as in Eqn. (1) but with the continuous in space object function $f_{k}(\boldsymbol{r})$ replaced by its finite dimensional approximation $f_{k}^{(N)}(\boldsymbol{r})$ defined in Eqn. (2).

Given the data matrix $\underline{\boldsymbol{G}}=\sum_{k=1}^{K}\underline{\boldsymbol{g}}_{k}\otimes\boldsymbol{e}_{k}\in\mathbb{R}^{PQ\times K}$ and the set of imaging operators $\boldsymbol{H_{k}}$ ( $k=1,\ldots,K$ ) corresponding to each imaging frame, the goal of dynamic PACT image reconstruction is to find a matrix $\boldsymbol{\hat{F}}=\sum_{k=1}^{K}\boldsymbol{\hat{f}}_{k}\otimes\boldsymbol{e}_{k}\in\mathbb{R}^{N\times K}$ , whose column $\boldsymbol{\hat{f}}_{k}$ represents an object estimate at the $k$ -th imaging frame; for simplicity hereafter $\boldsymbol{\hat{f}}_{k}$ is referred to as the $k$ -th frame of the spatiotemporal object estimate. Given the sparse tomographic measurements acquired for each imaging frame, the task of dynamic image reconstruction constitutes a significantly ill-posed inverse problem. A unique estimator, $\boldsymbol{\hat{F}}$ , can be obtained by solving the following penalized least squares optimization problem:

\boldsymbol{\hat{F}}=\operatornamewithlimits{argmin}\limits_{\boldsymbol{F}\in\mathbb{R}^{N\times K}}\text{{J}}(\boldsymbol{F})\coloneqq\text{{L}}(\boldsymbol{F})+\text{{R}}(\boldsymbol{F})=\sum_{k=1}^{K}\text{{L}}_{k}(\boldsymbol{F})+\text{{R}}(\boldsymbol{F}),

(6)

where $\text{{R}}(\boldsymbol{F})$ is the regularization term, which is convex but possibly non-smooth. The total data fidelity term $\text{{L}}(\boldsymbol{F})=\sum_{k=1}^{K}\text{{L}}_{k}(\boldsymbol{F})$ is the sum of data fidelity terms $\text{{L}}_{k}(\boldsymbol{F})$ associated with the $k$ -th imaging frame. These quantities are defined as

\text{{L}}(\boldsymbol{F})=\frac{1}{2}\left\lVert\left(\sum_{k=1}^{K}(\boldsymbol{H}_{k}\boldsymbol{f}_{k})\otimes\boldsymbol{e}_{k}\right)-\underline{\boldsymbol{G}}\right\rVert^{2}_{F}\quad\text{and}\quad\text{{L}}_{k}(\boldsymbol{F})=\frac{1}{2}\left\lVert\boldsymbol{H}_{k}\boldsymbol{f}_{k}-\boldsymbol{g}_{k}\right\rVert^{2}_{2},

(7)

respectively, where $\lVert\cdot\lVert_{F}$ denotes the Frobeniuos norm.

In addition to the inherent ill-posed nature of the considered dynamic image reconstruction problem, significant implementation challenges exist. Firstly, unlike the relatively straightforward task of static image reconstruction in which a single image is estimated, spatiotemporal image reconstruction involves dealing with a considerably higher computational burden as all frames are reconstructed simultaneously, particularly when each frame is 3D in space. Secondly, as the number of frames increases, there is a growing need for memory space during computation, which necessitates effective computational strategies with minimal memory usage.

3 Low-rank Matrix Estimation-based Spatiotemporal Image Reconstruction

In numerous biomedical applications, the spatiotemporal object function of interest has been demonstrated to be effectively approximated by a small set of weights $\sigma_{r}$ , and functions $u_{r}(\boldsymbol{r})$ and $v_{r}(t)$ , depending only on the space or time variables, known as the semiseparable approximation[32, 33, 34, 35, 36, 26, 37]. Consequently, the object function can be expressed as $f(\boldsymbol{r},t)\approx\sum_{r=1}^{R}\sigma_{r}u_{r}(\boldsymbol{r})v_{r}(t)$ [32, 33, 34, 35, 36, 26, 37]. Leveraging the semiseparable approximation, the spatiotemporal reconstruction problem can be reduced to the problem of estimating $R$ weights, and $R$ spatial and temporal functions. In a discretized formulation, this is algebraically equivalent to enforcing a low-rank structure on the matrix $\boldsymbol{F}$ .

A penalty scheme can then be imposed on the nuclear norm of the spatiotemporal reconstruction matrix to promote low-rankness. Specifically, the regularization term stemming from the nuclear norm of $\boldsymbol{F}$ is defined as

\text{{R${}^{nn}$}}(\boldsymbol{F})=\|\boldsymbol{F}\|_{*}=\sum_{r=1}^{\min(N,K)}\sigma_{r},

(8)

where $\sigma_{r}$ ( $r=1,\ldots,\min(N,K)$ ) represent the singular values of $\boldsymbol{F}$ . This approach not only effectively regularizes the ill-posed inverse problem [38] but also reduces memory demands without sacrificing accuracy. Rather than explicitly storing $\boldsymbol{F}$ in memory, its truncated singular value decomposition $\boldsymbol{U}_{R}\boldsymbol{\Sigma}_{R}\boldsymbol{V}_{R}^{T}$ is stored, resulting in decreased memory requirement. Here, $R<\min(N,K)$ denotes the truncation index, $\boldsymbol{\Sigma}_{R}$ is the diagonal matrix comprising the largest $R$ singular values, $\boldsymbol{U}_{R}$ and $\boldsymbol{V}_{R}$ are matrices with orthonormal columns collecting the left and right singular vectors, respectively, corresponding to the $R$ largest singular vaslues. This approach not only facilitates an efficient approximation of $\boldsymbol{F}$ but also enforces the space-time semiseparability. Specifically, the columns of $\boldsymbol{U}_{R}$ and $\boldsymbol{V}_{R}$ are the algebraic counterpart of the functions $u_{r}(\boldsymbol{r})$ and $v_{r}(t)$ , respectively.

To further regularize the inverse problem, under the assumption that the object undergoes a smooth and slow temporal change, another regularization scheme penalizing the difference between two consecutive frames can be employed. This can be accomplished by penalizing the squared Frobenius norm of the temporal difference matrix, which is expressed through the temporal (forward) difference operator, denoted as $\boldsymbol{D}\in\mathbb{R}^{K\times(K-1)}$ , applied to $\boldsymbol{F}$ :

\text{{R${}^{t}$}}(\boldsymbol{F})=\frac{1}{2}\left\lVert\boldsymbol{F}\boldsymbol{D}\right\rVert^{2}_{F},

(9)

where, $\boldsymbol{D}$ is defined as:

\boldsymbol{D}=\big{[}\begin{array}[]{c|c|c|c|c}\boldsymbol{d}_{1}&\cdots&\boldsymbol{d}_{k}&\cdots&\boldsymbol{d}_{K-1}\end{array}\big{]}=\begin{bmatrix}-1&0&0&0&\dots\\ 1&-1&0&0&\dots\\ 0&1&-1&0&\dots\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 0&\dots&0&0&-1\\ 0&\dots&0&0&1\end{bmatrix}.

(10)

Here, $\boldsymbol{d_{k}}$ corresponds to the $k$ -th column of the temporal (forward) difference operator, $\boldsymbol{D}$ . Within this column, the $k$ -th and $(k+1)$ -th elements possess values of $-1$ and $1$ , respectively, while all other elements are set to $0$ .

In this way, the sought after estimate of the dynamic reconstruction problem with consideration of a maximum rank constraint and temporal and nuclear norm penalties can be formulated as:

\displaystyle\hat{\boldsymbol{F}}

\displaystyle=\operatornamewithlimits{argmin}\limits_{\boldsymbol{F}\in\mathbb{R}^{N\times K}_{R_{\max}}}\text{{J}}(\boldsymbol{F})\coloneqq\text{{L}}(\boldsymbol{F})+\gamma\text{{R${}^{t}$}}(\boldsymbol{F})+\lambda\text{{R${}^{nn}$}}(\boldsymbol{F}),

(11)

where $\mathbb{R}^{N\times K}_{R_{\max}}$ denotes the set of all $N$ -by- $K$ matrices of rank at most $R_{\max}$ and the cost function $\text{{J}}(\boldsymbol{F})$ is comprised of the data fidelity term $\text{{L}}(\boldsymbol{F})$ in Eqn. (7), the temporal regularization term $\text{{R${}^{t}$}}(\boldsymbol{F})$ in Eqn. (9), and the nuclear norm regularization term $\text{{R${}^{nn}$}}(\boldsymbol{F})$ in Eqn. (8). Here, the parameters $\gamma\geq 0$ and $\lambda\geq 0$ control the strength of the temporal and nuclear norm regularization terms, respectively. Similarly to the data fidelity computation in Eqn. (7), the temporal regularization term $\text{{R${}^{t}$}}(\boldsymbol{F})=\sum_{k=1}^{K}\text{{R${}_{k}^{t}$}}(\boldsymbol{F})$ can written as the sum of contributions R ${}_{k}^{t}$ from each imaging frame. Each term is defined as

\text{{R${}_{k}^{t}$}}(\boldsymbol{F})\coloneqq\begin{cases}\left\lVert\boldsymbol{F}\boldsymbol{d}_{k}\right\rVert^{2}_{2}=\lVert\boldsymbol{f}_{k+1}-\boldsymbol{f}_{k}\rVert^{2}_{2}&\text{if }k<K\\ \left\lVert\boldsymbol{F}\boldsymbol{d}_{K}\right\rVert^{2}_{2}=0&\text{if }k=K,\\ \end{cases}

(12)

where, for uniformity of notation, $\boldsymbol{d}_{K}\in\mathbb{R}^{K}$ is the zero vector. To highlight the contribution from each frame, the minimization problem in Eqn. (11) can then be rewritten as

	$\displaystyle\hat{\boldsymbol{F}}$	$\displaystyle=\operatornamewithlimits{argmin}\limits_{\boldsymbol{F}\in\mathbb{R}^{N\times K}_{R_{\max}}}\sum_{k=1}^{K}\text{{L}${}_{k}$}(\boldsymbol{F})+\gamma\sum_{k=1}^{K}\text{{R${}^{t}_{k}$}}(\boldsymbol{F})+\lambda\text{{R${}^{nn}$}}(\boldsymbol{F})$		(13)
		$\displaystyle=\operatornamewithlimits{argmin}\limits_{\boldsymbol{F}\in\mathbb{R}^{N\times K}_{R_{\max}}}\sum_{k=1}^{K}\lVert\boldsymbol{H}_{k}\boldsymbol{f}_{k}-\boldsymbol{g}_{k}\rVert^{2}_{2}+\frac{\gamma}{2}\sum_{k=1}^{K}\left\lVert\boldsymbol{F}\boldsymbol{d}_{k}\right\rVert^{2}_{2}+\lambda\lVert\boldsymbol{F}\rVert_{*}.$		(13)

In the formulated minimization problem, the data fidelity and temporal penalty terms are convex and smooth, while the nuclear norm penalty term is convex but non-smooth. Accordingly, the minimization problem can be solved by use of a proximal gradient descent (PGD) method [39, 40].

The convergence speed of the PGD method can be improved with momentum schemes [41, 42]. To further improve the convergence speed, especially in the early iterations, an ordered subsets (OS) approach[43] can be incorporated with momentum schemes[44, 45], despite the lack of theoretical guarantees of convergence. In addition to acceleration, applying the OS approach to find an approximate solution to Eqn. (13) offers several other significant benefits. Notably, it substantially reduces memory requirements by a factor proportional to the number $M$ of subset used. While the gradient with respect to all imaging frames requires $\mathcal{O}(NK)$ memory usage, the gradient corresponding to each subset only requires $\mathcal{O}(NK/M)$ storage. Furthermore, it preserve the low-rank structure of the reconstructed object function estimate at each step of the proximal gradient iteration.

For the optimization problem given by Eqn. (11), the OS-based cost function can be expressed as[44, 45]:

\text{{J}}_{\mathcal{K}_{j}}(\boldsymbol{F})=M\sum_{k\in\mathcal{K}_{j}}\text{{L}${}_{k}$}(\boldsymbol{F})+M\gamma\sum_{k\in\mathcal{K}_{n}}\text{{R${}^{t}_{k}$}}(\boldsymbol{F})+\lambda\text{{R${}^{nn}$}}(\boldsymbol{F}),

(14)

where, $\mathcal{K}_{j}$ represents the set of frame indices in the $j$ -th ordered subset. The OS approach relies on the “subset balance” approximation[44, 45], implying that $\textbf{J}_{\mathcal{K}_{j}}(\boldsymbol{F})\approx\textbf{J}(\boldsymbol{F})$ . The update procedure in PGD with OS consists of two main steps. Initially, a gradient descent step is performed moving in the negative gradient of the smooth components of the OS-based cost function, which yields

	$\displaystyle\boldsymbol{F}_{j+\frac{1}{2}}$	$\displaystyle=\boldsymbol{F}_{j}-\eta\nabla\Bigl{(}M\sum_{k\in\mathcal{K}_{j}}\text{{L}${}_{k}$}(\boldsymbol{F}_{j})+M\gamma\sum_{k\in\mathcal{K}_{j}}\text{{R${}^{t}_{k}$}}(\boldsymbol{F}_{j})\Bigr{)}$		(15)
		$\displaystyle=\boldsymbol{F}_{j}-\eta M\sum_{k\in\mathcal{K}_{j}}\Bigl{(}\bigl{(}\boldsymbol{H_{k}}^{T}(\boldsymbol{H_{k}}\boldsymbol{f_{k}}-\boldsymbol{g_{k}})\bigr{)}\otimes\boldsymbol{e}_{k}+\gamma\boldsymbol{F}_{j}\bigl{(}\boldsymbol{d}_{k}\otimes\boldsymbol{d}_{k}\bigr{)}\Bigr{)}.$		(15)

Above, $\eta$ is the step size and $\boldsymbol{F}_{j}$ denotes the spatiotemporal object estimate at the beginning of the $j$ -th update. The first component of the gradient, associated with the data fidelity term for the $k$ -th frame, results in an outer product that produces a rank- $1$ matrix. Similarly, the second component of the gradient, related to the temporal regularization term for the $k$ -th frame, also yields a rank- $1$ matrix. Thus, the maximum rank of the gradient of the OS-based cost function $\text{{J}}_{\mathcal{K}_{j}}$ is bounded by $2|\mathcal{K}_{j}|$ , where $|\mathcal{K}_{j}|$ denotes the number of imaging frames in the ordered subset.

Following the gradient descent step, the proximal step is executed, where the proximal operator is applied to account for the nonsmooth component of the objective function. The proximal step corresponds to the solution of the following minimization problem:

\boldsymbol{F}_{j+1}=\operatorname{prox}_{\eta\lambda\|\cdot\|_{*}}\left(\boldsymbol{F}_{j+\frac{1}{2}}\right)\coloneqq\operatornamewithlimits{argmin}\limits_{\boldsymbol{F}\in\mathbb{R}^{N\times K}_{R_{\max}}}\frac{1}{2}\lVert\boldsymbol{F}-\boldsymbol{F}_{j+\frac{1}{2}}\rVert^{2}_{2}+\eta\lambda\lVert\boldsymbol{F}\rVert_{*},

(16)

whose solution can be efficiently implemented via a truncated SVD factorization of $\boldsymbol{F}_{j+\frac{1}{2}}$ and the application of the soft-thresholding operator, $\mathcal{S}(.)$ , to its singular values $\{\sigma_{i}\}$ .[46] The solution of the Eqn. (16) is expressed as following:

F_{j+1}=\boldsymbol{U}_{j+\frac{1}{2}}\mathcal{S}_{\eta\lambda}\left(\boldsymbol{{\Sigma}}_{j+1/2}\right)\boldsymbol{V}_{j+\frac{1}{2}}^{T}=\boldsymbol{U}_{j+\frac{1}{2}}\boldsymbol{\tilde{\Sigma}}_{j+\frac{1}{2}}\boldsymbol{V}_{j+\frac{1}{2}}^{T},

(17)

where $\boldsymbol{U}_{j+\frac{1}{2}}$ , $\boldsymbol{\Sigma}_{j+\frac{1}{2}}$ , $\boldsymbol{V}_{j+\frac{1}{2}}$ stem from the truncated SVD of $\boldsymbol{F}_{j+\frac{1}{2}}$ with maximum rank $R_{\max}$ and $\boldsymbol{\tilde{\Sigma}}_{j+\frac{1}{2}}\coloneqq\mathcal{S}_{\eta\lambda}\left(\boldsymbol{{\Sigma}}_{j+1/2}\right)$ . The soft-thresholding operator, $\mathcal{S}(.)$ , is defined (component-wise) as below:

\mathcal{S}_{\eta\lambda}(\sigma_{i})=\tilde{\sigma_{i}}=\begin{cases}\sigma_{i}-\eta\lambda&\text{if }\sigma_{i}>\eta\lambda\\ 0,&\text{if }\sigma_{i}\leq\eta\lambda.\end{cases}

(18)

The soft-thresholding is computationally efficient and enforces a low-rank structure effectively attenuating the singular values.

Algorithm 1 Low-rank matrix estimation-based spatiotemporal image reconstruction

Input:

R_{\max}

\epsilon

\gamma

\lambda

\eta

M

Output:

\boldsymbol{F}

Initialization:

\boldsymbol{F}^{(0)}_{1}=\boldsymbol{0}

t_{1}^{(0)}=1

b=\lceil K/M\rceil

i=0

while

||\boldsymbol{F}^{(i)}-\boldsymbol{F}^{(i-1)}||_{F}^{2}/\max_{l\leq i}||\boldsymbol{F}^{(l)}-\boldsymbol{F}^{(l-1)}||_{F}^{2}\leq\epsilon

\triangleright

Convergence check

\mathcal{K}=\text{shuffle}(1,2,...,K)

\triangleright

Randomly shuffle frames

for

j=1

M

\mathcal{K}_{j}=\mathcal{K}[(j-1)b+1:\min(jb,K)]

\triangleright

Select the current subset of frames

Compute gradient descent for the current subset of frames:

\boldsymbol{F}^{(i)}_{j+\frac{1}{2}}=\boldsymbol{\bar{F}}^{(i)}_{j}-\eta\nabla\Bigl{(}M\sum_{k\in\mathcal{K}_{j}}\text{{L}${}_{k}$}(\boldsymbol{\bar{F}}_{j}^{(i)})+M\gamma\sum_{k\in\mathcal{K}_{j}}\text{{R${}^{t}_{k}$}}(\boldsymbol{\bar{F}}_{j}^{(i)})\Bigr{)}

\triangleright

\text{rank}(\boldsymbol{F}^{(i)}_{j+\frac{1}{2}})\leq 2R_{max}+2b

Perform proximal operator:[46]

\boldsymbol{U}_{j+\frac{1}{2}}^{(i)},\boldsymbol{\Sigma}_{j+\frac{1}{2}}^{(i)},\boldsymbol{V}_{j+\frac{1}{2}}^{(i)}=\text{SVD}(\boldsymbol{F}_{j+\frac{1}{2}}^{(i)},R_{max})

\triangleright

Compute the truncated SVD with maximum rank

R_{max}

\boldsymbol{\tilde{\Sigma}}_{j+\frac{1}{2}}^{(i)}=\mathcal{S}_{\eta\gamma}(\boldsymbol{\Sigma}_{j+\frac{1}{2}}^{(i)})

\triangleright

Soft thresholding for nuclear norm regularization

\boldsymbol{F}^{(i)}_{j+1}=\boldsymbol{U}_{j+\frac{1}{2}}^{(i)}\boldsymbol{\tilde{\Sigma}}_{j+\frac{1}{2}}^{(i)}(\boldsymbol{V}_{j+\frac{1}{2}}^{(i)})^{T}

\triangleright

\text{rank}(\boldsymbol{F}^{(i)}_{j+1})\leq R_{max}

Implement FISTA acceleration[41]:

t_{j+1}^{(i)}=\frac{1+\sqrt{1+4(t_{j}^{(i)})^{2}}}{2}

\triangleright

Update momentum term

\boldsymbol{\bar{F}}_{j+1}^{(i)}=\boldsymbol{F}_{j+1}^{(i)}+\frac{t_{j}^{(i)}-1}{t_{j}^{(i)}}(\boldsymbol{F}_{j+1}^{(i)}-\boldsymbol{F}_{j}^{(i)})

\triangleright

FISTA update for the current iteration;

\text{rank}(\boldsymbol{\bar{F}}_{j+1}^{(i)})\leq 2R_{max}

end for

\boldsymbol{F}^{(i+1)}=\boldsymbol{F}_{N+1}^{(i)}

\boldsymbol{\bar{F}}_{1}^{(i+1)}=\boldsymbol{\bar{F}}_{N+1}^{(i)}

t_{1}^{(i+1)}=t_{N+1}^{(i)}

i=i+1

\triangleright

Increment outer loop iteration

end while

Algorithm 1 summarizes the proposed accelerated proximal gradient descent algorithm, integrating both momentum and ordered subset techniques to efficiently find an approximate solution to the minimization problem in Eqn. (11). The algorithm takes the following parameters as input: the maximum allowed rank $R_{\max}$ ; the threshold $\epsilon$ for the stopping criterion; the regularization parameter $\gamma$ for temporal regularization; the regularization parameter $\lambda$ for nuclear-norm regularization; the step size $\eta$ ; and the number $M$ of subsets. At the beginning of each iteration, the sequence of frame indices undergoes random shuffling. Within the subsequent inner loop, these shuffled indices are partitioned into $M$ subsets. The gradient pertaining to both the data fidelity and temporal regularization components is computed for these subset frame indices, and the gradient descent step is executed. Then, the proximal mapping associated with nuclear norm regularization is evaluated efficiently by using randomized singular value decomposition (SVD)[47] and soft thresholding. Subsequent to this step, the FISTA [41] acceleration is deployed to enhance the convergence rate. The algorithm terminates when the squared Frobenius norm of the difference between two successive iterations $||\boldsymbol{F}^{(i)}-\boldsymbol{F}^{(i-1)}||_{F}^{2}$ , normalized by its maximum $\max_{l\leq i}||\boldsymbol{F}^{(l)}-\boldsymbol{F}^{(l-1)}||_{F}^{2}\bigr{)}$ over the previous iterations, falls below the threshold $\epsilon$ defined by the user. This metric is equivalent to monitoring the norm of the gradient in smooth optimization.[40]

4 Study Description

4.1 3D PACT Imaging System Specifications for Experimental and Numerical Studies

The TriTom preclinical imaging system[48, 49] developed by PhotoSound was employed for the experimental studies and emulated for the numerical in-silico studies. It integrates photoacoustic (PA) and fluorescence (FL) imaging modalities, harnessing their individual strengths. The system comprises a central rotary scanning stage, an optical excitation setup for PA and FL imaging, a curvilinear 96-element PA transducer array, and a fluorescence-enabled scientific complementary metal-oxide-semiconductor (sCMOS) camera. The configuration of the TriTom setup is depicted in the left panel of Figure 2, and the imaging chamber is shown in the right panel.

During imaging, the object is immersed in water and continuously rotated while optically stimulated by a short laser pulse with a 10Hz repetition rate. The maximum rotation speed is 10 degrees per second, and therefore completing a full 360-degree scan requires 36 seconds. Four optical fiber bundles are located on the outer circumference of the cylindrical imaging chamber, perpendicular to the scanning plane of the PA array. The laser emits pulses in the 670 to 1064 nm wavelength range at a frequency of 10 Hz, with each pulse lasting 5 ns. Acoustic waves generated by the laser excitation are detected by the PA transducer array, which comprises piezoelectric transducer elements, each measuring 1.3 $\times$ 1.3 mm². The center frequency of the transducer elements is 6 MHz $\pm$ 10 $\%$ (at -6 dB) with bandwidth $\geq$ 55 $\%$ . The array is vertically oriented and cylindrically focused, with the central element positioned 65 mm from the center of the imaging chamber. For fluorescence imaging, an sCMOS camera with a 2048 $\times$ 2040 pixel resolution and a 40 $\times$ 40 mm² field-of-view is placed outside the imaging chamber.

In the acoustic modeling of the TriTom imaging system, the transducer array was assumed to consist of idealized point-like transducers positioned at the central locations of the transducer elements. The acoustic simulation was implemented with a GPU-accelerated D-D imaging model[31] assuming acoustically homogeneous medium. Although the TriTom system has a single transducer arc, other existing 3D PACT designs (e.g. [50, 20]) feature multiple transducer arcs, thus the numerical studies also explored scenarios where multiple tomographic measurements were acquired per imaging frame. Specifically, measurement configurations in which 2 transducer arcs separated by 90^∘ and 4 transducer arcs separated by 45^∘ were considered, as illustrated in Figure 3. The sampling rate for both the experimental and numerical studies was set to 31.25 MHz, with 2048 temporal samples collected per imaging frame.

4.2 Inverse Crime Validation Study

To verify that Algorithm 1 was correctly implemented, an inverse crime validation study was conducted in silico in which a simple rank-4 dynamic phantom was employed. The phantom consisted of 40 $\times$ 40 $\times$ 3 spatial voxels and 360 object frames, with a voxel size of 0.4 $\times$ 0.4 $\times$ 0.4 mm³. The induced pressure in the phantom was assumed constant along the z-axis and piecewise constant within the xy-plane at each object frame. The phantom was structured into four distinct regions, each characterized by varying temporal activities as shown in Figure 4. The left panel of Figure 4 illustrates the central z-slice of the phantom at the 120-th object frame, while the right panel displays the time activity curves corresponding to the numbered regions. For each imaging frame (360 in total), 4 tomographic measurements separated by 45^∘ were considered as depicted in the right panel of Figure 3. Simulated acoustic pressure data were generated using a grid voxel size of 0.4 $\times$ 0.4 $\times$ 0.4 mm³, assuming the phantom was centered within the imaging system [31]. The speed of sound was assumed constant at 1495 m/s, and no noise was added to the simulated measurement data.

During image reconstruction, the same computational grid that was used for generating the measurement data was employed. In the algorithm, the maximum allowed rank, $R_{max}$ , was set to 4, and no temporal or nuclear norm penalties were applied ( $\lambda=\gamma=0$ ). The step-size, $\eta$ , was tuned empirically to ensure convergence. To explore the impact of the number of ordered subsets used in the randomized evaluation of the data fidelity term, three different number of ordered subsets were investigated: $M\in\{1,2,6\}$ .

4.3 Numerical Phantom Study

A dynamic numerical phantom was utilized to evaluate the performance of the proposed method through in silico experiments. The phantom consisted of 360 object frames and contained four convex ellipsoidal blobs within a larger ellipsoidal blob, along with a vasculature mimicking structure as shown in the bottom panel of Figure 5. The size of the numerical phantom was 40 $\times$ 40 $\times$ 30 mm³, with a voxel size of 0.4 $\times$ 0.4 $\times$ 0.4 mm³. The time activity at each voxel was designed to mimic a contrast agent’s flow along the paths from ellipsoidal blobs 1 to 2 and 3 to 4. The top right panel of Figure 5 illustrates the time activity at the center of each ellipsoidal blob, and the blobs are numbered in the bottom right panel. The singular values of the dynamic numerical phantom are shown in Figure 5, where a rapid singular value decay is observed. The rapid singular decay indicates that the phantom can be accurately approximated with a low-rank representation, as demonstrated by the mean squared error (MSE) vs. rank plot depicted in the top middle panel of Figure 5.

To generate the synthetic measurement data, the grid voxel size was set to 0.2 $\times$ 0.2 $\times$ 0.2 mm³, and 360 imaging frames (corresponding to a complete rotation of the system) were simulated. The speed of sound was assumed to be constant at 1495 m/s. Zero-mean Gaussian noise with a standard deviation equivalent to a specified percentage of the maximum value of the simulated data (1%, 3%, or 5%) was added to the pressure signals.

The primary objective of this numerical study was to investigate the effect of the following physical factors on the proposed method:

•

Number of tomographic measurements per imaging frame: In this study, the noise level was fixed at 1%, and three scenarios were examined with 1, 2, and 4 tomographic measurements per imaging frame, as illustrated in Figure 3.
•

Measurement noise level: In this study, the number of tomographic measurements per imaging frame was set to 2, and three different noise levels were considered: 1%, 3%, and 5%.

For image reconstruction, the maximum allowed rank in the algorithm was set to $R_{max}=40$ , as it allows for an accurate approximation of the dynamic numerical phantom with MSE around $10^{-8}$ , as shown in Figure 5. To avoid the discretization inverse crime, a coarser grid with voxel size 0.4 $\times$ 0.4 $\times$ 0.4 mm³ was used for the reconstruction. The speed of sound value was the same used for simulating the data. The threshold $\epsilon$ for the stopping criterion of Algorithm 1 was set to $2.5\times 10^{-1}$ . The step-size, $\eta$ , was tuned empirically to ensure convergence. The number of subsets, $M$ , was set to 18. To select appropriate regularization parameters, the balancing principle [51] was employed to reduce the number of tunable regularization parameters from two to one. The balancing principle rescales each regularization term by an estimate of their value at the object function $\boldsymbol{F^{true}}$ . While an iterative procedure is proposed in [51] to estimate such values, for simplicity, this work assumes direct knowledge of the actual nuclear norm and the Frobenius norm of the temporal difference of $\boldsymbol{F^{true}}$ . Specifically, in this study, the solution to the dynamic image reconstruction problem was defined as:

\hat{\boldsymbol{F}}=\operatornamewithlimits{argmin}\limits_{\boldsymbol{F}}\frac{1}{2}\sum_{k=1}^{K}\lVert\boldsymbol{H_{k}}\boldsymbol{f_{k}}-\boldsymbol{g_{k}}\rVert^{2}+\kappa\left(\frac{\sum_{k=1}^{K-1}\lVert\boldsymbol{f_{k+1}}-\boldsymbol{f_{k}}\rVert^{2}}{\sum_{k=1}^{K-1}\lVert\boldsymbol{f_{k+1}^{true}}-\boldsymbol{f_{k}^{true}}\rVert^{2}}+\frac{\lVert\boldsymbol{F}\rVert_{*}}{\lVert\boldsymbol{F^{true}}\rVert_{*}}\right).

(19)

For each case, four different values of the parameter $\kappa$ were explored: $\kappa\in\{10^{-4}\lVert\boldsymbol{G}\rVert_{F}^{2},5\times 10^{-4}\lVert\boldsymbol{G}\rVert_{F}^{2},2.5\times 10^{-3}\lVert\boldsymbol{G}\rVert_{F}^{2},1.25\times 10^{-2}\lVert\boldsymbol{G}\rVert_{F}^{2}\}$ . The $\kappa$ parameter yielding the best average normalized squared error (nSE) over frames was selected [52]. The nSE for each frame was computed as:

\text{nSE}=||\boldsymbol{f_{k}^{true}}-\hat{\boldsymbol{f_{k}}}||_{2}^{2}/\operatornamewithlimits{max}\limits_{\boldsymbol{k}}||\boldsymbol{f_{k}^{true}}||_{2}^{2}.

(20)

Specifically, the regularization parameter value that yielded the optimal results was $\kappa=5\times 10^{-4}\lVert\boldsymbol{G}\rVert_{F}^{2}$ when 4 tomographic measurements per frame were available, and $\kappa=2.5\times 10^{-3}\lVert\boldsymbol{G}\rVert_{F}^{2}$ for all other cases.

4.4 Experimental Study

An open-ended dynamic flow phantom was constructed by bending a silicone tube into a U-shaped structure, as depicted in Figure 6. The left panel of the figure provides an illustration of the phantom within the imaging chamber, while the right panel shows an actual photograph of the physical phantom. The inner diameter of the silicone tube was 0.0635 cm, and the outer diameter was 0.1194 cm. The dynamic phantom, positioned at the center of the imaging system, was illuminated by a laser pulse with a wavelength of 770 nm and an energy of 100 mJ (before entering the fiber optic light delivery unit). PACT data were acquired with the TriTom imaging system during the injection of a photoacoustic-fluorescent contrast agent (PAtIR[53]) through one end of the tube. Over a span of 36 seconds, the dynamic phantom underwent scanning which resulted in 360 imaging frames, with a frame rate of 0.1 seconds and a single tomographic measurement per imaging frame. Two-dimensional fluorescence images were concurrently gathered, which serve as reference for the time evolution of the contrast agent concentration within the tube.

Based on region illuminated by laser, the reconstruction volume was set to a region of 30 $\times$ 30 $\times$ 30 mm³ located at the center of the imaging system. The voxel size was set to 0.4 $\times$ 0.4 $\times$ 0.4 mm³, resulting in 75 $\times$ 75 $\times$ 75 spatial voxels. The maximum allowed rank, $R_{max}$ , was set to 40, and the number of subsets, $M$ , was set to 18. For the stopping criterion of Algorithm 1, the threshold $\epsilon$ was set to $5\times 10^{-1}$ . The step-size, $\eta$ , was tuned empirically to ensure convergence. To calibrate the speed of sound, multiple static estimates of the dynamic object were reconstructed using the Universal Back-Projection (UBP) algorithm[54] assuming speed of sound values in the range of 1480-1520 m/s, with 5 m/s increments. The speed of sound value of 1495 m/s resulted in the best visual appearance and was selected to perform the dynamic image reconstruction. To choose the regularization parameters, 9 dynamic image reconstructions where performed for all possible combinations of $\gamma\in\{5.5\times 10^{0},5.5\times 10^{1},5.5\times 10^{2}\}$ , and $\lambda\in\{10^{-1},10^{0},10^{1}\}$ . Ultimately, the spatiotemporal object estimate that most closely captured the observed dynamic changes in the reference fluorescence images, as determined through visual examination, was selected. The corresponding regularization parameters were $\gamma=5.5\times 10^{1}$ and $\lambda=10^{0}$ .

5 Results

5.1 Inverse Crime Validation Study Results

In this inverse crime study, the step sizes $\eta$ for $M=1,2,6$ were empirically tuned to ensure convergence, resulting in values of $\{10^{-4},10^{-4},3.3\times 10^{-5}\}$ , respectively. The algorithm was allowed to run for 2500 iterations in each case to ensure convergence of the solution up to numerical precision. For the purpose of this validation study, Algorithm 1 was modified to re-evaluate the total data fidelity term $\text{{L}}(\boldsymbol{F}_{i})$ , which accumulates the contributions from all time frames, at each iteration.

The results of the validation study are depicted in Figure 7. The left panel exhibits the data fidelity $\text{{L}}(\boldsymbol{F}_{i})$ vs. iteration count, while the right panel presents the average nSE vs. iteration count. For all values of $M$ , A significant decrease of approximately 11 and 13 orders of magnitude can be observed in the data fidelity term and average nSE, respectively. This indicates that the proposed method using momentum-acceleration in combination with ordered subsets (case $M\in\{2,6\}$ ), although lacking theoretical guarantee of converge, can achieve (up to machine precision) to the same object estimate produced by the momentum-accelerated proximal gradient descent without subsampling method ( $M=1$ ). It is also evident that a larger number of subsets improves the convergence speed, particularly in the early iterations. This also translate in possibly faster time to solution, as the cost of each iteration is dominated by the evaluation of the imaging operator, while the time spent in performing the truncated SVD factorization using randomized method is neglegible. In the numerical studies presented here, the computational time required per iteration was approximately 15 minutes on a workstation (AMD EPYC 7702P 64-Core processor, 32GB RAM, one Nvidia Geforce RTX 2080 graphic processing unit) independently of the number $M$ of subsets used.

In summary, this validation study demonstrates the correct implementation, efficiency, and robust convergence, despite lack of theoretical guarantees, of the proposed method.

5.2 Numerical Phantom Study Results

Sensitivity to the number of tomographic measurements per imaging frame:

Figure 8 displays a selection of MIP images depicting the numerical phantom and the spatiotemporal estimates from simulated data with varying numbers of tomographic measurements per imaging frame. A video featuring the spatiotemporal evolution of the numerical phantom and its corresponding estimates is available as supplemental material[55]. Upon careful inspection, it is apparent that increasing the number of tomographic measurements per imaging frame leads to more accurate estimates of the dynamic object. For instance, object estimates reconstructed from data with 2 and 4 measurements per imaging frame correctly capture the increase in intensity of blob-4 after frame-301. However, this intensity change is less prominent in the dynamic estimate reconstructed from data with only one tomographic measurement per imaging frame, which also exhibits artifacts around blob-4. In addition, the estimate reconstructed using 4 measurements per imaging frame better captures the intensity difference between the two ends of the blob-1 in frame-1. These observations are more evident in Figure 9, which shows the time-activity curves (TACs) at each ellipsoidal blob’s center in both the numerical phantom and the spatiotemporal estimates from data with different numbers of tomographic measurements per imaging frame. Notably, the TACs for the reconstruction from 4 tomographic measurements per imaging frame closely aligns with those of the numerical phantom. The normalized squared error (nSE) vs. imaging frame plot for the reconstructions with varying numbers of tomographic measurements per imaging frame is depicted in the left panel of Figure 11. Both the TACs and nSE vs. frame plots confirm the observation that a higher number of tomographic measurements per imaging frame leads to more accurate spatiotemporal reconstructions.

Sensitivity to measurement noise:

Figure 10 displays the TACs at each ellipsoidal blob’s center in both the numerical phantom and its spatiotemporal estimates from data with varying noise levels, when the number of tomographic measurements per frame is kept at two. Notably, the estimated TACs remain similar across all noise levels, highlighting the algorithm’s robustness in capturing dynamic changes even in the presence of increased noise. Figure 11 (right panel) shows the nSE vs. imaging frame plot for estimates reconstructed from data with varying noise levels. As anticipated, the nSE exhibits an upward trend with increasing noise levels.

The numerical phantom study results underscore the effectiveness of the proposed method in accurately estimating dynamic changes, when limited tomographic measurements per frame are available. The study reveals that the ill-posed nature of the dynamic reconstruction problem diminishes significantly as the number of tomographic measurements per imaging frame increases. Notably, the method’s robustness in faithfully capturing the temporal dynamics of the object remains evident even in the presence of increased noise levels. These findings collectively underscore the algorithm’s reliability and potential significance in addressing challenges associated with dynamic imaging scenarios.

5.3 Experimental Study Results

Figure 12 displays dynamic PACT images reconstructed from experimental data (bottom row), along with the corresponding reference fluorescence images (top row). The fluorescence images were processed to suppress the background, and the contrast agent is highlighted in yellow. This enhancement was accomplished through manual segmentation of the tube and contrast agent from the raw fluorescence images. The spatiotemporal PACT object estimates were visualized using ParaView [56]. To ensure a qualitatively close alignment of the field-of-view between the ParaView visualization of the spatiotemporal PACT object estimate and the 2D fluorescence images, the view angle in ParaView was adjusted manually at each frame; however, a slight misalignment remains. A video featuring the raw images, visually enhanced images through the segmentation of the tube and contrast agent, and ParaView visualizations of the spatiotemporal PACT object estimate, is provided in the supplemental materials[55].

In particular, when examining frames 30 to 120, one can readily observe the accurate recovery of dynamic contrast flow within the tube. A similar trend is evident in frames 210 to 300, with the lower right section of frame-240 showing the presence of the contrast agent, albeit with reduced contrast. This reduced contrast is also noticeable in specific frames such as 330 and 360, possibly due to light obstruction by other tube segments. Nevertheless, the overall effective recovery of dynamic flow remains apparent, as convincingly demonstrated in the supplementary video. This highlights the effectiveness and practicality of the proposed method for experimental data beyond simulated measurements, even when only one tomographic measurement per imaging frame is available.

Furthermore, it is worth noting that the spatiotemporal reconstruction offers a frame rate of 0.1 seconds. In contrast, an FBFIR technique would provide a frame rate of 36 seconds, which is the time required for a complete tomographic measurement. This underscores the reconstruction method’s potential for monitoring dynamic physiological processes that demand enhanced frame rate. This can enhance the value of 3D PACT systems that involve rotating measurement gantries for preclinical research, enabling spatiotemporal image reconstruction from limited tomographic measurements per imaging frame.

6 Conclusion and Discussion

This study presents an accurate and computationally efficient LRME-STIR method for dynamic PACT attuned for commercially available volumetric imagers that employ a rotating measurement gantry in which the tomographic data are sequentially acquired. The implementation of the method was verified by an inverse-crime numerical validation study. The effect of varying number of tomographic measurements per imaging frame and the noise level on the method’s accuracy was investigated in an in-silico numerical study. The experimental study demonstrated the LRME-STIR method’s ability to reconstruct the flow of a contrast agent at a frame rate of 0.1 s, even when only a single tomographic measurement per imaging frame was available. Numerical and experimental studies confirm the accuracy of the proposed technique. Thus, this work will potentially have an immediate and sustained positive impact by creating a new capacity to perform dynamic 3D PACT.

The LRME-STIR method proposed in this study employs an accelerated proximal gradient descent method combined with an ordered subsets approach, distinguishing it from previously proposed LRME-STIR methods[32, 33, 26, 37]. This approach yields a rapid and computationally efficient algorithm with reduced memory demands. Nonetheless, it should be noted that the proposed approach lacks theoretical convergence guarantees. While in theory the method may exhibit instabilities, numerical and experimental studies showcase the robustness of the proposed method when the regularization strength and step size are properly chosen.

Future enhancements to the method may involve exploring different heuristics to enhance its stability[44, 45]. Subsequent investigations may focus on evaluating the method’s efficacy in imaging complex dynamic physiological processes and quantifying relevant parameters, such as wash-in and wash-out rates for tumor vascular perfusion [57, 34]. These evaluations might encompass both in-vivo and in-silico experiments, further advancing the understanding and application of the proposed method.

Disclosures

The author S. Ermilov discloses ownership interest in Photosound Technologies, Inc. Other authors have no relevant financial interests and no potential conflicts of interest to disclose.

Acknowledgements

This work was supported in part by the National Institutes of Health (NIH Grant Nos. OD023029 and EB031585).

Code, Data, and Materials Availability

Data and code are available upon request. Please contact Dr. Villa at \linkable[email protected].

References

[1] X. Wang, Y. Pang, G. Ku, et al., “Noninvasive laser-induced photoacoustic tomography for structural and functional in vivo imaging of the brain,” Nature biotechnology 21(7), 803–806 (2003).
[2] H.-P. Brecht, R. Su, M. Fronheiser, et al., “Whole-body three-dimensional optoacoustic tomography system for small animals,” Journal of biomedical optics 14(6), 064007–064007 (2009).
[3] S. Yang, D. Xing, Q. Zhou, et al., “Functional imaging of cerebrovascular activities in small animals using high-resolution photoacoustic tomography,” Medical physics 34(8), 3294–3301 (2007).
[4] G. C. Kagadis, G. Loudos, K. Katsanos, et al., “In vivo small animal imaging: current status and future prospects,” Medical physics 37(12), 6421–6442 (2010).
[5] G. Loudos, G. C. Kagadis, and D. Psimadas, “Current status and future perspectives of in vivo small animal imaging using radiolabeled nanoparticles,” European journal of radiology 78(2), 287–295 (2011).
[6] B. L. Franc, P. D. Acton, C. Mari, et al., “Small-animal spect and spect/ct: important tools for preclinical investigation,” Journal of nuclear medicine 49(10), 1651–1663 (2008).
[7] P. Carmeliet, “Angiogenesis in life, disease and medicine,” Nature 438(7070), 932–936 (2005).
[8] P. Carmeliet and R. K. Jain, “Angiogenesis in cancer and other diseases,” Nature 407(6801), 249–257 (2000).
[9] J. Xia and L. V. Wang, “Small-animal whole-body photoacoustic tomography: A review,” IEEE Transactions on Biomedical Engineering 61(5), 1380–1389 (2014).
[10] L. Li, L. Zhu, C. Ma, et al., “Single-impulse panoramic photoacoustic computed tomography of small-animal whole-body dynamics at high spatiotemporal resolution,” Nature biomedical engineering 1(5), 0071 (2017).
[11] P. K. Upputuri and M. Pramanik, “Dynamic in vivo imaging of small animal brain using pulsed laser diode-based photoacoustic tomography system,” Journal of biomedical optics 22(9), 090501–090501 (2017).
[12] S. Shi, X. Wen, T. Li, et al., “Thermosensitive biodegradable copper sulfide nanoparticles for real-time multispectral optoacoustic tomography,” ACS applied bio materials 2(8), 3203–3211 (2019).
[13] S. Manohar and M. Dantuma, “Current and future trends in photoacoustic breast imaging,” Photoacoustics 16, 100134 (2019).
[14] L. Lin, P. Hu, J. Shi, et al., “Single-breath-hold photoacoustic computed tomography of the breast,” Nature communications 9(1), 2352 (2018).
[15] J. Gamelin, A. Maurudis, A. Aguirre, et al., “A real-time photoacoustic tomography system for small animals,” Optics express 17(13), 10489–10498 (2009).
[16] W. He, K. Ai, C. Jiang, et al., “Plasmonic titanium nitride nanoparticles for in vivo photoacoustic tomography imaging and photothermal cancer therapy,” Biomaterials 132, 37–47 (2017).
[17] P. K. Upputuri, V. Periyasamy, S. K. Kalva, et al., “A high-performance compact photoacoustic tomography system for in vivo small-animal brain imaging,” JoVE (Journal of Visualized Experiments) (124), e55811 (2017).
[18] T. Shan, Y. Zhao, S. Jiang, et al., “In-vivo hemodynamic imaging of acute prenatal ethanol exposure in fetal brain by photoacoustic tomography,” Journal of biophotonics 13(5), e201960161 (2020).
[19] H. P. Brecht, V. Ivanov, D. S. Dumani, et al., “A 3D imaging system integrating photoacoustic and fluorescence orthogonal projections for anatomical, functional and molecular assessment of rodent models,” in Photons Plus Ultrasound: Imaging and Sensing 2018, 10494, 81–88, SPIE (2018).
[20] L. Lin, P. Hu, X. Tong, et al., “High-speed three-dimensional photoacoustic computed tomography for preclinical research and clinical translation,” Nature communications 12(1), 882 (2021).
[21] S. A. Ermilov, T. Khamapirad, A. Conjusteau, et al., “Laser optoacoustic imaging system for detection of breast cancer,” Journal of biomedical optics 14(2), 024007–024007 (2009).
[22] L. Xiang, B. Wang, L. Ji, et al., “4-D photoacoustic tomography,” Scientific reports 3(1), 1–8 (2013).
[23] J.-F. Cai, X. Jia, H. Gao, et al., “Cine cone beam CT reconstruction using low-rank matrix factorization: algorithm and a proof-of-principle study,” IEEE transactions on medical imaging 33(8), 1581–1591 (2014).
[24] M. Wernick, E. Infusino, and M. Milosevic, “Fast spatio-temporal image reconstruction for dynamic PET,” IEEE Transactions on Medical Imaging 18(3), 185–195 (1999).
[25] Q. Ding, M. Burger, and X. Zhang, “Dynamic SPECT reconstruction with temporal edge correlation,” Inverse Problems 34(1), 014005 (2017).
[26] J. P. Haldar and Z.-P. Liang, “Spatiotemporal imaging with partially separable functions: A matrix recovery approach,” in 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 716–719, IEEE (2010).
[27] K. Wang, J. Xia, C. Li, et al., “Fast spatiotemporal image reconstruction based on low-rank matrix estimation for dynamic photoacoustic computed tomography,” Journal of biomedical optics 19(5), 056007–056007 (2014).
[28] S. Arridge, P. Beard, M. Betcke, et al., “Accelerated high-resolution photoacoustic tomography via compressed sensing,” Physics in Medicine & Biology 61(24), 8908 (2016).
[29] A. Özbek, X. L. Deán-Ben, and D. Razansky, “Compressed optoacoustic sensing of volumetric cardiac motion,” IEEE Transactions on Medical Imaging 39(10), 3250–3255 (2020).
[30] F. Lucka, N. Huynh, M. Betcke, et al., “Enhancing compressed sensing 4D photoacoustic tomography by simultaneous motion estimation,” SIAM Journal on Imaging Sciences 11(4), 2224–2253 (2018).
[31] K. Wang, C. Huang, Y.-J. Kao, et al., “Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units,” Medical physics 40(2), 023301 (2013).
[32] Z.-P. Liang, “Spatiotemporal imagingwith partially separable functions,” in 2007 4th IEEE international symposium on biomedical imaging: from nano to macro, 988–991, IEEE (2007).
[33] C. Brinegar, H. Zhang, Y.-J. L. Wu, et al., “Real-time cardiac mri using prior spatial-spectral information,” in 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 4383–4386, IEEE (2009).
[34] R. M. Cam, C. Wang, S. Park, et al., “Dynamic image reconstruction to monitor tumor vascular perfusion in small animals using 3d photoacoustic computed-tomography imagers with rotating gantries,” in Photons Plus Ultrasound: Imaging and Sensing 2023, 12379, 78–83, SPIE (2023).
[35] R. M. Cam, C. Wang, W. Thompson, et al., “Low-rank matrix estimation-based spatiotemporal image reconstruction from few tomographic measurements per frame for dynamic photoacoustic computed tomography,” in Medical Imaging 2023: Physics of Medical Imaging, 12463, 124630R, SPIE (2023).
[36] L. Lozenski, M. A. Anastasio, and U. Villa, “A memory-efficient self-supervised dynamic image reconstruction method using neural fields,” IEEE Transactions on Computational Imaging 8, 879–892 (2022).
[37] S. G. Lingala, Y. Hu, E. DiBella, et al., “Accelerated dynamic mri exploiting sparsity and low-rank structure: kt slr,” IEEE transactions on medical imaging 30(5), 1042–1054 (2011).
[38] S. Gu, L. Zhang, W. Zuo, et al., “Weighted nuclear norm minimization with application to image denoising,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2862–2869 (2014).
[39] R. Tibshirani et al., “Proximal gradient descent and acceleration,” Lecture Notes (2010).
[40] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” Fixed-point algorithms for inverse problems in science and engineering , 185–212 (2011).
[41] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences 2(1), 183–202 (2009).
[42] Y. Nesterov, “Gradient methods for minimizing composite functions,” Mathematical programming 140(1), 125–161 (2013).
[43] H. M. Hudson and R. S. Larkin, “Accelerated image reconstruction using ordered subsets of projection data,” IEEE transactions on medical imaging 13(4), 601–609 (1994).
[44] D. Kim, S. Ramani, and J. A. Fessler, “Combining ordered subsets and momentum for accelerated x-ray ct image reconstruction,” IEEE transactions on medical imaging 34(1), 167–178 (2014).
[45] V. Haase, K. Stierstorfer, K. Hahn, et al., “An improved combination of ordered subsets and momentum for fast model-based iterative ct reconstruction,” in Medical Imaging 2020: Physics of Medical Imaging, 11312, 391–397, SPIE (2020).
[46] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on optimization 20(4), 1956–1982 (2010).
[47] N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review 53(2), 217–288 (2011).
[48] H. P. Brecht, V. Ivanov, D. S. Dumani, et al., “A 3D imaging system integrating photoacoustic and fluorescence orthogonal projections for anatomical, functional and molecular assessment of rodent models,” in Photons Plus Ultrasound: Imaging and Sensing 2018, A. A. Oraevsky and L. V. Wang, Eds., 10494, 81 – 88, International Society for Optics and Photonics, SPIE (2018).
[49] W. R. Thompson, H.-P. F. Brecht, V. Ivanov, et al., “Characterizing a photoacoustic and fluorescence imaging platform for preclinical murine longitudinal studies,” Journal of Biomedical Optics 28(3), 036001 (2023).
[50] S. M. Schoustra, D. Piras, R. Huijink, et al., “Twente Photoacoustic Mammoscope 2: system overview and three-dimensional vascular network images in healthy breasts,” Journal of Biomedical Optics 24(12), 121909 (2019).
[51] K. Ito, B. Jin, and T. Takeuchi, “Multi-parameter tikhonov regularization—an augmented approach,” Chinese Annals of Mathematics, Series B 35(3), 383–398 (2014).
[52] S. Pereverzev and E. Schock, “On the adaptive selection of the parameter in regularization of ill-posed problems,” SIAM Journal on Numerical Analysis 43(5), 2060–2076 (2005).
[53] W. Thompson, A. Yu, D. S. Dumani, et al., “A preclinical small animal imaging platform combining multi-angle photoacoustic and fluorescence projections into co-registered 3D maps,” in Photons Plus Ultrasound: Imaging and Sensing 2020, A. A. Oraevsky and L. V. Wang, Eds., 11240, 112400L, International Society for Optics and Photonics, SPIE (2020).
[54] M. Xu and L. V. Wang, “Universal back-projection algorithm for photoacoustic computed tomography,” Physical Review E 71(1), 016706 (2005).
[55] “dynamic pact videos.” https://uofi.box.com/s/97eqmnzctj7ctrz1rxa91gnwwg9pt8wt. Accessed: 2023-09-28.
[56] J. Ahrens, B. Geveci, and C. Law, ParaView: An End-User Tool for Large Data Visualization, 717–731. Elsevier Butterworth-Heinemann, Burlington, MA, USA (2005). [doi:10.1016/B978-012387582-2/50038-1].
[57] C. W. Hupple, S. Morscher, N. C. Burton, et al., “A light-fluence-independent method for the quantitative analysis of dynamic contrast-enhanced multispectral optoacoustic tomography (dce msot),” Photoacoustics 10, 54–64 (2018).