This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\UseRawInputEncoding

Electromagnetic Property Sensing and Channel Reconstruction Based on Diffusion Schrödinger Bridge in ISAC

Yuhua Jiang, Feifei Gao, and Shi Jin Y. Jiang and F. Gao are with Institute for Artificial Intelligence, Tsinghua University (THUAI), State Key Lab of Intelligent Technologies and Systems, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Beijing, P.R. China (email: [email protected], [email protected]). S. Jin is with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: [email protected]).
Abstract

Integrated sensing and communications (ISAC) has emerged as a transformative paradigm for next-generation wireless systems. In this paper, we present a novel ISAC scheme that leverages the diffusion Schrödinger bridge (DSB) to realize the sensing of electromagnetic (EM) property of a target as well as the reconstruction of the wireless channel. The DSB framework connects EM property sensing and channel reconstruction by establishing a bidirectional process: the forward process transforms the distribution of EM property into the channel distribution, while the reverse process reconstructs the EM property from the channel. To handle the difference in dimensionality between the high-dimensional sensing channel and the lower-dimensional EM property, we generate latent representations using an autoencoder network. The autoencoder compresses the sensing channel into a latent space that retains essential features, which incorporates positional embeddings to process spatial context. The simulation results demonstrate the effectiveness of the proposed DSB framework, which achieves superior reconstruction of the target’s shape, relative permittivity, and conductivity. Moreover, the proposed method can also realize high-fidelity channel reconstruction given the EM property of the target. The dual capability of accurately sensing the EM property and reconstructing the channel across various positions within the sensing area underscores the versatility and potential of the proposed approach for broad application in future ISAC systems.

Index Terms:
Electromagnetic (EM) property sensing, channel reconstruction, integrated sensing and communications (ISAC), diffusion Schrödinger bridge (DSB), generative artificial intelligence (GAI)

I Introduction

Integrated sensing and communications (ISAC) has recently garnered considerable attention from both academic and industrial experts, particularly due to its promising implications for the sixth-generation (6G) wireless networks [1, 2]. Unlike the conventional frequency-division sensing and communications (FDSAC) paradigm, which necessitates distinct frequency bands and infrastructure for each operational function, ISAC enables the simultaneous sharing of time, frequency, power, and hardware resources for both communications and sensing functionalities. It is anticipated that ISAC will surpass FDSAC in terms of spectrum efficiency, energy conservation, and hardware demands [3, 4, 5]. Moreover, ISAC has the potential to be integrated with other innovative technologies, such as reconfigurable intelligent surfaces (RISs), to augment the efficacy of sensing and communications systems [6]. Given its myriad advantages, ISAC is expected to play a pivotal role in various emerging applications [7, 8, 9], including digital twins that effectively bridge the physical world with its virtual equivalent in the communications domain [10]. In contrast to image-centric digital twins that prioritize shape and spatial orientation, digital twins designed for communications systems are responsible for the complex reconstruction of communications pathways and the management of channel-specific issues.

Electromagnetic (EM) property sensing represents a groundbreaking advancement in ISAC systems, which leverages the unique property of EM waves to simultaneously sense the environment and enable communications. This novel approach, as discussed in [11], introduces a paradigm shift by using orthogonal frequency division multiplexing (OFDM) signals to acquire the target’s EM property and identify the material of the target. The integration of multiple base stations (BSs) enhances the performance and accuracy of EM property sensing, as explored in [12], where sensing algorithms and pilot are meticulously designed to optimize the sensing process. Additionally, diffusion models have been employed to refine the EM property sensing in ISAC systems, which offers a robust framework to accurately detect and interpret environmental EM characteristics [13].

Meanwhile, wireless channel reconstruction is also a critical aspect of modern wireless systems, which enables accurate signal processing and improved system performance. Recent advancements, such as deep learning and variational Bayesian methods, have significantly enhanced the accuracy and efficiency of channel reconstruction techniques. For instance, deep plug-and-play priors have been proposed to facilitate multitask channel reconstruction in massive multiple-input multiple-output (MIMO) systems, demonstrating notable improvement in handling complex channel conditions [14]. Additionally, variational Bayesian learning has been leveraged to optimize localization and channel reconstruction in RIS-aided systems, offering robust performance in the face of channel uncertainties [15]. In the realm of MIMO systems with doubly selective channels, diagonally reconstructed channel estimation techniques have been developed to mitigate inter-Doppler interference, ensuring more reliable communications [16]. Moreover, near-field MIMO channel reconstruction has shown promise in enhancing the performance of future wireless systems by utilizing limited geometry feedback [17]. Finally, deep learning frameworks have been explored for wireless radiation field reconstruction and channel prediction, pushing the boundaries that can be achieved in wireless systems [18].

In fact, the fields of EM property sensing and channel reconstruction are intrinsically linked through a shared goal of optimizing the performance and reliability of ISAC systems. The close relationship between EM property sensing and channel reconstruction suggests that advances in one area could significantly benefit the other, particularly by enhancing the precision and efficiency of data acquisition and processing. However, no direct attempt has been made to integrate the two fields within a unified framework.

In this paper, we utilize the diffusion Schrödinger bridge (DSB) to realize the sensing of EM property of a target and the reconstruction of the wireless channel in ISAC systems. As a powerful tool recently explored in generative artificial intelligence (GAI), DSB offers a framework for transitioning between probability distributions in a controlled manner [19, 20, 21]. The DSB framework connects EM property sensing and channel reconstruction by establishing a bidirectional process: the forward process transforms the distribution of EM property into the channel distribution, while the reverse process reconstructs the EM property from the channel. To handle the difference in dimensionality between the high-dimensional sensing channel and the lower-dimensional EM property, we use an autoencoder network to generate the latent representations of the channel. The autoencoder compresses the sensing channel into a latent space that retains essential features, which incorporates positional embeddings to process spatial context. The latent is then used within the DSB framework to iteratively generate the EM property. The simulation results demonstrate the effectiveness of the proposed DSB framework, which achieves superior reconstruction of the target’s shape, relative permittivity, and conductivity. Moreover, the proposed method can also realize high-fidelity channel reconstruction given the EM property of the target. The dual capability of accurately sensing the EM property and reconstructing the channel across various positions within the sensing area underscores the versatility and potential of the proposed approach for broad application in future ISAC systems.

The rest of this paper is organized as follows. Section II presents the ISAC system model. Section III describes the DSB model for EM property sensing and channel reconstruction. Section IV proposes the approach to generating the latent in DSB. Section V provides the numerical simulation results, and Section VI draws the conclusion.

Notations: Boldface denotes a vector or a matrix; jj corresponds to the imaginary unit; ()H(\cdot)^{H}, ()(\cdot)^{\top}, and ()(\cdot)^{*} represent Hermitian, transpose, and conjugate, respectively; \otimes denotes the Kronecker product; vec()\mathrm{vec}(\cdot) and unvec()\mathrm{unvec}(\cdot) denote the vectorization and unvectorization operation; \nabla denotes the nabla operator; 𝐈\mathbf{I} denote the identity matrix with compatible dimensions; 𝐚2\left\|\mathbf{a}\right\|_{2} denotes 2\ell_{2}-norm of the vector 𝐚\mathbf{a}; 𝐀F\left\|\mathbf{A}\right\|_{F} denotes Frobenius-norm of the matrix 𝐀\mathbf{A}; ||\left|\cdot\right| denotes the element-wise absolute value of complex vectors or matrices; ()\Re(\cdot) and ()\Im(\cdot) denote the real and imaginary part of complex vectors or matrices, respectively; 𝒫N\mathscr{P}_{N} denotes the the space of NN-state path measures on a finite time horizon for any NN\in \mathbb{N} in discrete stochastic processes; the Kullback-Leibler (KL) divergence between distributions pp and qq is defined by KL(p|q)=p(x)logp(x)q(x)(p|q)=\int p(x)\log\frac{p(x)}{q(x)}dxx; the distribution of a real-valued Gaussian random vector with mean 𝝁\boldsymbol{\mu} and covariance matrix 𝚺\boldsymbol{\Sigma} is denoted as 𝒩(𝝁,𝚺)\mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}); the distribution of a circularly symmetric complex Gaussian (CSCG) random vector with mean 𝝁\boldsymbol{\mu} and covariance matrix 𝚺\boldsymbol{\Sigma} is denoted as 𝒞𝒩(𝝁,𝚺)\mathcal{CN}(\boldsymbol{\mu},\boldsymbol{\Sigma}).

Refer to caption

Figure 1: Multi-antenna mono-static ISAC system for the target’s EM property sensing and the channel reconstruction.

II System Model

As illustrated in Fig. 1, consider a multi-antenna mono-static ISAC system for the target’s EM property sensing and the channel reconstruction. The system includes a BS with NtN_{t} transmitting antennas and NrN_{r} receiving antennas. We suppose that the BS senses only one target at a time. If there are multiple targets, they can be sensed one by one using a time-division approach. Since target positioning has been widely studied in ISAC [22, 23, 24], we assume that the target’s location is accurately known by the BS. We consider two distinct scenarios in this ISAC system. Scenario 1: The BS transmits OFDM pilot signals and utilizes the received echo signals to sense the EM property of the target. Scenario 2: The BS is aware of the EM property of the target and reconstruct the OFDM channels without sending any pilot signals.

II-A Scenario 1

In the channel estimation stage, the BS adopts fully digital precoding structure where the number of radio frequency (RF) chains NRFN_{RF} is equal to the number of transmitting antennas NtN_{t}. The central frequency of the signals is denoted by fcf_{c} with the corresponding wavelength λc\lambda_{c}. The number of subcarriers is denoted by KK and the frequency spacing between adjacent subcarriers is denoted by Δf\Delta_{f}. The number of the transmitted symbols in each subcarrier is denoted by II. We assume a quasi-static environment where the channels remain unchanged throughout the sensing period.

Since only the signals scattered by the target carry the information of its EM property, we may send the pilot signals towards the target by beamforming properly at the transmitter. Let the subscript kk represent that the physical quantity is associated with the kk-th subcarrier. Denote 𝐇kNr×Nt\mathbf{H}_{k}\in\mathbb{C}^{N_{r}\times N_{t}} as the overall echo channel from the transmitter to the receiver. Thus, the received signals can be formulated as

𝐲k=𝐇k𝐰k+𝐧k,\displaystyle\mathbf{y}_{k}=\mathbf{H}_{k}\mathbf{w}_{k}+\mathbf{n}_{k}, (1)

where 𝐰kNt×1\mathbf{w}_{k}\in\mathbb{C}^{N_{t}\times 1} is the pilot symbol on the kk-th subcarrier; 𝐧k𝒞𝒩(𝟎,σk2𝐈Nr)\mathbf{n}_{k}\sim\mathcal{CN}\left(\mathbf{0},\sigma_{k}^{2}\mathbf{I}_{N_{r}}\right) is the CSCG noise at the receiver of the kk-th subcarrier. Denote 𝐖k=[𝐰k,1,𝐰k,2,,𝐰k,I]Nt×I\mathbf{W}_{k}=\left[\mathbf{w}_{k,1},\mathbf{w}_{k,2},\cdots,\mathbf{w}_{k,I}\right]\in\mathbb{C}^{N_{t}\times I} as the pilot matrix stacked by time, and denote 𝐍k=[𝐧k,1,𝐧k,2,,𝐧k,I]Nr×I\mathbf{N}_{k}=\left[\mathbf{n}_{k,1},\mathbf{n}_{k,2},\cdots,\mathbf{n}_{k,I}\right]\in\mathbb{C}^{N_{r}\times I}. Then the overall received pilot signals 𝐘kNr×I\mathbf{Y}_{k}\in\mathbb{C}^{N_{r}\times I} can be formulated in a compact form

𝐘k[𝐲k,1,𝐲k,2,,𝐲k,I]=𝐇k𝐖k+𝐍k.\displaystyle\mathbf{Y}_{k}\triangleq[\mathbf{y}_{k,1},\mathbf{y}_{k,2},\cdots,\mathbf{y}_{k,I}]=\mathbf{H}_{k}\mathbf{W}_{k}+\mathbf{N}_{k}. (2)

In order to extract the EM property of the target, the BS first needs to estimate 𝐇k\mathbf{H}_{k} by applying the least square (LS) method as

𝐇^k=argmin𝐇k𝐘k𝐇k𝐖kF2=𝐘k𝐖kH(𝐖k𝐖kH)1,\displaystyle\hat{\mathbf{H}}_{k}=\arg\min_{\mathbf{H}_{k}}\left\|\mathbf{Y}_{k}-\mathbf{H}_{k}\mathbf{W}_{k}\right\|_{F}^{2}=\mathbf{Y}_{k}\mathbf{W}_{k}^{H}\left(\mathbf{W}_{k}\mathbf{W}_{k}^{H}\right)^{-1}, (3)

where 𝐇^k\hat{\mathbf{H}}_{k} is the minimum variance unbiased (MVU) estimated sensing channel. We then stack 𝐇^k\hat{\mathbf{H}}_{k} for k{1,,K}k\in\{1,\cdots,K\} into a 3rd order tensor ^K×Nr×Nt\hat{\mathcal{H}}\in\mathbb{C}^{K\times N_{r}\times N_{t}} as

^={𝐇^k}k=1KK×Nr×Nt.\displaystyle\hat{\mathcal{H}}=\left\{\hat{\mathbf{H}}_{k}\right\}_{k=1}^{K}\in\mathbb{C}^{K\times N_{r}\times N_{t}}. (4)

According to [11, 12, 13], the EM property of the target is implicitly encoded in the received echo signals that are transmitted through the sensing channel. Thus, we can leverage ^\hat{\mathcal{H}} as the prior information to reconstruct the EM property of the target.

II-B Scenario 2

Suppose that the EM property of the target exhibits isotropy [25, 26, 27]. The process of sensing the EM characteristic is essentially about determining the contrast function χk(𝐫)\chi_{k}(\mathbf{r}), which represents the discrepancy in the complex relative permittivity of the target as compared to the surrounding air. Considering the relative permittivity and conductivity of air to be nearly 11 and 0 Siemens per meter (S/m) respectively, the contrast function can be defined as [28, 29]

χk(𝐫)=ϵr(𝐫)jσ(𝐫)ϵ0ωk1,\displaystyle\chi_{k}(\mathbf{r})=\epsilon_{r}(\mathbf{r})-\frac{j\sigma(\mathbf{r})}{\epsilon_{0}\omega_{k}}-1, (5)

where ϵr(𝐫)\epsilon_{r}(\mathbf{r}) denotes the real relative permittivity at point 𝐫\mathbf{r}, σ(𝐫)\sigma(\mathbf{r}) denotes the conductivity at point 𝐫\mathbf{r}, ωk=2πfk\omega_{k}=2\pi f_{k} denotes the angular frequency of the EM waves, and ϵ0\epsilon_{0} is the vacuum permittivity.

Throughout this document, it is presumed that the electric fields exhibit a harmonic time variation characterized by ejωkte^{-j\omega_{k}t} [30]. Let λk=c/fk\lambda_{k}=c/f_{k} represent the wavelength and kk=2π/λkk_{k}=2\pi/\lambda_{k} represent the wave number within the ambient medium. The total electric field and the incident electric field, which propagate through the medium in the xx, yy, and zz dimensions, are represented by the complex vectors 𝐄kt(𝐫)3×1\mathbf{E}_{k}^{t}({\mathbf{r}})\in\mathbb{C}^{3\times 1} and 𝐄ki(𝐫)3×1\mathbf{E}_{k}^{i}({\mathbf{r}})\in\mathbb{C}^{3\times 1}, respectively. Since the incident electric field is linearly induced by the currents on the transmitting antennas, there is a matrix 𝐇~1,k(𝐫)3×Nt\tilde{\mathbf{H}}_{1,k}({\mathbf{r}})\in\mathbb{C}^{3\times N_{t}} that linearly maps 𝐰k\mathbf{w}_{k} to 𝐄ki(𝐫)\mathbf{E}_{k}^{i}({\mathbf{r}}), i.e.,

𝐄ki(𝐫)=𝐇~1,k(𝐫)𝐰k.\displaystyle\mathbf{E}_{k}^{i}({\mathbf{r}})=\tilde{\mathbf{H}}_{1,k}({\mathbf{r}})\mathbf{w}_{k}. (6)

Upon exposure to the incident field 𝐄ki(𝐫)\mathbf{E}_{k}^{i}({\mathbf{r}}), the fields 𝐄ki(𝐫)\mathbf{E}_{k}^{i}({\mathbf{r}}) and 𝐄kt(𝐫)\mathbf{E}_{k}^{t}({\mathbf{r}}) are governed by the homogeneous wave equation for the incident field and the inhomogeneous wave equation for the total field, respectively [31], i.e.,

××𝐄ki(𝐫)kk2𝐄ki(𝐫)\displaystyle\nabla\times\nabla\times\mathbf{E}_{k}^{i}({\mathbf{r}})-k_{k}^{2}\mathbf{E}_{k}^{i}({\mathbf{r}}) =𝟎,\displaystyle=\mathbf{0}, (7)
××𝐄kt(𝐫)kk2𝐄kt(𝐫)\displaystyle\nabla\times\nabla\times\mathbf{E}_{k}^{t}({\mathbf{r}})-k_{k}^{2}\mathbf{E}_{k}^{t}({\mathbf{r}}) =kk2χk(𝐫)𝐄kt(𝐫).\displaystyle=k_{k}^{2}\chi_{k}(\mathbf{r})\mathbf{E}_{k}^{t}({\mathbf{r}}). (8)

Suppose that the BS knows the target is positioned in the region DD through prior localization. To address the solutions for (7) and (8), the total electric field within DD can be formulated by the 3D Lippmann-Schwinger equation [31, 32, 33]

𝐄kt(𝐫)=𝐄ki(𝐫)+kk2D𝐆¯¯k(𝐫,𝐫)χk(𝐫)𝐄kt(𝐫)d𝐫,\mathbf{E}_{k}^{t}\left(\mathbf{r}\right)=\mathbf{E}_{k}^{i}\left(\mathbf{r}\right)+k_{k}^{2}\iiint_{D}\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)\chi_{k}\left(\mathbf{r}^{\prime}\right)\mathbf{E}_{k}^{t}\left(\mathbf{r}^{\prime}\right)\mathrm{d}\mathbf{r}^{\prime}, (9)

where 𝐆¯¯k(𝐫,𝐫)3×3\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)\in\mathbb{C}^{3\times 3} is the dyadic electric field Green’s function that satisfies

××𝐆¯¯k(𝐫,𝐫)kk2𝐆¯¯k(𝐫,𝐫)=𝐈3δ(𝐫𝐫).\nabla\times\nabla\times\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)-k_{k}^{2}\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)=\mathbf{I}_{3}\delta\left(\mathbf{r}-\mathbf{r}^{\prime}\right). (10)

Meanwhile, 𝐆¯¯k(𝐫,𝐫)\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right) can be formulated as [34]

𝐆¯¯k(𝐫,𝐫)=(𝐈3+kk2)gk(𝐫,𝐫)\displaystyle\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)=\left(\mathbf{I}_{3}+\frac{\nabla\nabla}{k_{k}^{2}}\right)g_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)
=[(3kk2R23jkkR1)𝐫^𝐫^(1kk2R2jkkR1)𝐈3]gk(𝐫,𝐫),\displaystyle=\!\!\left[\!\!\left(\frac{3}{k_{k}^{2}R^{\prime 2}}-\frac{3j}{k_{k}R^{\prime}}-1\!\!\right)\!\hat{\mathbf{r}}\hat{\mathbf{r}}^{\top}\!\!\!\!-\!\!\left(\frac{1}{k_{k}^{2}R^{\prime 2}}-\frac{j}{k_{k}R^{\prime}}-1\!\!\right)\!\!\mathbf{I}_{3}\right]\!\!g_{k}\!\left(\mathbf{r},\mathbf{r}^{\prime}\right), (11)

where RR^{\prime} is the distance defined as R𝐫𝐫2R^{\prime}\triangleq\|\mathbf{r}-\mathbf{r}^{\prime}\|_{2}, 𝐫^3×1\hat{\mathbf{r}}\in\mathbb{R}^{3\times 1} is the unit vector from 𝐫\mathbf{r}^{\prime} to 𝐫\mathbf{r}, and gk(𝐫,𝐫)g_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right) is the scalar Green’s function defined as gk(𝐫,𝐫)exp(jkkR)4πRg_{k}\left(\mathbf{r},\mathbf{r}^{\prime}\right)\triangleq\frac{\exp(jk_{k}R^{\prime})}{4\pi R^{\prime}} [34].

The echo electric field at the BS’s receiver scattered back from the target can then be formulated as [32, 33]

𝐄ks(𝐫n)=kk2D𝐆¯¯k(𝐫n,𝐫)χk(𝐫)𝐄kt(𝐫)d𝐫,\mathbf{E}_{k}^{s}\left(\mathbf{r}_{n}\right)=k_{k}^{2}\iiint_{D}\overline{\overline{\mathbf{G}}}_{k}\left(\mathbf{r}_{n},\mathbf{r}^{\prime}\right)\chi_{k}\left(\mathbf{r}^{\prime}\right)\mathbf{E}_{k}^{t}\left(\mathbf{r}^{\prime}\right)\mathrm{d}\mathbf{r}^{\prime}, (12)

where 𝐫n\mathbf{r}_{n} denotes the position of the nn-th receiving antenna. Suppose the receiver can only measure the scalar electric field component in the direction represented by the unit vector 𝐪3×1\mathbf{q}\in\mathbb{R}^{3\times 1}. The received echo signals can also be formulated as

𝐲k=G~r[𝐄ks(𝐫1),,𝐄ks(𝐫Nr)]𝐪+𝐧k,\displaystyle\mathbf{y}_{k}=\tilde{G}_{r}\left[\mathbf{E}_{k}^{s}\left(\mathbf{r}_{1}\right),\cdots,\mathbf{E}_{k}^{s}\left(\mathbf{r}_{N_{r}}\right)\right]^{\top}\mathbf{q}+\mathbf{n}_{k}, (13)

where G~r\tilde{G}_{r} denotes the receiving antenna gain.

Remark 1.

In accordance with (9), (12), and (13), the echo signals that are transmitted through the sensing channel can also be derived through the EM property of the target. As the mapping from 𝐰k\mathbf{w}_{k} to 𝐲k\mathbf{y}_{k}, 𝐇k\mathbf{H}_{k} in (1) depends on the EM property of the target and is actually the composite mapping consisting of (6), (9), (12), and (13). Specifically, (6) maps 𝐰k\mathbf{w}_{k} to 𝐄ki(𝐫)\mathbf{E}_{k}^{i}({\mathbf{r}}); (9) maps 𝐄ki(𝐫)\mathbf{E}_{k}^{i}({\mathbf{r}}) to 𝐄kt(𝐫)\mathbf{E}_{k}^{t}({\mathbf{r}}); (12) maps 𝐄kt(𝐫)\mathbf{E}_{k}^{t}({\mathbf{r}}) to 𝐄ks(𝐫)\mathbf{E}_{k}^{s}({\mathbf{r}}); (13) maps 𝐄ks(𝐫)\mathbf{E}_{k}^{s}({\mathbf{r}}) to 𝐲k\mathbf{y}_{k}.

III DSB for EM Property Sensing and Channel Reconstruction

III-A Point Cloud Representation

We utilize the point cloud representation to concisely and vividly represent the distribution of the target’s EM property. Define the mm-th normalized 5D point 𝐱m5×1\mathbf{x}_{m}\in\mathbb{R}^{5\times 1} that comprises both the 3D location information and the 2D EM property as

𝐱m=[xmxcxd,ymycyd,zmzczd,ϵmϵcϵd,σmσcσd],\displaystyle\mathbf{x}_{m}=\left[\frac{x_{m}-x_{c}}{x_{d}},\frac{y_{m}-y_{c}}{y_{d}},\frac{z_{m}-z_{c}}{z_{d}},\frac{\epsilon_{m}-\epsilon_{c}}{\epsilon_{d}},\frac{\sigma_{m}-\sigma_{c}}{\sigma_{d}}\right]^{\top}, (14)

where xmx_{m}, ymy_{m}, and zmz_{m} denote the coordinates of the mm-th point in each dimension; xcx_{c}, ycy_{c}, and zcz_{c} denote the coordinates of the center of the target; xdx_{d}, ydy_{d}, and zdz_{d} denote the corresponding standard deviations. Here, ϵm\epsilon_{m} and σm\sigma_{m} represent the dielectric constant and conductivity at the mm-th point, respectively; ϵc\epsilon_{c} and σc\sigma_{c} represent their respective central values; ϵd\epsilon_{d} and σd\sigma_{d} denote their respective standard deviations. Suppose a total of MM points 𝐱m\mathbf{x}_{m} in (14) constitute the point cloud that represents the target and is defined as 𝐗dataM×5\mathbf{X}_{\text{data}}\in\mathbb{R}^{M\times 5}.

The implementation of the point cloud approach presents a more efficient and uncomplicated alternative for discerning the EM property. Point clouds intrinsically facilitate the distinction between the background medium and the target, thereby eliminating the necessity to analyze the known background medium and significantly reducing the computational burden. Additionally, the representation of data through point clouds permits a clear and prompt visualization of the 3D target, which thereby enhances the intuitive interpretation of the inversion outcomes.

III-B DSB Linking EM Property and Sensing Channel

Let pdatap_{\text{data}} denote the distribution of the EM property and ppriorp_{\text{prior}} denote the distribution of the latent extracted from the sensing channel. The latent shares the same dimensions as 𝐗data\mathbf{X}_{\text{data}} and is denoted as 𝐗priorM×5\mathbf{X}_{\text{prior}}\in\mathbb{R}^{M\times 5}. To estimate the EM property from the sensing channel or to reconstruct the sensing channel from the EM property, we need to link pdatap_{\text{data}} and ppriorp_{\text{prior}} using DSB. Specifically, DSB establishes the link through a bidirectional process: the forward process gradually transforms pdatap_{\text{data}} into ppriorp_{\text{prior}}, while the reverse process maps ppriorp_{\text{prior}} back to pdatap_{\text{data}}. Both processes can be represented via Markov chains.

Denote 𝐗0:N\mathbf{X}_{0:N} as the set of {𝐗0,,𝐗N}\{\mathbf{X}_{0},\cdots,\mathbf{X}_{N}\} that are sequentially generated from 𝐗0=𝐗data\mathbf{X}_{0}=\mathbf{X}_{\text{data}} to 𝐗N=𝐗prior\mathbf{X}_{N}=\mathbf{X}_{\text{prior}} over a sequence of N1N-1 intermediate states. The forward transition pi+1i(𝐗i+1𝐗i)p_{i+1\mid i}\left(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}\right) is constructed to progressively guide the distribution from p0=pdatap_{0}=p_{\text{data}} to approximate pN=ppriorp_{N}=p_{\text{prior}}. The joint probability density of 𝐗0:N\mathbf{X}_{0:N} is

p(𝐗0:N)=p0(𝐗0)i=0N1pi+1i(𝐗i+1𝐗i).\displaystyle p\left(\mathbf{X}_{0:N}\right)=p_{0}\left(\mathbf{X}_{0}\right)\prod_{i=0}^{N-1}p_{i+1\mid i}\left(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}\right). (15)

Similarly, the reverse process can be formulated as a Markovian sequence, with the reverse joint density being

q(𝐗0:N)=pN(𝐗N)i=0N1pii+1(𝐗i𝐗i+1),q\left(\mathbf{X}_{0:N}\right)=p_{N}\left(\mathbf{X}_{N}\right)\prod_{i=0}^{N-1}p_{i\mid i+1}\left(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}\right), (16)

where the conditional probability pii+1(𝐗i𝐗i+1)p_{i\mid i+1}\left(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}\right) represents the probability of transitioning from state 𝐗i+1\mathbf{X}_{i+1} at time i+1i+1 to state 𝐗i\mathbf{X}_{i} at time ii. We can decompose the conditional probability pii+1(𝐗i𝐗i+1)p_{i\mid i+1}\left(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}\right) using Bayesian theorem as

pii+1(𝐗i𝐗i+1)=pi+1i(𝐗i+1𝐗i)pi(𝐗i)pi+1(𝐗i+1),p_{i\mid i+1}\left(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}\right)=\frac{p_{i+1\mid i}\left(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}\right)p_{i}\left(\mathbf{X}_{i}\right)}{p_{i+1}\left(\mathbf{X}_{i+1}\right)}, (17)

where pi+1i(𝐗i+1𝐗i)p_{i+1\mid i}\left(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}\right) is the conditional probability of the forward process, pi(𝐗i)p_{i}\left(\mathbf{X}_{i}\right) is the marginal distribution of state 𝐗i\mathbf{X}_{i}, and pi+1(𝐗i+1)p_{i+1}\left(\mathbf{X}_{i+1}\right) is the marginal distribution of state 𝐗i+1\mathbf{X}_{i+1}.

However, directly computing the conditional probability pii+1(𝐗i𝐗i+1)p_{i\mid i+1}\left(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}\right) using (17) is generally quite challenging due to the complexity of the involved distributions and the recursive nature of the computation. In practice, score-based generative models (SGMs) adopt a more tractable approach to handle the forward process, and thus could simplify the forward process by modeling it as the gradual addition of Gaussian noise to the states over time. The forward process can be represented as

pi+1i(𝐗i+1𝐗i)=𝒩(𝐗i+γi+1fi(𝐗i),2γi+1𝐈),p_{i+1\mid i}\left(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}\right)=\mathcal{N}\left(\mathbf{X}_{i}+\gamma_{i+1}f_{i}\left(\mathbf{X}_{i}\right),2\gamma_{i+1}\mathbf{I}\right), (18)

where γi+1\gamma_{i+1} is the noise level parameter at time step i+1i+1, and fi(𝐗i)f_{i}\left(\mathbf{X}_{i}\right) is the forward drift term that governs the deterministic part of the state evolution.

For a sufficiently large number of time steps N+1N+1, the distribution of the state at the final time step will converge to ppriorp_{\text{prior}}, which serves as the starting point for the reverse-time generative process.

The forward process (15) and the reverse process (16) can also be described in a continuous-time framework using stochastic differential equations (SDEs). Specifically, the forward process can be modeled by an SDE as

d𝐗t=ft(𝐗t)dt+gtd𝐁t,\displaystyle\mathrm{d}\mathbf{X}_{t}=f_{t}\left(\mathbf{X}_{t}\right)\mathrm{d}t+g_{t}\,\mathrm{d}\mathbf{B}_{t}, (19)

where t[0,T]t\in[0,T], ft(𝐗t):M×5M×5f_{t}(\mathbf{X}_{t}):\mathbb{R}^{M\times 5}\rightarrow\mathbb{R}^{M\times 5} is the drift term function, gtg_{t} represents the diffusion coefficient, and 𝐁t\mathbf{B}_{t} denotes the standard Brownian motion. The reverse process, on the other hand, involves solving the time-reversed version of the SDE in (19), and is given by

d𝐗t=[ft(𝐗t)+2𝐗tlogpTt(𝐗t)]dt+gtd𝐁t.\displaystyle\mathrm{d}\mathbf{X}_{t}=\left[-f_{t}\left(\mathbf{X}_{t}\right)+2\nabla_{\mathbf{X}_{t}}\log p_{T-t}\left(\mathbf{X}_{t}\right)\right]\mathrm{d}t+g_{t}\,\mathrm{d}\mathbf{B}_{t}. (20)

III-C Iterative Proportional Fitting

The DSB is an extension of the Schrödinger bridge (SB) problem, which incorporates diffusion model to model uncertainty and variability in dynamic systems. In the context of SB problem, we aim to find an optimal distribution p𝒫N+1p^{*}\in\mathscr{P}_{N+1} that minimizes the KL divergence from a reference path measure pref𝒫N+1p^{\text{ref}}\in\mathscr{P}_{N+1}. The optimization problem is defined as

p=argminp𝒫N+1{KL(ppref):p0=pdata,pN=pprior},\displaystyle p^{*}=\operatorname*{argmin}_{p\in\mathscr{P}_{N+1}}\Bigg{\{}\mathrm{KL}\left(p\mid p^{\text{ref}}\right):p_{0}=p_{\text{data}},p_{N}=p_{\text{prior}}\Bigg{\}}, (21)

where the marginal distributions p0p_{0} and pNp_{N} correspond to the distributions at the start and end points of the process, respectively. Typically, the reference measure path prefp^{\text{ref}} is generated using the same form of the forward SDE as in (19).

After determining the optimal solution pp^{*}, we can sample 𝐗0pdata\mathbf{X}_{0}\sim p_{\text{data}} by first generating 𝐗Npprior\mathbf{X}_{N}\sim p_{\text{prior}} and then iteratively applying the backward transition pii+1(𝐗i𝐗i+1)p_{i\mid i+1}\left(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}\right). Alternatively, we can sample 𝐗Npprior\mathbf{X}_{N}\sim p_{\text{prior}} by initially drawing 𝐗0pdata\mathbf{X}_{0}\sim p_{\text{data}} and subsequently applying the forward transition pi+1i(𝐗i+1𝐗i)p_{i+1\mid i}\left(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}\right). The SB formulation enables the bidirectional transitions without relying on closed-form expression of pdatap_{\text{data}} and ppriorp_{\text{prior}}.

Although the SB problem does not have a closed-form solution, it can be tackled using iterative proportional fitting (IPF), which iteratively solves the following optimization problems:

p2n+1=argminp𝒫N+1{KL(pp2n):pN=pprior},\displaystyle p^{2n+1}=\operatorname*{argmin}_{p\in\mathscr{P}_{N+1}}\left\{\mathrm{KL}\left(p\mid p^{2n}\right):p_{N}=p_{\text{prior}}\right\}, (22)
p2n+2=argminp𝒫N+1{KL(pp2n+1):p0=pdata},\displaystyle p^{2n+2}=\operatorname*{argmin}_{p\in\mathscr{P}_{N+1}}\left\{\mathrm{KL}\left(p\mid p^{2n+1}\right):p_{0}=p_{\text{data}}\right\}, (23)

where the superscript nn denotes the number of iterations, and the initialization p0p^{0} is set as prefp^{\text{ref}}. However, implementing IPF in real-world scenarios can be computationally prohibitive, as it requires the calculation and optimization of joint densities.

DSB can be viewed as an approximate method for IPF, which simplifies the optimization of the joint density by decomposing it into a sequence of conditional density optimization tasks. Specifically, the distribution pp is split into the forward and backward conditional distributions pi+1ip_{i+1\mid i} and pii+1p_{i\mid i+1}, respectively:

p2n+1=argminp𝒫N+1{KL(pii+1pii+12n):pN=pprior},\displaystyle p^{2n+1}=\operatorname*{argmin}_{p\in\mathscr{P}_{N+1}}\bigg{\{}\mathrm{KL}\left(p_{i\mid i+1}\mid p_{i\mid i+1}^{2n}\right):p_{N}=p_{\text{prior}}\bigg{\}}, (24)
p2n+2=argminp𝒫N+1{KL(pi+1ipi+1i2n+1):p0=pdata}.\displaystyle p^{2n+2}=\operatorname*{argmin}_{p\in\mathscr{P}_{N+1}}\bigg{\{}\mathrm{KL}\left(p_{i+1\mid i}\mid p_{i+1\mid i}^{2n+1}\right):p_{0}=p_{\text{data}}\bigg{\}}. (25)

It can be shown that optimizing conditional distributions pi+1ip_{i+1\mid i} and pii+1p_{i\mid i+1} in (24) and (25) leads to the optimization of the joint distribution pp in (22) and (23) [21]. We assume that the conditional distributions pi+1ip_{i+1\mid i} and pii+1p_{i\mid i+1} are Gaussian distributions, following the assumption commonly used in SGMs and allowing DSB to analytically handle the time reversal process. Consequently, we employ two separate neural networks to model the forward and backward dynamics.

The forward process, which governs the transition of the state from one time step to the next, is mathematically expressed as

pi+1in(𝐗i+1𝐗i)=𝒩(𝐗i+γi+1fin(𝐗i),2γi+1𝐈),p_{i+1\mid i}^{n}(\mathbf{X}_{i+1}\mid\mathbf{X}_{i})=\mathcal{N}\left(\mathbf{X}_{i}+\gamma_{i+1}f_{i}^{n}(\mathbf{X}_{i}),2\gamma_{i+1}\mathbf{I}\right), (26)

where pi+1in(𝐗i+1𝐗i)p_{i+1\mid i}^{n}(\mathbf{X}_{i+1}\mid\mathbf{X}_{i}) denotes the conditional probability distribution of the state 𝐗i+1\mathbf{X}_{i+1} given the state 𝐗i\mathbf{X}_{i} at the previous time step. In the multivariate Gaussian distribution 𝒩(𝐗i+γi+1fin(𝐗i),2γi+1𝐈)\mathcal{N}\left(\mathbf{X}_{i}+\gamma_{i+1}f_{i}^{n}(\mathbf{X}_{i}),2\gamma_{i+1}\mathbf{I}\right), γi+1\gamma_{i+1} represents the diffusion coefficient that controls the spread of the distribution, and fin(𝐗i)f_{i}^{n}(\mathbf{X}_{i}) represents the drift term that accounts for the deterministic part of the state transition in the forward process. For brevity, we define the forward estimation Fin(𝐗i)F_{i}^{n}(\mathbf{X}_{i}) as

Fin(𝐗i)=𝐗i+γi+1fin(𝐗i).F_{i}^{n}(\mathbf{X}_{i})=\mathbf{X}_{i}+\gamma_{i+1}f_{i}^{n}(\mathbf{X}_{i}). (27)

Conversely, the backward process, which describes the reverse-time evolution of (26), is given by

pii+1n(𝐗i𝐗i+1)=𝒩(𝐗i+1+γi+1bi+1n(𝐗i+1),2γi+1𝐈),p_{i\mid i+1}^{n}(\mathbf{X}_{i}\!\mid\!\mathbf{X}_{i+1})=\mathcal{N}\left(\mathbf{X}_{i+1}+\gamma_{i+1}b_{i+1}^{n}(\mathbf{X}_{i+1}),2\gamma_{i+1}\mathbf{I}\right), (28)

where pii+1n(𝐗i𝐗i+1)p_{i\mid i+1}^{n}(\mathbf{X}_{i}\mid\mathbf{X}_{i+1}) represents the conditional probability distribution of the state 𝐗i\mathbf{X}_{i} given the state 𝐗i+1\mathbf{X}_{i+1} at the subsequent time step. Similar to the forward process, the backward process is modeled as a multivariate Gaussian distribution 𝒩(𝐗i+1+γi+1bi+1n(𝐗i+1),2γi+1𝐈)\mathcal{N}\left(\mathbf{X}_{i+1}+\gamma_{i+1}b_{i+1}^{n}(\mathbf{X}_{i+1}),2\gamma_{i+1}\mathbf{I}\right). The term bi+1n(𝐗i+1)b_{i+1}^{n}(\mathbf{X}_{i+1}) in (28) is the drift term specific to the backward process, which plays an analogous role to fin(𝐗i)f_{i}^{n}(\mathbf{X}_{i}) in the forward process and can be computed according to (26) as

bi+1n(𝐗i+1)=fin(𝐗i+1)+2𝐗i+1logpi+1n(𝐗i+1).b_{i+1}^{n}\left(\mathbf{X}_{i+1}\right)=-f_{i}^{n}\left(\mathbf{X}_{i+1}\right)+2\nabla_{\mathbf{X}_{i+1}}\log p_{i+1}^{n}\left(\mathbf{X}_{i+1}\right). (29)

For brevity, we define the backward estimation Bin(𝐗i)B_{i}^{n}(\mathbf{X}_{i}) as

Bin(𝐗i)=𝐗i+γibin(𝐗i).B_{i}^{n}(\mathbf{X}_{i})=\mathbf{X}_{i}+\gamma_{i}b_{i}^{n}(\mathbf{X}_{i}). (30)

According to (26) and (27), we can compute the probability density function pi+1n(𝐗i+1)p_{i+1}^{n}(\mathbf{X}_{i+1}) at time step i+1i+1 as

pi+1n(𝐗i+1)\displaystyle p_{i+1}^{n}(\mathbf{X}_{i+1}) =(4πγi+1)5M2pin(𝐗i)\displaystyle=(4\pi\gamma_{i+1})^{-\frac{5M}{2}}\int p_{i}^{n}(\mathbf{X}_{i})
exp[Fin(𝐗i)𝐗i+1F24γi+1]d𝐗i,\displaystyle\exp\left[-\frac{\|F_{i}^{n}(\mathbf{X}_{i})-\mathbf{X}_{i+1}\|_{F}^{2}}{4\gamma_{i+1}}\right]\mathrm{d}\mathbf{X}_{i}, (31)

where the integral normalization factor (4πγi)5M2(4\pi\gamma_{i})^{-\frac{5M}{2}} comes from 𝐗i,𝐗i+1M×5\mathbf{X}_{i},\mathbf{X}_{i+1}\in\mathbb{R}^{M\times 5}.

Taking the logarithm of pi+1n(𝐗i+1)p_{i+1}^{n}(\mathbf{X}_{i+1}) and applying the gradient with respect to 𝐗i+1\mathbf{X}_{i+1}, we obtain

𝐗i+1logpi+1n(𝐗i+1)\displaystyle\nabla_{\mathbf{X}_{i+1}}\log p_{i+1}^{n}(\mathbf{X}_{i+1}) =Fin(𝐗i)𝐗i+12γi+1\displaystyle\!=\!\!\!\int\frac{F_{i}^{n}(\mathbf{X}_{i})-\mathbf{X}_{i+1}}{2\gamma_{i+1}}
×pii+1n(𝐗i|𝐗i+1)d𝐗i,\displaystyle\times p_{i\mid i+1}^{n}(\mathbf{X}_{i}|\mathbf{X}_{i+1})\mathrm{d}\mathbf{X}_{i}, (32)

where we employ the Bayesian rule and then use (26) as

pii+1n(𝐗i|𝐗i+1)\displaystyle p_{i\mid i+1}^{n}(\mathbf{X}_{i}|\mathbf{X}_{i+1}) =pi+1in(𝐗i+1|𝐗i)pin(𝐗i)pi+1n(𝐗i+1)\displaystyle=\frac{p_{i+1\mid i}^{n}(\mathbf{X}_{i+1}|\mathbf{X}_{i})p_{i}^{n}(\mathbf{X}_{i})}{p_{i+1}^{n}(\mathbf{X}_{i+1})}
=exp[Fin(𝐗i)𝐗i+1F24γi+1]pin(𝐗i)(4πγi+1)5M2pi+1n(𝐗i+1).\displaystyle=\frac{\exp\left[-\frac{\|F_{i}^{n}(\mathbf{X}_{i})-\mathbf{X}_{i+1}\|_{F}^{2}}{4\gamma_{i+1}}\right]p_{i}^{n}(\mathbf{X}_{i})}{(4\pi\gamma_{i+1})^{\frac{5M}{2}}p_{i+1}^{n}(\mathbf{X}_{i+1})}. (33)

Substituting (32) into (29) and utilizing (27), we obtain

bi+1n(𝐗i+1)\displaystyle b_{i+1}^{n}(\mathbf{X}_{i+1}) =Fin(𝐗i)Fin(𝐗i+1)γi+1pii+1n(𝐗i|𝐗i+1)d𝐗i.\displaystyle\!=\!\!\!\int\!\!\frac{F_{i}^{n}(\mathbf{X}_{i})\!-\!F_{i}^{n}(\mathbf{X}_{i+1})}{\gamma_{i+1}}p_{i\mid i+1}^{n}(\mathbf{X}_{i}|\mathbf{X}_{i+1})\mathrm{d}\mathbf{X}_{i}. (34)

Now, considering the Bayesian update step and substituting (34) into (30), we can derive the backward estimation Bi+1n(𝐗i+1)B_{i+1}^{n}(\mathbf{X}_{i+1}) as

Bi+1n(𝐗i+1)\displaystyle B_{i+1}^{n}(\mathbf{X}_{i+1}) =𝔼[𝐗i+1+Fin(𝐗i)Fin(𝐗i+1)𝐗i+1],\displaystyle=\mathbb{E}\left[\mathbf{X}_{i+1}+F_{i}^{n}(\mathbf{X}_{i})-F_{i}^{n}(\mathbf{X}_{i+1})\mid\mathbf{X}_{i+1}\right], (35)

where the expectation is taken over the joint distribution pi,i+1n(𝐗i,𝐗i+1)p_{i,i+1}^{n}(\mathbf{X}_{i},\mathbf{X}_{i+1}).

Refer to caption

Figure 2: Schematic diagram of DSB pipeline with forward and backward processes, where the latent refers to the compressed features extracted from the sensing channel.

To minimize the difference between Bi+1n(𝐗i+1)B_{i+1}^{n}(\mathbf{X}_{i+1}) and the variable on the right-hand-side of (35), we define the loss function Bi+1n\mathcal{L}_{B_{i+1}^{n}} as

Bi+1n\displaystyle\mathcal{L}_{B_{i+1}^{n}} =𝔼(𝐗i,𝐗i+1)pi,i+1n[Bi+1n(𝐗i+1)\displaystyle=\mathbb{E}_{(\mathbf{X}_{i},\mathbf{X}_{i+1})\sim p_{i,i+1}^{n}}\Bigg{[}\left\|B_{i+1}^{n}(\mathbf{X}_{i+1})\right.
(𝐗i+1+Fin(𝐗i))Fin(𝐗i+1)F2].\displaystyle\quad\left.-\left(\mathbf{X}_{i+1}+F_{i}^{n}(\mathbf{X}_{i})\right)-F_{i}^{n}(\mathbf{X}_{i+1})\right\|_{F}^{2}\Bigg{]}. (36)

Similarly, the loss function Fin+1\mathcal{L}_{F_{i}^{n+1}} for the mapping function Fin+1F_{i}^{n+1} can be given by

Fin+1\displaystyle\mathcal{L}_{F_{i}^{n+1}} =𝔼(𝐗i,𝐗i+1)pi,i+1n[Fin+1(𝐗i)\displaystyle=\mathbb{E}_{(\mathbf{X}_{i},\mathbf{X}_{i+1})\sim p_{i,i+1}^{n}}\Bigg{[}\left\|F_{i}^{n+1}(\mathbf{X}_{i})\right.
(𝐗i+Bi+1n(𝐗i+1))Bi+1n(𝐗i)F2].\displaystyle\quad\left.-\left(\mathbf{X}_{i}+B_{i+1}^{n}(\mathbf{X}_{i+1})\right)-B_{i+1}^{n}(\mathbf{X}_{i})\right\|_{F}^{2}\Bigg{]}. (37)

The loss functions (36) and (37) are derived under the assumption that the distribution pi,i+1np_{i,i+1}^{n} accurately models the underlying dynamics of the system and that the error introduced by approximating Bi+1nB_{i+1}^{n} and Fin+1F_{i}^{n+1} is minimized in the Frobenius norm sense. The schematic diagram of DSB pipeline with forward and backward processes is shown in Fig. 2, where the latent refers to the compressed features extracted from the sensing channel.

In practical applications, the DSB methodology employs two neural networks to approximate the forward estimation (27) and the backward estimation (30). Let αn\alpha^{n} and βn\beta^{n} represent the trainable parameters of two neural networks Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}) and Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}) in the nn-th iteration. Specifically, the neural network Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}) is used to approximate the backward estimation Bin(𝐗i)B_{i}^{n}(\mathbf{X}_{i}), and the neural network Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}) is used to approximate the forward estimation Fin(𝐗i)F_{i}^{n}(\mathbf{X}_{i}). The iterative optimization of Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}) and Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}) is crucial for the DSB methodology.

The DSB methodology proceeds by alternately training the backward network Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}) and the forward network Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}) across multiple iterations indexed by nn. Specifically, during the (2n+1)(2n+1)-th epoch, we train the backward network Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}), while during the (2n+2)(2n+2)-th epoch, we train the forward network Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}). The alternating optimization of Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}) and Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}) is designed to minimize (24) and (25), which ensures that the forward and backward processes are accurately modeled. Moreover, both Bβn(i,𝐗i)B_{\beta^{n}}(i,\mathbf{X}_{i}) and Fαn(i,𝐗i)F_{\alpha^{n}}(i,\mathbf{X}_{i}) are composed of the same concatsquash layers as in [13].

The convergence of IPF has been rigorously proven in the literature, as demonstrated in [21]. The proof underpins the theoretical soundness of the DSB approach, which confirms that the method will, under appropriate conditions, converge to a solution that satisfies the desired characteristics of DSB.

III-D DSB Training Scheme

Algorithm 1 Initial Forward Model Training Scheme
1:Rounds RR, timesteps NN, data distribution pdatap_{\text{data}}, prior distribution ppriorp_{\text{prior}}, and learning rate η\eta
2:Initial forward model Fα0(i,𝐗i)=Mθ(i,𝐗i)F_{\alpha^{0}}(i,\mathbf{X}_{i})=M_{\theta}(i,\mathbf{X}_{i})
3:for r{0,,R}r\in\{0,\dots,R\} do
4:     while not converged do
5:         Sample paired data (𝐗0,𝐗N)(\mathbf{X}_{0},\mathbf{X}_{N}) with 𝐗0pdata\mathbf{X}_{0}\sim p_{\text{data}}, 𝐗Npprior\mathbf{X}_{N}\sim p_{\text{prior}}, and i{0,1,,N1}i\in\{0,1,\dots,N-1\}
6:         Compute intermediate sample 𝐗i=(1iN)𝐗0+iN𝐗N\mathbf{X}_{i}=\left(1-\frac{i}{N}\right)\mathbf{X}_{0}+\frac{i}{N}\mathbf{X}_{N}
7:         Compute prediction objective 𝐘i=𝐗N𝐗0\mathbf{Y}_{i}=\mathbf{X}_{N}-\mathbf{X}_{0}
8:         Compute loss function =Mθ(i,𝐗i)𝐘iF2\mathcal{L}=\|M_{\theta}(i,\mathbf{X}_{i})-\mathbf{Y}_{i}\|_{F}^{2}
9:         Update θθηθ\theta\leftarrow\theta-\eta\nabla_{\theta}\mathcal{L}
10:     end while
11:end for
12:Fα0(i,𝐗i)Mθ(i,𝐗i)F_{\alpha^{0}}(i,\mathbf{X}_{i})\leftarrow M_{\theta}(i,\mathbf{X}_{i})

III-D1 Initial Forward Model

As highlighted in the previous subsection, DSB and SGM are inherently aligned in their training objectives. Specifically, both methodologies aim to model and approximate the target data distribution through a series of learned transformations that progressively refine an initial noise distribution into a more structured and complex distribution that resembles the data. Given the shared objective of DSB and SGM, it is natural to establish the reference distribution prefp^{\text{ref}} in DSB to mirror the noise schedule employed by SGM. By doing so, the training process in the first epoch of DSB becomes theoretically equivalent to the standard training procedure of SGM.

Therefore, rather than initiating the training process of DSB from scratch, we leverage a pre-trained SGM as the starting forward model to train the first backward model. The pre-trained SGM provides a strong foundation from which DSB can iteratively enhance the quality of generated data through multiple rounds of forward and backward training. Each subsequent epoch in DSB builds upon the results of the previous one, progressively refining the model’s ability to capture the complexities of the target distribution. Consequently, DSB develops a new generative model that surpasses the initial SGM in its capacity to accurately model the target data distribution.

The training process for the first forward epoch in DSB, which corresponds to the second epoch in the overall training procedure, is specifically designed to train a neural network to transform the data distribution pdatap_{\text{data}} into the prior distribution ppriorp_{\text{prior}}. The purpose of the transformation is to ensure that the intermediate states generated during the forward process adhere to the KL divergence constraint, which is imposed by the trajectories learned in the first backward epoch, as outlined in (24) and (25).

We employ flow matching (FM) models [35] to train the initial forward model in DSB. FM models work by ensuring that the flow of the generated data matches the flow of the real data, which effectively captures the underlying dependencies and relationships within the system [36]. By aligning the trajectories of data points, FM models provide a robust framework to model complex systems, which leads to precise and reliable generation [37]. The entire procedure to train the FM model as the initial forward model within the DSB framework is summarized in Algorithm 1.

Refer to caption


Figure 3: Schematic diagram of the latent generation using an autoencoder network. The encoding phase is employed in Scenario 1, while the decoding phase is employed in Scenario 2.
Algorithm 2 DSB Training Scheme
1:Epochs EE, time steps NN, data distribution pdatap_{\text{data}}, and prior distribution ppriorp_{\text{prior}}
2:Forward network FαE+1(i,𝐗i)F_{\alpha^{E+1}}(i,\mathbf{X}_{i}) and backward network BβE(i,𝐗i)B_{\beta^{E}}(i,\mathbf{X}_{i})
3:for n{0,,E}n\in\{0,\dots,E\} do
4:     while not converged do
5:         Sample {𝐗i}i=0N\{\mathbf{X}_{i}\}_{i=0}^{N}, where 𝐗0pdata\mathbf{X}_{0}\sim p_{\text{data}} and 𝐗i+1=Fαn(i,𝐗i)+2γi+1ϵ\mathbf{X}_{i+1}=F_{\alpha^{n}}(i,\mathbf{X}_{i})+\sqrt{2\gamma_{i+1}}\epsilon, ϵ𝒩(𝟎,𝐈)\epsilon\sim\mathcal{N}(\mathbf{0},\mathbf{I})
6:         bi+1nBβn(i+1,𝐗i+1)𝐗iF2\mathcal{L}_{b_{i+1}^{n}}^{\prime}\leftarrow\left\|B_{\beta^{n}}(i+1,\mathbf{X}_{i+1})-\mathbf{X}_{i}\right\|_{F}^{2}
7:         βn+1Take gradient step(βnbi+1n)\beta^{n+1}\leftarrow\text{Take gradient step}(\nabla_{\beta^{n}}\mathcal{L}_{b_{i+1}^{n}}^{\prime})
8:     end while
9:     while not converged do
10:         Sample {𝐗i}i=0N\{\mathbf{X}_{i}\}_{i=0}^{N}, where 𝐗Npprior\mathbf{X}_{N}\sim p_{\text{prior}} and 𝐗i1=Bβn(i,𝐗i)+2γiϵ\mathbf{X}_{i-1}=B_{\beta^{n}}(i,\mathbf{X}_{i})+\sqrt{2\gamma_{i}}\epsilon, ϵ𝒩(𝟎,𝐈)\epsilon\sim\mathcal{N}(\mathbf{0},\mathbf{I})
11:         fin+1Fαn+1(i,𝐗i)𝐗i+1F2\mathcal{L}_{f_{i}^{n+1}}^{\prime}\leftarrow\left\|F_{\alpha^{n+1}}(i,\mathbf{X}_{i})-\mathbf{X}_{i+1}\right\|_{F}^{2}
12:         αn+2Take gradient step(αn+1fin+1)\alpha^{n+2}\leftarrow\text{Take gradient step}(\nabla_{\alpha^{n+1}}\mathcal{L}_{f_{i}^{n+1}}^{\prime})
13:     end while
14:end for

III-D2 Loss Function Simplification

In standard DSB, the loss functions for both the forward and backward models have relatively high computational complexity, as detailed in (36) and (37), which renders its physical interpretation challenging. To reduce the complexity, the training loss associated with DSB can be simplified as

bi+1n\displaystyle\mathcal{L}^{\prime}_{b_{i+1}^{n}} =𝔼(𝐗0,𝐗i+1)p0,i+1n[Bβn(i+1,𝐗i+1)𝐗iF2],\displaystyle=\mathbb{E}_{\left(\mathbf{X}_{0},\mathbf{X}_{i+1}\right)\sim p_{0,i+1}^{n}}\left[\left\|B_{\beta^{n}}(i+1,\mathbf{X}_{i+1})-\mathbf{X}_{i}\right\|_{F}^{2}\right], (38)
fin+1\displaystyle\mathcal{L}^{\prime}_{f_{i}^{n+1}} =𝔼(𝐗i,𝐗N)pi,Nn[Fαn+1(i,𝐗i)𝐗i+1F2].\displaystyle=\mathbb{E}_{\left(\mathbf{X}_{i},\mathbf{X}_{N}\right)\sim p_{i,N}^{n}}\left[\left\|F_{\alpha^{n+1}}(i,\mathbf{X}_{i})-\mathbf{X}_{i+1}\right\|_{F}^{2}\right]. (39)

The rationale of such simplification will be stated as follows. Since generally the drift term fin(𝐗i)f_{i}^{n}\left(\mathbf{X}_{i}\right) changes mildly and 𝐗i\mathbf{X}_{i} is in close proximity to 𝐗i+1\mathbf{X}_{i+1} according to [21], we can assume fin(𝐗i)fin(𝐗i+1)f_{i}^{n}\left(\mathbf{X}_{i}\right)\approx f_{i}^{n}\left(\mathbf{X}_{i+1}\right). Thus, we have

[𝐗i+1+Fin(𝐗i)Fin(𝐗i+1)]𝐗i\displaystyle\quad\left[\mathbf{X}_{i+1}+F_{i}^{n}\left(\mathbf{X}_{i}\right)-F_{i}^{n}\left(\mathbf{X}_{i+1}\right)\right]-\mathbf{X}_{i}
=𝐗i+1+𝐗i+γi+1fin(𝐗i)\displaystyle=\mathbf{X}_{i+1}+\mathbf{X}_{i}+\gamma_{i+1}f_{i}^{n}\left(\mathbf{X}_{i}\right)
[𝐗i+1+γi+1fin(𝐗i+1)]𝐗i\displaystyle-\left[\mathbf{X}_{i+1}+\gamma_{i+1}f_{i}^{n}\left(\mathbf{X}_{i+1}\right)\right]-\mathbf{X}_{i}
=γi+1fin(𝐗i)γi+1fin(𝐗i+1)\displaystyle=\gamma_{i+1}f_{i}^{n}\left(\mathbf{X}_{i}\right)-\gamma_{i+1}f_{i}^{n}\left(\mathbf{X}_{i+1}\right)
0.\displaystyle\approx 0. (40)

According to (36) and (40), there is Bi+1nBi+1n\mathcal{L}_{B_{i+1}^{n}}^{\prime}\approx\mathcal{L}_{B_{i+1}^{n}}. Besides, Fin+1Fin+1\mathcal{L}_{F_{i}^{n+1}}^{\prime}\approx\mathcal{L}_{F_{i}^{n+1}} can also be proved analogously. With the simplified loss functions (38) and (39), the overall procedure to train DSB is summarized in Algorithm 2. Once the training of DSB is completed, the sampling of DSB can be conducted using (26) and (28).

IV Latent Generation

Throughout the DSB process, the dimension of the variables keeps unchanged, which indicates that the dimensions of 𝐗data\mathbf{X}_{\text{data}} and 𝐗prior\mathbf{X}_{\text{prior}} should be the same. However, in MIMO systems with multiple subcarriers, the dimension of the sensing channel is typically much higher than the dimension of the point cloud that represents the EM property of the target. Thus, we need to compress the sensing channel into a latent to keep the dimension of the prior equal to the dimension of the 5D point cloud.

In order to generate the latent with the estimated sensing channel, we adopt an autoencoder network with positional embedding information, as shown in Fig. 3. The encoding phase is employed in Scenario 1, while the decoding phase is employed in Scenario 2. The autoencoder is designed to capture the relevant features of the sensing channel \mathcal{H} while considering the position of the target through positional embedding.

In Scenario 1, the goal is to compress the estimated sensing channel ^\hat{\mathcal{H}} into a latent representation that captures the essential features needed for EM property sensing. First, the estimated sensing channel ^\hat{\mathcal{H}} is passed through the channel transferring module, in which the positional information of the target is embedded into a high-dimensional 3rd tensor using sinusoidal functions inspired by [38]. The positional embedding tensor provides additional context that is crucial for accurate encoding, particularly because the channel features might vary significantly with the target’s position. The positional embedding tensor is then integrated with the channel and is processed by a neural network. The resulting combined data ^ref,1\hat{\mathcal{H}}_{ref,1} is then compressed by the downscaling module, which reduces the data size and retains only the most relevant features. The output of the downscaling module is a latent that shares the same dimensions with the 5D point cloud 𝐗data\mathbf{X}_{\text{data}}, i.e., M×5\mathbb{R}^{M\times 5}. The latent is a compressed representation of the target’s features, which will be used in DSB to reconstruct the EM property of the target.

In the autoencoder’s encoding process, both the channel transferring module and the downscaling module mainly consist of convolutional layers. Assume that the dimension of the positional embedding tensor is Dp×Nr×Nt\mathbb{R}^{D_{p}\times N_{r}\times N_{t}}. The channel transferring module first splits the real and the imaginary parts of the channel and then concatenates them with the positional embedding tensor into the dimension of (2K+Dp)×Nr×Nt\mathbb{R}^{(2K+D_{p})\times N_{r}\times N_{t}}. Next, the downscaling module reduces the spatial dimensions of the data through striding and pooling, which compresses the input into a more compact form. The dual role of feature extraction and dimensionality reduction is essential to create an efficient latent representation that retains critical information while minimizing the redundancy. The last layer of the downscaling module is a fully connected layer that transforms the flattened input into a vector whose length is 5M5M to align with the dimension of the 5D point cloud.

In Scenario 2, the objective is to reconstruct the channel \mathcal{H} from the latent representation produced by DSB, where we incorporate the same positional information as the encoder to ensure the consistency in the reconstruction. The latent representation is first passed through the upscaling module, which reverses the compression applied in the encoding phase and expands the latent to ^ref,2\hat{\mathcal{H}}_{ref,2} with the same size as that of the sensing channel \mathcal{H}. Similar to the encoding phase, the same positional embedding tensor is combined with the upscaled data ^ref,2\hat{\mathcal{H}}_{ref,2}, which ensures that the positional information is also considered in the decoding phase. The combined data is then processed by the reverse channel transferring module, which reverts the transformations applied during the encoding phase and aims to reconstruct the sensing channel ~\tilde{\mathcal{H}} as accurately as possible.

In the autoencoder’s decoding process, the upscaling module reverses the encoding process by taking the flattened latent representation as input and using a fully connected layer followed by a series of transposed convolutional layers. The reverse channel transferring module concatenates ^ref,2\hat{\mathcal{H}}_{ref,2} and the positional embedding tensor and then employs a series of convolutional layers to transform the dimension of the data to 2K×Nr×Nt\mathbb{R}^{2K\times N_{r}\times N_{t}}. The final output is the reconstructed channel ~\tilde{\mathcal{H}} with stacked real and imaginary parts.

Since the magnitude of the sensing channel varies with the location of the target, we need to normalize the loss function. The training process of the latent generation autoencoder is guided by the normalized mean square error (NMSE) loss function, which measures the difference between the real channel \mathcal{H} and the reconstructed channel ~\tilde{\mathcal{H}} as

NMSE\displaystyle\mathcal{L}_{\mathrm{NMSE}} =~F2F2.\displaystyle=\frac{\left\|\mathcal{H}-\tilde{\mathcal{H}}\right\|_{F}^{2}}{\left\|\mathcal{H}\right\|_{F}^{2}}. (41)

By minimizing NMSE of the reconstructed channel ~\tilde{\mathcal{H}}, the autoencoder learns to produce accurate latent representations that can faithfully reconstruct the sensing channel.

V Simulation Results and Analysis

Suppose all possible targets can be contained within a cubic region DD whose size is 1m×1m×1m1~{}\mathrm{m}\times 1~{}\mathrm{m}\times 1~{}\mathrm{m}. We designate the number of scatter points that form the target as M=2048M=2048. Assume that the BS is positioned at (0,0,0)(0,0,0) m and needs to sense the target in a 30 m radius sector on the horizontal plane, characterized by S={(x,y,0)arctanyx[60,60],x2+y230m}S=\{(x,y,0)\mid\operatorname{arctan}\frac{y}{x}\in[-60^{\circ},60^{\circ}],\sqrt{x^{2}+y^{2}}\leq 30~{}\mathrm{m}\}. The BS is equipped with a uniform linear array (ULA) with Nt=32N_{t}=32 transmitting antennas and a ULA with Nr=32N_{r}=32 receiving antennas. The transmitting and the receiving ULAs are both centered at (0,0,0)(0,0,0) m and are parallel to yy and zz directions, respectively, which is analogous to the Mills-Cross configuration [39]. The transmitting and the receiving antennas are set as dipoles polarized along zz and yy directions, respectively. The central carrier frequency is set as fc=30f_{c}=30 GHz and the corresponding central wavelength is λc=0.01\lambda_{c}=0.01 m. The inter-antenna spacing for both the transmitting and the receiving ULAs is set as λc/2=0.005\lambda_{c}/2=0.005 m. We assume there are a total of K=16K=16 subcarriers whose spacing is set as Δf=800\Delta_{f}=800 KHz.

In DSB, we set the number of intermediate time steps as N=100N=100. The diffusion coefficients γi\gamma_{i} linearly increase from γ0=0.001\gamma_{0}=0.001 to γ50=0.05\gamma_{50}=0.05 and then linearly decrease from γ50=0.05\gamma_{50}=0.05 to γ100=0.001\gamma_{100}=0.001. In order to train DSB, we select 100000 targets from the ShapeNet dataset [40], which are split into training, testing, and validation sets by the ratio 80%, 10%, and 10%, respectively. All the targets in the dataset are uniformly and randomly located in the sector SS. During the training process, we utilize the Adam optimizer and set the batch size as 128. In order to compute the forward scattering, we convert (9) into a discrete form by the methods of moments (MoM). Then the unknown total electric field 𝐄it(𝐫)\mathbf{E}_{i}^{t}(\mathbf{r}) is determined with the stabilized biconjugate gradient fast Fourier transform (BCGS-FFT) technique [41].

Refer to caption
(a) Target real relative permittivity
Refer to caption
(b) Reconstructed relative permittivity
with SNR = 5 dB
Refer to caption
(c) Reconstructed relative permittivity
with SNR = 30 dB
Refer to caption
(d) Target real conductivity
Refer to caption
(e) Reconstructed conductivity with SNR = 5 dB
Refer to caption
(f) Reconstructed conductivity with SNR = 30 dB
Figure 4: EM property sensing results versus SNR. The center of the target is (15,0,0)(15,0,0) m. The target is shown in the coordinate system relative to its center. Unit of conductivity is mS/m.

Additionally, we propose the mean Chamfer distance (MCD) between the ground truth and the estimated point clouds as a metric to quantitatively assess the performance of EM property sensing, which is defined as

MCD\displaystyle\mathrm{MCD}\! =10log10[1|𝒯|𝐗data𝒯(1M𝐱𝐗datamin𝐲𝐗^data𝐱𝐲22\displaystyle=\!10\log_{10}\left[\ \frac{1}{|\mathcal{T}|}\sum_{\mathbf{X}_{\text{data}}\in\mathcal{T}}\left(\frac{1}{M}\sum_{\mathbf{x}\in\mathbf{X}_{\text{data}}}\min_{\mathbf{y}\in\hat{\mathbf{X}}_{data}}\|\mathbf{x}-\mathbf{y}\|_{2}^{2}\!\right.\right.
+1M𝐲𝐗^datamin𝐱𝐗data𝐱𝐲22)],\displaystyle\left.\left.+\frac{1}{M}\sum_{\mathbf{y}\in\hat{\mathbf{X}}_{data}}\min_{\mathbf{x}\in\mathbf{X}_{\text{data}}}\|\mathbf{x}-\mathbf{y}\|_{2}^{2}\right)\right], (42)

where 𝒯\mathcal{T} denotes the test dataset, |𝒯||\mathcal{T}| denotes the number of samples in the test dataset, and 𝐗^data\hat{\mathbf{X}}_{data} denotes the estimated value of 𝐗data\mathbf{X}_{\text{data}}.

Refer to caption

Figure 5: MCD of 5D point clouds versus SNR. The center of the target is (15,0,0)(15,0,0) m.

V-A Performance of EM Property Sensing

In Scenario 1, a total of I=256I=256 pilot symbols are transmitted in each subcarrier to estimate the sensing channel. The signal-to-noise ratio (SNR) at the receiver decides the accuracy of the estimated sensing channel. The estimated channel is then compressed to generate the latent, which is employed to estimate the EM property of the target.

V-A1 Performance versus SNR at the Receiver

To illustrate the EM property sensing results, we present the reconstructed 5D point cloud of the target in Fig. 4. The center of the target is (15,0,0)(15,0,0) m, and the target is shown in the coordinate system relative to its center. It is seen from Fig. 4 that, the reconstructed point clouds can reflect the general shape of the target. The values of EM property reconstructed with SNR = 30 dB is much more accurate compared to those reconstructed with SNR = 5 dB. Moreover, a higher SNR value leads to a more precise reconstruction of the target’s shape.

We explore the MCD of the reconstructed 5D point clouds versus SNR in Fig. 5. We set the center of the target as (5,0,0)(5,0,0) m, (15,0,0)(15,0,0) m, or (25,0,0)(25,0,0) m, respectively. It is seen from Fig. 5 that, the MCD decreases with the increase of SNR to an error floor. The MCD is larger when the target is farther from the BS. The phenomenon can be attributed to the fact that when the target is closer to the BS, the sensing channel benefits from a higher number of effective degrees of freedom (EDoF) [42]. As a result, more diverse spatial features of the estimated sensing channel can be extracted, leading to a more accurate reconstruction of the point clouds. Consequently, the error floor of MCD is significantly lower when the target is near the BS, whereas the error increases as the distance between the target and the BS grows, reflecting the reduced EDoF and less diverse spatial feature extraction capability at greater distances.

V-A2 Performance versus Location of the Target

Refer to caption
(a) Target relative permittivity
Refer to caption
(b) Reconstructed relative permittivity with target
at (25,0,0) m
Refer to caption
(c) Reconstructed relative permittivity with target
at (5,0,0) m
Refer to caption
(d) Target conductivity
Refer to caption
(e) Reconstructed conductivity with target
at (25,0,0) m
Refer to caption
(f) Reconstructed conductivity with target
at (5,0,0) m
Figure 6: EM property sensing results versus location of the target with SNR = 15 dB. The target is shown in the coordinate system relative to its center. Unit of conductivity is mS/m.

Refer to caption

Figure 7: MCD of 5D point clouds versus location of the target, with SNR = 30 dB.

To illustrate the EM property sensing results, we present the reconstructed point cloud of the target with SNR = 15 dB in Fig. 6. The target is shown in the coordinate system relative to its center. It is seen from Fig. 6 that, the reconstructed 5D point clouds can reflect the general shape of the target. The values of EM property reconstructed with target at (5,0,0)(5,0,0) m is more accurate compared to those reconstructed with target at (25,0,0)(25,0,0) m. Moreover, a closer distance results in a more accurate reconstructed shape of the target.

We investigate the MCD of the reconstructed 5D point clouds in relation to the target’s location with an SNR of 30 dB, as illustrated in Fig. 7. The figure reveals that the MCD tends to be lower when the target is positioned closer to the BS. The observation can be explained by the fact that a closer target results in a sensing channel with a greater number of EDoF [42], allowing for the extraction of more diverse spatial features from the estimated sensing channel. Additionally, the MCD shows minimal variation with changes in the angle, indicating that the proposed method is capable of effectively sensing the EM property of the target from any direction within the sector SS.

V-B Performance of Channel Reconstruction

In Scenario 2, the sensing channel is reconstructed given the EM property and the location of the target. We assume the EM property may not be accurate due to measurement errors.

V-B1 Performance versus SNR of EM Property

Refer to caption

Figure 8: NMSE of channel reconstruction versus SNR of EM property.

We add Gaussian noise to the 5D point cloud that represents the real EM property of the target, and explore the NMSE of channel reconstruction versus SNR of the EM property. It is seen from Fig. 8 that, as the SNR increases, the NMSE decreases for all target locations. The error floor reaches approximately -34 dB for the target at (5,0,0)(5,0,0) m, around -27 dB for the target at (15,0,0)(15,0,0) m, and about -20 dB for the target at (25,0,0)(25,0,0) m. However, in the low SNR regions, there is an abnormal phenomenon where the NMSE for the target at (25,0,0)(25,0,0) m is smaller than NMSE for targets at (15,0,0)(15,0,0) m and (5,0,0)(5,0,0) m. This behavior suggests that when the target is farther from the BS, the reconstructed channel becomes more dependent on the target’s location and less dependent on its EM property. As a result, the reconstructed channel is less sensitive to the noise affecting the EM property, which leads to higher NMSE values at lower SNR levels compared to the closer targets.

V-B2 Performance versus Location of the Target

Refer to caption

Figure 9: NMSE of channel reconstruction versus location of the target with accurately known EM property.

We explore the NMSE of channel reconstruction versus location of the target with accurately known EM property in Fig. 9. The NMSE values range from approximately -60 dB near the BS to about -20 dB at the farthest points in the sector. The trend indicates that the NMSE decreases as the distance from the BS increases, yet the variation in angle is not significant. This trend suggests that the channel reconstruction generally becomes more reliable when the target is closer to the BS.

VI Conclusion

This paper introduces a cutting-edge ISAC scheme that utilizes DSB to Bayesian EM property sensing and channel reconstruction within a specific area. The DSB framework facilitates a bidirectional transformation, converting the sensed EM property distribution into a channel distribution and vice versa, while an autoencoder network addresses the dimensionality discrepancy by creating latent representations that maintain crucial spatial features. The latent representations are then used in DSB to progressively generate the EM property of the target. Simulation results highlight the superiority of the DSB framework in reconstructing the target’s shape, relative permittivity, and conductivity. Besides, the proposed method is capable of achieving precise channel reconstruction based on the EM property of the target. The method’s ability to accurately detect the EM property and reconstruct channels at different locations within the sensing region highlights its adaptability and promise for widespread use in the ISAC systems.

References

  • [1] Q. Zhang, H. Sun, X. Gao, X. Wang, and Z. Feng, “Time-division ISAC enabled connected automated vehicles cooperation algorithm design and performance evaluation,” IEEE J. Sel. Areas Commun., vol. 40, no. 7, pp. 2206–2218, 2022.
  • [2] H. Luo, T. Zhang, C. Zhao, Y. Wang, B. Lin, Y. Jiang, D. Luo, and F. Gao, “Integrated sensing and communications framework for 6G networks,” arXiv preprint arXiv:2405.19925, 2024.
  • [3] Z. Ren, Y. Peng, X. Song, Y. Fang, L. Qiu, L. Liu, D. W. K. Ng, and J. Xu, “Fundamental CRB-rate tradeoff in multi-antenna ISAC systems with information multicasting and multi-target sensing,” IEEE Trans. Wireless Commun., Sep. 2023.
  • [4] Y. Jiang, F. Gao, Y. Liu, S. Jin, and T. J. Cui, “Near-field computational imaging with RIS generated virtual masks,” IEEE Trans. Antennas Propag., vol. 72, no. 5, pp. 4383–4398, Apr. 2024.
  • [5] F. Dong, F. Liu, Y. Cui, W. Wang, K. Han, and Z. Wang, “Sensing as a service in 6G perceptive networks: A unified framework for ISAC resource allocation,” IEEE Trans. Wireless Commun., 2022.
  • [6] Y. Jiang, F. Gao, M. Jian, S. Zhang, and W. Zhang, “Reconfigurable intelligent surface for near field communications: Beamforming and sensing,” IEEE Trans. Wireless Commun., vol. 22, no. 5, pp. 3447–3459, 2023.
  • [7] Y. Zhong, T. Bi, J. Wang, J. Zeng, Y. Huang, T. Jiang, Q. Wu, and S. Wu, “Empowering the V2X network by integrated sensing and communications: Background, design, advances, and opportunities,” IEEE Network, vol. 36, no. 4, pp. 54–60, 2022.
  • [8] T. Ma, Y. Xiao, X. Lei, and M. Xiao, “Integrated sensing and communication for wireless extended reality (XR) with reconfigurable intelligent surface,” IEEE J. Sel. Signal Process., 2023.
  • [9] Y. Cui, F. Liu, X. Jing, and J. Mu, “Integrating sensing and communications for ubiquitous IoT: Applications, trends, and challenges,” IEEE Network, vol. 35, no. 5, pp. 158–167, 2021.
  • [10] A. Alkhateeb, S. Jiang, and G. Charan, “Real-time digital twins: Vision and research directions for 6G and beyond,” IEEE Commun. Mag., 2023.
  • [11] Y. Jiang, F. Gao, and S. Jin, “Electromagnetic property sensing: A new paradigm of integrated sensing and communication,” IEEE Trans. Wireless Commun., May 2024.
  • [12] Y. Jiang, F. Gao, S. Jin, and T. J. Cui, “Electromagnetic property sensing in ISAC with multiple base stations: Algorithm, pilot design, and performance analysis,” arXiv preprint arXiv:2405.06364, 2024.
  • [13] Y. Jiang, F. Gao, S. Jin, and T. J. Cui, “Electromagnetic property sensing based on diffusion model in ISAC system,” arXiv preprint arXiv:2407.03075, 2024.
  • [14] W. Wan, W. Chen, S. Wang, G. Y. Li, and B. Ai, “Deep plug-and-play prior for multitask channel reconstruction in massive MIMO systems,” IEEE Trans. Commun., 2024.
  • [15] Y. Li, Y. Luo, X. Wu, Z. Shi, S. Ma, and G. Yang, “Variational bayesian learning based localization and channel reconstruction in RIS-aided systems,” IEEE Trans. Wireless Commun., 2024.
  • [16] H. Yin, X. Wei, Y. Tang, and K. Yang, “Diagonally reconstructed channel estimation for mimo-afdm with inter-doppler interference in doubly selective channels,” IEEE Trans. Wireless Commun., 2024.
  • [17] S. Eslami, B. Gouda, and A. Tölli, “Near-field MIMO channel reconstruction via limited geometry feedback,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9256–9260, IEEE, 2024.
  • [18] H. Lu, C. Vattheuer, B. Mirzasoleiman, and O. Abari, “A deep learning framework for wireless radiation field reconstruction and channel prediction,” arXiv preprint arXiv:2403.03241, 2024.
  • [19] Z. Tang, T. Hang, S. Gu, D. Chen, and B. Guo, “Simplified diffusion Schrödinger bridge,” arXiv preprint arXiv:2403.14623, 2024.
  • [20] Y. Shi, V. De Bortoli, A. Campbell, and A. Doucet, “Diffusion schrödinger bridge matching,” in Advances in Neural Information Processing Systems (A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, eds.), vol. 36, pp. 62183–62223, Curran Associates, Inc., 2023.
  • [21] V. De Bortoli, J. Thornton, J. Heng, and A. Doucet, “Diffusion Schrödinger bridge with applications to score-based generative modeling,” in Advances in Neural Information Processing Systems, vol. 34, pp. 17695–17709, Curran Associates, Inc., 2021.
  • [22] R. Liu, M. Jian, D. Chen, X. Lin, Y. Cheng, W. Cheng, and S. Chen, “Integrated sensing and communication based outdoor multi-target detection, tracking, and localization in practical 5g networks,” Intelligent and Converged Networks, vol. 4, no. 3, pp. 261–272, 2023.
  • [23] P. Gao, L. Lian, and J. Yu, “Cooperative ISAC with direct localization and rate-splitting multiple access communication: A Pareto optimization framework,” IEEE J. Sel. Areas Commun., vol. 41, no. 5, pp. 1496–1515, 2023.
  • [24] Z. Zhang, H. Ren, C. Pan, S. Hong, D. Wang, J. Wang, and X. You, “Target localization and performance trade-offs in cooperative ISAC systems: A scheme based on 5G NR OFDM signals,” arXiv preprint arXiv:2403.02028, 2024.
  • [25] L. Li, L. G. Wang, F. L. Teixeira, C. Liu, A. Nehorai, and T. J. Cui, “Deepnis: Deep neural network for nonlinear electromagnetic inverse scattering,” IEEE Trans. Antennas Propag., vol. 67, no. 3, pp. 1819–1825, 2018.
  • [26] Y. Chen, H. Zhang, T. J. Cui, F. L. Teixeira, and L. Li, “A mesh-free 3-D deep learning electromagnetic inversion method based on point clouds,” IEEE Trans. Microwave Theory Tech., vol. 71, no. 8, pp. 3530–3539, 2023.
  • [27] H. Zhang, Y. Chen, T. J. Cui, F. L. Teixeira, and L. Li, “Probabilistic deep learning solutions to electromagnetic inverse scattering problems using conditional renormalization group flow,” IEEE Trans. Microwave Theory Tech., vol. 70, no. 11, pp. 4955–4965, 2022.
  • [28] Z. Liu and Z. Nie, “Subspace-based variational Born iterative method for solving inverse scattering problems,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 7, pp. 1017–1020, 2019.
  • [29] J. Guillemoteau, P. Sailhac, and M. Behaegel, “Fast approximate 2D inversion of airborne tem data: Born approximation and empirical approach,” Geophysics, vol. 77, no. 4, pp. WB89–WB97, 2012.
  • [30] C. A. Balanis, Advanced Engineering Electromagnetics. USA: Wiley: Hoboken, NJ, 2 ed., 2012.
  • [31] O. J. F. Martin and N. B. Piller, “Electromagnetic scattering in polarizable backgrounds,” Phys. Rev. E, vol. 58, pp. 3909–3915, Sep 1998.
  • [32] J. O. Vargas and R. Adriano, “Subspace-based conjugate-gradient method for solving inverse scattering problems,” IEEE Trans. Antennas Propag., vol. 70, no. 12, pp. 12139–12146, 2022.
  • [33] F.-F. Wang and Q. H. Liu, “A hybrid Born iterative Bayesian inversion method for electromagnetic imaging of moderate-contrast scatterers with piecewise homogeneities,” IEEE Trans. Antennas Propag., vol. 70, no. 10, pp. 9652–9661, 2022.
  • [34] H. F. Arnoldus, “Representation of the near-field, middle-field, and far-field electromagnetic Green’s functions in reciprocal space,” JOSA B, vol. 18, no. 4, pp. 547–555, 2001.
  • [35] Y. Lipman, R. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,” in International Conference on Learning Representations, PMLR, 2023.
  • [36] A. Tong, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, K. Fatras, G. Wolf, and Y. Bengio, “Conditional flow matching: Simulation-free dynamic optimal transport,” arXiv preprint arXiv:2302.00482, vol. 2, no. 3, 2023.
  • [37] L. Klein, A. Krämer, and F. Noé, “Equivariant flow matching,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [38] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [39] A. M. Molaei, T. Fromenteze, V. Skouroliakou, T. V. Hoang, R. Kumar, V. Fusco, and O. Yurduseven, “Development of fast fourier-compatible image reconstruction for 3D near-field bistatic microwave imaging with dynamic metasurface antennas,” IEEE Trans. Vehi. Tech., vol. 71, no. 12, pp. 13077–13090, 2022.
  • [40] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al., “Shapenet: An information-rich 3D model repository,” arXiv preprint arXiv:1512.03012, 2015.
  • [41] X. Millard and Q. H. Liu, “Simulation of near-surface detection of objects in layered media by the BCGS-FFT method,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 2, pp. 327–334, 2004.
  • [42] Y. Jiang and F. Gao, “Electromagnetic channel model for near field MIMO systems in the half space,” IEEE Commun. Lett., vol. 27, no. 2, pp. 706–710, 2023.