A convolutional plane wave model for sound field reconstruction

Manuel Hahmann Efren Fernandez-Grande [email protected] Acoustic Technology Group, Department of Electrical Engineering, Technical University of Denmark, Building 352, Ørsteds Plads, 2800 Kgs. Lyngby, Denmark

Abstract

Spatial sound field interpolation relies on suitable models to both conform to available measurements and predict the sound field in the domain of interest. A suitable model can be difficult to determine when the spatial domain of interest is large compared to the wavelength or when spherical and planar wavefronts are present or the sound field is complex, as in the near-field. To span such complex sound fields, the global reconstruction task can be partitioned into local subdomain problems. Previous studies have shown that partitioning approaches rely on sufficient measurements within each domain, due to the higher number of model coefficients. This study proposes a joint analysis of all local subdomains, while enforcing self-similarity between neighbouring partitions. More specifically, the coefficients of local plane wave representations are sought to have spatially smooth magnitudes. A convolutional model of the sound field in terms of plane wave filters is formulated and the inverse reconstruction problem is solved via the alternating direction method of multipliers. The experiments on simulated and measured sound fields suggest, that the proposed method both retains the flexibility of local models to conform to complex sound fields and also preserves the global structure to reconstruct from fewer measurements.

I Introduction

Sound field reconstruction methods enable spatial interpolation of sound fields from a set of discrete measurements. Such spatial characterization of sound fields is key in applications such as sound field analysis, jacobsen2010a ; verburg2018a ; haneda1999a ; mignot2013a ; mignot2014a ; nolan2019a ; witew2017a ; brandao2022a sound field control, heuchel2020a ; caviedes2019a ; heuchel2018a ; betlehem2005a ; moller2019a in simulation software (interpolation from a coarser to a finer grid),borrel2021a and for navigation of a sound field in auralization and spatial audio.tylka2017a ; tylka2020a ; winter2014a ; schultz2013data ; fernandez2021a Often, the sound field at hand is dominated by wavefronts of specific geometry, and a matching propagation model is sufficient to approximate the measurements and interpolate the sound field. Typical approaches use for example plane waves verburg2018a ; mignot2014a ; nolan2019a ; jacobsen2013a ; moiola2011a or spherical harmonics caviedes2019a ; betlehem2005a ; pezzoli2022sparsity (in free or near field).

When considering areas significantly larger than the acoustic wavelength, parts of the sound field can show significant influence of reflections, scattering, diffraction or varying wavefront curvature.witew2017a ; heuchel2020a ; brandao2022a ; fernandez2021a To model fields across such large domains, typical approaches include wave expansions with a high number of terms, or dividing the global domain into smaller local subdomains. For example, plane waves have been proposed to interpolate between two local spherical harmonic decompositions.tylka2020b Another approach is to analyse the sound field in terms of independent overlapping subdomains, as it is common in other disciplines like image processing.elad2006a In acoustic field analysis, subdomain representations using plane waves (also called ray space analysis) have been explored.markovic2016a ; markovic2013a ; hahmann2021spatial Sparse subdomain representations have been examined for beamforming,jin2017a and sound field reconstruction using plane wavesyu2021upscaling and functions learned from measured sound fields.hahmann2021spatial

Such locally variant representations increase the number of model coefficients to span more complex observations. However, independent local representations of sound fields ignore the continuous nature of wave propagation. Even diffuse fields, which exhibit the shortest possible correlation length, are commonly described as a superposition of infinitely many plane waves with random incidence angles and phases.morse1944a ; schroeder1954a ; pierce1981a It is therefore reasonable to assume spatial similarity between sound field representations in overlapping subdomains.

Convolutional models express a given field in terms of a set of subdomain-size filters, convolved with a spatial coefficient map. The spatial coefficient map preserves the spatial context of subdomains and allows for joint analysis of neighbouring partitions, thereby capturing both the fine structure and large-scale features of the field.papyan2017a ; wohlberg2018a ; bianco2018a Such convolutional approaches often exploit sparsity to find optimal local filter coefficients. They are then known as convolutional or shift-invariant sparse coding in audio and image processinggrosse2007a ; m2008a ; batenkov2017a and parallels to convolutional neural networks exist.papyan2017b Convolutional analysis has previously been proposed for beamformingcohen2018sparse , also in form of neural networks for spatial sound field interpolationllu2020a and source localization.grumiaux2021a

In this study, we express a monochromatic sound field in terms of a locally variant planar wave model. In a convolutional formulation, we enforce continuity between local representations to exploit observations in neighboring subdomains. In this way, we not only accommodate local phenomena, but also capture the global structure of the sound field (by context between neighbours). Specifically, we enforce spatially smooth coefficients in a joint analysis of all local plane wave representations. In addition to global continuity, we also require sparsity within each local subdomain. Because the spatial frequencies are bandlimited, sparse approximations enable reconstructions even from few observations within each local subdomain.verburg2018a ; gerstoft2018a ; candes2008a

To test the proposed continuous convolutional plane wave model, we reconstruct: in Sec. III.1 a simulated sound field in near field of a monopole, radially across a linear array, in Sec. III.2 a simulated field of a monopole interferring with a plane wave across a 2D aperture and in Sec. III.3 the experimentally captured reverberant high-frequency sound field in a classroom across a large 2D aperture. The reconstruction is formulated as an Alternating Direction Method of Multipliers (ADMM) problem.boyd2010admm Where applicable, steps are solved in frequency domain, where the convolution transforms to a multiplication.heide-2015-fast ; wohlberg2016b To enable reconstruction across a limited and sparsely sampled aperture, mask decoupling is applied.wohlberg2016a ; wohlberg2017a As benchmarks, global and independent local plane wave reconstructions are included in the tests.

II Theory

This section explains the sound field modelling and reconstruction approaches included in this study: the conventional linear superposition model in II.1, the partitioning of the reconstruction domain into independent, overlapping subdomains in II.2, the convolutional model to facilitate spatially smooth local coefficients in II.3, its solution via ADMM in II.4, and the assessment of reconstructed sound fields in II.5.

II.1 Global sound field model and reconstruction

The true acoustic pressure $\mathbf{p}\in\mathbb{C}^{N}$ at frequency $f$ and $N$ positions $\mathbf{r}\in\mathbb{R}^{3}$ within a domain $\Omega$ is assumed to be modelled as a linear combination of basis functions

\mathbf{p=Hx}\ ,

(1)

where $\mathbf{x}\in\mathbb{C}^{M}$ are the coefficients and $\mathbf{H}\in\mathbb{C}^{N\times M}$ contains the $M$ basis functions. For example, plane propagating waves $e^{-\mathrm{j}\mathbf{k}^{\sf T}\mathbf{r}}$ are often used to model reverberant or far-fields, where $\mathbf{k}$ is the wavenumber vector ( $\|\mathbf{k}\|_{2}=2\pi/\lambda$ , $\lambda$ is the wavelength). In the case of plane waves, the $n,m^{\text{th}}$ element of $\mathbf{H}$ is $e^{-\mathrm{j}\mathbf{k}_{m}^{\sf T}\mathbf{r}_{n}}$ , with $\mathbf{r}_{n}$ denoting the $n^{\text{th}}$ position and $\mathbf{k}_{m}$ the wavenumber vector from the $m^{\text{th}}$ incidence direction.

For the equation to hold, $\mathbf{H}$ must span the observed sound pressure field $\mathbf{p}$ over the complete domain $\Omega$ . An observation of the sound pressure field is

\mathbf{p}_{\mathrm{obs}}=\mathbf{Mp+n}=\mathbf{H}_{\mathrm{obs}}\mathbf{x+n}\ ,

(2)

where $\mathbf{M}\in\{0,1\}^{N_{\mathrm{obs}}\times N}$ is a binary mask selecting the $N_{\mathrm{obs}}$ available observations from the sound field $\mathbf{p}$ . $\mathbf{H}_{\mathrm{obs}}=\mathbf{MH}$ is the model at the observed positions and $\mathbf{n}$ is an error vector, that accounts for measurement noise and model error.

An optimal set of coefficients $\hat{\mathbf{x}}$ is found by inversion of Eq. (2). To arrive to a stable solution, regularization is necessary as $\mathbf{H}$ is typically ill-conditioned or rank-deficient.hansen1998a Typically, a structure in the coefficients is imposed, such as in

\hat{\mathbf{x}}=\text{arg}\min_{\mathbf{x}}\;\left\lVert\mathbf{H}_{\mathrm{obs}}\mathbf{x}-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}+\beta\ \|\mathbf{x}\|_{1}\ ,

(3)

in which case a $\ell_{1}$ -norm penalty is applied to promote a sparse structure in the coefficients and $\hat{\cdot}$ denotes an estimate. The sound field is then reconstructed as

\hat{\mathbf{p}}=\mathbf{H}\hat{\mathbf{x}}\ ,

(4)

where $\hat{\mathbf{p}}\in\mathbb{C}^{N}$ is the reconstructed sound pressure at $N$ reconstruction positions.

II.2 Local subdomain sound field model

Refer to caption — Figure 1: (Color online) Soundfield reconstruction via redundant subdomain decomposition in one spatial dimension (x-axis). The true signal $\mathbf{p}$ of length $N$ is partitioned in $S=N-N_{s}+1$ overlapping subdomains. All subdomains are of length $N_{S}$ and collected in $\mathbf{P}$ . When the observed noisy signal $\mathbf{p}_{\mathrm{obs}}$ is partitioned (and $N_{\mathrm{obs}}<N$ ), $\mathbf{P}_{\mathrm{obs}}$ contains unknown values. Reconstruction is applied within each subdomain to estimate $\hat{\mathbf{P}}$ . For each position, the corresponding elements in $\hat{\mathbf{P}}$ are averaged to yield the reconstructed sound field $\hat{\mathbf{p}}$ .

When reconstructing the sound field over large spatial domains (i.e. much larger than the acoustic wavelength), it is useful to partition the global domain $\Omega$ into smaller, overlapping subdomains $\Omega_{sub}$ . Correspondingly, the sound field $\mathbf{p}$ can be described by a collection of $S$ subdomain sound fields $\mathbf{p}_{s}\in\mathbb{C}^{N_{s}}$

\mathbf{P}=[\mathbf{R}_{1}\mathbf{p}\cdots\mathbf{R}_{S}\mathbf{p}]=[\mathbf{p}_{1}\cdots\mathbf{p}_{S}]\,,\ \mathbf{P}\in\mathbb{C}^{N_{s}\times S}.

(5)

$\mathbf{R}_{s}\in\{0,1\}^{N_{s}\times N}$ is a binary extraction operator to select the $N_{s}$ positions contained in the $s^{\text{th}}$ subdomain $\Omega_{s}$ . The partitioning is illustrated on the left side of Fig. 1. For simplicity, all subdomains within a sound field are considered to have the same extent, such that $N_{s}$ is constant. Specifically, this study considers subdomains of extent one wavelength $\lambda$ in each dimension of the aperture. Note that Eq. (5) yields a redundant representation of the sound field if the subdomains overlap ( $N_{s}S\geq N$ ).

A sound field can then be reconstructed within each subdomain of the partitioned observations $\mathbf{P}_{\mathrm{obs}}$ , for example by applying the procedure in Sec. II.1 and finding coefficients $\hat{\mathbf{x}}_{x}$ via Eq. (3) to estimate each $\hat{\mathbf{p}}_{s}$ . This yields the collection of reconstructed subdomain sound fields [see center right in Fig. 1].

\hat{\mathbf{P}}=\mathbf{H}_{s}\hat{\mathbf{X}}\ ,

(6)

where $\mathbf{H}_{s}$ are the model functions at the desired positions within each local subdomain, for example $\mathbf{H}_{s}=\mathbf{R}_{1}\mathbf{H}$ , and

\hat{\mathbf{X}}=[\hat{\mathbf{x}}_{1}\cdots\hat{\mathbf{x}}_{S}]\,,\ \hat{\mathbf{X}}\in\mathbb{C}^{M\times S}

(7)

the estimated local coefficients. The reconstructed field $\hat{\mathbf{p}}$ is reassembled from $\hat{\mathbf{P}}$ as the mean of overlapping subdomain representations [see right side of Fig. 1],

\hat{\mathbf{p}}=\mathbf{W}\,\sum_{s}\mathbf{R}_{s}^{T}\hat{\mathbf{p}_{s}}\,,

(8)

where the diagonal matrix $\mathbf{W}=\text{diag}(\sum_{s}\mathbf{R}_{s}^{T}\mathbf{1}_{N_{s}})^{-1}$ normalizes by the spatial overlap of the subdomains and $\mathbf{1}_{N_{s}}$ denotes a vector of $N_{s}$ ones.

Such partitioning approaches counteract model mismatch, which occurs if a global model is suboptimal. For example in the global sound field model of Eq. (1), the chosen model functions in $\mathbf{H}$ might not span observations of a sound field across a large spatial domain.

Estimating independent coefficients for each subdomain allows for arbitrarily different wave components in each subdomain. Such independent treatment of local representations disregards the similarity or even redundancy between sound fields in nearby or overlapping local subdomains.

The coefficients in the local subdomain sound field model can also be understood as a collection of coefficient maps $\mathbf{x}_{m}\in\mathbb{C}^{S}$ , the rows in $\mathbf{X}=[\mathbf{x}^{\sf T}_{1}\cdots\mathbf{x}^{\sf T}_{M}]^{\sf T}$ . The $m^{\text{th}}$ row in $\mathbf{X}$ contains a coefficient for the $m^{\text{th}}$ local model function across all $S$ subdomain locations. When plane waves are used, the coefficient map $\mathbf{x}_{m}$ contains the spatial distribution of coefficients for the $m^{\text{th}}$ plane wave across subdomain representations.

II.3 Convolutional sound field model

This study explores, how the spatial variations across $\mathbf{x}_{m}$ , the rows of $\mathbf{X}$ , can be taken into account for reconstruction, such that the global structure (or topology) of the sound field can be preserved. For the remainder of the paper, we consider the reconstruction positions on a regular grid (containing also the measured positions as a subset). Further, we assume subdomains of equal size and with full overlap, such that $S=N$ , the number of subdomains equals the number of positions in $\mathbf{p}$ (circular boundary conditions).

The true sound field across an $L$ -dimensional aperture can be rewritten as a sum of $M$ convolutions:

\mathbf{p}=\sum_{m=1}^{M}\,\mathbf{h}_{m}\circledast\overset{L}{\cdots}\circledast\mathbf{x}_{m}\ ,

(9)

where $\mathbf{h}_{m}\in\mathbb{C}^{N_{s}}$ describes the $m^{\text{th}}$ local filter (the $m^{\text{th}}$ column of $\mathbf{H}_{s}$ ), for example a plane wave. $\circledast\overset{L}{\cdots}\circledast$ denotes a circular convolution along $L\in{1,2,3}$ spatial dimensions. For example for $L=1$ , the $n^{\text{th}}$ element of $\mathbf{p}$ is,

	$\displaystyle\mathbf{p}(n)=$	$\displaystyle\sum_{m}(\mathbf{h}_{m}\circledast\mathbf{x}_{m})(n)$
	$\displaystyle=$	$\displaystyle\sum_{m}\left(\sum_{k=0}^{N_{s}}\mathbf{h}_{m}(k)\mathbf{x}_{m}\left((N+n-k)\%N\right)\right)\ ,$

where $n=1,\ldots N$ and $\%$ is the modulo operation.

Compared to the collection of local sound fields in Sec. II.2, this convolutional model reflects the spatial relations of coefficients and enables a joint analysis of all local representations in the global field. For example, each subdomain representation can take the coefficients in nearby subdomains into account. We propose to estimate the coefficients of Eq. (9) as

	$\displaystyle\hat{\mathbf{X}}$	$\displaystyle=\text{arg}\min_{\mathbf{X}}\;\frac{1}{2}\left\lVert\mathbf{M}\left(\sum_{m}\,\mathbf{h}_{m}\circledast\overset{L}{\cdots}\circledast\mathbf{x}_{m}\right)-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}$		(10)
		$\displaystyle+\frac{\mu}{2}\,\sum_{m}\sum_{l}\ \left\lVert\Delta_{l}\mathbf{x}_{m}\right\rVert_{2}^{2}+\ \beta\sum_{m}\left\lVert\mathbf{x}_{m}\right\rVert_{1}\ ,$

The $\ell_{1}$ penalty promotes sparse coefficients, notably applied on the global coefficient vector. A penalty on the spatial differences of the coefficients promotes smooth coefficient maps $\hat{\mathbf{x}}_{m}$ , weighted by the regularization parameter $\mu$ . Specifically, $\Delta_{l}\mathbf{x}_{m}$ are the first order finite differences of the $m^{\text{th}}$ coefficient map along the $l^{\text{th}}$ dimension. For example, when the considered aperture (and therefore $\mathbf{x}_{m}$ ) is one-dimensional, it is $\Delta\mathbf{x}_{m}=[x_{m1}-x_{m2},x_{m2}-x_{m3}\cdots x_{mS}-x_{m1}]^{\sf T}$ . Smooth coefficient maps $\hat{\mathbf{x}}_{m}$ enforce similarity between nearby and overlapping representations, which seems particularly suitable for sound fields.

II.4 Convolutional reconstruction via ADMM

To reconstruct a sound field via the convolutional model, we solve Eq. (10) via the alternating direction method of multipliers (ADMM)boyd2010admm and rewrite Eq. (10) in matrix form

	$\displaystyle\hat{\mathbf{x}}=\text{arg}\min_{\mathbf{x}}\frac{1}{2}$	$\displaystyle\left\lVert\mathbf{MD_{*}}\mathbf{x}-\mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}$		(11)
	$\displaystyle+$	$\displaystyle\frac{\mu}{2}\sum_{m}\sum_{l}\left\lVert\mathbf{G_{*}}_{l}\mathbf{x}_{m}\right\rVert_{2}^{2}+\beta\left\lVert\mathbf{x}\right\rVert_{1}\,,$

where $\mathbf{x}\in\mathbb{C}^{MN}$ are the stacked columns of $\mathbf{X}$ and $\mathbf{D_{*}}\in\mathbb{C}^{N\times MN}$ is convolutional dictionary matrix such that $\mathbf{D_{*}}\mathbf{x}=\sum_{m}\mathbf{h}_{m}\circledast\overset{L}{\cdots}\circledast\mathbf{x}_{m}$ (e.g. for $L=1$ , $\mathbf{D_{*}}$ is block-circulant with block $\mathbf{H}_{s}$ ). $\mathbf{G_{*}}_{l}\mathbf{x}_{m}$ calculates the first order finite differences of the $m^{\text{th}}$ coefficient map along the $l^{\text{th}}$ dimension. In the one-dimensional case, $\mathbf{G_{*}}$ is a circulant matrix with the first row $[1,-1,\mathbf{0}_{1\times S-2}]$ , where $\mathbf{0}_{1\times S-2}$ is a row vector of $S-2$ zeros.

To solve Eq. (11), we split the variables and reformulate the joint problem wohlberg2016a ; wohlberg2017a ; wohlberg2018a as

$\displaystyle\underset{\mathbf{x,y_{0},y_{1}}}{\text{minimize}}\quad$	$\displaystyle\frac{1}{2}\left\lVert\mathbf{M}\mathbf{y_{0}}-\mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}+\frac{\mu}{2}\sum_{m}\sum_{l}\left\lVert\mathbf{G_{*}}_{l}\mathbf{x}\right\rVert_{2}^{2}+\beta\left\lVert\mathbf{y_{1}}\right\rVert_{1}$	(12)
subject to	$\displaystyle\mathbf{Ax-y}=0\quad,$
where	$\displaystyle\mathbf{A}=\begin{bmatrix}\mathbf{D_{*}}\\ \mathbf{I}\end{bmatrix}\quad\text{and}\quad\mathbf{y}=\begin{bmatrix}\mathbf{y_{0}}\\ \mathbf{y_{1}}\end{bmatrix}.$

The ADMM steps in the $k^{\text{th}}$ iteration are

$\displaystyle\mathbf{x}^{k+1}$	$\displaystyle=\text{arg}\min_{\mathbf{x}}\frac{\mu}{2}\sum_{m}\sum_{l}\left\lVert\mathbf{G_{*}}_{l}\mathbf{x}_{m}\right\rVert_{2}^{2}$	(13)
	$\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{A}\mathbf{x}-\mathbf{y}^{k}+\mathbf{u}^{k}\right\rVert_{2}^{2}$
$\displaystyle\mathbf{y}^{k+1}$	$\displaystyle=\text{arg}\min_{\mathbf{y}}\frac{1}{2}\left\lVert\mathbf{M}\mathbf{y_{0}}-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}\,+\beta\left\lVert\mathbf{y_{1}}\right\rVert_{1}$	(14)
	$\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{Ax}^{k+1}-\mathbf{y}+\mathbf{u}^{k}\right\rVert_{2}^{2}$
$\displaystyle\mathbf{u}^{k+1}$	$\displaystyle=\mathbf{u}^{k}+\mathbf{Ax}^{k+1}-\mathbf{y}^{k+1}\ ,$	(15)

where $\mathbf{u}=[\mathbf{u_{0}},\mathbf{u_{1}}]^{\sf T}$ is the dual variable. The upper index $k$ indicates the state before the $k^{\text{th}}$ iteration (omitted further on for readability). In spatial frequency domain, the convolutional matrices reduce to their transformed filters and Eq. (13) reduces to heide2015a

\left(\mu\mathbf{\tilde{G}}^{\sf H}\mathbf{\tilde{G}}+\rho\mathbf{\tilde{D}}^{\sf H}\mathbf{\tilde{D}}\right)\mathbf{\tilde{x}}^{k+1}=\mathbf{\tilde{D}}^{\sf H}\left(\mathbf{\tilde{y}_{0}}-\mathbf{\tilde{u}_{0}}\right)+\left(\mathbf{\tilde{y}_{1}}-\mathbf{\tilde{u}_{1}}\right)\,,

(16)

which can be efficiently solved via the Sherman-Morrison formula wohlberg-2014-efficient ; wohlberg2016b and where $\tilde{\cdot}$ indicates frequency domain quantities. Also, $\mathbf{\tilde{G}}^{\sf H}\mathbf{\tilde{G}}=\sum_{l}\mathbf{\tilde{G}}_{l}^{\sf H}\mathbf{\tilde{G}}_{l}$ , where $\mathbf{\tilde{G}}_{l}$ is the frequency transformed finite difference matrix in the $l^{\text{th}}$ dimension.
To align dimensions of $\mathbf{\tilde{D}}$ to the sound field $\mathbf{\tilde{p}}$ , zero-padding is typically applied to the local filters $\mathbf{h}_{m}$ (corresponding to $[\mathbf{H}_{s}^{\sf T},\mathbf{0}_{M\times N-N_{s}}]^{\sf T}$ , i.e. the first $M$ columns of $\mathbf{D}_{*}$ ). Instead, we obtain $\mathbf{\tilde{D}}$ from plane waves functions evaluated over the complete sound field (i.e. the global plane wave expansion $\mathbf{H}$ ).
Equations (14) and (15) are separable, such that

$\displaystyle\mathbf{y_{0}}^{k+1}$	$\displaystyle=\text{arg}\min_{\mathbf{y_{0}}}\frac{1}{2}\left\lVert\mathbf{My_{0}}-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}$	(17)
	$\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{y_{0}}-\left(\mathbf{D_{*}x}^{k+1}+\mathbf{u_{0}}^{k}\right)\right\rVert_{2}^{2}$
$\displaystyle\mathbf{y_{1}}^{k+1}$	$\displaystyle=\text{arg}\min_{\mathbf{y_{1}}}\beta\left\lVert\mathbf{y_{1}}\right\rVert_{1}$	(18)
	$\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{y_{1}}-\left(\mathbf{x}^{k+1}+\mathbf{u_{1}}^{k}\right)\right\rVert_{2}^{2}$
$\displaystyle\mathbf{u_{0}}^{k+1}$	$\displaystyle=\mathbf{u_{0}}^{k}+\mathbf{D_{*}x}^{k+1}-\mathbf{y_{0}}^{k+1}$	(19)
$\displaystyle\mathbf{u_{1}}^{k+1}$	$\displaystyle=\mathbf{u_{1}}^{k}+\mathbf{x}^{k+1}-\mathbf{y_{1}}^{k+1}\ ,$	(20)

where the $\mathbf{y_{0}}$ update Eq. (17) has an efficient closed-form solution in frequency domain

\displaystyle\mathbf{\tilde{y}_{0}}^{k+1}

\displaystyle=\left(\mathbf{\tilde{M}}^{\sf H}\mathbf{\tilde{M}}+\rho\mathbf{I}\right)^{-1}\left(\mathbf{\tilde{M}\tilde{p}}_{\mathrm{obs}}+\rho\left(\mathbf{\tilde{D}\tilde{x}}^{k+1}+\mathbf{\tilde{u}_{0}}^{k}\right)\right)\ .

(21)

$\mathbf{y_{1}}$ is updated by soft-thresholding, separable along the elements of $\mathbf{y_{1}}$ ,

\displaystyle\mathbf{y_{1}}^{k+1}

\displaystyle=\mathcal{S}_{\beta/\rho}\left(\mathbf{x}^{k+1}+\mathbf{u_{1}}^{k}\right)\ ,

(22)

where $\mathcal{S}$ is the shrinkage operator

\displaystyle\mathcal{S}_{\alpha}(\mathbf{z})=\text{sign}(\mathbf{z})\odot\text{max}(0,|\mathbf{z}|-\alpha)\ .

(23)

II.5 Assessment of reconstructed sound fields

To assess a reconstructed pressure field $\hat{\mathbf{p}}$ , it is compared to the true field $\mathbf{p}$ in terms of the normalized mean square error NMSE and the spatial similarity $C$ . The NMSE is

\displaystyle\text{NMSE}=20\log_{10}\left(\frac{\left\|\mathbf{\hat{p}}-\mathbf{p}\right\|_{2}}{\left\|\mathbf{p}\right\|_{2}}\right)\ .

(24)

The spatial similarity $C$ is assessed as

\displaystyle C=\frac{\left|\mathbf{\hat{p}}^{\text{H}}\mathbf{p}\right|^{2}}{\left(\mathbf{\hat{p}}^{\text{H}}\mathbf{\hat{p}}\right)\,\left(\mathbf{p}^{\text{H}}\mathbf{p}\right)}\ ,

(25)

such that $C=0$ indicates no similiarity and $C=1$ means the fields are indistinguishable.

III Results

To demonstrate the proposed approach, we reconstruct simulated and measured sound fields. The proposed method is implemented with help of the SPORCO Python package wohlberg-2017-sporco and included as supplementary material.¹¹1

III.1 Reconstruction along the radial distance from a monopole

The first experiment reconstructs the sound field by a monopole at the end of a linear microphone array, see Fig. 2(a). The sound pressure is simulated radially across ten wavelengths ( $\lambda$ ), spanning the near-field and the far-field of the monopole. The linear array consists of 31 microphones at $0.5\,\lambda$ to $10.5\,\lambda$ radial distance and with spacing of $\lambda/3$ . Three reconstruction methods are compared:

i)

global plane waves with least squares (no regularization in Eq. (3));
ii)

local independent plane waves using compressive sensing, i.e. finding sparse representations via Eq. (3) for each local partition separately, then overlap and average;
iii)

convolutional sparse plane waves with smooth coefficients via Eq. (10).

All three reconstruction methods use the same set of plane waves, with wavenumber $\|\mathbf{k}\|_{2}=\tfrac{2\pi}{\lambda}$ and 21 incidence angles equally spaced along a semicircle $[0\cdots\pi]$ . This experiment interpolates from observations with resultion $\lambda/3$ to a grid spacing of $\lambda/24$ . The aperture of length $10\lambda$ contains $N=241$ reconstruction positions. The local approaches operate use subdomains of size one wavelength. In this example, each subdomain contains $N_{s}=25$ reconstruction points and at most 4 observations. Both sides of the domain are padded with $N_{s}-1$ zeros, to reduce artifacts from circular wrapping. The zero-padded domain has size $N^{\prime}=N+2(N_{s}-1)=289$ and is partitioned in $S=N^{\prime}$ fully overlapping subdomains. After reconstruction, the sound field is cropped again to the original size $N$ . For comparison, the local approaches determine $MS=21\times 289$ coefficients compared to $M=21$ in the global model.

The reconstructions $\hat{\mathbf{p}}$ are shown in Fig. 2(b) and the error $\hat{\mathbf{p}}-\mathbf{p}$ in Fig. 2(c). All methods yield good reconstructions. The error is lowest for the smooth convolutional model, where smooth coefficients (plane wave magnitude and direction) among neighbouring representations are enforced. Methods using locally variant plane wave coefficients are flexible enough to approximate all measurements (the error is zero at measurement positions). Still, they rely on sufficient measurements within a local partition to yield good predictions. The global plane wave model can not conform to all measurements across the array, because of the mismatch between the radial decay of the sound field and propagating plane waves.

The experiment is repeated for varying distances between the monopole source and the microphone array. The normalized spatial mean squared error (24) is shown in Fig. 3. When the array is close to the monopole, local representations improve reconstructions due to the field’s high curvature and strong decay with distance. The proposed smooth-convolutional approach gives the most accurate reconstructions up to 0.7 $\lambda$ . The error decreases with distance for all methods, due to the less pronounced magnitude decay in the field. The further the array is placed in far field, the more the true field approaches planar characteristics, such that also the global model fits to the observations well and yields the best predictions. It is to note that in an application scenario, the reconstruction quality depends on many factors, such as the aperture size, curvature of wavefronts within the aperture, number and distribution of measurements available and not at least the evaluation criteria.

III.2 2D reconstruction: monopole and plane wave

The approach is tested for a two-dimensional aperture of size $(5\,\lambda)^{2}$ in the plane $z=0$ . The sound field is generated by interference of a monopole at (0,0, $\lambda$ /8) with a plane wave propagating in $\mathbf{k}/k=(k_{x},k_{y},k_{z})/k=(0.38,-0.76,0.52)$ . Four plane wave reconstructions are tested:

i)

global plane waves with ridge regression ( $\ell_{2}$ -norm regularization in Eq. (3), parameter $\beta$ via leave-one-out cross-validation);
ii)

local independent, sparse plane waves (solving Eq. (3) via least-angle regression, regularization via leave-one-out cross-validation);
iii)

convolutional sparse plane waves without smooth coefficients ( $\mu=0$ in Eq. (10));
iv)

convolutional sparse plane waves with smooth coefficients via Eq. (10).

All models use plane waves with propagation angles distributed in a fibonacci grid ( $k_{z}\geq 0$ hemisphere). The global method i) uses $M$ = 1000 propagation angles, the local methods ii-iv) $M$ = 100 angles. The reconstruction grid is regular with spacing of $d=\lambda/10$ between positions ( $N=51^{2}$ ). The local subdomains and filters in methods ii)-iv) have size $\lambda^{2}$ (i.e. $N_{s}=(\lfloor\lambda/d\rfloor+1)^{2}=11^{2}$ discrete positions). For methods iii) and iv), the domain is zeropadded (with $\sqrt{N_{s}}-1$ in each direction) to avoid artifacts from circular convolutions ( $N^{\prime}=(\sqrt{N}+2(\sqrt{N_{s}}-1))^{2}=3481$ ). The regularization parameters are tuned to $\beta=1\times 10^{-5}$ , $\mu=1\times 10^{-3}$ (0 for iii) ), $\rho=1\times 10^{-5}$ , and the ADMM iterations are stopped after 500 iterations. Note that this demonstration showcase exhibits a high signal to noise ratio. In less favourable conditions, the regularization would likely need to be adjusted.

The results are shown in Fig. 4. For reconstructions from 100 microphones, the global and the proposed approach with smooth local coefficients capture the spatial phenomena better than the two other approaches ii) and iii), which only rely on local and global sparsity. The spatially invariant (global) and slowly varying (proposed) models prescribe the necessary spatial structure to reconstruct from few measurements. Both methods exploit measurements across a larger spatial range and capture the global structure of the sound field, which is critical in sparsely sampled scenarios (see Fig. 4(b) and (e) vs. (c-d). When more microphones are available, spatially variant (local) models benefit from their flexibility to model complex sound fields. The global approach exhibits artefacts around the monopole due to the strong decay and curvature of the wavefronts in this region. The proposed approach balances local flexibility with global structure and yield good reconstructions in both cases. Note that method iii) and the proposed method iv) apply the same shrinkage threshold $\beta/\rho$ to all local coefficients (see Eq. (22)). Dynamic regularization could further improve the results, as it was observed for the independent local approach ii) (which uses cross-validation to find an optimal $\beta$ for each subdomain).

The particle velocity and sound intensity xy-vector fields of the reconstructions from 300 microphones (second row Fig. 4) are shown in Fig. 5. Global representations can not conform to the drastic spatial variations close to the monopole, all particle velocity vectors point outwards from $(x,y)=(0,0)$ . Local approaches recover the fine structure of particle velocity and intensity also around the monopole.

III.3 Experimental reconstruction with real data: classroom measurement

The same methods i)-iv) from Sec. III.2 (and parameters) are used to reconstruct the enclosed reverberant sound field in a classroom (DTU building 352, Lyngby, Denmark), shown in Fig. 6. The room dimensions are $(l_{x},l_{y},l_{z})=(6.63,9.45,2.97)$ m, the reverberation time approx. $T_{60}=0.5$ s, the Schröder frequency $f_{S}\approx 240$ Hz. The room is furnished and its walls are somewhat irregular, with wooden floor, scattering elements on the walls and absorbing ceiling. A loudspeaker (BM6, Dynaudio, Skanderborg, Denmark) placed in a room corner was used to excite the room with 10 s logarithmic sweeps from 20 Hz to 20 kHz. A total of 4761 frequency responses were measured using a robotic arm (UR5, Universal Robots, Odense, Denmark) with a 1/2 inch free field condenser microphone (Brüel&Kjäer, Nærum, Denmark). The positions are distributed over a $1.7\times 1.7$ m² planar aperture with $N=69^{2}$ positions on a regular grid with 2.5 cm spacing. We refer the reader to hahmann2021spatial, for more information on the room and measurements and to hahmann2021b, for the dataset.

The sound field in the classroom is reconstructed at 1000 Hz from 98 and 295 measurements, distributed with uniform probability (and a minimum distance of 7 cm) across the aperture. Reconstructions using i-iv) and measured reference (“true”) of the classroom sound field are shown in Fig. 7 for the two cases with $N_{\mathrm{obs}}$ = 98 and 295. To reconstruct from few measurements, it is necessary to capture the global structure of the sound field, as in the global and the proposed approach (Fig. 7(b) and (d)). Models based on local representations conform easier to many measurements due to their higher number of coefficients (Fig. 7(i-k)). Specifically in this test, subdomains of size $\lambda^{2}$ contain $N_{s}=24^{2}$ discrete reconstruction positions, such that local models use a total of ii) $MN=476100$ and with padding iii+iv) $MN^{\prime}=M(\sqrt{N}+2(\sqrt{N_{s}}-1))^{2}=1322500$ coefficients compared to 1000 in the global model. The proposed approach combines both local flexibility and joint global analysis. As a consequence, it yields the highest similarity and lowest reconstruction errors when compared to the measured field.

The benefit of smooth coefficients shows when comparing the two convolutional approaches. Both seek a sparse approximation of the measurements, but spatial continuity is required to reconstruct sound fields successfully via local representations. Also the local independent approach yields smooth reconstructions, namely by averaging of overlapping partitions. However, a joint analysis of nearby representations is needed to align nearby coefficients and hence, indirectly exploit nearby measurements. In this study, the sparsity constraint enables feasible reconstructions also when only few measurements are available within a local subdomain and the inverse problem is severely underdetermined. As such, it is not the goal to represent the sound field using the fewest number of coefficients, but local sparsity is a means of exploiting the available measurements.

The experiment in Fig. 7 is extended to other frequencies and reconstruct the sound field in the classroom from 500 Hz to 2 kHz using a fixed number of microphones. The NMSE results in Fig. 8 show that the proposed approach yields good reconstructions when average distance between measurements is lower than $\lambda/2$ . The proposed approach yields significant improvement with errors close to (see Fig. 8(a)), or lower than global plane waves (for sufficient measurements, see Fig. 8(b,c)).

IV Conclusion

This study formulates a sound field model as a spatial convolution between a global coefficient map and local plane wave filters. This model leads to a joint analysis of all local representations, while keeping their spatial relation (and thereby the global structure of the field) intact. By penalizing the spatial differences of plane wave coefficients, continuity between neighboring representations is enforced in terms of amplitude and direction of the plane waves. In this way, each local representation has to be consistent with its neighbours, and can therefore utilise nearby observations. The experiments indicate that the proposed approach both conforms to complex spatial sound fields and also preserves the global structure of the sound field. Compared to other local models using locally sparse coding in terms of plane waves, the proposed approach attains better reconstructions of sound fields when few measurements are available. When measurements are very scarcely distributed, an expansion of the entire global field in terms of plane waves yields the best reconstructions. However, when sufficient measurements are available, the experiments indicate that local representation models conform best to fields of higher complexity. This is shown for the reconstruction of the sound pressure, as well as for the reconstruction of particle velocity and sound intensity vector fields, where the improvements are even more substantial.

Acknowledgements.

This work is funded by VILLUM Fonden through VILLUM Young Investigator grant number 19179 for the project ‘Large-scale Acoustic Holography’.

References

(1) F. Jacobsen and E. Tiana Roig, “Measurement of the sound power incident on the walls of a reverberation room with near field acoustic holography,” Acustica United w. Acta Acustica 96(1), 76–81 (2010).
(2) S. A. Verburg and E. Fernandez-Grande, “Reconstruction of the sound field in a room using compressive sensing,” J. Acoust. Soc. Am. 143(6), 3770–3779 (2018).
(3) Y. Haneda, Y. Kaneda, and N. Kitawaki, “Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function,” IEEE Trans. Sp. Audio Proc. 7(6), 709–717 (1999).
(4) R. Mignot, L. Daudet, and F. Ollivier, “Room reverberation reconstruction: Interpolation of the early part using compressed sensing,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 21(11), 6562745, 2301–2312 (2013).
(5) R. Mignot, G. Chardon, and L. Daudet, “Low frequency interpolation of room impulse responses using compressed sensing,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 22(1), 205–216 (2014).
(6) M. Nolan, S. A. Verburg, J. Brunskog, and E. Fernandez-Grande, “Experimental characterization of the sound field in a reverberation room,” J. Acoust. Soc. Am. 145(4), 2237–2246 (2019).
(7) I. B. Witew, M. Vorländer, and N. Xiang, “Sampling the sound field in auditoria using large natural-scale array measurements,” J. Acoust. Soc. Am. 141(3), EL300–EL306 (2017).
(8) E. Brandão and E. Fernandez-Grande, “Analysis of the sound field above finite absorbers in the wave-number domain,” J. Acoust. Soc. Am. 151(5), 3019–3030 (2022).
(9) F. M. Heuchel, D. Caviedes-Nozal, J. Brunskog, and F. T. Agerkvist, “Large-scale outdoor sound field control,” J. Acoust. Soc. Am. 148(4), 2392–2402 (2020).
(10) D. Caviedes-Nozal, F. M. Heuchel, J. Brunskog, N. A. B. Riis, and E. Fernandez-Grande, “A bayesian spherical harmonics source radiation model for sound field control,” J. Acoust. Soc. Am. 146(5), 3425–3435 (2019).
(11) F. M. Heuchel, E. Fernandez-Grande, F. T. Agerkvist, and E. Shabalina, “Active room compensation for sound reinforcement using sound field separation techniques,” J. Acoust. Soc. Am. 143(3), 1346–1354 (2018).
(12) T. Betlehem and T. D. Abhayapala, “Theory and design of sound field reproduction in reverberant rooms,” J. Acoust. Soc. Am. 117(4), 2100–2111 (2005).
(13) M. B. Møller, J. K. Nielsen, E. Fernandez-Grande, and S. K. Olesen, “On the influence of transfer function noise on sound zone control in a room,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 27(9), 1405–1418 (2019).
(14) N. Borrel-Jensen, A. P. Engsig-Karup, and C.-H. Jeong, “Physics-informed neural networks for one-dimensional sound field predictions with parameterized sources and impedance boundaries,” JASA Express Letters 1(12), 122402 (2021).
(15) J. G. Tylka and E. Y. Choueiri, “Evaluation of techniques for navigation of higher-order ambisonics,” J. Acoust. Soc. Am. 141(5), 3511–3511 (2017).
(16) J. G. Tylka and E. Y. Choueiri, “Fundamentals of a parametric method for virtual navigation within an array of ambisonics microphones,” J. Audio Eng. Soc. 68(3), 120–137 (2020).
(17) F. Winter, F. Schultz, and S. Spors, “Localization properties of data-based binaural synthesis including translatory head-movements,” Proceedings of Forum Acusticum 2014- (2014).
(18) F. Schultz and S. Spors, “Data-based binaural synthesis including rotational and translatory head-movements,” in Audio Eng. Soc. Conf.: Sound Field Control - Eng. and Percep. (2013).
(19) E. Fernandez-Grande, D. Caviedes-Nozal, M. Hahmann, X. Karakonstantis, and S. A. Verburg, “Reconstruction of room impulse responses over extended domains for navigable sound field reproduction,” in Proceed. Int. Conf. Immers. 3D Audio, IEEE (2021), p. 8 pp.
(20) F. Jacobsen and P. M. Juhl, Fundamentals of general linear acoustics (Wiley, London, 2013).
(21) A. Moiola, R. Hiptmair, and I. Perugia, “Plane wave approximation of homogeneous helmholtz solutions,” Zeitschrift Fur Angewandte Mathematik Und Physik 62(5), 809–837 (2011).
(22) M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity-based sound field separation in the spherical harmonics domain,” in IEEE Int. Conf. Acoust. Sp. Sig. Process. (ICASSP) (2022), pp. 1051–1055.
(23) J. G. Tylka and E. Y. Choueiri, “Performance of linear extrapolation methods for virtual sound field navigation,” J. Audio Eng. Soc. 68(3), 138–156 (2020).
(24) M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15(12), 3736–3745 (2006).
(25) D. Markovic, L. Bianchi, S. Tubaro, and A. Sarti, “Extraction of acoustic sources through the processing of sound field maps in the ray space,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(12), 2481–2494 (2016).
(26) D. Markovic, F. Antonacci, A. Sarti, and S. Tubaro, “Soundfield imaging in the ray space,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 21(12), 2493–2505 (2013).
(27) M. Hahmann, S. A. Verburg, and E. Fernandez-Grande, “Spatial reconstruction of sound fields using local and data-driven functions,” J. Acoust. Soc. Am. 150(6), 4417–4428 (2021).
(28) C. Jin, F. Antonacci, and A. Sarti, “Ray space analysis with sparse recovery,” in 2017 IEEE Works. Appl. Si. Process. Aud. Acous. (WASPAA) (2017), pp. 239–243.
(29) S. Yu, C. Jin, F. Antonacci, and A. Sarti, “Sparse recovery beamforming and upscaling in the ray space,” in IEEE Int. Conf. Acoust. Sp. Sig. Process. (ICASSP) (2021), pp. 776–780.
(30) P. Morse and R. Bolt, “Sound waves in rooms,” Reviews of Modern Physics 16(2), 0069–0150 (1944).
(31) M. Schröder, “Eigenfrequenzstatistik und anregungsstatistik in räumen - modellversuche mit elektrischen wellen,” Acustica 4(4), 456–468 (1954).
(32) A. D. Pierce, Acoustics. An introduction to its physical principles and applications (McGraw-Hill, New York, 1981).
(33) V. Papyan, J. Sulam, and M. Elad, “Working locally thinking globally: Theoretical guarantees for convolutional sparse coding,” IEEE Trans. Signal Process. 65(21), 7997798, 5687–5701 (2017).
(34) B. Wohlberg, “Convolutional sparse representations with gradient penalties,” ICASSP, IEEE Int. Conf. Acoust., Speech and Sig. Proc. - Proceedings 2018-, 8462151 (2018).
(35) M. J. Bianco and P. Gerstoft, “Travel time tomography with adaptive dictionaries,” IEEE Trans. Comput. Imaging 4(4), 499–511 (2018).
(36) R. Grosse, R. Raina, H. Kwong, and A. Y. Ng, “Shift-invariant sparse coding for audio classification,” Proc. Conf. on Uncert. in Art. Int. 149–158 (2007).
(37) M. Mørup, L. K. Hansen, S. M. Arnfred, L.-H. Lim, and K. H. Madsen, “Shift invariant multi-linear decomposition of neuroimaging data,” Neuroimage 42(4), 1439–1450 (2008).
(38) D. Batenkov, Y. Romano, and M. Elad, “On the global-local dichotomy in sparsity modeling,” Applied and Numerical Harmonic Analysis 1–53 (2017).
(39) V. Papyan, Y. Romano, and M. Elad, “Convolutional neural networks analyzed via convolutional sparse coding,” J. Machine Learning Research 18, 1–52 (2017).
(40) R. Cohen and Y. C. Eldar, “Sparse convolutional beamforming for ultrasound imaging,” IEEE Trans. Ultras, Ferroel., and Freq. Control 65(12), 2390–2406 (2018).
(41) F. Lluís, P. Martínez-Nuevo, M. Bo Møller, and S. Ewan Shepstone, “Sound field reconstruction in rooms: Inpainting meets super-resolution,” J. Acoust. Soc. Am. 148(2), 649 (2020).
(42) P.-A. Grumiaux, S. Kitić, L. Girin, and A. Guérin, “A survey of sound source localization with deep learning methods,” J. Acoust. Soc. Am. 152(1), 107–151 (2022).
(43) P. Gerstoft, C. F. Mecklenbräuker, W. Seong, and M. Bianco, “Introduction to compressive sensing in acoustics,” J. Acoust. Soc. Am. 143(6), 3731–3736 (2018).
(44) E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008).
(45) S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning 3(1), 1–122 (2010).
(46) F. Heide, W. Heidrich, and G. Wetzstein, “Fast and flexible convolutional sparse coding,” in Proc. IEEE Conf. on Comp. Vision and Pattern Rec. (CVPR) (2015), pp. 5135–5143.
(47) B. Wohlberg, “Efficient algorithms for convolutional sparse representations,” IEEE Trans. Image Processing 25(1), 7308045 (2016).
(48) B. Wohlberg, “Boundary handling for convolutional sparse representations,” Proceedings - International Conference on Image Processing, Icip 2016-, 7532675, 1833–1837 (2016).
(49) B. Wohlberg and P. Rodriguez, “Convolutional sparse coding: Boundary handling revisited,” (2017).
(50) P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion (SIAM, Philadelphia, 1998), pp. 1–16.
(51) See the code repository https://github.com/manvhah/convolutional_plane_waves to run experiments.
(52) F. Heide, W. Heidrich, and G. Wetzstein, “Fast and flexible convolutional sparse coding,” Proc. IEEE Conf. on Comp. Vision and Pattern Rec. (CVPR) 07-12-, 7299149, 5135–5143 (2015).
(53) B. Wohlberg, “Efficient convolutional sparse coding,” in Proc. IEEE Int. Conf. on Acoust., Speech, and Sig. Process. (ICASSP) (2014), pp. 7173–7177.
(54) B. Wohlberg, “SPORCO: A Python package for standard and convolutional sparse representations,” in Proceed. of the 15th Python in Science Conf., Austin, TX, USA (2017), pp. 1–8.
(55) M. Hahmann, S. A. Verburg, and E. Fernandez-Grande, “Acoustic frequency responses in a conventional classroom” Dataset doi: 10.11583/DTU.13315286 (2021).