This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A convolutional plane wave model for sound field reconstruction

Manuel Hahmann    Efren Fernandez-Grande [email protected] Acoustic Technology Group, Department of Electrical Engineering, Technical University of Denmark, Building 352, Ørsteds Plads, 2800 Kgs. Lyngby, Denmark
Abstract

Spatial sound field interpolation relies on suitable models to both conform to available measurements and predict the sound field in the domain of interest. A suitable model can be difficult to determine when the spatial domain of interest is large compared to the wavelength or when spherical and planar wavefronts are present or the sound field is complex, as in the near-field. To span such complex sound fields, the global reconstruction task can be partitioned into local subdomain problems. Previous studies have shown that partitioning approaches rely on sufficient measurements within each domain, due to the higher number of model coefficients. This study proposes a joint analysis of all local subdomains, while enforcing self-similarity between neighbouring partitions. More specifically, the coefficients of local plane wave representations are sought to have spatially smooth magnitudes. A convolutional model of the sound field in terms of plane wave filters is formulated and the inverse reconstruction problem is solved via the alternating direction method of multipliers. The experiments on simulated and measured sound fields suggest, that the proposed method both retains the flexibility of local models to conform to complex sound fields and also preserves the global structure to reconstruct from fewer measurements.

I Introduction

Sound field reconstruction methods enable spatial interpolation of sound fields from a set of discrete measurements. Such spatial characterization of sound fields is key in applications such as sound field analysis, jacobsen2010a ; verburg2018a ; haneda1999a ; mignot2013a ; mignot2014a ; nolan2019a ; witew2017a ; brandao2022a sound field control, heuchel2020a ; caviedes2019a ; heuchel2018a ; betlehem2005a ; moller2019a in simulation software (interpolation from a coarser to a finer grid),borrel2021a and for navigation of a sound field in auralization and spatial audio.tylka2017a ; tylka2020a ; winter2014a ; schultz2013data ; fernandez2021a Often, the sound field at hand is dominated by wavefronts of specific geometry, and a matching propagation model is sufficient to approximate the measurements and interpolate the sound field. Typical approaches use for example plane waves verburg2018a ; mignot2014a ; nolan2019a ; jacobsen2013a ; moiola2011a or spherical harmonics caviedes2019a ; betlehem2005a ; pezzoli2022sparsity (in free or near field).

When considering areas significantly larger than the acoustic wavelength, parts of the sound field can show significant influence of reflections, scattering, diffraction or varying wavefront curvature.witew2017a ; heuchel2020a ; brandao2022a ; fernandez2021a To model fields across such large domains, typical approaches include wave expansions with a high number of terms, or dividing the global domain into smaller local subdomains. For example, plane waves have been proposed to interpolate between two local spherical harmonic decompositions.tylka2020b Another approach is to analyse the sound field in terms of independent overlapping subdomains, as it is common in other disciplines like image processing.elad2006a In acoustic field analysis, subdomain representations using plane waves (also called ray space analysis) have been explored.markovic2016a ; markovic2013a ; hahmann2021spatial Sparse subdomain representations have been examined for beamforming,jin2017a and sound field reconstruction using plane wavesyu2021upscaling and functions learned from measured sound fields.hahmann2021spatial

Such locally variant representations increase the number of model coefficients to span more complex observations. However, independent local representations of sound fields ignore the continuous nature of wave propagation. Even diffuse fields, which exhibit the shortest possible correlation length, are commonly described as a superposition of infinitely many plane waves with random incidence angles and phases.morse1944a ; schroeder1954a ; pierce1981a It is therefore reasonable to assume spatial similarity between sound field representations in overlapping subdomains.

Convolutional models express a given field in terms of a set of subdomain-size filters, convolved with a spatial coefficient map. The spatial coefficient map preserves the spatial context of subdomains and allows for joint analysis of neighbouring partitions, thereby capturing both the fine structure and large-scale features of the field.papyan2017a ; wohlberg2018a ; bianco2018a Such convolutional approaches often exploit sparsity to find optimal local filter coefficients. They are then known as convolutional or shift-invariant sparse coding in audio and image processinggrosse2007a ; m2008a ; batenkov2017a and parallels to convolutional neural networks exist.papyan2017b Convolutional analysis has previously been proposed for beamformingcohen2018sparse , also in form of neural networks for spatial sound field interpolationllu2020a and source localization.grumiaux2021a

In this study, we express a monochromatic sound field in terms of a locally variant planar wave model. In a convolutional formulation, we enforce continuity between local representations to exploit observations in neighboring subdomains. In this way, we not only accommodate local phenomena, but also capture the global structure of the sound field (by context between neighbours). Specifically, we enforce spatially smooth coefficients in a joint analysis of all local plane wave representations. In addition to global continuity, we also require sparsity within each local subdomain. Because the spatial frequencies are bandlimited, sparse approximations enable reconstructions even from few observations within each local subdomain.verburg2018a ; gerstoft2018a ; candes2008a

To test the proposed continuous convolutional plane wave model, we reconstruct: in Sec. III.1 a simulated sound field in near field of a monopole, radially across a linear array, in Sec. III.2 a simulated field of a monopole interferring with a plane wave across a 2D aperture and in Sec. III.3 the experimentally captured reverberant high-frequency sound field in a classroom across a large 2D aperture. The reconstruction is formulated as an Alternating Direction Method of Multipliers (ADMM) problem.boyd2010admm Where applicable, steps are solved in frequency domain, where the convolution transforms to a multiplication.heide-2015-fast ; wohlberg2016b To enable reconstruction across a limited and sparsely sampled aperture, mask decoupling is applied.wohlberg2016a ; wohlberg2017a As benchmarks, global and independent local plane wave reconstructions are included in the tests.

II Theory

This section explains the sound field modelling and reconstruction approaches included in this study: the conventional linear superposition model in II.1, the partitioning of the reconstruction domain into independent, overlapping subdomains in II.2, the convolutional model to facilitate spatially smooth local coefficients in II.3, its solution via ADMM in II.4, and the assessment of reconstructed sound fields in II.5.

II.1 Global sound field model and reconstruction

The true acoustic pressure 𝐩N\mathbf{p}\in\mathbb{C}^{N} at frequency ff and NN positions 𝐫3\mathbf{r}\in\mathbb{R}^{3} within a domain Ω\Omega is assumed to be modelled as a linear combination of basis functions

𝐩=𝐇𝐱,\mathbf{p=Hx}\ , (1)

where 𝐱M\mathbf{x}\in\mathbb{C}^{M} are the coefficients and 𝐇N×M\mathbf{H}\in\mathbb{C}^{N\times M} contains the MM basis functions. For example, plane propagating waves ej𝐤𝖳𝐫e^{-\mathrm{j}\mathbf{k}^{\sf T}\mathbf{r}} are often used to model reverberant or far-fields, where 𝐤\mathbf{k} is the wavenumber vector (𝐤2=2π/λ\|\mathbf{k}\|_{2}=2\pi/\lambda, λ\lambda is the wavelength). In the case of plane waves, the n,mthn,m^{\text{th}} element of 𝐇\mathbf{H} is ej𝐤m𝖳𝐫ne^{-\mathrm{j}\mathbf{k}_{m}^{\sf T}\mathbf{r}_{n}}, with 𝐫n\mathbf{r}_{n} denoting the nthn^{\text{th}} position and 𝐤m\mathbf{k}_{m} the wavenumber vector from the mthm^{\text{th}} incidence direction.

For the equation to hold, 𝐇\mathbf{H} must span the observed sound pressure field 𝐩\mathbf{p} over the complete domain Ω\Omega. An observation of the sound pressure field is

𝐩obs=𝐌𝐩+𝐧=𝐇obs𝐱+𝐧,\mathbf{p}_{\mathrm{obs}}=\mathbf{Mp+n}=\mathbf{H}_{\mathrm{obs}}\mathbf{x+n}\ , (2)

where 𝐌{0,1}Nobs×N\mathbf{M}\in\{0,1\}^{N_{\mathrm{obs}}\times N} is a binary mask selecting the NobsN_{\mathrm{obs}} available observations from the sound field 𝐩\mathbf{p}. 𝐇obs=𝐌𝐇\mathbf{H}_{\mathrm{obs}}=\mathbf{MH} is the model at the observed positions and 𝐧\mathbf{n} is an error vector, that accounts for measurement noise and model error.

An optimal set of coefficients 𝐱^\hat{\mathbf{x}} is found by inversion of Eq. (2). To arrive to a stable solution, regularization is necessary as 𝐇\mathbf{H} is typically ill-conditioned or rank-deficient.hansen1998a Typically, a structure in the coefficients is imposed, such as in

𝐱^=argmin𝐱𝐇obs𝐱𝐩obs22+β𝐱1,\hat{\mathbf{x}}=\text{arg}\min_{\mathbf{x}}\;\left\lVert\mathbf{H}_{\mathrm{obs}}\mathbf{x}-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}+\beta\ \|\mathbf{x}\|_{1}\ , (3)

in which case a 1\ell_{1}-norm penalty is applied to promote a sparse structure in the coefficients and ^\hat{\cdot} denotes an estimate. The sound field is then reconstructed as

𝐩^=𝐇𝐱^,\hat{\mathbf{p}}=\mathbf{H}\hat{\mathbf{x}}\ , (4)

where 𝐩^N\hat{\mathbf{p}}\in\mathbb{C}^{N} is the reconstructed sound pressure at NN reconstruction positions.

II.2 Local subdomain sound field model

Refer to caption
Figure 1: (Color online) Soundfield reconstruction via redundant subdomain decomposition in one spatial dimension (x-axis). The true signal 𝐩\mathbf{p} of length NN is partitioned in S=NNs+1S=N-N_{s}+1 overlapping subdomains. All subdomains are of length NSN_{S} and collected in 𝐏\mathbf{P}. When the observed noisy signal 𝐩obs\mathbf{p}_{\mathrm{obs}} is partitioned (and Nobs<NN_{\mathrm{obs}}<N), 𝐏obs\mathbf{P}_{\mathrm{obs}} contains unknown values. Reconstruction is applied within each subdomain to estimate 𝐏^\hat{\mathbf{P}}. For each position, the corresponding elements in 𝐏^\hat{\mathbf{P}} are averaged to yield the reconstructed sound field 𝐩^\hat{\mathbf{p}}.

When reconstructing the sound field over large spatial domains (i.e. much larger than the acoustic wavelength), it is useful to partition the global domain Ω\Omega into smaller, overlapping subdomains Ωsub\Omega_{sub}. Correspondingly, the sound field 𝐩\mathbf{p} can be described by a collection of SS subdomain sound fields 𝐩sNs\mathbf{p}_{s}\in\mathbb{C}^{N_{s}}

𝐏=[𝐑1𝐩𝐑S𝐩]=[𝐩1𝐩S],𝐏Ns×S.\mathbf{P}=[\mathbf{R}_{1}\mathbf{p}\cdots\mathbf{R}_{S}\mathbf{p}]=[\mathbf{p}_{1}\cdots\mathbf{p}_{S}]\,,\ \mathbf{P}\in\mathbb{C}^{N_{s}\times S}. (5)

𝐑s{0,1}Ns×N\mathbf{R}_{s}\in\{0,1\}^{N_{s}\times N} is a binary extraction operator to select the NsN_{s} positions contained in the sths^{\text{th}} subdomain Ωs\Omega_{s}. The partitioning is illustrated on the left side of Fig. 1. For simplicity, all subdomains within a sound field are considered to have the same extent, such that NsN_{s} is constant. Specifically, this study considers subdomains of extent one wavelength λ\lambda in each dimension of the aperture. Note that Eq. (5) yields a redundant representation of the sound field if the subdomains overlap (NsSNN_{s}S\geq N).

A sound field can then be reconstructed within each subdomain of the partitioned observations 𝐏obs\mathbf{P}_{\mathrm{obs}}, for example by applying the procedure in Sec. II.1 and finding coefficients 𝐱^x\hat{\mathbf{x}}_{x} via Eq. (3) to estimate each 𝐩^s\hat{\mathbf{p}}_{s}. This yields the collection of reconstructed subdomain sound fields [see center right in Fig. 1].

𝐏^=𝐇s𝐗^,\hat{\mathbf{P}}=\mathbf{H}_{s}\hat{\mathbf{X}}\ , (6)

where 𝐇s\mathbf{H}_{s} are the model functions at the desired positions within each local subdomain, for example 𝐇s=𝐑1𝐇\mathbf{H}_{s}=\mathbf{R}_{1}\mathbf{H}, and

𝐗^=[𝐱^1𝐱^S],𝐗^M×S\hat{\mathbf{X}}=[\hat{\mathbf{x}}_{1}\cdots\hat{\mathbf{x}}_{S}]\,,\ \hat{\mathbf{X}}\in\mathbb{C}^{M\times S} (7)

the estimated local coefficients. The reconstructed field 𝐩^\hat{\mathbf{p}} is reassembled from 𝐏^\hat{\mathbf{P}} as the mean of overlapping subdomain representations [see right side of Fig. 1],

𝐩^=𝐖s𝐑sT𝐩s^,\hat{\mathbf{p}}=\mathbf{W}\,\sum_{s}\mathbf{R}_{s}^{T}\hat{\mathbf{p}_{s}}\,, (8)

where the diagonal matrix 𝐖=diag(s𝐑sT𝟏Ns)1\mathbf{W}=\text{diag}(\sum_{s}\mathbf{R}_{s}^{T}\mathbf{1}_{N_{s}})^{-1} normalizes by the spatial overlap of the subdomains and 𝟏Ns\mathbf{1}_{N_{s}} denotes a vector of NsN_{s} ones.

Such partitioning approaches counteract model mismatch, which occurs if a global model is suboptimal. For example in the global sound field model of Eq. (1), the chosen model functions in 𝐇\mathbf{H} might not span observations of a sound field across a large spatial domain.

Estimating independent coefficients for each subdomain allows for arbitrarily different wave components in each subdomain. Such independent treatment of local representations disregards the similarity or even redundancy between sound fields in nearby or overlapping local subdomains.

The coefficients in the local subdomain sound field model can also be understood as a collection of coefficient maps 𝐱mS\mathbf{x}_{m}\in\mathbb{C}^{S}, the rows in 𝐗=[𝐱1𝖳𝐱M𝖳]𝖳\mathbf{X}=[\mathbf{x}^{\sf T}_{1}\cdots\mathbf{x}^{\sf T}_{M}]^{\sf T}. The mthm^{\text{th}} row in 𝐗\mathbf{X} contains a coefficient for the mthm^{\text{th}} local model function across all SS subdomain locations. When plane waves are used, the coefficient map 𝐱m\mathbf{x}_{m} contains the spatial distribution of coefficients for the mthm^{\text{th}} plane wave across subdomain representations.

II.3 Convolutional sound field model

This study explores, how the spatial variations across 𝐱m\mathbf{x}_{m}, the rows of 𝐗\mathbf{X}, can be taken into account for reconstruction, such that the global structure (or topology) of the sound field can be preserved. For the remainder of the paper, we consider the reconstruction positions on a regular grid (containing also the measured positions as a subset). Further, we assume subdomains of equal size and with full overlap, such that S=NS=N, the number of subdomains equals the number of positions in 𝐩\mathbf{p} (circular boundary conditions).

The true sound field across an LL-dimensional aperture can be rewritten as a sum of MM convolutions:

𝐩=m=1M𝐡m𝐿𝐱m,\mathbf{p}=\sum_{m=1}^{M}\,\mathbf{h}_{m}\circledast\overset{L}{\cdots}\circledast\mathbf{x}_{m}\ , (9)

where 𝐡mNs\mathbf{h}_{m}\in\mathbb{C}^{N_{s}} describes the mthm^{\text{th}} local filter (the mthm^{\text{th}} column of 𝐇s\mathbf{H}_{s}), for example a plane wave. 𝐿\circledast\overset{L}{\cdots}\circledast denotes a circular convolution along L1,2,3L\in{1,2,3} spatial dimensions. For example for L=1L=1, the nthn^{\text{th}} element of 𝐩\mathbf{p} is,

𝐩(n)=\displaystyle\mathbf{p}(n)= m(𝐡m𝐱m)(n)\displaystyle\sum_{m}(\mathbf{h}_{m}\circledast\mathbf{x}_{m})(n)
=\displaystyle= m(k=0Ns𝐡m(k)𝐱m((N+nk)%N)),\displaystyle\sum_{m}\left(\sum_{k=0}^{N_{s}}\mathbf{h}_{m}(k)\mathbf{x}_{m}\left((N+n-k)\%N\right)\right)\ ,

where n=1,Nn=1,\ldots N and %\% is the modulo operation.

Compared to the collection of local sound fields in Sec. II.2, this convolutional model reflects the spatial relations of coefficients and enables a joint analysis of all local representations in the global field. For example, each subdomain representation can take the coefficients in nearby subdomains into account. We propose to estimate the coefficients of Eq. (9) as

𝐗^\displaystyle\hat{\mathbf{X}} =argmin𝐗12𝐌(m𝐡m𝐿𝐱m)𝐩obs22\displaystyle=\text{arg}\min_{\mathbf{X}}\;\frac{1}{2}\left\lVert\mathbf{M}\left(\sum_{m}\,\mathbf{h}_{m}\circledast\overset{L}{\cdots}\circledast\mathbf{x}_{m}\right)-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2} (10)
+μ2mlΔl𝐱m22+βm𝐱m1,\displaystyle+\frac{\mu}{2}\,\sum_{m}\sum_{l}\ \left\lVert\Delta_{l}\mathbf{x}_{m}\right\rVert_{2}^{2}+\ \beta\sum_{m}\left\lVert\mathbf{x}_{m}\right\rVert_{1}\ ,

The 1\ell_{1} penalty promotes sparse coefficients, notably applied on the global coefficient vector. A penalty on the spatial differences of the coefficients promotes smooth coefficient maps 𝐱^m\hat{\mathbf{x}}_{m}, weighted by the regularization parameter μ\mu. Specifically, Δl𝐱m\Delta_{l}\mathbf{x}_{m} are the first order finite differences of the mthm^{\text{th}} coefficient map along the lthl^{\text{th}} dimension. For example, when the considered aperture (and therefore 𝐱m\mathbf{x}_{m}) is one-dimensional, it is Δ𝐱m=[xm1xm2,xm2xm3xmSxm1]𝖳\Delta\mathbf{x}_{m}=[x_{m1}-x_{m2},x_{m2}-x_{m3}\cdots x_{mS}-x_{m1}]^{\sf T}. Smooth coefficient maps 𝐱^m\hat{\mathbf{x}}_{m} enforce similarity between nearby and overlapping representations, which seems particularly suitable for sound fields.

II.4 Convolutional reconstruction via ADMM

To reconstruct a sound field via the convolutional model, we solve Eq. (10) via the alternating direction method of multipliers (ADMM)boyd2010admm and rewrite Eq. (10) in matrix form

𝐱^=argmin𝐱12\displaystyle\hat{\mathbf{x}}=\text{arg}\min_{\mathbf{x}}\frac{1}{2} 𝐌𝐃𝐱𝐩obs22\displaystyle\left\lVert\mathbf{MD_{*}}\mathbf{x}-\mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2} (11)
+\displaystyle+ μ2ml𝐆l𝐱m22+β𝐱1,\displaystyle\frac{\mu}{2}\sum_{m}\sum_{l}\left\lVert\mathbf{G_{*}}_{l}\mathbf{x}_{m}\right\rVert_{2}^{2}+\beta\left\lVert\mathbf{x}\right\rVert_{1}\,,

where 𝐱MN\mathbf{x}\in\mathbb{C}^{MN} are the stacked columns of 𝐗\mathbf{X} and 𝐃N×MN\mathbf{D_{*}}\in\mathbb{C}^{N\times MN} is convolutional dictionary matrix such that 𝐃𝐱=m𝐡m𝐿𝐱m\mathbf{D_{*}}\mathbf{x}=\sum_{m}\mathbf{h}_{m}\circledast\overset{L}{\cdots}\circledast\mathbf{x}_{m} (e.g.  for L=1L=1, 𝐃\mathbf{D_{*}} is block-circulant with block 𝐇s\mathbf{H}_{s}). 𝐆l𝐱m\mathbf{G_{*}}_{l}\mathbf{x}_{m} calculates the first order finite differences of the mthm^{\text{th}} coefficient map along the lthl^{\text{th}} dimension. In the one-dimensional case, 𝐆\mathbf{G_{*}} is a circulant matrix with the first row [1,1,𝟎1×S2][1,-1,\mathbf{0}_{1\times S-2}], where 𝟎1×S2\mathbf{0}_{1\times S-2} is a row vector of S2S-2 zeros.

To solve Eq. (11), we split the variables and reformulate the joint problem wohlberg2016a ; wohlberg2017a ; wohlberg2018a as

minimize𝐱,𝐲𝟎,𝐲𝟏\displaystyle\underset{\mathbf{x,y_{0},y_{1}}}{\text{minimize}}\quad 12𝐌𝐲𝟎𝐩obs22+μ2ml𝐆l𝐱22+β𝐲𝟏1\displaystyle\frac{1}{2}\left\lVert\mathbf{M}\mathbf{y_{0}}-\mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}+\frac{\mu}{2}\sum_{m}\sum_{l}\left\lVert\mathbf{G_{*}}_{l}\mathbf{x}\right\rVert_{2}^{2}+\beta\left\lVert\mathbf{y_{1}}\right\rVert_{1} (12)
subject to 𝐀𝐱𝐲=0,\displaystyle\mathbf{Ax-y}=0\quad,
where 𝐀=[𝐃𝐈]and𝐲=[𝐲𝟎𝐲𝟏].\displaystyle\mathbf{A}=\begin{bmatrix}\mathbf{D_{*}}\\ \mathbf{I}\end{bmatrix}\quad\text{and}\quad\mathbf{y}=\begin{bmatrix}\mathbf{y_{0}}\\ \mathbf{y_{1}}\end{bmatrix}.

The ADMM steps in the kthk^{\text{th}} iteration are

𝐱k+1\displaystyle\mathbf{x}^{k+1} =argmin𝐱μ2ml𝐆l𝐱m22\displaystyle=\text{arg}\min_{\mathbf{x}}\frac{\mu}{2}\sum_{m}\sum_{l}\left\lVert\mathbf{G_{*}}_{l}\mathbf{x}_{m}\right\rVert_{2}^{2} (13)
+ρ2𝐀𝐱𝐲k+𝐮k22\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{A}\mathbf{x}-\mathbf{y}^{k}+\mathbf{u}^{k}\right\rVert_{2}^{2}
𝐲k+1\displaystyle\mathbf{y}^{k+1} =argmin𝐲12𝐌𝐲𝟎𝐩obs22+β𝐲𝟏1\displaystyle=\text{arg}\min_{\mathbf{y}}\frac{1}{2}\left\lVert\mathbf{M}\mathbf{y_{0}}-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2}\,+\beta\left\lVert\mathbf{y_{1}}\right\rVert_{1} (14)
+ρ2𝐀𝐱k+1𝐲+𝐮k22\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{Ax}^{k+1}-\mathbf{y}+\mathbf{u}^{k}\right\rVert_{2}^{2}
𝐮k+1\displaystyle\mathbf{u}^{k+1} =𝐮k+𝐀𝐱k+1𝐲k+1,\displaystyle=\mathbf{u}^{k}+\mathbf{Ax}^{k+1}-\mathbf{y}^{k+1}\ , (15)

where 𝐮=[𝐮𝟎,𝐮𝟏]𝖳\mathbf{u}=[\mathbf{u_{0}},\mathbf{u_{1}}]^{\sf T} is the dual variable. The upper index kk indicates the state before the kthk^{\text{th}} iteration (omitted further on for readability). In spatial frequency domain, the convolutional matrices reduce to their transformed filters and Eq. (13) reduces to heide2015a

(μ𝐆~𝖧𝐆~+ρ𝐃~𝖧𝐃~)𝐱~k+1=𝐃~𝖧(𝐲~𝟎𝐮~𝟎)+(𝐲~𝟏𝐮~𝟏),\left(\mu\mathbf{\tilde{G}}^{\sf H}\mathbf{\tilde{G}}+\rho\mathbf{\tilde{D}}^{\sf H}\mathbf{\tilde{D}}\right)\mathbf{\tilde{x}}^{k+1}=\mathbf{\tilde{D}}^{\sf H}\left(\mathbf{\tilde{y}_{0}}-\mathbf{\tilde{u}_{0}}\right)+\left(\mathbf{\tilde{y}_{1}}-\mathbf{\tilde{u}_{1}}\right)\,, (16)

which can be efficiently solved via the Sherman-Morrison formula wohlberg-2014-efficient ; wohlberg2016b and where ~\tilde{\cdot} indicates frequency domain quantities. Also, 𝐆~𝖧𝐆~=l𝐆~l𝖧𝐆~l\mathbf{\tilde{G}}^{\sf H}\mathbf{\tilde{G}}=\sum_{l}\mathbf{\tilde{G}}_{l}^{\sf H}\mathbf{\tilde{G}}_{l}, where 𝐆~l\mathbf{\tilde{G}}_{l} is the frequency transformed finite difference matrix in the lthl^{\text{th}} dimension.
To align dimensions of 𝐃~\mathbf{\tilde{D}} to the sound field 𝐩~\mathbf{\tilde{p}}, zero-padding is typically applied to the local filters 𝐡m\mathbf{h}_{m} (corresponding to [𝐇s𝖳,𝟎M×NNs]𝖳[\mathbf{H}_{s}^{\sf T},\mathbf{0}_{M\times N-N_{s}}]^{\sf T}, i.e. the first MM columns of 𝐃\mathbf{D}_{*}). Instead, we obtain 𝐃~\mathbf{\tilde{D}} from plane waves functions evaluated over the complete sound field (i.e. the global plane wave expansion 𝐇\mathbf{H}).
Equations (14) and (15) are separable, such that

𝐲𝟎k+1\displaystyle\mathbf{y_{0}}^{k+1} =argmin𝐲𝟎12𝐌𝐲𝟎𝐩obs22\displaystyle=\text{arg}\min_{\mathbf{y_{0}}}\frac{1}{2}\left\lVert\mathbf{My_{0}}-\ \mathbf{p}_{\mathrm{obs}}\right\rVert_{2}^{2} (17)
+ρ2𝐲𝟎(𝐃𝐱k+1+𝐮𝟎k)22\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{y_{0}}-\left(\mathbf{D_{*}x}^{k+1}+\mathbf{u_{0}}^{k}\right)\right\rVert_{2}^{2}
𝐲𝟏k+1\displaystyle\mathbf{y_{1}}^{k+1} =argmin𝐲𝟏β𝐲𝟏1\displaystyle=\text{arg}\min_{\mathbf{y_{1}}}\beta\left\lVert\mathbf{y_{1}}\right\rVert_{1} (18)
+ρ2𝐲𝟏(𝐱k+1+𝐮𝟏k)22\displaystyle+\frac{\rho}{2}\left\lVert\mathbf{y_{1}}-\left(\mathbf{x}^{k+1}+\mathbf{u_{1}}^{k}\right)\right\rVert_{2}^{2}
𝐮𝟎k+1\displaystyle\mathbf{u_{0}}^{k+1} =𝐮𝟎k+𝐃𝐱k+1𝐲𝟎k+1\displaystyle=\mathbf{u_{0}}^{k}+\mathbf{D_{*}x}^{k+1}-\mathbf{y_{0}}^{k+1} (19)
𝐮𝟏k+1\displaystyle\mathbf{u_{1}}^{k+1} =𝐮𝟏k+𝐱k+1𝐲𝟏k+1,\displaystyle=\mathbf{u_{1}}^{k}+\mathbf{x}^{k+1}-\mathbf{y_{1}}^{k+1}\ , (20)

where the 𝐲𝟎\mathbf{y_{0}} update Eq. (17) has an efficient closed-form solution in frequency domain

𝐲~𝟎k+1\displaystyle\mathbf{\tilde{y}_{0}}^{k+1} =(𝐌~𝖧𝐌~+ρ𝐈)1(𝐌~𝐩~obs+ρ(𝐃~𝐱~k+1+𝐮~𝟎k)).\displaystyle=\left(\mathbf{\tilde{M}}^{\sf H}\mathbf{\tilde{M}}+\rho\mathbf{I}\right)^{-1}\left(\mathbf{\tilde{M}\tilde{p}}_{\mathrm{obs}}+\rho\left(\mathbf{\tilde{D}\tilde{x}}^{k+1}+\mathbf{\tilde{u}_{0}}^{k}\right)\right)\ . (21)

𝐲𝟏\mathbf{y_{1}} is updated by soft-thresholding, separable along the elements of 𝐲𝟏\mathbf{y_{1}},

𝐲𝟏k+1\displaystyle\mathbf{y_{1}}^{k+1} =𝒮β/ρ(𝐱k+1+𝐮𝟏k),\displaystyle=\mathcal{S}_{\beta/\rho}\left(\mathbf{x}^{k+1}+\mathbf{u_{1}}^{k}\right)\ , (22)

where 𝒮\mathcal{S} is the shrinkage operator

𝒮α(𝐳)=sign(𝐳)max(0,|𝐳|α).\displaystyle\mathcal{S}_{\alpha}(\mathbf{z})=\text{sign}(\mathbf{z})\odot\text{max}(0,|\mathbf{z}|-\alpha)\ . (23)

II.5 Assessment of reconstructed sound fields

To assess a reconstructed pressure field 𝐩^\hat{\mathbf{p}}, it is compared to the true field 𝐩\mathbf{p} in terms of the normalized mean square error NMSE and the spatial similarity CC. The NMSE is

NMSE=20log10(𝐩^𝐩2𝐩2).\displaystyle\text{NMSE}=20\log_{10}\left(\frac{\left\|\mathbf{\hat{p}}-\mathbf{p}\right\|_{2}}{\left\|\mathbf{p}\right\|_{2}}\right)\ . (24)

The spatial similarity CC is assessed as

C=|𝐩^H𝐩|2(𝐩^H𝐩^)(𝐩H𝐩),\displaystyle C=\frac{\left|\mathbf{\hat{p}}^{\text{H}}\mathbf{p}\right|^{2}}{\left(\mathbf{\hat{p}}^{\text{H}}\mathbf{\hat{p}}\right)\,\left(\mathbf{p}^{\text{H}}\mathbf{p}\right)}\ , (25)

such that C=0C=0 indicates no similiarity and C=1C=1 means the fields are indistinguishable.

III Results

To demonstrate the proposed approach, we reconstruct simulated and measured sound fields. The proposed method is implemented with help of the SPORCO Python package wohlberg-2017-sporco and included as supplementary material.111

III.1 Reconstruction along the radial distance from a monopole

Refer to caption
Figure 2: (Color online) Reconstruction of the sound field from a monopole (simulation), along a linear array in the radial dimension, placed at a distance of 0.5 λ\lambda. From top to bottom: normalized pressure and decimated measurements; reconstruction via convolutional model, independent local reconstructions and global plane waves; reconstruction errors. Shown is the real part of the pressure in (a) and (b) and the absolute error in (c), normalized to the maximum pressure within the domain, arbitrary units.

The first experiment reconstructs the sound field by a monopole at the end of a linear microphone array, see Fig. 2(a). The sound pressure is simulated radially across ten wavelengths (λ\lambda), spanning the near-field and the far-field of the monopole. The linear array consists of 31 microphones at 0.5λ0.5\,\lambda to 10.5λ10.5\,\lambda radial distance and with spacing of λ/3\lambda/3. Three reconstruction methods are compared:

  1. i)

    global plane waves with least squares (no regularization in Eq. (3));

  2. ii)

    local independent plane waves using compressive sensing, i.e. finding sparse representations via Eq. (3) for each local partition separately, then overlap and average;

  3. iii)

    convolutional sparse plane waves with smooth coefficients via Eq. (10).

All three reconstruction methods use the same set of plane waves, with wavenumber 𝐤2=2πλ\|\mathbf{k}\|_{2}=\tfrac{2\pi}{\lambda} and 21 incidence angles equally spaced along a semicircle [0π][0\cdots\pi]. This experiment interpolates from observations with resultion λ/3\lambda/3 to a grid spacing of λ/24\lambda/24. The aperture of length 10λ10\lambda contains N=241N=241 reconstruction positions. The local approaches operate use subdomains of size one wavelength. In this example, each subdomain contains Ns=25N_{s}=25 reconstruction points and at most 4 observations. Both sides of the domain are padded with Ns1N_{s}-1 zeros, to reduce artifacts from circular wrapping. The zero-padded domain has size N=N+2(Ns1)=289N^{\prime}=N+2(N_{s}-1)=289 and is partitioned in S=NS=N^{\prime} fully overlapping subdomains. After reconstruction, the sound field is cropped again to the original size NN. For comparison, the local approaches determine MS=21×289MS=21\times 289 coefficients compared to M=21M=21 in the global model.

The reconstructions 𝐩^\hat{\mathbf{p}} are shown in Fig. 2(b) and the error 𝐩^𝐩\hat{\mathbf{p}}-\mathbf{p} in Fig. 2(c). All methods yield good reconstructions. The error is lowest for the smooth convolutional model, where smooth coefficients (plane wave magnitude and direction) among neighbouring representations are enforced. Methods using locally variant plane wave coefficients are flexible enough to approximate all measurements (the error is zero at measurement positions). Still, they rely on sufficient measurements within a local partition to yield good predictions. The global plane wave model can not conform to all measurements across the array, because of the mismatch between the radial decay of the sound field and propagating plane waves.

Refer to caption
Figure 3: (Color online) NMSE for reconstructions across a linear microphone array for varying distance between the monopole and array (see Fig. 2).

The experiment is repeated for varying distances between the monopole source and the microphone array. The normalized spatial mean squared error (24) is shown in Fig. 3. When the array is close to the monopole, local representations improve reconstructions due to the field’s high curvature and strong decay with distance. The proposed smooth-convolutional approach gives the most accurate reconstructions up to 0.7λ\lambda. The error decreases with distance for all methods, due to the less pronounced magnitude decay in the field. The further the array is placed in far field, the more the true field approaches planar characteristics, such that also the global model fits to the observations well and yields the best predictions. It is to note that in an application scenario, the reconstruction quality depends on many factors, such as the aperture size, curvature of wavefronts within the aperture, number and distribution of measurements available and not at least the evaluation criteria.

III.2 2D reconstruction: monopole and plane wave

Refer to caption
Figure 4: (Color online) Reconstructions of the sound pressure generated by a monopole interfering with a plane wave. Reconstructions from 100 (top row) and 300 microphones (bottom row). Left to right: measurements, reconstructions using global plane waves, independent local sparse representations, joint analysis with global sparsity, joint analysis with global sparsity and continuity, true reference. The aligned color scale is in dB SPL.

The approach is tested for a two-dimensional aperture of size (5λ)2(5\,\lambda)^{2} in the plane z=0z=0. The sound field is generated by interference of a monopole at (0,0,λ\lambda/8) with a plane wave propagating in 𝐤/k=(kx,ky,kz)/k=(0.38,0.76,0.52)\mathbf{k}/k=(k_{x},k_{y},k_{z})/k=(0.38,-0.76,0.52). Four plane wave reconstructions are tested:

  1. i)

    global plane waves with ridge regression (2\ell_{2}-norm regularization in Eq. (3), parameter β\beta via leave-one-out cross-validation);

  2. ii)

    local independent, sparse plane waves (solving Eq. (3) via least-angle regression, regularization via leave-one-out cross-validation);

  3. iii)

    convolutional sparse plane waves without smooth coefficients (μ=0\mu=0 in Eq. (10));

  4. iv)

    convolutional sparse plane waves with smooth coefficients via Eq. (10).

All models use plane waves with propagation angles distributed in a fibonacci grid (kz0k_{z}\geq 0 hemisphere). The global method i) uses MM = 1000 propagation angles, the local methods ii-iv) MM = 100 angles. The reconstruction grid is regular with spacing of d=λ/10d=\lambda/10 between positions (N=512N=51^{2}). The local subdomains and filters in methods ii)-iv) have size λ2\lambda^{2} (i.e. Ns=(λ/d+1)2=112N_{s}=(\lfloor\lambda/d\rfloor+1)^{2}=11^{2} discrete positions). For methods iii) and iv), the domain is zeropadded (with Ns1\sqrt{N_{s}}-1 in each direction) to avoid artifacts from circular convolutions (N=(N+2(Ns1))2=3481N^{\prime}=(\sqrt{N}+2(\sqrt{N_{s}}-1))^{2}=3481). The regularization parameters are tuned to β=1×105\beta=1\times 10^{-5}, μ=1×103\mu=1\times 10^{-3} (0 for iii) ), ρ=1×105\rho=1\times 10^{-5}, and the ADMM iterations are stopped after 500 iterations. Note that this demonstration showcase exhibits a high signal to noise ratio. In less favourable conditions, the regularization would likely need to be adjusted.

The results are shown in Fig. 4. For reconstructions from 100 microphones, the global and the proposed approach with smooth local coefficients capture the spatial phenomena better than the two other approaches ii) and iii), which only rely on local and global sparsity. The spatially invariant (global) and slowly varying (proposed) models prescribe the necessary spatial structure to reconstruct from few measurements. Both methods exploit measurements across a larger spatial range and capture the global structure of the sound field, which is critical in sparsely sampled scenarios (see Fig. 4(b) and (e) vs. (c-d). When more microphones are available, spatially variant (local) models benefit from their flexibility to model complex sound fields. The global approach exhibits artefacts around the monopole due to the strong decay and curvature of the wavefronts in this region. The proposed approach balances local flexibility with global structure and yield good reconstructions in both cases. Note that method iii) and the proposed method iv) apply the same shrinkage threshold β/ρ\beta/\rho to all local coefficients (see Eq. (22)). Dynamic regularization could further improve the results, as it was observed for the independent local approach ii) (which uses cross-validation to find an optimal β\beta for each subdomain).

Refer to caption
Figure 5: (Color online) Particle velocity and intensity fields for the test using 300 microphones (second row of Fig. 4) to reconstruct the sound field by a monopole and an interfering plane wave (see Sec. III.2). From top to bottom: true field, reconstructions using a global, local independent and convolutional smooth model. Vector fields for particle velocity {𝐮xy}\Re\{\mathbf{u}_{xy}\} and intensity {𝐩𝐮xy}\Re\{\mathbf{p}\odot\mathbf{u}_{xy}^{*}\}. For readability, vector norms are clipped, |𝐮||\mathbf{u}| to 150μms1150\,\mu\text{ms}^{-1} and |𝐈||\mathbf{I}| to 2μWm22\,\mu\text{Wm}^{-2}, and only every fourth intensity vector is shown. Color indicates particle velocity and intensity levels in dB, where uref=50u_{ref}=50\,nms-1 and Iref=1I_{ref}=1\,pWm-2, and bilinear interpolation is used for readability.

The particle velocity and sound intensity xy-vector fields of the reconstructions from 300 microphones (second row Fig. 4) are shown in Fig. 5. Global representations can not conform to the drastic spatial variations close to the monopole, all particle velocity vectors point outwards from (x,y)=(0,0)(x,y)=(0,0). Local approaches recover the fine structure of particle velocity and intensity also around the monopole.

III.3 Experimental reconstruction with real data: classroom measurement

Refer to caption
Figure 6: (Color online) Furnished classroom (room 019 in DTU building 352, Lyngby, Denmark) with absorbing ceiling and wooden floor (left, picture from Ref. hahmann2021spatial, ) and robotic arm (right).
Refer to caption
Figure 7: (Color online) Reconstructions of the sound pressure field in a classroom, from 98 (top row) and 295 microphones (bottom row). Left to right: measurements, reconstructions using global plane waves, independent local sparse representations, joint analysis with global sparsity, joint analysis with global sparsity and continuity, true reference. The aligned color scale is in dB, relative to the spatial mean of the squared true pressure field, <𝐩true2><\mathbf{p}_{true}^{2}>.

The same methods i)-iv) from Sec. III.2 (and parameters) are used to reconstruct the enclosed reverberant sound field in a classroom (DTU building 352, Lyngby, Denmark), shown in Fig. 6. The room dimensions are (lx,ly,lz)=(6.63,9.45,2.97)(l_{x},l_{y},l_{z})=(6.63,9.45,2.97) m, the reverberation time approx. T60=0.5T_{60}=0.5 s, the Schröder frequency fS240f_{S}\approx 240 Hz. The room is furnished and its walls are somewhat irregular, with wooden floor, scattering elements on the walls and absorbing ceiling. A loudspeaker (BM6, Dynaudio, Skanderborg, Denmark) placed in a room corner was used to excite the room with 10 s logarithmic sweeps from 20 Hz to 20 kHz. A total of 4761 frequency responses were measured using a robotic arm (UR5, Universal Robots, Odense, Denmark) with a 1/2 inch free field condenser microphone (Brüel&Kjäer, Nærum, Denmark). The positions are distributed over a 1.7×1.71.7\times 1.7 m2 planar aperture with N=692N=69^{2} positions on a regular grid with 2.5 cm spacing. We refer the reader to hahmann2021spatial, for more information on the room and measurements and to hahmann2021b, for the dataset.

The sound field in the classroom is reconstructed at 1000 Hz from 98 and 295 measurements, distributed with uniform probability (and a minimum distance of 7 cm) across the aperture. Reconstructions using i-iv) and measured reference (“true”) of the classroom sound field are shown in Fig. 7 for the two cases with NobsN_{\mathrm{obs}} = 98 and 295. To reconstruct from few measurements, it is necessary to capture the global structure of the sound field, as in the global and the proposed approach (Fig. 7(b) and (d)). Models based on local representations conform easier to many measurements due to their higher number of coefficients (Fig. 7(i-k)). Specifically in this test, subdomains of size λ2\lambda^{2} contain Ns=242N_{s}=24^{2} discrete reconstruction positions, such that local models use a total of ii) MN=476100MN=476100 and with padding iii+iv) MN=M(N+2(Ns1))2=1322500MN^{\prime}=M(\sqrt{N}+2(\sqrt{N_{s}}-1))^{2}=1322500 coefficients compared to 1000 in the global model. The proposed approach combines both local flexibility and joint global analysis. As a consequence, it yields the highest similarity and lowest reconstruction errors when compared to the measured field.

The benefit of smooth coefficients shows when comparing the two convolutional approaches. Both seek a sparse approximation of the measurements, but spatial continuity is required to reconstruct sound fields successfully via local representations. Also the local independent approach yields smooth reconstructions, namely by averaging of overlapping partitions. However, a joint analysis of nearby representations is needed to align nearby coefficients and hence, indirectly exploit nearby measurements. In this study, the sparsity constraint enables feasible reconstructions also when only few measurements are available within a local subdomain and the inverse problem is severely underdetermined. As such, it is not the goal to represent the sound field using the fewest number of coefficients, but local sparsity is a means of exploiting the available measurements.

Refer to caption
Figure 8: (Color online) Reconstruction error (NMSE) of the sound field in a classroom across frequency from (a) 80, (b) 160 and (c) 320 microphones. Shown are NMSE mean ±\pm one standard deviation, obtained from 12 reconstructions from pseudo-randomly distributed measurements. The vertical line indicates the frequency, where the average distance between measurements is dm¯=λ/2\overline{d_{m}}=\lambda/2.

The experiment in Fig. 7 is extended to other frequencies and reconstruct the sound field in the classroom from 500 Hz to 2 kHz using a fixed number of microphones. The NMSE results in Fig. 8 show that the proposed approach yields good reconstructions when average distance between measurements is lower than λ/2\lambda/2. The proposed approach yields significant improvement with errors close to (see Fig. 8(a)), or lower than global plane waves (for sufficient measurements, see Fig. 8(b,c)).

IV Conclusion

This study formulates a sound field model as a spatial convolution between a global coefficient map and local plane wave filters. This model leads to a joint analysis of all local representations, while keeping their spatial relation (and thereby the global structure of the field) intact. By penalizing the spatial differences of plane wave coefficients, continuity between neighboring representations is enforced in terms of amplitude and direction of the plane waves. In this way, each local representation has to be consistent with its neighbours, and can therefore utilise nearby observations. The experiments indicate that the proposed approach both conforms to complex spatial sound fields and also preserves the global structure of the sound field. Compared to other local models using locally sparse coding in terms of plane waves, the proposed approach attains better reconstructions of sound fields when few measurements are available. When measurements are very scarcely distributed, an expansion of the entire global field in terms of plane waves yields the best reconstructions. However, when sufficient measurements are available, the experiments indicate that local representation models conform best to fields of higher complexity. This is shown for the reconstruction of the sound pressure, as well as for the reconstruction of particle velocity and sound intensity vector fields, where the improvements are even more substantial.

Acknowledgements.
This work is funded by VILLUM Fonden through VILLUM Young Investigator grant number 19179 for the project ‘Large-scale Acoustic Holography’.

References

  • (1) F. Jacobsen and E. Tiana Roig, “Measurement of the sound power incident on the walls of a reverberation room with near field acoustic holography,” Acustica United w. Acta Acustica 96(1), 76–81 (2010).
  • (2) S. A. Verburg and E. Fernandez-Grande, “Reconstruction of the sound field in a room using compressive sensing,” J. Acoust. Soc. Am. 143(6), 3770–3779 (2018).
  • (3) Y. Haneda, Y. Kaneda, and N. Kitawaki, “Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function,” IEEE Trans. Sp. Audio Proc. 7(6), 709–717 (1999).
  • (4) R. Mignot, L. Daudet, and F. Ollivier, “Room reverberation reconstruction: Interpolation of the early part using compressed sensing,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 21(11), 6562745, 2301–2312 (2013).
  • (5) R. Mignot, G. Chardon, and L. Daudet, “Low frequency interpolation of room impulse responses using compressed sensing,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 22(1), 205–216 (2014).
  • (6) M. Nolan, S. A. Verburg, J. Brunskog, and E. Fernandez-Grande, “Experimental characterization of the sound field in a reverberation room,” J. Acoust. Soc. Am. 145(4), 2237–2246 (2019).
  • (7) I. B. Witew, M. Vorländer, and N. Xiang, “Sampling the sound field in auditoria using large natural-scale array measurements,” J. Acoust. Soc. Am. 141(3), EL300–EL306 (2017).
  • (8) E. Brandão and E. Fernandez-Grande, “Analysis of the sound field above finite absorbers in the wave-number domain,” J. Acoust. Soc. Am. 151(5), 3019–3030 (2022).
  • (9) F. M. Heuchel, D. Caviedes-Nozal, J. Brunskog, and F. T. Agerkvist, “Large-scale outdoor sound field control,” J. Acoust. Soc. Am. 148(4), 2392–2402 (2020).
  • (10) D. Caviedes-Nozal, F. M. Heuchel, J. Brunskog, N. A. B. Riis, and E. Fernandez-Grande, “A bayesian spherical harmonics source radiation model for sound field control,” J. Acoust. Soc. Am. 146(5), 3425–3435 (2019).
  • (11) F. M. Heuchel, E. Fernandez-Grande, F. T. Agerkvist, and E. Shabalina, “Active room compensation for sound reinforcement using sound field separation techniques,” J. Acoust. Soc. Am. 143(3), 1346–1354 (2018).
  • (12) T. Betlehem and T. D. Abhayapala, “Theory and design of sound field reproduction in reverberant rooms,” J. Acoust. Soc. Am. 117(4), 2100–2111 (2005).
  • (13) M. B. Møller, J. K. Nielsen, E. Fernandez-Grande, and S. K. Olesen, “On the influence of transfer function noise on sound zone control in a room,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 27(9), 1405–1418 (2019).
  • (14) N. Borrel-Jensen, A. P. Engsig-Karup, and C.-H. Jeong, “Physics-informed neural networks for one-dimensional sound field predictions with parameterized sources and impedance boundaries,” JASA Express Letters 1(12), 122402 (2021).
  • (15) J. G. Tylka and E. Y. Choueiri, “Evaluation of techniques for navigation of higher-order ambisonics,” J. Acoust. Soc. Am. 141(5), 3511–3511 (2017).
  • (16) J. G. Tylka and E. Y. Choueiri, “Fundamentals of a parametric method for virtual navigation within an array of ambisonics microphones,” J. Audio Eng. Soc. 68(3), 120–137 (2020).
  • (17) F. Winter, F. Schultz, and S. Spors, “Localization properties of data-based binaural synthesis including translatory head-movements,” Proceedings of Forum Acusticum 2014- (2014).
  • (18) F. Schultz and S. Spors, “Data-based binaural synthesis including rotational and translatory head-movements,” in Audio Eng. Soc. Conf.: Sound Field Control - Eng. and Percep. (2013).
  • (19) E. Fernandez-Grande, D. Caviedes-Nozal, M. Hahmann, X. Karakonstantis, and S. A. Verburg, “Reconstruction of room impulse responses over extended domains for navigable sound field reproduction,” in Proceed. Int. Conf. Immers. 3D Audio, IEEE (2021), p. 8 pp.
  • (20) F. Jacobsen and P. M. Juhl, Fundamentals of general linear acoustics (Wiley, London, 2013).
  • (21) A. Moiola, R. Hiptmair, and I. Perugia, “Plane wave approximation of homogeneous helmholtz solutions,” Zeitschrift Fur Angewandte Mathematik Und Physik 62(5), 809–837 (2011).
  • (22) M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity-based sound field separation in the spherical harmonics domain,” in IEEE Int. Conf. Acoust. Sp. Sig. Process. (ICASSP) (2022), pp. 1051–1055.
  • (23) J. G. Tylka and E. Y. Choueiri, “Performance of linear extrapolation methods for virtual sound field navigation,” J. Audio Eng. Soc. 68(3), 138–156 (2020).
  • (24) M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15(12), 3736–3745 (2006).
  • (25) D. Markovic, L. Bianchi, S. Tubaro, and A. Sarti, “Extraction of acoustic sources through the processing of sound field maps in the ray space,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 24(12), 2481–2494 (2016).
  • (26) D. Markovic, F. Antonacci, A. Sarti, and S. Tubaro, “Soundfield imaging in the ray space,” IEEE/ACM Trans. Audio, Speech, Lang. Process. 21(12), 2493–2505 (2013).
  • (27) M. Hahmann, S. A. Verburg, and E. Fernandez-Grande, “Spatial reconstruction of sound fields using local and data-driven functions,” J. Acoust. Soc. Am. 150(6), 4417–4428 (2021).
  • (28) C. Jin, F. Antonacci, and A. Sarti, “Ray space analysis with sparse recovery,” in 2017 IEEE Works. Appl. Si. Process. Aud. Acous. (WASPAA) (2017), pp. 239–243.
  • (29) S. Yu, C. Jin, F. Antonacci, and A. Sarti, “Sparse recovery beamforming and upscaling in the ray space,” in IEEE Int. Conf. Acoust. Sp. Sig. Process. (ICASSP) (2021), pp. 776–780.
  • (30) P. Morse and R. Bolt, “Sound waves in rooms,” Reviews of Modern Physics 16(2), 0069–0150 (1944).
  • (31) M. Schröder, “Eigenfrequenzstatistik und anregungsstatistik in räumen - modellversuche mit elektrischen wellen,” Acustica 4(4), 456–468 (1954).
  • (32) A. D. Pierce, Acoustics. An introduction to its physical principles and applications (McGraw-Hill, New York, 1981).
  • (33) V. Papyan, J. Sulam, and M. Elad, “Working locally thinking globally: Theoretical guarantees for convolutional sparse coding,” IEEE Trans. Signal Process. 65(21), 7997798, 5687–5701 (2017).
  • (34) B. Wohlberg, “Convolutional sparse representations with gradient penalties,” ICASSP, IEEE Int. Conf. Acoust., Speech and Sig. Proc. - Proceedings 2018-, 8462151 (2018).
  • (35) M. J. Bianco and P. Gerstoft, “Travel time tomography with adaptive dictionaries,” IEEE Trans. Comput. Imaging 4(4), 499–511 (2018).
  • (36) R. Grosse, R. Raina, H. Kwong, and A. Y. Ng, “Shift-invariant sparse coding for audio classification,” Proc. Conf. on Uncert. in Art. Int. 149–158 (2007).
  • (37) M. Mørup, L. K. Hansen, S. M. Arnfred, L.-H. Lim, and K. H. Madsen, “Shift invariant multi-linear decomposition of neuroimaging data,” Neuroimage 42(4), 1439–1450 (2008).
  • (38) D. Batenkov, Y. Romano, and M. Elad, “On the global-local dichotomy in sparsity modeling,” Applied and Numerical Harmonic Analysis 1–53 (2017).
  • (39) V. Papyan, Y. Romano, and M. Elad, “Convolutional neural networks analyzed via convolutional sparse coding,” J. Machine Learning Research 18, 1–52 (2017).
  • (40) R. Cohen and Y. C. Eldar, “Sparse convolutional beamforming for ultrasound imaging,” IEEE Trans. Ultras, Ferroel., and Freq. Control 65(12), 2390–2406 (2018).
  • (41) F. Lluís, P. Martínez-Nuevo, M. Bo Møller, and S. Ewan Shepstone, “Sound field reconstruction in rooms: Inpainting meets super-resolution,” J. Acoust. Soc. Am. 148(2), 649 (2020).
  • (42) P.-A. Grumiaux, S. Kitić, L. Girin, and A. Guérin, “A survey of sound source localization with deep learning methods,” J. Acoust. Soc. Am. 152(1), 107–151 (2022).
  • (43) P. Gerstoft, C. F. Mecklenbräuker, W. Seong, and M. Bianco, “Introduction to compressive sensing in acoustics,” J. Acoust. Soc. Am. 143(6), 3731–3736 (2018).
  • (44) E. J. Candes and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008).
  • (45) S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning 3(1), 1–122 (2010).
  • (46) F. Heide, W. Heidrich, and G. Wetzstein, “Fast and flexible convolutional sparse coding,” in Proc. IEEE Conf. on Comp. Vision and Pattern Rec. (CVPR) (2015), pp. 5135–5143.
  • (47) B. Wohlberg, “Efficient algorithms for convolutional sparse representations,” IEEE Trans. Image Processing 25(1), 7308045 (2016).
  • (48) B. Wohlberg, “Boundary handling for convolutional sparse representations,” Proceedings - International Conference on Image Processing, Icip 2016-, 7532675, 1833–1837 (2016).
  • (49) B. Wohlberg and P. Rodriguez, “Convolutional sparse coding: Boundary handling revisited,” (2017).
  • (50) P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion (SIAM, Philadelphia, 1998), pp. 1–16.
  • (51) See the code repository https://github.com/manvhah/convolutional_plane_waves to run experiments.
  • (52) F. Heide, W. Heidrich, and G. Wetzstein, “Fast and flexible convolutional sparse coding,” Proc. IEEE Conf. on Comp. Vision and Pattern Rec. (CVPR) 07-12-, 7299149, 5135–5143 (2015).
  • (53) B. Wohlberg, “Efficient convolutional sparse coding,” in Proc. IEEE Int. Conf. on Acoust., Speech, and Sig. Process. (ICASSP) (2014), pp. 7173–7177.
  • (54) B. Wohlberg, “SPORCO: A Python package for standard and convolutional sparse representations,” in Proceed. of the 15th Python in Science Conf., Austin, TX, USA (2017), pp. 1–8.
  • (55) M. Hahmann, S. A. Verburg, and E. Fernandez-Grande, “Acoustic frequency responses in a conventional classroom” Dataset doi: 10.11583/DTU.13315286 (2021).