This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multiple scattering ambisonics: three-dimensional sound field estimation using interacting spheres

Shoken Kaneko Perceptual Interfaces and Reality Laboratory, Computer Science and UMIACS, University of Maryland, College Park, MD 20742, USA Ramani Duraiswami Perceptual Interfaces and Reality Laboratory, Computer Science and UMIACS, University of Maryland, College Park, MD 20742, USA
Abstract

Rigid spherical microphone arrays (RSMAs) have been widely used in ambisonics [1] sound field recording. While it is desired to combine the information captured by a grid of densely arranged RSMAs for expanding the area of accurate reconstruction, or sweet-spots, this is not trivial due to inter-array interference. Here we propose multiple scattering ambisonics, a method for three-dimensional ambisonics sound field recording using multiple acoustically interacting RSMAs. Numerical experiments demonstrate the sweet-spot expansion realized by the proposed method. The proposed method can be used with existing RSMAs as building blocks and opens possibilities including higher degrees-of-freedom spatial audio.

1 Introduction

Audio is indispensable in immersive technologies such as mixed reality (MR) and virtual reality (VR), which are receiving much attention. For these applications, it is essential to develop technologies to capture, process, and render spatial sound fields with high precision for the presentation of truly realistic and immersive MR/VR experiences. Ambisonics [1] as well as higher-order ambisonics (HOA) [2], which are established spatial audio frameworks to capture, process and reproduce spatial sound fields based on its representation in the spherical harmonics domain, are receiving much attention due to the popularization of MR/VR platforms [3, 4], and its high compatibility with first-person view MR/VR. Ambisonics spatial audio capturing and processing consists of a microphone array and signal processing that is used to encode the raw microphone array signal to the spherical harmonics-domain spatial description format, which is referred to the ambisonics signal. This ambisonics signal is decoded to the signal which is fed to loudspeaker arrays to render the spatial sound field. Such loudspeaker arrays are often virtualized by means of binaural technologies [5, 6, 7] and played back via headphones. Hence the high compatibility of ambisonics with MR/VR applications that usually use headphones for audio playback. Due to its formulation in the spherical harmonics-domain, a typical implementation of an ambisonics recording device is employing a spherical microphone array (SMA) [1, 8, 2, 9, 10]. Often, SMAs are mounted on sound-hard spherical scatterers in order to avoid the instability arising in encoding filters for hollow microphone arrays due to singularities originating from the roots of the spherical Bessel function [2], and for its mechanical stability as hardware. This form of a SMA is referred to as a rigid SMA (RSMA). Despite its success in first-person immersive audio with only three degrees-of-freedom (DoF) which are associated with the rotation of the listener, ambisonics suffers from the diminishing size of the accurate reconstruction area, referred to as the sweet-spot, as the frequency increases, hence limiting its efficacy in higher DoF spatial audio reproduction allowing translation of the listener. This is visualized in Fig. 1 (left), showing the resulting reconstruction sweet-spots for incident plane waves with various frequencies. Here, the sweet-spot is defined as the region where the signal-to-distortion ratio (SDR) of the estimated field with respect to the ground truth incident field is above 30 dB.

Refer to caption
Figure 1: Left: examples of reconstruction sweet-spots using a 252-channel RSMA for estimating incident fields with various frequencies. Details of the RSMA are described in Section 4. The truncation degree of HOA which provides the largest sweet-spot is chosen for each frequency independently. Right: illustration of a sound field recording setup using MS-HOA (top) and its reproduction setup over headphones allowing translation of the listener (bottom).

In order to expand the sweet-spot of ambisonics reproduction, the simplest way is to develop RSMAs with larger number of microphones. Although this is an effective approach, it comes with a significant development and device cost. An alternative approach is to combine multiple existing RSMAs and integrate the captured information. However, this is not a trivial task due to the inter-array interference. Here, multiple scattering higher-order ambisonics (MS-HOA), a three-dimensional (3D) sound field capturing scheme using multiple RSMAs with fully considering inter-array interaction due to multiple scattering [11] is proposed. Numerical experiments show that MS-HOA successfully creates sound field representations with expanded sweet-spots even when the RSMAs are densely arranged with small spacing, which is not achieved without the consideration of inter-array interaction. An example sound field recording and reproduction setup allowing translation of the listener is illustrated in Fig. 1 (right).

2 Conventional ambisonics encoding using a single RSMA

The conventional framework of ambisonics encoding using a single RSMA is briefly reviewed. Ambisonics encoding and decoding can be performed by either relying on solving a linear system using least squares [2] or relying on spherical harmonic transformation using numerical integration [12]. Since the first approach allows more flexibility of the microphone array configuration, this approach is adopted here. In the present work, all formulations are presented in the frequency-domain, which can be converted to time-domain representations by inverse Fourier transform. All individual microphone capsules are assumed to be omnidirectional. The spherical harmonics used are defined as

Ynm(θ,φ)(2n+1)4π(nm)!(n+m)!Pnm(cosθ)eimφ,Y_{n}^{m}(\theta,\varphi)\equiv\sqrt{\frac{(2n+1)}{4\pi}\frac{(n-m)!}{(n+m)!}}P_{n}^{m}(\cos\theta)e^{im\varphi}, (1)

with θ\theta and φ\varphi the polar and azimuthal angle, respectively, and Pnm(x)P_{n}^{m}(x) and Pn(x)P_{n}(x) respectively the associated and regular Legendre polynomials:

Pnm(x)(1)m(1x2)m/2dmdxm(Pn(x)),Pn(x)12nn!dndxn(x21)n.P_{n}^{m}(x)\equiv(-1)^{m}\left(1-x^{2}\right)^{m/2}\frac{d^{m}}{dx^{m}}\left(P_{n}(x)\right),\quad P_{n}(x)\equiv\frac{1}{2^{n}n!}\frac{d^{n}}{dx^{n}}\left(x^{2}-1\right)^{n}. (2)

The above definition of spherical harmonics provides an orthonormal basis:

θ=0πφ=02πYnm(θ,φ)Ynm(θ,φ)𝑑Ω=δnnδmm,\int_{\theta=0}^{\pi}\int_{\varphi=0}^{2\pi}Y_{n}^{m}(\theta,\varphi)Y_{n^{\prime}}^{m^{\prime}}(\theta,\varphi)^{*}d\Omega=\delta_{nn^{\prime}}\delta_{mm^{\prime}}, (3)

with δij\delta_{ij} the Kronecker delta.

The process of obtaining the ambisonics signal Anm(k)A_{n}^{m}(k), the weights of the spherical basis functions of the three dimensional sound field representing an arbitrary incident field of wavenumber kk, from the signal captured by the microphone array is referred to as ambisonics encoding. An arbitrary incident field can be expanded in terms of the regular spherical basis functions jn(kr)Ynm(θ,φ)j_{n}(kr)Y_{n}^{m}(\theta,\varphi) of the three-dimensional Helmholtz equation in the spherical coordinate system (r,θ,φ)(r,\theta,\varphi):

pin\displaystyle p_{\mathrm{in}} =n=0m=nnAnm(k)jn(kr)Ynm(θ,φ),\displaystyle=\sum_{n=0}^{\infty}\sum_{m=-n}^{n}A_{n}^{m}(k)j_{n}(kr)Y_{n}^{m}(\theta,\varphi), (4)

with jn(x)j_{n}(x) the spherical Bessel function of degree nn. The total field ptotp_{\mathrm{tot}}, which is the sum of the incident field and the field scattered by a rigid sphere with radius RR located at OO, the origin, is given by:

ptot\displaystyle p_{\mathrm{tot}} =n=0m=nnAnm(k){jn(kr)hn(kr)jn(kR)hn(kR)}Ynm(θ,φ),\displaystyle=\sum_{n=0}^{\infty}\sum_{m=-n}^{n}A_{n}^{m}(k)\left\{j_{n}(kr)-h_{n}(kr)\frac{j_{n}^{\prime}(kR)}{h_{n}^{\prime}(kR)}\right\}Y_{n}^{m}(\theta,\varphi), (5)

with hn(x)h_{n}(x) the spherical Hankel function of the first kind with degree nn [13]. On the surface of the rigid sphere, i.e. r=Rr=R, this total field is evaluated as:

ptot|r=R\displaystyle p_{\mathrm{tot}}|_{r=R} =n=0m=nnAnm(k)i(kR)2hn(kR)Ynm(θ,φ).\displaystyle=\sum_{n=0}^{\infty}\sum_{m=-n}^{n}A_{n}^{m}(k)\frac{i}{(kR)^{2}h_{n}^{\prime}(kR)}Y_{n}^{m}(\theta,\varphi). (6)

The total field captured by the qq-th microphone on the surface of the RSMA located at (R,θq,φq)(R,\theta_{q},\varphi_{q}) is therefore given by:

ptot(q)\displaystyle p_{\mathrm{tot}}^{(q)} =n=0m=nni(kR)2hn(kR)Ynm(θq,φq)Anm(k).\displaystyle=\sum_{n=0}^{\infty}\sum_{m=-n}^{n}\frac{i}{(kR)^{2}h_{n}^{\prime}(kR)}Y_{n}^{m}(\theta_{q},\varphi_{q})A_{n}^{m}(k). (7)

By truncating the infinite series with n=Ncn=N_{\mathrm{c}}, this result can be represented in the following vector form:

𝐩tot\displaystyle\mathbf{p}_{\mathrm{tot}} =Λ𝐀,\displaystyle=\Lambda\mathbf{A}, (8)

where 𝐩tot\mathbf{p}_{\mathrm{tot}} is a vector holding ptot(q)p_{\mathrm{tot}}^{(q)} in its qq-th entry, 𝐀\mathbf{A} is a vector holding Anm(k)A_{n}^{m}(k) in its (n2+n+m+1)(n^{2}+n+m+1)-th entry, and Λ\Lambda is the matrix holding i(kR)2hn(kR)Ynm(θq,φq)\frac{i}{(kR)^{2}h_{n}^{\prime}(kR)}Y_{n}^{m}(\theta_{q},\varphi_{q}) in its (q,n2+n+m+1)(q,n^{2}+n+m+1) entry. The goal of ambisonics encoding is to obtain Anm(k)A_{n}^{m}(k) for all nns and mms up to the truncation degree n=Nc(in)n=N_{\mathrm{c}}^{\mathrm{(in)}}, i.e. 0nNc0\leq n\leq N_{\mathrm{c}} and |m|n|m|\leq n, from the observation 𝐩tot\mathbf{p}_{\mathrm{tot}}. This problem can be solved by regularized least squares with a minimization objective:

Lenc\displaystyle L_{\mathrm{enc}} =𝐩totΛ𝐀22+σ𝐀22,\displaystyle=||\mathbf{p}_{\mathrm{tot}}-\Lambda\mathbf{A}||_{2}^{2}+\sigma||\mathbf{A}||_{2}^{2}, (9)

with σ\sigma a regularization parameter, and the solution given by:

𝐀(est)\displaystyle\mathbf{A}^{\mathrm{(est)}} =argmin𝐀Lenc=(ΛHΛ+σI)1ΛH𝐩tot=E𝐩tot,\displaystyle=\operatorname*{argmin}_{\mathbf{A}}L_{\mathrm{enc}}=(\Lambda^{H}\Lambda+\sigma I)^{-1}\Lambda^{H}\mathbf{p}_{\mathrm{tot}}=E\mathbf{p}_{\mathrm{tot}}, (10)

where E(ΛHΛ+σI)1ΛHE\equiv(\Lambda^{H}\Lambda+\sigma I)^{-1}\Lambda^{H} is the regularized encoding matrix.

3 Proposed method

In the proposed method, a grid of multiple RSMAs is used to estimate pinp_{\mathrm{in}} 4. The goal of ambisonics encoding in MS-HOA is to estimate Anm(k)A_{n}^{m}(k) 4, for 0nNc0\leq n\leq N_{\mathrm{c}} and |m|n|m|\leq n from observations of the sound pressure at discrete microphone capsule positions mounted on the surfaces of multiple RSMAs. In the following, a system of NS2N_{\mathrm{S}}\geq 2 RSMAs where each RSMA has a radius asa_{s} is considered. Here, ss is the index of the RSMA. Hereafter, the argument kk is omitted from Anm(k)A_{n}^{m}(k).

3.1 The forward problem: multiple scattering due to an arbitrary incident field

It is known that the problem of multiple scattering in a system of multiple spherical scatterers, i.e. computing the scattered field pscatp_{\mathrm{scat}} given pinp_{\mathrm{in}} and the configuration of the scattering spheres, can be solved analytically [14, 11]. This problem is referred to as the forward problem. The procedure of solving the forward problem is briefly described here. First, AnmA_{n}^{m}, the expansion coefficients at OO 4 truncated at degree n=Nc(in)n=N_{\mathrm{c}}^{\mathrm{(in)}}, are translated to the positions of the RSMAs using the translation operators TR|R(s,O)T^{(s,O)}_{R|R} resulting in the expansions 𝐀(s)=TR|R(s,O)𝐀(in)\mathbf{A}^{(s)}=T^{(s,O)}_{R|R}{\mathbf{A}^{\mathrm{(in)}}}, where 𝐀(s)\mathbf{A}^{(s)} and 𝐀(in){\mathbf{A}^{\mathrm{(in)}}} are vectors of length L(in)(Nc(in)+1)2L^{\mathrm{(in)}}\equiv(N_{\mathrm{c}}^{\mathrm{(in)}}+1)^{2} holding Anm(s)A_{n}^{m(s)} and AnmA_{n}^{m} in its (n2+n+m+1)(n^{2}+n+m+1)-th entry, respectively. The 𝐀(s)\mathbf{A}^{(s)} coefficients are then further truncated at degree n=Nc(fwd)n=N_{\mathrm{c}}^{\mathrm{(fwd)}}. Two distinct truncation numbers Nc(in)N_{\mathrm{c}}^{\mathrm{(in)}} and Nc(fwd)N_{\mathrm{c}}^{\mathrm{(fwd)}} are introduced here in order to achieve sufficient accuracy of the translation operation while limiting numerical error in the computation of the scattered field. Given the set of expansions Anm(s)A_{n}^{m(s)} at each RSMA position, the contribution of each RSMA to the scattered field Bnm(s)B_{n}^{m(s)} can be computed by solving the linear system:

𝐀=S𝐁,\displaystyle\mathbf{A}^{\prime}=S\mathbf{B}^{\prime}, (11)

where 𝐀\mathbf{A}^{\prime} and 𝐁\mathbf{B}^{\prime} are concatenations of NSN_{\mathrm{S}} vectors {𝐀(1),𝐀(2),,𝐀(NS)}\{\mathbf{A}^{(1)},\mathbf{A}^{(2)},...,\mathbf{A}^{(N_{\mathrm{S}})}\} and {𝐁(1),𝐁(2),,𝐁(NS)}\{\mathbf{B}^{(1)},\mathbf{B}^{(2)},...,\mathbf{B}^{(N_{\mathrm{S}})}\}, where 𝐁(s)\mathbf{B}^{(s)} are vectors of length L(fwd)(Nc(fwd)+1)2L^{\mathrm{(fwd)}}\equiv(N_{\mathrm{c}}^{\mathrm{(fwd)}}+1)^{2} holding Bnm(s)B_{n}^{m(s)} in its (n2+n+m+1)(n^{2}+n+m+1)-th entry. SS is referred to as the system matrix, which is a block matrix holding the inter-sphere translation operator TS|R(s,t)T^{(s,t)}_{S|R} from the tt-th sphere to the ss-th sphere in its off-diagonal (s,t)(s,t)-block and the “single scattering matrix” Λ(s)\Lambda^{(s)} in its diagonal blocks:

S=(Λ(1)TS|R(1,2)TS|R(1,NS)TS|R(2,1)Λ(2)TS|R(2,NS)TS|R(NS,1)TS|R(NS,2)Λ(NS)),S=\left(\begin{array}[]{cccc}\Lambda^{(1)}&-T^{(1,2)}_{S|R}&...&-T^{(1,N_{\mathrm{S}})}_{S|R}\\ -T^{(2,1)}_{S|R}&\Lambda^{(2)}&...&-T^{(2,N_{\mathrm{S}})}_{S|R}\\ \vdots&\vdots&\ddots&\vdots\\ -T^{(N_{\mathrm{S}},1)}_{S|R}&-T^{(N_{\mathrm{S}},2)}_{S|R}&\dots&\Lambda^{(N_{\mathrm{S}})}\\ \end{array}\right), (12)

where Λ(s)\Lambda^{(s)} is a diagonal matrix holding hn(kas)jn(kas)-\frac{h^{\prime}_{n}(ka_{s})}{j^{\prime}_{n}(ka_{s})} in its (l,l)(l,l) entry with l=n2+n+m+1l=n^{2}+n+m+1. The translation operators TR|R(s,O)T^{(s,O)}_{R|R} and TS|R(s,t)T^{(s,t)}_{S|R} can be computed by various methods, including explicit expressions based on Clebsch-Gordan coefficients or Wigner 3-j symbols [15], or methods based on recurrence relations [16]. The total field ptotp_{\mathrm{tot}} evaluated at 𝐫q(s)\mathbf{r}_{q}^{(s)}, the qq-th microphone position belonging to the ss-th RSMA, is the sum of the scattered field contributions from all the RSMAs and the incident field pinp_{\mathrm{in}}:

ptot(𝐫q(s))\displaystyle p_{\mathrm{tot}}(\mathbf{r}_{q}^{(s)}) =pscat(𝐫q(s))+pin(𝐫q(s))\displaystyle=p_{\mathrm{scat}}(\mathbf{r}_{q}^{(s)})+p_{\mathrm{in}}(\mathbf{r}_{q}^{(s)}) (13)
=n=0Nc(fwd)m=nn(t=1NSBnm(t)Snm(t)(𝐫q(s))+Anm(s)Rnm(s)(𝐫q(s))),\displaystyle=\sum_{n=0}^{N_{\mathrm{c}}^{\mathrm{(fwd)}}}\sum_{m=-n}^{n}\left(\sum_{t=1}^{N_{\mathrm{S}}}B_{n}^{m(t)}S_{n}^{m(t)}(\mathbf{r}_{q}^{(s)})+A_{n}^{m(s)}R_{n}^{m(s)}(\mathbf{r}_{q}^{(s)})\right),

where Rnm(s)(𝐫q(s))R_{n}^{m(s)}(\mathbf{r}_{q}^{(s)}) and Snm(t)(𝐫q(s))S_{n}^{m(t)}(\mathbf{r}_{q}^{(s)}) are the regular and singular spherical basis functions expanded at the location of the ss-th and tt-th sphere, respectively, and evaluated at the position of the qq-th microphone capsule belonging to the ss-th RSMA:

Rnm(s)(𝐫q(s))\displaystyle R_{n}^{m(s)}(\mathbf{r}_{q}^{(s)}) =jn(kas)Ynm(𝐫q(s)𝐫s),\displaystyle=j_{n}(ka_{s})Y_{n}^{m}(\mathbf{r}_{q}^{(s)}-\mathbf{r}_{s}),\quad (14)
Snm(t)(𝐫q(s))\displaystyle S_{n}^{m(t)}(\mathbf{r}_{q}^{(s)}) =hn(k|𝐫q(s)𝐫t|)Ynm(𝐫q(s)𝐫t).\displaystyle=h_{n}(k|\mathbf{r}_{q}^{(s)}-\mathbf{r}_{t}|)Y_{n}^{m}(\mathbf{r}_{q}^{(s)}-\mathbf{r}_{t}).

Alternatively, pin(𝐫q(s))p_{\mathrm{in}}(\mathbf{r}_{q}^{(s)}) could be evaluated directly using AnmA_{n}^{m} instead of the translated Anm(s)A_{n}^{m(s)} coefficients. The whole procedure of the forward problem can be expressed by a linear operator TFT_{\mathrm{F}} which is referred to as the forward operator:

𝐩tot\displaystyle\mathbf{p}_{\mathrm{tot}} =TF𝐀(in),\displaystyle=T_{\mathrm{F}}\mathbf{A}^{\mathrm{(in)}}, (15)

where 𝐩tot\mathbf{p}_{\mathrm{tot}} is a vector holding the values of ptot(𝐫q(s))p_{\mathrm{tot}}(\mathbf{r}_{q}^{(s)}).

3.2 The inverse problem: MS-HOA encoding

The matrix representing TFT_{\mathrm{F}} can be constructed by applying the operator to all bases up to nNc(in)n\leq N_{\mathrm{c}}^{\mathrm{(in)}}. The estimate of the incident field can then be obtained via regularized least squares:

𝐀(est)\displaystyle\mathbf{A}^{\mathrm{(est)}} =(TFHTF+σI)1TFH𝐩tot=TI𝐩tot,\displaystyle=(T_{\mathrm{F}}^{H}T_{\mathrm{F}}+\sigma I)^{-1}T_{\mathrm{F}}^{H}\mathbf{p}_{\mathrm{tot}}=T_{\mathrm{I}}\mathbf{p}_{\mathrm{tot}}, (16)

where 𝐀(est)\mathbf{A}^{\mathrm{(est)}} is a vector holding the estimated coefficients Anm(est)A_{n}^{m(\mathrm{est})} in its (n2+n+m+1)(n^{2}+n+m+1)-th entry up to nNc(in)n\leq N_{\mathrm{c}}^{\mathrm{(in)}} and TI(TFHTF+σI)1TFHT_{\mathrm{I}}\equiv(T_{\mathrm{F}}^{H}T_{\mathrm{F}}+\sigma I)^{-1}T_{\mathrm{F}}^{H} is the encoding matrix for MS-HOA with σ\sigma a regularization parameter. The scheme of the forward and inverse problem is summarized in Fig. 2.

Refer to caption
Figure 2: The scheme of the forward problem, i.e. computation of the total sound field at the microphone positions given the incident field, and the inverse problem, i.e. estimation of the incident field given the sound pressure at the microphone positions.

4 Numerical experiments

MS-HOA recording and encoding into HOA coefficients was validated by numerical experiments. Grids of RSMAs where each individual RSMA is a 252-channel SMA mounted on a rigid spherical scatterer with a radius of 8 cm are considered. The spherical Fibonacci grid [17, 10] of 252 points was used for the microphone capsule positions. A real-world implementation of a 252-channel RSMA with a similar size has been demonstrated in the past [9]. As the RSMA grid, a linear grid of 6 RSMAs and a regular Cartesian grid of 9 RSMAs was used in the experiments. The spacing between the nearest neighbour RSMA was set to 25 cm. The sound field generated by a monopole source located at 𝐫s=(10m,10m,10m)\mathbf{r}_{\mathrm{s}}=(10\mathrm{m},10\mathrm{m},10\mathrm{m}) was used as the incident field. The signal captured by the grid of RSMAs was encoded into the HOA coefficients 𝐀(est)\mathbf{A}^{\mathrm{(est)}} with the proposed method (MS-HOA). While prior works on the forward problem report heuristics for choosing the parameter Nc(fwd)N_{\mathrm{c}}^{\mathrm{(fwd)}}, e.g. Nc(fwd)=ekaN_{\mathrm{c}}^{\mathrm{(fwd)}}=\left\lfloor eka\right\rfloor [14], here Nc(fwd)N_{\mathrm{c}}^{\mathrm{(fwd)}} was treated as a free hyper-parameter. The case where inter-sphere interaction is switched off, i.e. the method which only considers single scattering (Single), and the case of conventional HOA encoding using only one building block RSMA (HOA) are also computed as baselines. The analytical reconstruction of the estimated incident field was computed by 4 and was compared to the ground truth incident field pinp_{\mathrm{in}} in terms of the SDR and the size of the reconstruction sweet-spot area (SSA) measured in the xyxy-plane or the yzyz-plane depending on the configuration of the RSMA grid. The SSA is defined here as the total area where the SDR surpasses 30 dB, which is measured using the regular Cartesian grid points on a plane which correspond to the pixels in Fig. 3-Fig. 4. The regularization hyperparameter was optimized by grid search independently for both the Single baseline and the proposed MS-HOA. Regularization was not applied to the single sphere HOA baseline due to its minor effect for this case, while the truncation number NcN_{\mathrm{c}} was chosen as the one providing the largest SSA for the given RSMA. The results for the linear 6-sphere RSMA grid with a incident field frequency of 4kHz are shown in Fig. 3. The sweet-spot of reconstruction is successfully expanded with MS-HOA while the SSA and SDR is significantly degraded if only single scattering is considered. The results for the regular Cartesian 9-sphere grid is shown in Fig. 4, demonstrating planar expansion of the sweet-spot.

Refer to caption
Figure 3: Results for the linear 6-sphere RSMA grid for an incident field of 4 kHz. Top row from left to right: real part of the sound pressure for the ground truth incident field, incident field estimated by HOA, Single, and MS-HOA, respectively. Bottom row from left to right: SDR map of the estimated fields with respect to the ground truth field using HOA, Single, and MS-HOA, respectively. The blue circles represent the positions and sizes of the RSMAs. The truncation number is Nc=14N_{\mathrm{c}}=14 in HOA, Nc(in)=55N_{\mathrm{c}}^{\mathrm{(in)}}=55 and Nc(fwd)=20N_{\mathrm{c}}^{\mathrm{(fwd)}}=20 in Single and MS-HOA.
Refer to caption
Figure 4: Results for the regular Cartesian 9-sphere RSMA grid. The estimated incident field (top row) and SDR map (bottom row) for Single (left column) and MS-HOA (right column), respectively. The blue circles represent the positions and sizes of the RSMAs. The same incident field as Fig. 3 is used. The truncation numbers are Nc(in)=45N_{\mathrm{c}}^{\mathrm{(in)}}=45 and Nc(fwd)=16N_{\mathrm{c}}^{\mathrm{(fwd)}}=16.

5 Related work and discussion

Multiple scattering ambisonics, a method to capture 3D sound fields using multiple acoustically interacting RSMAs, was proposed. MS-HOA allows to integrate the information captured by multiple densely arranged RSMAs and can be used to expand the reconstruction sweet-spots in 3D sound field reproduction. The numerical experiments demonstrated that the proposed method successfully captures spatial sound fields with expanded reconstruction sweet-spots which was not possible without the consideration of inter-array interaction due to multiple scattering.

A related method using the translation of multipoles was introduced in [18]. This method was based on the assumption that the SMAs do not physically interact with each other, i.e. the SMAs do not cause scattering that affect other SMAs. This assumption is violated if the SMAs are densely arranged RSMAs, which scatter the incident field and interact with each other by multiple scattering. As shown in the numerical experiments, the approach without considering inter-array interaction becomes inaccurate if the RSMAs are arranged with small spacing. Recently, the consideration of inter-array multiple scattering has been demonstrated to improve the reconstruction accuracy in a two-dimensional sound field reconstruction problem using multiple cylindrical microphone arrays [19]. Two-dimensional modeling, however, is insufficient for modern spatial audio applications such as MR/VR where 3D audio representation and rendering is essential. Our work enables the use of interacting rigid microphone arrays for 3D spatial audio.

The expanded reconstruction sweet-spots with linear or planar spreads realized by the proposed method could be useful in applications including sound field reproduction in theaters or in meeting rooms where the sweet-spot should cover multiple listeners sitting next to each other, or higher DoF MR/VR where the translation of the listener needs to be supported. Developing techniques to reduce the cost of MS-HOA recording in terms of hardware, computation, and bandwidth is important for practical applications and are subjects of future research.

Acknowledgments

Shoken Kaneko thanks to Japan Student Services Organization and Watanabe Foundation for support via scholarships.

References

  • [1] Michael A Gerzon. Periphony: With-height sound reproduction. Journal of the Audio Engineering Society, 21:2–10, 1973.
  • [2] Jérôme Daniel, Sebastien Moreau, and Rozenn Nicol. Further investigations of high-order ambisonics and wavefield synthesis for holophonic sound imaging. In Audio Engineering Society Convention 114. Audio Engineering Society, 2003.
  • [3] YouTube. https://support.google.com/youtube/answer/639596.
  • [4] Facebook. https://facebookincubator.github.io/facebook-360-spatial-workstation/KB/CreatingVideosSpatialAudioFacebook360.html.
  • [5] Markus Noisternig, Thomas Musil, Alois Sontacchi, and Robert Holdrich. 3D binaural sound reproduction using a virtual ambisonic approach. In IEEE International Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2003. VECIMS’03. 2003, pages 174–178. IEEE, 2003.
  • [6] Dmitry N Zotkin, Ramani Duraiswami, and Larry S Davis. Rendering localized spatial audio in a virtual auditory space. IEEE Transactions on multimedia, 6:553–564, 2004.
  • [7] Shoken Kaneko, Tsukasa Suenaga, Mai Fujiwara, Kazuya Kumehara, Futoshi Shirakihara, and Satoshi Sekine. Ear shape modeling for 3D audio and acoustic virtual reality: The shape-based average HRTF. In Audio Engineering Society Conference: 61st International Conference: Audio for Games. Audio Engineering Society, 2016.
  • [8] Jens Meyer and Gary Elko. A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages II–1781. IEEE, 2002.
  • [9] Shuichi Sakamoto, Satoshi Hongo, Takuma Okamoto, Yukio Iwaya, and Yôiti Suzuki. Sound-space recording and binaural presentation system based on a 252-channel microphone array. Acoustical Science and Technology, 36:516–526, 2015.
  • [10] Shoken Kaneko, Tsukasa Suenaga, Hitoshi Akiyama, Yoshiro Miyake, Satoshi Tominaga, Futoshi Shirakihara, and Hiraku Okumura. Development of a 64-channel spherical microphone array and a 122-channel loudspeaker array system for 3D sound field capturing and reproduction technology research. In Audio Engineering Society Convention 144. Audio Engineering Society, 2018.
  • [11] Paul A Martin. Multiple scattering: interaction of time-harmonic waves with N obstacles. Cambridge University Press, 2006.
  • [12] Mark A Poletti. Three-dimensional surround sound systems based on spherical harmonics. Journal of the Audio Engineering Society, 53:1004–1025, 2005.
  • [13] Philip McCord Morse and K Uno Ingard. Theoretical acoustics. Princeton university press, 1986.
  • [14] Nail A Gumerov and Ramani Duraiswami. Computation of scattering from n spheres using multipole reexpansion. Journal of the Acoustical Society of America, 112:2688–2701, 2002.
  • [15] Michael A Epton and Benjamin Dembart. Multipole translation theory for the three-dimensional Laplace and Helmholtz equations. SIAM Journal on Scientific Computing, 16:865–897, 1995.
  • [16] WC Chew. Recurrence relations for three-dimensional scalar addition theorem. Journal of Electromagnetic Waves and Applications, 6:133–142, 1992.
  • [17] Richard Swinbank and R James Purser. Fibonacci grids: A novel approach to global modelling. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 132:1769–1793, 2006.
  • [18] Prasanga Samarasinghe, Thushara Abhayapala, and Mark Poletti. Wavefield analysis over large areas using distributed higher order microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22:647–658, 2014.
  • [19] Masahiro Nakanishi, Natsuki Ueno, Shoichi Koyama, and Hiroshi Saruwatari. Two-dimensional sound field recording with multiple circular microphone arrays considering multiple scattering. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 368–372. IEEE, 2019.