^†^†thanks: These two authors contributed equally.^†^†thanks: These two authors contributed equally.

Covariance Analysis of Impulsive Streaking

Jun Wang^1,2,3 Zhaoheng Guo^2,3,4 Erik Isele^1,2,3 Philip H. Bucksbaum^1,2,3 Agostino Marinelli^1,3 James P. Cryan^1,3 [email protected] Taran Driver^1,3 [email protected] ¹Stanford PULSE Institute, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA ²Department of Applied Physics, Stanford University, Stanford, CA 94305, USA ³SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA ⁴Paul Scherrer Institute, Villigen, Switzerland

Abstract

We present a comprehensive framework of modeling covariance in angular streaking experiments. Within the impulsive streaking regime, the displacement of electron momentum distribution (MD) provides a tight connection between the dressing-free MD and the dressed MD. Such connection establishes universal structures in the composition of streaking covariance that are common across different MDs, regardless of their exact shape. Building on this robust framework, we have developed methods for retrieving temporal information from angular streaking measurements. By providing a detailed understanding of the covariance structure in angular streaking experiments, our work enables more accurate and robust temporal measurements in a wide range of experimental scenarios.

I Introduction

The motion of electrons in molecules and condensed phase systems takes place on the attosecond timescale. It is now possible to generate light pulses and trains of pulses with sub-femtosecond (i.e. attosecond) duration. These technical developments have launched the field of experimental attosecond science [1]. Even with access to attosecond pulses, measuring electron motion with attosecond time resolution is a significant technical challenge. Time-resolved measurements require the ability to synchronize an attosecond pulse with a second event, such as the interaction with a second laser pulse, with sub-femtosecond precision. One method that has proven highly successful involves combining the attosecond pulse with a longer duration infrared laser pulse, and using the oscillating electric field of the infrared laser pulse to map time onto a measurable quantity such as the momentum of an emitted electron [2]. The term ‘attosecond streaking’ has been coined to describe this class of experiments, as the action of the field on the electron is analogous to the action of the time-varying voltage in a streak camera [3, 2]. Since the period of oscillation of an infrared laser pulse is on the few-femtosecond timescale and the action of the field on an electron depends on the phase of the oscillation, attosecond streaking experiments have emerged as a powerful probe of attosecond electron dynamics [4], including measurements of photoemission delays [5], Auger-Meitner decay [6], and characterization of attosecond pulses [7, 8].

The principal of laser streaking measurements relies on a time reference for the ultrafast process, which can be provided either by the precise timing stability (few-attosecond) between the attosecond pulse and the infrared field, or through a single-shot self-referencing technique. The timing stability can be achieved when the attosecond pulse has been produced by high harmonic generation (HHG), since the HHG emission is naturally synchronized with the infrared driving field. Meanwhile, attosecond x-ray free-electron lasers (XFELs) have many advantageous properties: continuous tunability of photon energy, peak powers that can approach one terawatt, and roughly Fourier-limited pulse durations [9, 10]. The inherent timing jitter between an infrared laser and the XFEL pulses is typically larger than the infrared period, in contrast to HHG-based sources. To make use of the exceptional properties of XFELs in experiments approaching the attosecond regime, approaches that employ a single-shot time reference signal are required. Such approaches have been explored recently, demonstrating the ability to achieve time resolution better than an optical cycle [11, 12, 13, 14, 15, 16]. The single-shot reference signal can be provided by photoelectrons produced from prompt ionization. When the duration of the XFEL pulse is much shorter than the infrared laser period, the streaking interaction can be treated in the impulsive regime, in which the streaking laser imparts the same momentum shift to all photoelectrons. The momentum shift of the prompt electrons therefore provides a single-shot reference for the relative x-ray/laser arrival time. Such a time reference has been employed to achieve attosecond timing resolution for measurements of Auger-Meitner decay by characterizing the photoelectrons’ momentum shift on a single-shot basis [11, 12, 17]. The single-shot quantification of the momentum shift places stringent conditions on the single-shot data and risks systematic error.

Correlation-based approaches have also been employed to overcome the inherent timing jitter in laser/x-ray measurements [18, 14, 15]. Such a technique has recently been used to extract the photoemission delay in x-ray ionization [16] and measure the delay between two attosecond pulses [13]. Here we provide a formal treatment for this covariance analysis, focusing on measurements with a circularly polarized infrared dressing field, or “angular streaking”. In addition to circumventing the requirement for single-shot analysis, this correlation-based analysis bypasses the requirement for the complex modeling of the streaking process in the retrieval of photoemission delay.

In section II, we introduce our mathematical model for the covariance analysis of impulsive angular streaking, in the presence of large jitter in the x-ray/infrared relative timing. In section III, we describe and compare several methods for extracting the time delay between two impulsive processes. These methods can be used to measure the relative photoemission delay between photoionization events produced by the same pulse, or the delay between attosecond pulses. In section IV we discuss additional considerations for the interpretation and design of attosecond angular streaking measurements. In section V we generalize our analysis to the case of an emission process with a complex (non-impulsive) profile in the time domain.

II Model

To describe the x-ray-matter interaction in the presence of an intense laser-field with vector potential $\bm{A}(t)$ , we employ the strong-field approximation to write the probability amplitude for observing a photoelectron with momentum $\bm{p}$ [19]:

b\left(\bm{p};\bm{A}\right)=\int_{t_{0}}^{\infty}dte^{-i\Phi(t;\bm{p},\bm{A})}G(t;\bm{p}-e\bm{A}(t))~{},

(1)

where $\Phi(t;\bm{p},\bm{A})\equiv\int_{t}^{+\infty}dt^{\prime}(\bm{p}-e\bm{A}(t^{\prime}))^{2}/(2m\hbar)$ is the so-called Volkov phase [20], $e<0$ and $m$ are the charge and mass of electron, respectively, and $G(t;\bm{p}^{\prime})$ describes the electron source term for electrons with momentum $\bm{p}^{\prime}$ entering the continuum at time $t$ , and $t_{0}$ is a fixed initial time before the onset of x-ray pulse. Equation (1) can be simplified using the stationary phase approximation to yield,

~{}b(\bm{p};\bm{A})\simeq\sum_{t_{\mathrm{s}}\in T(\bm{p},\bm{A})}C_{\bm{p}}(t_{\mathrm{s}})G(t_{\mathrm{s}},\bm{p}-e\bm{A}(t_{\mathrm{s}}))~{},

(2)

where $T(\bm{p},\bm{A})$ is the collection of all $t_{\mathrm{s}}$ that satisfy the stationary phase condition $0=(\bm{p}-e\bm{A}(t_{\mathrm{s}}))^{2}/2+\left.\frac{d}{dt}\right|_{t=t_{s}}\arg G(t,\bm{p}-e\bm{A}(t))$ , and $C_{\bm{p}}(t_{s})$ is a weighting factor for each stationary point. If the duration of source-term $G$ is much shorter than the streaking laser period $T_{L}$ , a unique $t_{\mathrm{s}}$ dominates for all $\bm{p}$ . Since the derivative of the phase $\arg[G]$ describes an energy, we can write $\left.\frac{d}{dt}\right|_{t=t_{s}}\arg[G]=-E_{0}$ . With this substitution, the stationary phase condition becomes $(\bm{p}-e\bm{A}(t_{\mathrm{s}}))^{2}/2=E_{0}$ , which we recognize as an equation of motion for a classical electron in an external field, with kinetic energy $E_{0}$ at time $t_{s}$ . Revisiting Eqn. (2) and the stationary phase condition, this uniqueness of $t_{\mathrm{s}}$ enables an approximation, demonstrating that the dressed photoelectron momentum distribution (MD) $|b(\bm{p};\bm{A})|^{2}$ can be approximated by a displacement of the dressing-free (i.e. $\bm{A}=0$ ) MD:

|b(\bm{p};\bm{A})|^{2}\simeq|b(\bm{p}-\bm{k};\bm{A}=0)|^{2}~{},

(3)

where $\bm{k}\equiv e(\bm{A}(t_{s})+\tau\left.\frac{d\bm{A}}{dt}\right|_{t_{s}})$ is the momentum shift, and $\tau=\langle\frac{\partial^{2}}{\partial\bm{p}^{\prime 2}}\left.\arg G\right|_{t_{s}}\rangle$ is the momentum-averaged photoemission delay. We call this the impulsive streaking regime and label $\bm{k}$ the streaking vector.

Refer to caption — Figure 1: Illustration of impulsive regime of angular streaking. (a) In the presence of the circularly polarized streaking laser field (red), a pulse (purple) much shorter than the laser period ionizes the sample molecules, from which electrons are emitted (black dashed). (b) Streaked photoelectron momentum distribution (MD) at the $p_{z}=0$ slice simulated with strong-field approximation (SFA). A hydrogen atom $I_{P}{=}13.6$ eV is ionized by a $40.8$ eV, $200$ as Gaussian pulse in a $\lambda=1.85\,\mathrm{\mu m},|\bm{A}|=0.06\,\mathrm{a.u.}$ dressing field. The arrow indicates $e\bm{A}$ at the ionizing pulse’s peak instant, with 10x-magnified length. (c) Difference of (b) from the dressing-free MD $I^{0}$ , i.e. the MD when the streaking field is mistimed from the ionization pulse. (d) The dressing-free MD $I^{0}$ displaced by $\bm{k}$ . (e) Difference between (b) and (d).

For the remainder of this work, we consider the case of a circularly polarized dressing laser propagating along the $\hat{z}$ -direction,

\bm{A}(t)=A_{0}(t)\left[\cos\left(\omega_{L}t\right)\hat{x}+\sin\left(\omega_{L}t\right)\hat{y}\right],

(4)

where $A_{0}(t)$ is the slowly varying envelope of the laser field assumed to be much longer than the laser period $T_{L}$ , and $\omega_{L}=2\pi/T_{L}$ is the angular frequency. We consider the two-dimensional (2D) momentum distribution $I(\bm{r})$ of the electrons, where $\bm{r}$ denotes the momentum in the $xy$ -plane. Depending on the measurement scheme, $I(\bm{r})$ can either be the projection of $|b(\bm{p})|^{2}$ along the $\hat{z}$ direction [21] or the slice of $|b(\bm{p})|^{2}$ at the $p_{z}=0$ plane [8]. The model and methods introduced in this work are applicable to both measurement schemes, and a quantitative comparison is given in Sec. IV.

The defining property of the impulsive regime (Eqn. (3)) is the displacement of the dressing-free MD $I^{0}(\bm{r})$ by the streaking vector: $I(\bm{r};\bm{A})\simeq I^{0}(\bm{r}-\bm{k})\equiv\mathcal{D}_{\bm{k}}I^{0}(\bm{r})$ , where we define the displacement operator $\mathcal{D}_{\bm{k}}$ . Throughout this work, the superscript “0” indicates the MD is dressing-free ( $\bm{A}=0$ ), e.g. $I^{0}(\bm{r})$ . An example MD simulated using Eqn. (1) is shown in Fig. 1(b). In this case, the photoelectrons are produced from direct ionization $G(t,\bm{p}^{\prime})=-ie^{iI_{P}t/\hbar}D(\bm{p}^{\prime})E_{X}(t)$ , where $I_{P}$ is the ionization potential, $D(\bm{p}^{\prime})$ the dipole along the polarization of the ionizing pulse $E_{X}(t)$ . As shown in Fig. 1(c-d), the effect of the vector potential on the photoelectron MD is well approximated by the displacement operator, to the 1% level in this case (panel (e)).

As discussed in Sec. I, most experimental photoelectron MDs contain multiple photoemission features. We generally denote two non-overlapping features in the MD as $X(\bm{r})$ and $Y(\bm{r})$ , such as the photoelectrons produced by two ionizing pulses with different wavelengths, separated in time by less than $T_{L}$ [13]. Due to instabilities in the relative delay between the ionizing and streaking pulses, both $X$ and $Y$ vary randomly. To extract the relative timing information between the features, we compute the covariance between $X$ and $Y$ , i.e. the two-point function defined as

C[X,Y](\bm{r}_{q},\bm{r}_{p})\equiv\mathbb{E}\left[X(\bm{r}_{q})Y(\bm{r}_{p})\right]-\mathbb{E}\left[X(\bm{r}_{q})\right]\mathbb{E}\left[Y(\bm{r}_{p})\right]~{},

(5)

where $\mathbb{E}[\cdot]$ refers to the expectation over all fluctuations, which is also written as $\langle\cdot\rangle{\equiv}\mathbb{E}\left[\cdot\right]$ for simplicity. We choose the feature $X(\bm{r})$ to be the result of an impulsive process (Eqn. 3), which provides a timing reference for any general ionization feature, $Y(\bm{r})$ . We denote the streaking vector of $X$ as $\bm{k}=\bm{k}_{X}=k(\cos\kappa\,\bm{e}_{x}+\sin\kappa\,\bm{e}_{y})$ , with amplitude $k$ and direction $\kappa$ .

The key to encoding the relative timing between $X$ and $Y$ into the covariance is the randomness of the streaking direction $\kappa$ . The distribution of $\kappa$ is generally considered to be uniform $\mathcal{U}[0,2\pi]$ , since the arrival time jitter spans many periods of the dressing laser [22]. Although $\kappa$ is random, the relative timing between $X$ and $Y$ is determined by the underlying atomic or molecular physics. The streaking amplitude $k$ is determined by the spatial overlap and the intensity of dressing laser, which can be assumed statistically independent from $\kappa$ . The arrival time jitters may also vary $k$ , but since the duration of $A_{0}(t)$ is much longer than $T_{L}$ , we can still assume $k$ and $\kappa$ are independent. In the following, we refer to the covariance that purely results from the variations in the streaking vector $\bm{k}$ as the “streaking covariance”:

K[X,Y]\equiv C[\mathbb{E}\left[X|\bm{k}\right],\mathbb{E}\left[Y|\bm{k}\right]]~{},

(6)

where $\mathbb{E}\left[\cdot|\bm{k}\right]$ is the conditional expectation operator given the streaking vector, i.e. the average over all other parameters except $\bm{k}$ . In this way, $\mathbb{E}\left[X|\bm{k}\right]$ and $\mathbb{E}\left[Y|\bm{k}\right]$ are two random functions that only vary with $\bm{k}$ , thus $K[X,Y]\neq 0$ if and only if $\bm{k}$ fluctuates.

In Sec. II.1-IV, we demonstrate the covariance analysis in the situation where $Y$ is also impulsive. The angle $\phi$ between the two streaking vectors, $\bm{k}_{Y}$ and $\bm{k}_{X}=\bm{k}$ , corresponds to a time delay $\phi/\omega_{L}$ of the feature $Y$ referenced to $X$ . If we assume a counter-clockwise rotation of the vector potential as specified in Eqn. (4), $\phi>0$ means that $Y$ is delayed from $X$ . Typically the time delay is shorter than $T_{L}$ , much shorter than the evolution of the envelope function $A_{0}(t)$ , so the two features share the same magnitude of vector potential $|\bm{A}(t_{s})|$ . An interesting consequence of the stationary phase approximation in Eqn. (3) is the difference in streaking amplitudes $|\bm{k}_{X}|{=}k$ and $|\bm{k}_{Y}|$ for two photoelectron features dressed by the same streaking field. Combining Eqn. (4), which describes the vector potential of a circularly polarized dressing field, with the momentum shift from the stationary phase approximation $\bm{k}=e(\bm{A}(t_{s})+\tau\left.\frac{d\bm{A}}{dt}\right|_{t_{s}})$ , we find that $|\bm{k}_{Y}|=\lambda k$ , where the factor $\lambda\equiv\sqrt{(1+(\omega_{L}\tau_{Y})^{2})/(1+(\omega_{L}\tau_{X})^{2})}$ accounts for the difference between $|\bm{k}_{X}|$ and $|\bm{k}_{Y}|$ resulting from the relative photoemission delay $\tau_{Y}-\tau_{X}$ [23]. When $|\tau_{Y}-\tau_{X}|\omega_{L}\ll 1$ rad, this factor is $\lambda\simeq 1$ , and as $|\tau_{X}-\tau_{Y}|$ increases, the two streaking amplitudes diverge. In this way, the streaking vector of $Y$ is given by $\bm{k}_{Y}=\lambda k(\cos(\kappa+\phi)\,\bm{e}_{x}+\sin(\kappa+\phi)\,\bm{e}_{y})$ .

II.1 Impulsive Streaking Covariance

We first derive the impulsive model for $K[X,Y]$ in the absence of machine fluctuations, i.e. when the only variation in the measurement is $\bm{k}$ , in which case $K[X,Y]=C[X,Y]$ . As illustrated in Eqn. (3), in the impulsive regime $X(\bm{r})\simeq X^{0}(\bm{r}-\bm{k}_{X})$ , and $Y(\bm{r})\simeq Y^{0}(\bm{r}-\bm{k}_{Y})$ . Substituting the Taylor expansions of $X$ and $Y$ into the definition of covariance Eqn. (5), we find the leading order is

C[X,Y]\simeq C[{\mathcal{D}}_{\bm{k}_{X}}{X^{0}},{\mathcal{D}}_{\bm{k}_{Y}}{Y^{0}}]\\ =\frac{\langle k^{2}\rangle\lambda}{2}(\nabla{X^{0}})^{T}~{}R(-\phi)~{}\nabla{Y^{0}}+o(k^{4})~{},

(7)

where $\nabla$ is the 2D momentum gradient operator, $R(-\phi)$ is the 2-by-2 matrix representing the counterclockwise rotation of $-\phi$ in the $xy$ -plane. This leading order is the inner-product between $\nabla{X^{0}}$ and $R(-\phi)\nabla{Y^{0}}$ , also referred to as the gradient inner-product (GIP), and the delay is encoded in the rotation.

In the impulsive regime, the streaking covariance defined in Eqn. (6) is given by a bilinear operation on the unstreaked distributions, $C[{\mathcal{D}}_{\bm{k}_{X}},{\mathcal{D}}_{\bm{k}_{Y}}]{X^{0}}{Y^{0}}$ . As shown in Eqn. (7), the leading order in this bilinear operator $C[{\mathcal{D}}_{\bm{k}_{X}},{\mathcal{D}}_{\bm{k}_{Y}}]$ can be written as

M_{\mathrm{GIP}}\equiv\frac{\langle k^{2}\rangle\lambda}{2}(\nabla{X^{0}})^{T}\,R(-\phi)\,\nabla{Y^{0}}=\frac{\langle k^{2}\rangle}{2}\hat{G}(\phi)X^{0}Y^{0},

(8)

where $\hat{G}(\phi)\equiv\lambda\sum_{i,j}R_{ij}(-\phi)\partial_{i}\otimes\partial_{j}$ , with the direct product defined as $\partial_{i}\otimes\partial_{j}\equiv\frac{\partial}{\partial r_{q,i}}\frac{\partial}{\partial r_{p,j}}$ to perform partial differentiation with respect to momentum coordinates $\bm{r}_{q}$ and $\bm{r}_{p}$ , respectively. The full expansion of $C[{\mathcal{D}}_{\bm{k}_{X}},{\mathcal{D}}_{\bm{k}_{Y}}]$ is

C[{\mathcal{D}}_{\bm{k}_{X}},{\mathcal{D}}_{\bm{k}_{Y}}]=\sum_{N=0}^{+\infty}2^{N}\gamma_{N}(\hat{G}(\phi)+\hat{H})^{N}-\sum_{n=0}^{+\infty}\sum_{m=0}^{+\infty}\gamma_{n}\gamma_{m}(\nabla^{2})^{n}\otimes(\lambda^{2}\nabla^{2})^{m}~{},

(9)

where $\hat{H}\equiv\left(\nabla^{2}\otimes\hat{I}+\hat{I}\otimes\lambda^{2}\nabla^{2}\right)/2$ , $\hat{I}$ is the identity operator, and $\gamma_{n}\equiv\langle k^{2n}\rangle/(2^{2n}(n!)^{2})$ . According to Eqn. (9), the streaking covariance expands into multiple direct products between the $n_{X}$ -th order partial derivative of $X$ and the $n_{Y}$ -th order partial derivative of $Y$ . We number them by the differentiation orders $(n_{X}+n_{Y})$ , e.g. the GIP term is the (1+1) order. The next orders are (1+3), (3+1), and (2+2), since all terms’ $(n_{X}+n_{Y})$ are even numbers, as shown by Eqn. (9).

In a measurement, we record electron yield with finite momentum resolution. We consider the covariance between the electron yield measured in two arbitrary regions $Q$ and $P$ . Due to the bilinearity of covariance, this is identical to the regional integral of the covariance between the densities $X$ and $Y$ :

C\left[\int_{Q}d^{2}\bm{r}X(\bm{r}),\int_{P}d^{2}\bm{r}Y(\bm{r})\right]\\ =\int_{Q}d^{2}\bm{r}^{\prime}\int_{P}d^{2}\bm{r}^{\prime\prime}C[X,Y](\bm{r}^{\prime},\bm{r}^{\prime\prime})~{}.

(10)

As illustrated in Fig. 2, we define two sets of 2D momentum regions of interest (ROIs): $\{Q_{q}\}_{q=1}^{N_{Q}}$ for $X$ and $\{P_{p}\}_{p=1}^{N_{P}}$ for $Y$ . We use subscripts $q,p$ on $X,Y$ as a shorthand for the regional integrals, e.g. $X_{q}\equiv\int_{Q_{q}}X(\bm{r})d^{2}\bm{r}$ , as illustrated in Fig. 2. Fig. 3(a) shows an exemplary visualization of $K[X,Y]$ , where the covariance has been calculated between $N_{Q}{=}N_{P}{=}180$ ROIs on the $p_{z}{=}0$ plane, each $2^{\circ}$ -wide. The 180 angular bins are on a ring $p_{\mathrm{min}}<|\bm{r}|<p_{\mathrm{max}}$ and labelled by their central angular positions $\theta_{q}$ or $\theta_{p}$ , thus they are also referred to as angular bins. In Fig. 3, both photoelectron features $X$ and $Y$ are simulated in the same ionization process as Fig. 1 but detected separately, and we set $\lambda=1$ to simulate the scenarios where the difference in photoemission delays is much shorter than the dressing field period $\omega|\tau_{X}-\tau_{Y}|\ll 1$ . As shown in the $\phi=\pi/3$ example in Fig. 3(a), one feature of $K[X,Y]$ is that the most positive part is around $\theta_{p}-\theta_{q}\sim\phi$ . At small streaking amplitudes, $K[X,Y]$ is well approximated by the GIP, as shown in Fig. 3(b-c). The the relative error $\|M_{\mathrm{GIP}}-K[X,Y]\|_{2}/\|K[X,Y]\|_{2}$ between panel (b) and (a) is evaluated as $0.03$ , which we use to quantify the accuracy of the GIP. The accuracy of the GIP model is insensitive to $\phi$ , but it degrades as streaking amplitude $k$ increases, due to the increased contribution from the higher order terms. As shown by the solid curve in Fig. 3(c), the root-mean-squared (rms) relative error over $\phi\in[-\pi,+\pi]$ becomes comparable to unity once $k$ exceeds the width of the photoelectron feature in momentum $\Delta p$ (quantified as the full-width at half-maximum, FWHM). For panels (a-b), the lower boundary $p_{\mathrm{min}}$ of the ROIs is set to the maximum point of the unstreaked MD gradient $p_{\mathrm{MG}}\equiv\arg\max_{r}\int d\theta|\nabla I^{0}|$ (red solid line in Fig. 3(d)), and the upper boundary is set to where the streaked MD has fallen to zero. This choice of limit for $p_{\min}$ optimizes the accuracy of the GIP model, because $p_{\mathrm{MG}}$ is close to the zero point of $\nabla^{2}I^{0}$ . As shown in Fig. 3(c-d), when we alternate $p_{\min}$ by a fraction of $\Delta p$ , the relative error is increased when $k/\Delta p<1$ .

For the photoelectron features produced in direct ionization, the discrepancy between the GIP model and the streaking covariance $K[X,Y]$ generally depends on three characteristic dimensionless quantities: (1) the size of the momentum shift normalized to the momentum spread of the photoelectron feature $k/\Delta p$ , (2) the normalized radius $p_{c}/\Delta p$ , with $p_{c}$ denoting the central momentum of the photoelectron feature, and (3) the ratio of the ionizing pulse duration to the streaking laser period $\Delta t_{X}/T_{L}$ . In the impulsive regime $\Delta t_{X}\ll T_{L}$ , we find that the relative error is predominantly a function of $k/\Delta p$ , with a much weaker dependence on $p_{c}/\Delta p$ , as shown in Fig. 3(f). In the case shown in panels (a-e), both $X$ and $Y$ have a normalized radius of $p_{c}/\Delta p=6.6$ , and when we vary $p_{c}/\Delta p$ by changing $\Delta t_{X}$ and/or the photon energy of the ionizing pulse, the relative error curve does not significantly change as long as we remain in the impulsive regime ( $\Delta t_{X}\ll T_{L}$ ). The two features $X$ and $Y$ can have different $\Delta p_{X},\Delta p_{Y}$ , but we note that by defining $\Delta p$ as the geometric mean $\Delta p=\sqrt{\Delta p_{X}\Delta p_{Y}}$ , the accuracy of the GIP can be well described by $k/\Delta p$ .

II.2 Contribution from Machine Fluctuations

We differentiate the shot-to-shot fluctuations in angular streaking experiments into two categories, the fluctuation of the streaking vector $\bm{k}$ , and everything else. The fluctuations of these other parameters (or “machine fluctuations”), such as pulse energy and central photon energy of the ionizing pulse, also affect the single-shot MD, but the relative timing between the ionizing and dressing pulses typically does not depend on these parameters. Thus in this work, machine fluctuations are assumed to be statistically independent from the random streaking vector. The streaking covariance $K[X,Y]$ defined in Eqn. (6) is therefore simplified by the impulsive condition as $K[X,Y]\simeq C[\mathcal{D}_{\bm{k}_{X}}\langle X^{0}\rangle,\mathcal{D}_{\bm{k}_{Y}}\langle Y^{0}\rangle]$ . We note that the expected dressing-free MDs $\langle X^{0}\rangle,\langle Y^{0}\rangle$ are fixed, free from shot-to-shot variations, so the treatment of $K[X,Y]$ introduced in Sec. II.1 are also applicable when machine fluctuations are present, by simply replacing the stable $X^{0},Y^{0}$ with $\langle X^{0}\rangle,\langle Y^{0}\rangle$ respectively.

Both machine fluctuations and fluctuations in $\bm{k}$ contribute to the covariance $C[X,Y]$ defined in Eqn. (5). According to the law of total covariance [24], $C[X,Y]=K[X,Y]+L[X,Y]$ consists of the streaking covariance $K[X,Y]$ defined in Eqn. (6) and the contribution from machine fluctuations:

L[X,Y]\equiv\mathbb{E}\left[C[X,Y|\bm{k}]\right]\simeq\mathbb{E}\left[{\mathcal{D}}_{\bm{k}_{X}}\otimes{\mathcal{D}}_{\bm{k}_{Y}}\right]C[X^{0},Y^{0}]~{},

(11)

where in the approximation, we used the impulsive streaking condition and the statistical independence between machine fluctuations and streaking vector.

III Methods to Retrieve Delay

Figure 4 illustrates three approaches for retrieving the relative angle $\phi$ , which is directly proportional to the time delay between $X$ and $Y$ , from the measured momentum distribution. Measuring $N_{s}$ laser shots, we obtain a sample of electron yield MDs: $X^{(i)},Y^{(i)},1{\leq}i{\leq}N_{s}$ . The sample covariance between the regional yields $X_{q}$ and $Y_{p}$ , $(\mathrm{C}_{XY})_{qp}\equiv(\overline{X_{q}Y_{p}}-\overline{X_{q}}~{}\overline{Y_{p}})N_{s}/(N_{s}-1)$ , with $\overline{\bullet}$ the sample mean over shots, gives an estimate of the underlying covariance $C[X_{q},Y_{p}]$ . As introduced in Sec. I, a key feature of the covariance analysis is leveraging the angular isotropy of the $\bm{k}$ distribution to circumvent the need for single-shot knowledge of $\bm{k}$ . While this feature avoids errors introduced control or quantification of $\bm{k}$ , it prevents forming a sample estimator according to $K[X,Y]\equiv C[\mathbb{E}\left[X|\bm{k}\right],\mathbb{E}\left[Y|\bm{k}\right]]$ , as an estimator of the conditional expectation $\mathbb{E}\left[\cdot|\bm{k}\right]$ would require binning shots by vector $\bm{k}$ . Thus our general strategy is to remove the contribution of machine fluctuations from the sample covariance $\mathrm{C}_{XY},\mathrm{C}_{XX}$ and $\mathrm{C}_{YY}$ to obtain a sample estimate of the streaking covariance (or “K-estimators”) denoted as $\mathrm{K}_{XY},\mathrm{K}_{XX}$ and $\mathrm{K}_{YY}$ . These modelling and removal procedures are detailed in Sec. IV.1. Throughout this work, the sample estimates are denoted with subscripts (e.g. $\mathrm{C}_{XY}$ ), whereas the underlying covariance and streaking covariance are with brackets (e.g. $C[X,Y]$ ).

The delay retrieval methods are generally separated into two classes, gradient-based (involving reconstruction of the MD gradient) and gradient-free. In the gradient-based methods, we reconstruct the density gradients of the dressing-free MD $\nabla X_{q}\equiv\int_{Q_{q}}\nabla\langle X^{0}\rangle d^{2}\bm{r}$ and $\nabla Y_{p}\equiv\int_{P_{p}}\nabla\langle Y^{0}\rangle d^{2}\bm{r}$ , in order to fit the GIP model (Eqn. (8)) to the sample streaking covariance $\mathrm{K}_{XY}$ . In the gradient-free method, we use the K-estimators to calculate a sample correlation matrix $\mathrm{Corr}_{XY}$ and leverage part of $\mathrm{Corr}_{XY}$ to circumvent reconstruction of the gradients. The procedures and benefits of these different approaches are described below. As introduced in Sec. I, one use case of these approaches is to measure the relative photoemission delay between two features produced by the same ionizing pulse $\tau_{Y}-\tau_{X}$ . Although $\lambda$ also depends on $\tau_{Y}-\tau_{X}$ , it is much less sensitive than $\phi$ when $|\tau_{Y}-\tau_{X}|\lesssim 1/\omega_{L}$ , so we focus on retrieving $\phi$ .

III.1 Gradient-based Methods

To retrieve the delay, the GIP model of $K[X,Y]$ relies on the gradients $\nabla X_{q},\nabla Y_{p}$ :

\phi_{\mathrm{fit}}=\mathrm{argmin}\sum_{q,p}\left|\frac{\langle k^{2}\rangle\lambda}{2}(\nabla X_{q})^{T}R(-\phi)\nabla Y_{p}-(\mathrm{K}_{XY})_{qp}\right|^{2}~{}.

(12)

One way to reconstruct the gradients $\nabla X_{q},\nabla Y_{p}$ is through numerical differentiation (ND). Dressing-free shots provide the sample mean of the measured distributions $\overline{X^{0}}$ and $\overline{Y^{0}}$ to estimate the expectations $\langle X^{0}\rangle$ and $\langle Y^{0}\rangle$ , respectively. With adequate momentum resolution, we can use a finite difference scheme to numerically differentiate $\overline{X^{0}}$ , and then integrate over $Q_{q}$ to measure the gradient $\nabla X_{q}$ . An example is shown in Fig. 5(a) for the case in Fig. 1 &3. Obtaining the gradients $\nabla X_{q},\nabla Y_{p}$ from numerical differentiation, we fit the GIP model to the sample streaking covariance $\mathrm{K}_{XY}$ , with the factor $\langle k^{2}\rangle\lambda$ treated as a free parameter in Eqn. (12). Because the amplitude ratio $\lambda$ has been absorbed into the free scaling parameter, not knowing $\lambda$ does not affect this method. We benchmark the accuracy of the ND method in the noiseless limit: i.e. with sufficient number of shots $N_{s}\to\infty$ to suppress statistical noise, complete removal of machine fluctuations, and ignoring readout noise in the electron yield. Thus we equate $\mathrm{K}_{XY}$ to the underlying $K[X,Y]$ and obtain the delay retrieval error $\Delta\phi=\phi_{\mathrm{fit}}-\phi$ at various “ground-truth” $\phi$ , Fig. 5(b). This systematic error is zero when $\phi$ is a multiple of $\pi/2$ and reaches a maximum at $\phi\,{\sim}\pm\pi/4$ . The rms error over $\phi$ increases with $k$ due to the increase in higher-order terms, as shown in panel (c), but within $k<\Delta p$ this rms error is $<0.03\,\mathrm{rad}{=}1.7^{\circ}$ , which converts to a time delay error $<30\,\mathrm{as}$ for a $2\,\mathrm{\mu m}$ -wavelength dressing field.

Another way to reconstruct the gradient is “rank-reduction” (RR), described in Algorithm 1, which employs a low-rank approximation of $K[X,X]$ and $K[Y,Y]$ to estimate the gradient. This method can be used when each set of ROIs complete a loop on the high-momentum side of the corresponding photoelectron feature, e.g. the rings delineated in Fig. 4. Within the GIP model, $K[X_{q},X_{q^{\prime}}]=\langle k^{2}/2\rangle(\nabla X_{q})^{T}\nabla X_{q^{\prime}}$ , whose rank is at most 2 since $\nabla X_{q}$ is 2D. Thus it appears we can solve the following optimization problem given $\mathrm{K}_{XX}$ :


$\displaystyle\text{minimize }f_{\mathrm{RR}}(\xi)\equiv$	$\displaystyle\sum_{q,q^{\prime}}\left\|\xi_{q}^{T}\xi_{q^{\prime}}-(\mathrm{K}_{XX})_{qq^{\prime}}\right\|^{2}W_{qq^{\prime}}~{},$	(13a)
$\displaystyle\text{subject to }l_{\mathrm{RR}}(\xi)\equiv$	$\displaystyle\sum_{q=1}^{N_{Q}}\left(\begin{matrix}-\sin\theta_{q},&\cos\theta_{q}\end{matrix}\right)\xi_{q}a_{q}=0~{},$	(13b)

where $\xi=(\xi_{1},\cdots,\xi_{N_{Q}})\in\mathbb{R}_{2{\times}N_{Q}}$ represents the gradient field, $a_{q}$ is the arc length of the angular bin at $\theta_{q}$ and $W$ is a $N_{Q}\times N_{Q}$ matrix weights, which can be configured to combat the effect of readout noise as described below. The reconstructed gradient field is given by the optimal point $\xi^{*}$ of Eqn. (13) up to a global factor, $\xi_{q}^{*}=\sqrt{\langle k^{2}/2\rangle}\nabla X_{q}$ . Applying the same procedure to $\mathrm{K}_{YY}$ , we obtain the optimal point $\eta^{*}_{p}=\sqrt{\langle k^{2}/2\rangle}\lambda\nabla Y_{p}$ . Thus the GIP model described in Eqn. (7) can be rewritten as $M_{\mathrm{GIP}}=(\xi^{*})^{T}R(-\phi)\eta^{*}$ , which we then substitute into Eqn. (12) to retrieve $\phi_{\mathrm{fit}}$ .

The minimization of the objective function $f_{\mathrm{RR}}$ is closely related to principal component analysis (PCA). The weight matrix can be configured as uniform $W_{qq^{\prime}}=1$ when the signal-to-noise ratio is high in the electron yield readout process. When the readout signal-to-noise ratio is low, we recommend ignoring the diagonal band of $\mathrm{K}_{XX}$ by setting the diagonal band in $W$ to zero (e.g. $W_{qq^{\prime}}=1-\delta_{qq^{\prime}}$ ). Guidance for defining is detailed in Sec. IV.2. With a uniform $W$ , a minimal point of $f_{\mathrm{RR}}$ is given by the PCA result $\xi^{\mathrm{P}}=\left(\sqrt{s^{(1)}}v^{(1)},\sqrt{s^{(2)}}v^{(2)}\right)^{T}$ [25], where $s^{(\alpha)}$ is the $\alpha$ -th largest eigenvalue of $\mathrm{K}_{XX}$ and $v^{(\alpha)}$ is the corresponding eigenvector (in column vector form). When $W$ is non-uniform, e.g. $W_{qq^{\prime}}=1-\delta_{qq^{\prime}}$ , we can minimize $f_{\mathrm{RR}}$ by optimizing $\xi^{\mathrm{P}}$ using gradient descent.

The objective function $f_{\mathrm{RR}}$ is invariant under any global orthogonal transformation $O{:}~{}\xi_{q}\mapsto O\xi_{q}$ , but only certain minimal points can satisfy the zero-loop constraint Eqn. (13b), which is a general property of a gradient field $\oint\nabla X^{0}\cdot d\bm{l}=0$ . Thus obtaining one minimal point $\xi^{\mathrm{P}}$ , we solve for an orthogonal matrix $O$ such that $l_{\mathrm{RR}}(O\xi^{\mathrm{P}})=0$ This is straightforwardly achieved by parameterizing $O$ with the rotation angle and parity. An alternative to the zero-loop constraint is maximizing the gradient-flux $j_{\mathrm{RR}}(O\xi^{\mathrm{P}})\equiv-\sum_{q}(\cos\theta_{q},~{}\sin\theta_{q})O\xi^{\mathrm{P}}_{q}a_{q}$ with $O$ , which necessarily satisfies the zero-loop constraint and additionally breaks the remaining discrete degeneracy, as proven in Supplemental Sec. 2.

Algorithm 1 Rank-reduction gradient reconstruction

Perform PCA on

\mathrm{K}_{XX}

, obtaining

\xi^{\mathrm{P}}\leftarrow(\sqrt{s^{(1)}}v^{(1)},\sqrt{s^{(2)}}v^{(2)})^{T}

;

W

is not uniform then

Update

\xi^{\mathrm{P}}

to minimize

f_{\mathrm{RR}}(\xi^{\mathrm{P}})

using gradient descent;

end if

O^{*}\leftarrow\arg\max_{O}j_{\mathrm{RR}}(O\xi^{\mathrm{P}})

with

\xi^{\mathrm{P}}

fixed;

return

\xi^{*}=O^{*}\xi^{\mathrm{P}}

;

Similar to the results in Fig. 5(b-c), we benchmark the RR method under various $\phi$ and $k$ in the noiseless limit as mentioned above, as shown in panels (d-f). The gradient fields $\nabla X_{q}$ and $\nabla Y_{p}$ are reconstructed from $\mathrm{K}_{XX}$ and $\mathrm{K}_{YY}$ , respectively, and we show the resultant $\nabla X_{q}$ in panel (d). Similar to the ND method, the delay retrieval error of the RR method is minimized when $\phi$ is a multiple of $\pi/2$ . The rms error increases with $k$ , but the magnitude of the error is notably smaller than with the ND method. As shown in Fig. 5(f), the rms error is ${<}1$ mrad within $k{<}\Delta p$ , corresponding to ${<}1$ as for a $2\,\mathrm{\mu m}$ dressing field.

The main reason for the higher accuracy of the RR method lies in the reconstructed “gradient”. The ND gradients give the first-order derivatives of the dressing-free MD $\nabla X_{q}$ and $\nabla Y_{p}$ , independent from the streaking amplitude $k$ . The gradient reconstructed by the RR procedure, in contrast, deviates from the first-order derivative as $k$ increases. To the next lowest order, the RR gradients are


$\displaystyle\xi_{q}$	$\displaystyle=\sqrt{\frac{\langle k^{2}\rangle}{2}}\int_{Q_{q}}d^{2}\bm{r}\nabla\left(1+\frac{\langle k^{4}\rangle}{8\langle k^{2}\rangle}\nabla^{2}\right)\langle X^{0}\rangle+o(k^{4})~{},$	(14a)
$\displaystyle\eta_{p}$	$\displaystyle=\sqrt{\frac{\langle k^{2}\rangle}{2}}\lambda\int_{P_{p}}d^{2}\bm{r}\nabla\left(1+\frac{\langle k^{4}\rangle\lambda^{2}}{8\langle k^{2}\rangle}\nabla^{2}\right)\langle Y^{0}\rangle+o(k^{4})~{},$	(14b)

which include the third-order derivatives in addition to the first-order (see Supplementary Sec. 1.2). As a result, when using the RR gradients, the inner product $\xi_{q}^{T}R(\phi)\eta_{p}$ not only captures the (1+1) order of $K[X_{q},Y_{p}]$ , but also the (1+3) and (3+1) orders. In contrast when using the ND gradient, only the (1+1) order is modelled. Since the errors in the noiseless limit arise from the finite accuracy of the GIP model and are insensitive to the normalized radius $p_{c}/\Delta p$ , the result in Fig. 5 is general across $p_{c}/\Delta p$ .

III.2 Gradient-free Method

In some experiments, gradient reconstruction may be challenging. For example, the numerical differentiation becomes infeasible when the detector lacks angular resolution, and RR becomes infeasible when the available ROIs cannot complete a loop. In this case, it is often possible to identify a part of the streaking correlation matrix $R[X_{q},Y_{p}]\equiv{K[X_{q},Y_{p}]}/{\sqrt{K[X_{q},X_{q}]K[Y_{p},Y_{p}]}}$ that can be used to extract the photoemission delay without requiring knowledge of the gradient.

In the angular regions where the gradient is predominantly along the radial direction $|r^{-1}\partial_{\theta}X_{q}|\ll|\partial_{r}X_{q}|$ , $|r^{-1}\partial_{\theta}Y_{p}|\ll|\partial_{r}Y_{p}|$ (also referred to as the “radial regions”), the streaking covariance is approximated as

K[X_{q},Y_{p}]\approx\frac{\langle k^{2}\rangle}{2}\cos(\phi-\theta_{p}+\theta_{q})(\partial_{r}X_{q})(\lambda\partial_{r}Y_{p})~{}.

(15)

The $\cos(\phi-\theta_{p}+\theta_{q})$ factor explains the positive ridge around $\theta_{p}-\theta_{q}{\sim}\phi$ in the covariance matrix. The radial gradients in Eqn. (15), including the factor $\lambda$ , cancel out, i.e. $R\left[X_{q},Y_{p}\right]\approx\cos(\phi-\theta_{p}+\theta_{q})$ . Therefore we define the gradient-free model as $M_{\mathrm{GF}}\equiv a\cos(\phi-\theta_{p}+\theta_{q})$ , with free parameters $a$ and $\phi$ . From the measured K-estimators, we calculate the sample correlation matrix $\mathrm{Corr}_{XY}=\mathrm{K}_{XY}/\sqrt{\mathrm{K}_{XX}\mathrm{K}_{YY}}$ , and minimize the mean-squared error in the radial region between $\mathrm{Corr}_{XY}$ and the model $M_{\mathrm{GF}}$ .

The radial region is determined from the dressing-free MD. For the case shown in Figs. 3 and 5, the anisotropy parameter is $\beta_{2}=2$ , the radial region defined by $|r^{-1}\partial_{\theta}X_{q}|<0.2|\partial_{r}X_{q}|$ is $138^{\circ}$ -wide around each antinode of the dipole feature. For lower anisotropy parameters, the radial region is larger, since an isotropic feature has no angular gradient component. As shown in Fig. 6(a), within the radial region, the correlation matrix is well described by the gradient-free model $M_{\mathrm{GF}}$ . By fitting $M_{\mathrm{GF}}$ to the measured $\mathrm{Corr}_{XY}$ in the radial region, we obtain $\phi_{\mathrm{fit}}$ , and the error $\phi_{\mathrm{fit}}-\phi$ is shown in Fig. 6(b & c). Similar to the gradient-based methods, the error is zero at $\phi=0,\pm\pi/2$ , but the rms error across $\phi\in[0,2\pi]$ does not vanish when $k\to 0$ . This residual error at small streaking amplitude arises from ignoring the non-radial regions, and in the case shown in Fig. 6, it amounts to $20$ mrad rms. The systematic error of the gradient-free method also increases with $k$ , but the increase is slower than for the ND method. In this case, comparing Fig. 6(c) to Fig. 5(c), we find the gradient-free method is more accurate than the ND method when $k\gtrsim\Delta p$ , but remains less accurate than the RR method. Another difference between the gradient-free and gradient-based methods is the dependence on $p_{c}/\Delta p$ : the gradient-free method is generally more accurate with higher $p_{c}/\Delta p$ , since the magnitude of the radial gradient relative to the angular gradient increases.

IV Discussion

IV.1 Machine Fluctuations

When implementing analysis procedures that make use of fluctuations in the measured data, it is important to understand how instabilities in the measurement affect the measured correlation, in this case $\mathrm{C}_{XY}$ . When using an attosecond x-ray free electron laser there may be fluctuations in the x-ray parameters, for example the pulse energy and/or the central photon energy. In this section we describe how to manage the effect of these additional fluctuations in the delay extraction procedure.

IV.1.1 Additional Contribution from Machine Fluctuations in the Impulsive Regime

As discussed in Sec. II.2, the measured covariance is the arithmetic sum of the streaking covariance and the machine fluctuations $C[X,Y]=K[X,Y]+L[X,Y]$ . In the the impulsive regime, $L[X,Y]\simeq\mathbb{E}\left[{\mathcal{D}}_{\bm{k}_{X}}\otimes{\mathcal{D}}_{\bm{k}_{Y}}\right]C[X^{0},Y^{0}]$ , as written in Eqn. (11). Similar to $K[X,Y]$ in the impulsive regime, $\mathbb{E}\left[{\mathcal{D}}_{\bm{k}_{X}}\otimes{\mathcal{D}}_{\bm{k}_{Y}}\right]$ also expands into direct products of partial derivative operators numbered by the orders $(n_{X}+n_{Y})$ . In contrast to $K[X,Y]$ , $L[X,Y]$ has a leading order of (0+0) given by $L_{0+0}\equiv C[X^{0},Y^{0}]$ , i.e. the covariance of the unstreaked MD. The (1+1) order of $L[X,Y]$ is given by the $\langle k^{2}/2\rangle\hat{G}(\phi)$ operator acting on the two-point function $C[X^{0},Y^{0}]$ , and the sum of (0+2) and (2+0) terms amounts to $\langle k^{2}/2\rangle\hat{H}L_{0+0}$ . The decomposition of the total covariance $C[X,Y]$ in the presence of machine fluctuations is shown in Fig. 7. The parameters used in Fig. 7 are the same as those used in Fig. 3, but machine fluctuations have been introduced in the following manner. The central photon energy is normally distributed in 2 eV FWHM around 40.8 eV, the ionizing pulse energy follows a Gamma distribution $\Gamma(\alpha{=}\beta{=}2)$ , and the streaking amplitude follows a Rayleigh distribution with an expected $\langle k\rangle=0.06\,\mathrm{a.u.}$ . As shown in Fig. 7(a), the total covariance is mostly positive, with a positive ridge at roughly $\theta_{p}-\theta_{q}\sim\phi$ . The streaking covariance $K[X,Y]$ , shown in panel (b), is nearly identical to Fig. 3(a), since the average machine condition remains the same. The composition of $L[X,Y]$ is shown in Fig. 7(c-f). The unstreaked covariance $L_{0+0}$ is positive, which explains why the total covariance is overall more positive than the streaking covariance. At the same time, $L_{0+0}$ is independent from the relative timing between the two features. The next order consists of the (1+1) term $\langle k^{2}/2\rangle\hat{G}(\phi)L_{0+0}$ and the sum of (0+2) and (2+0) terms $\langle k^{2}/2\rangle\hat{H}L_{0+0}$ , as shown in panels (d-e). The remaining higher-order terms in $L$ are negligible in this case, as shown in panel (f).

IV.1.2 Accounting for Machine Fluctuations in Delay Retrieval

As mentioned in Sec. III.1, the general strategy for obtaining the K-estimators ( $\mathrm{K}_{XY}$ , $\mathrm{K}_{XX}$ , and $\mathrm{K}_{YY}$ ) is to remove the contribution from machine fluctuations from the corresponding sample covariances $\mathrm{C}_{XY},\mathrm{C}_{XX}$ , and $\mathrm{C}_{YY}$ . Here we discuss two procedures for this removal: (1) using partial covariance with respect to a fluctuating global parameter(s), and (2) subtracting the unstreaked covariance.

In the linear regime of light-matter interactions, the electron yield depends on the pulse energy of the ionizing radiation, the sample density, and the detector gain. The combined effect of all these parameters can be described by a global scaling factor $F$ , which is uniform across momentum space and fluctuates from shot to shot. As long as $F$ is measured on a single-shot basis, taking the partial covariance with respect to $F$ removes the correlation resulting from the linear dependence of $X,Y$ on $F$ :

\mathrm{PCov}[X,Y;F]\equiv C[X,Y]-C[X,F]C[F,F]^{-1}C[F,Y]~{},

(16)

whose corresponding sample estimate is $\mathrm{PC}_{XY;F}\equiv\mathrm{C}_{XY}-\mathrm{C}_{XF}\mathrm{C}_{FF}^{-1}\mathrm{C}_{FY}$ [26]. In this case, we can decompose the two features $X,Y$ as the product of normalized distributions $U$ and $V$ with $F$ , $X=FU,Y=FV$ . $U$ and $V$ depend on the streaking vector $\bm{k}$ but are statistically independent from $F$ . Combining Eqn. (16) with the law of total covariance, the partial covariance is $\mathrm{PCov}[X,Y;F]=\langle F^{2}\rangle C[U,V]=\langle F^{2}\rangle(K[U,V]+L[U,V])$ . The term $\langle F^{2}\rangle K[U,V]$ can be recognized as the streaking covariance $K[X,Y]=\langle F\rangle^{2}K[U,V]$ scaled by the factor $\langle F^{2}\rangle/\langle F\rangle^{2}\geq 1$ . The second term $\langle F^{2}\rangle L[U,V]$ is smaller than the machine fluctuation $L[X,Y]$ by a positive definite value:

L[X,Y]-\langle F^{2}\rangle L[U,V]=C[F,F]~{}\mathbb{E}\left[\mathbb{E}\left[U|\bm{k}\right]\mathbb{E}\left[V|\bm{k}\right]\right]~{}.

(17)

Therefore taking partial covariance has removed this difference that is proportional to $C[F,F]$ and revealed the streaking covariance.

Another method for removing effect of machine fluctuations involves calculating the difference between measurements made with the dressing field and measurements made in the absence of the dressing field. If the sample covariance has been calculated, we use the difference $\mathrm{C}_{XY}-\mathrm{C}_{X^{0}Y^{0}}$ as $K_{XY}$ . In this case $K_{XY}$ becomes the sum of $K\left[X,Y\right]$ and all terms of the machine fluctuation covariance $L\left[X,Y\right]$ higher than $L_{0+0}=\mathrm{C}_{X^{0}Y^{0}}$ , as shown in Fig. 7. An improved estimate of $K\left[X,Y\right]$ can be calculated using the partial covariance, $\mathrm{K}_{XY}=\mathrm{PC}_{XY;F}-\mathrm{PC}_{X^{0}Y^{0};F}$ . In Fig. 8, we demonstrate the performance of $\mathrm{K}_{XY}$ as an estimate of the underlying streaking covariance $K[X,Y]$ in this case. Panels (a) and (b) compare the sample estimates $\mathrm{K}_{XY},\mathrm{K}_{XX}$ to the respective underlying streaking covariance $K[X,Y],K[X,X]$ . As shown in panel (c), the accuracy of all three delay retrieval methods (solid curves) described in Sec. III is generally worse than without machine fluctuations (dashed curves), but the change in the error is less than 20 mrad within $\langle k\rangle/\Delta p<2$ . This change in systematic error arises from the residual machine fluctuation contribution in the K-estimators. We note that to apply this subtraction procedure, it is important to acquire comparable amount of unstreaked shots as the streaked shots, to ensure the statistical noise from the unstreaked covariance does not dominate the noise in the subtraction result.

IV.1.3 Gradient Validation in Rank Reduction

The residual machine fluctuation contribution in $\mathrm{K}_{XX}$ is a potential systematic issue for the RR method, because the optimization problem Eqn. (13) finds a rank-2 approximation of $\mathrm{K}_{XX}$ . This is motivated by the fact that the gradient inner-product in $K[X,X]$ has a rank of 2, but the residual machine fluctuations mix with the streaking covariance. Therefore in the presence of significant machine fluctuations, the gradient reconstructed with RR requires a validation. If the RR gradient differs significantly from the ND gradient, it should be considered invalid, and delay retrieval should not proceed using such a gradient. We can quantify the similarity between RR gradient $\xi^{*}$ and ND gradient $g=(\nabla X_{1},\cdots,\nabla X_{N_{Q}})$ by the average cosine similarity $S_{C}(a,b)\equiv|ab^{T}|/(\|a\|\|b\|)$ of the components $\mathcal{S}\equiv(S_{C}(\xi^{*x},g^{x})+S_{C}(\xi^{*y},g^{y}))/2$ . For instance, applying the RR procedure to the $\mathrm{C}_{XX}$ shown in Fig. 9(a) without any removal of the machine fluctuations, the resultant RR gradient is significantly different from the ND gradient, as shown in panel (b). The average cosine similarity is $\mathcal{S}=0.5$ resulting from similarity in $x$ component but orthogonality in $y$ . Given that there are two components ( $x$ and $y$ ) in the gradient field, we suggest a suitable threshold at $\mathcal{S}>(1/2+1)/2=3/4$ for the validity of RR gradient. A higher threshold is less desirable, because a valid RR gradient deviates from the first-order partial derivative and outperforms the ND method in delay retrieval accuracy, as pointed out by Eqn. (14).

When an acceptable similarity between the RR gradient and ND gradient cannot be obtained, it is possible to generalize to a rank-3 approximation, i.e. to minimize $f_{\mathrm{RR}}(\zeta)$ with $\zeta\in\mathbb{R}_{3\times N_{Q}}$ . We obtain a minimal point $\zeta^{\mathrm{P}}$ , subject to the constraint that the rows of $\zeta^{\mathrm{P}}$ are orthogonal. We refer to the row vectors in $\zeta^{\mathrm{P}}$ as principal components. For each pair of principal components, we stack them as the $\xi^{\mathrm{P}}\in\mathbb{R}_{2\times N_{Q}}$ , maximize the flux $j_{\mathrm{RR}}(O\xi^{\mathrm{P}})$ as in Algorithm 1. One should validate the reconstructed gradient $\xi^{*}$ with the ND gradient. When no pair of principal components can result in a valid $\xi^{*}$ , the RR method is not recommended. If there are multiple pairs that results in a valid $\xi^{*}$ , we choose the valid n $\xi^{*}$ with the largest spectral norm.

When the measured dressing-free MD has inversion symmetry, $X^{0}(\bm{r})=X^{0}(-\bm{r})$ , each principal component is either even $f(-\bm{r})=f(\bm{r})$ or odd $f(-\bm{r})=-f(\bm{r})$ . This parity serves as a useful guide for selecting the pair of principal components: both $x$ and $y$ components of the gradient field must be odd. As shown in Fig. 9(c-d), the largest principal component of $\mathrm{C}_{XX}$ (blue) has even parity, whereas the second (orange) and third (green) are odd. After removing the machine fluctuation contribution from $\mathrm{C}_{XX}$ , the top two components of the resultant $\mathrm{K}_{XX}$ both have odd parity, and strongly resemble the second and third components of $\mathrm{C}_{XX}$ . A rank-3 approximation of $\mathrm{C}_{XX}$ with the aforementioned validation procedure results in the same RR gradient field as the reconstruction with $\mathrm{K}_{XX}$ .

IV.2 Shot Noise

Shot noise is ubiquitous in electron spectroscopy, arising from the particle nature of electrons. The measured electron yield in a momentum region $Q$ follows a Poisson distribution, with expectation $\int_{Q}I(\bm{r})d\bm{r}$ , where the MD, $I(\bm{r})$ varies from shot to shot. To understand the effect of shot noise on our measurement scheme, we denote $E_{\mathrm{fx}}$ as the expected electron yield at each set of ROIs. We also refer to this quantity as the “flux”, since it represents an average number electron counts per shot. Here we investigate the impact of shot noise on the delay retrieval methods, assuming both sets of ROIs $\{Q_{q}\}$ and $\{P_{p}\}$ receive the same electron flux. We simulate $N_{s}$ measured shots by Poisson-sampling the MD, and apply the delay retrieval methods to this simulated data set. We repeat the Poisson sampling 10 times for each $E_{\mathrm{fx}}$ , and the mean $\overline{\Delta\phi}{=}\overline{\phi_{\mathrm{fit}}}-\phi$ and standard deviation $\sigma[\Delta\phi]{=}\sigma[\phi_{\mathrm{fit}}]$ over the repetitions quantify the systematic error and the statistical error, respectively.

Since shot noise is typically independent across non-intersecting regions, it does not generally induce systematic error in $\mathrm{K}_{XY}$ . There is an exception in the vicinity of the diagonal entries in $\mathrm{K}_{XX}$ and $\mathrm{K}_{YY}$ , since the corresponding pairs of ROIs have correlated shot noise. These entries in $\mathrm{K}_{XX}$ and $\mathrm{K}_{YY}$ are positively biased because the shot noise is unaccounted for in our model. For example, when regions $Q_{1}$ and $Q_{2}$ both include a contribution from the same detector pixel, $(\mathrm{K}_{XX})_{12}$ is affected by such positive bias, i.e. $\langle(\mathrm{K}_{XX})_{12}\rangle-K[X_{1},X_{2}]>0$ . The critical distance between the pair of ROIs is the noise correlation length of the detector. When the distance between two regions is shorter than this noise correlation length, a bias can be introduced.

The ND method relies on $\overline{X^{0}},\overline{Y^{0}}$ , and $\mathrm{K}_{XY}$ , as illustrated in Fig. 4, all of which are free from bias induced by shot noise. Thus the systematic error of the ND method is independent of shot noise but remains at the noiseless limit, as shown by the green trace in Fig. 10(a). On the other hand, the RR method is affected by the positive bias in part of the K-estimators that arises from shot noise, as visualized in the inset of Fig. 10(a). The impact of the bias can be mitigated by assigning zero weights to these positively biased entries of $\mathrm{K}_{XX}$ in the RR procedure. In simulation, we can set $W_{qq^{\prime}}=1-\delta_{qq^{\prime}}$ , since it is straightforward to ensure the independence of the Poisson-sampling process between non-intersecting regions. We refer to this variant of the RR method, which is also free from the systematic errors arising from shot noise, as the diagonal-agnostic rank-reduction (daRR) method. This variation improves the accuracy of the delay extraction in low-count scenarios. In high-count scenarios, the daRR method is less accurate than the RR method, due to the small induced bias in the reconstructed gradient. As shown in Fig. 10(a), the point at which the RR method becomes more accurate than the daRR methods is $E_{\mathrm{fx}}/N_{Q}\approx 10$ for the parameters considered. As $E_{\mathrm{fx}}\to\infty$ , the systematic error of the RR method diminishes as $\propto 1/E_{\mathrm{fx}}$ , approaching the noiseless limit.

The bias in $\mathrm{K}_{XX}$ and $\mathrm{K}_{YY}$ also affects the gradient-free delay retrieval method. Here, shot noise affects the sample correlation matrix $\mathrm{Corr}_{XY}$ via the positive bias in $(\mathrm{K}_{XX})_{qq}$ and $(\mathrm{K}_{YY})_{pp}$ . In the case of $k/\Delta p=0.5,p_{c}/\Delta p=6.6$ , the systematic error of the gradient-free method exhibits two plateaus at both the low-count and the high-count ends, with a dip in between, as shown in Fig. 10(a). At high counts we approach the noiseless limit, where the error $\Delta\phi=\phi_{\mathrm{fit}}-\phi$ has the same sign as $\phi$ , as shown in Fig. 6. In the low-count region, in contrast, the error $\Delta\phi$ exhibits opposite sign to $\phi$ . Between the two plateaus, when the sign of $\Delta\phi$ flips, a minimum systematic error is found. However, the exact $E_{\mathrm{fx}}$ for which the minimal systematic error occurs strongly depends on $k/\Delta p$ and $p_{c}/\Delta p$ . Therefore, the flux providing the minimal error can only be accurately identified when a thorough characterization of the system parameters is possible.

The accuracy of the delay retrieval, quantified by the systematic error $\overline{\Delta\phi}$ , is insensitive to the number of shots $N_{s}$ . This is because the accuracy of $\overline{X^{0}},\overline{Y^{0}}$ , and K-estimators, depends on the electron flux and not $N_{s}$ , when only shot noise is considered. In contrast, precision of the delay retrieval, quantified by the statistical error $\sigma[\Delta\phi]$ , depends on $N_{s}$ . This is because the variance of $\overline{X^{0}},\overline{Y^{0}}$ , and K-estimators scales as ${\propto}1/N_{s}$ when the measurement is repeated over a large number of shots $N_{s}$ . Shot noise contributes to the variance in a way that is inversely proportional to the total electron counts detected $N_{s}E_{\mathrm{fx}}$ . As shown by the colored traces in Fig. 10(b), with shot noise alone, the statistical error of the retrieved delay scales as $\propto 1/\sqrt{N_{s}E_{\mathrm{fx}}}$ . In addition to shot noise, the variance of $\overline{X^{0}},\overline{Y^{0}}$ , and K-estimators, also include the sampling noise of the streaking direction $\kappa$ , i.e. the non-uniformity of the distribution of a finite number of measured $\kappa$ , which is $\propto 1/N_{s}$ for large $N_{s}$ . Therefore the statistical error shrinks as $\sqrt{\mathcal{C}_{\kappa}/N_{s}+\mathcal{C}_{E}/(N_{s}E_{\mathrm{fx}})}$ when both $N_{s}$ and $E_{\mathrm{fx}}$ are large, where $\mathcal{C}_{\kappa}$ and $\mathcal{C}_{E}$ are constants independent of $N_{s}$ and $E_{\mathrm{fx}}$ . In the case shown in Fig. 10(a& b), $k/\Delta p=0.5$ , and these constants are $\mathcal{C}_{E}=2.0\,\mathrm{rad}^{2}$ and $\mathcal{C}_{\kappa}=0.045\,\mathrm{rad}^{2}$ . The sampling noise of $\kappa$ does not introduce systematic error in the delay retrieval, as it preserves the accuracy of $\overline{X^{0}},\overline{Y^{0}}$ , and the K-estimators.

The total error of the retrieved delay, defined as $\sqrt{\overline{(\Delta\phi)^{2}}}=\sqrt{(\overline{\Delta\phi})^{2}+\sigma[\Delta\phi]^{2}}$ which combines the systematic and statistical errors, also depends on the streaking amplitude $k$ . Figure 10(c) shows this total error varies with $k/\Delta p$ at a fixed electron flux $E_{\mathrm{fx}}=3\times 10^{2}$ . At large values of $k/\Delta p$ , the total error grows with $k$ due to the increasing systematic errors as discussed in section III. However, as $k/\Delta p$ approaches zero, the total error increases due to the presence of noise. This is because the streaking covariance, which scales as $k^{2}$ in the small amplitude limit (Eqn. 8), is overwhelmed by the noise. The dependence on $k/\Delta p$ shown in Fig. 10(c) represents the typical behavior of the total error in delay retrieval, although the exact optimal streaking amplitude also depends on other parameters such as $E_{\mathrm{fx}}$ and $N_{s}$ .

IV.3 Delay Fluctuation

When employing angular streaking to measure the time-delay between two ionizing pulses, the instabilities in the delay impacts the measured covariance between the two photoelectron features produced by the respective pulses. The delay fluctuation results in variation of $\phi$ . In our framework, it is straightforward to incorporate fluctuations in $\phi$ , so long as the flucuations are independent from the rest of machine fluctuations. Since $\phi$ is the relative angle between the streaking directions of the two features, each individual MD is independent from $\phi$ , thus $\mathbb{E}\left[I|\phi\right]=\mathbb{E}\left[I\right]$ and $C[\mathbb{E}\left[X|\phi\right],\mathbb{E}\left[Y|\phi\right]]=0$ . Applying the law of total covariance:

	$\displaystyle C[X,Y]$	$\displaystyle=\mathbb{E}\left[C[X,Y\|\phi]\right]+C[\mathbb{E}\left[X\|\phi\right],\mathbb{E}\left[Y\|\phi\right]]$
		$\displaystyle=\mathbb{E}\left[C[X,Y\|\phi]\right]~{},$		(18)

we only retain the conditional covariance $C[X,Y|\phi]$ averaged over $\phi$ . Then the measured covariance is simply the ensemble average of the covariance for each $\phi$ .

To further illustrate the effect of a small delay jitter in the covariance analysis, we look into the scenario where $\phi$ follows a normal distribution with mean value $\phi_{0}$ and standard deviation $\delta\phi$ . We found that this normal distribution of delay results in a damping factor on the GIP model $\mathbb{E}\left[M_{\mathrm{GIP}}(\phi)\right]=e^{-\delta\phi^{2}/2}M_{\mathrm{GIP}}(\phi_{0})$ . This damping factor $e^{-\delta\phi^{2}/2}$ has an intuitively interpretation: Increasing the delay jitter reduces the correlation between the streaking directions of the two features $X$ and $Y$ , which reduces the magnitude of the streaking covariance $K[X,Y]$ . On the other hand, $K[X,X]$ and $K[Y,Y]$ are not affected by the delay jitter $\delta\phi$ , since they originate from a single ionizing pulse and thus have no dependence on $\phi$ . Incorporating a global scaling factor as a free parameter of the gradient-based and/or gradient-free model, we can retrieve the delay. We can also estimate the delay jitter by comparing the magnitude of $\mathrm{K}_{XY}$ to $\mathrm{K}_{XX}$ and $\mathrm{K}_{YY}$ [13].

IV.4 Instrument Specific Issues

IV.4.1 Momentum Projection

The framework presented above for interpreting photoelectron momentum distributions in the impulsive streaking regime can be applied to both projected and sliced MDs. A projected MD can be measured with a VMI-type spectrometer [21], and a sliced MD can be measured with an array of Time-of-Flight spectrometers (ToFs) [27]. The two schemes perform nearly identically, in terms of the accuracy of the model and the performance of the delay retrieval methods. The GIP model is slightly more accurate for the projected MD, as shown in Fig. 11(a). As a result, the systematic error of delay retrieval methods is slightly lower, as shown in panel (b). The primary reason for the improved accuracy is that the higher order ( $(n_{X}+n_{Y}){=}4$ ) terms, beyond GIP term, in streaking covariance is reduced by the project along $p_{z}$ . The reduction of the higher-order terms is a result of the reduced curvature of the dressing-free MD (e.g. $\nabla^{2}\langle X^{0}\rangle$ ), when the MD is projected. At the same time, difference in accuracy and precision between the two detection modalities are minimal.

IV.4.2 Angular Sparsity

Measurement of the sliced (i.e. $p_{z}=0$ ) MD is often done using an array of time-of-flight spectrometers (ToFs), as in the aforementioned device, MRCO, which consists of $16$ ToFs, each collecting ${\sim}0.2$ % of the full $4\pi$ solid angle [27]. We consider the impact of angular sparsity in the sampling of the MD on the delay retrieval methods. The ${\sim}0.2$ % angular sampling is notably more sparse than the measurement schemes employed above.To simulate this angular sampling scheme, we integrate an angular window representing each ToF, these windows are shown for the dressing-free MD in Fig. 12(a). As in the previous tests, we use the optimal lower bound for the ROI of the momentum, $p_{\min}=p_{\mathrm{MG}}$ , to optimize the accuracy of the GIP model. We can then calculate the K-estimators: $\mathrm{K}_{XY},\mathrm{K}_{XX},\mathrm{K}_{YY}$ with the limited angular resolution. We compare two measurement schemes, one where both $X$ and $Y$ are detected by all 16 ToFs $N_{Q}{=}N_{P}{=}16$ , denoted MRCO16, and a second scheme where the ToFs are split into two interleaving subsets $N_{Q}{=}N_{P}{=}8$ for $X$ and $Y$ respectively, denoted MRCO8v8. As a reference we use a densely sampled measurement scheme, denoted as “N180” in Fig. 3-10. Examples of the computed $\mathrm{K}_{XY}$ are shown in Fig. 12(b) and (c), for the MRCO16 and MRCO8v8 schemes, respectively. In both cases, the positive ridge around $\theta_{p}{-}\theta_{q}{\sim}\phi$ persists regardless of sampling scheme.

We characterize the impact of the angular sparsity in the measurement on the delay retrieval methods described in Sec. III. We focus specifically on the RR and daRR methods, initially in the noiseless limit. We find that the systematic error of the RR method is independent of the sampling scheme ( $<0.1$ mrad difference), as shown by the solid curves in Fig. 12(d). In contrast, the systematic error of the daRR method increases with increasing angular sparsity, as shown by the dashed curves in Fig. 12(d). The number of elements in the $\mathrm{K}_{XX}$ matrix scales quadratically with $N_{Q}$ , where as the number of diagonal elements scales linearly. As a result, when the daRR method ignores the diagonal elements in $\mathrm{K}_{XX}$ , a smaller value of $N_{Q}$ ignores relatively more information in $\mathrm{K}_{XX}$ . For the N180 scheme, the daRR method performs nearly the same as the RR method in the noiseless limit, within $0.2$ mrad difference. Another observation from panel (d) is that the systematic error of daRR is roughly proportional to $N_{Q}^{-1}k/\Delta p$ , within $k/\Delta p<0.6$ , where the error if the daRR method is approximately linear with $k$ .

When we introduce shot noise to the simulations, the three measurement schemes (N180, MRCO16, MRCO8v8) share several other similarities in terms of the delay retrieval performance. Decreasing the electron flux in each set of ROIs, $E_{\mathrm{fx}}$ , from the noiseless limit, we see a region where the systematic error of RR method scale as $\propto 1/E_{\mathrm{fx}}$ and then exceeds the daRR method, as shown by the solid curves in Fig. 12(e). This behavior is exhibited under all three schemes, although the values of $E_{\mathrm{fx}}$ below which the daRR method outperforms the RR method are different. As $E_{\mathrm{fx}}$ decreases further, the systematic error of the daRR method remains insensitive to $E_{\mathrm{fx}}$ until the electron counts become excessively scarce $E_{\mathrm{fx}}\lesssim 5$ . On the other hand, the statistical error of both the RR and the daRR methods follows the $1/\sqrt{N_{s}E_{\mathrm{fx}}}$ scaling, as shown by the colored traces in panel (f). Remarkably, in this asymptotic scaling region the statistical error curves for the different measurement schemes overlap each other, rather than having an offset between them. Meanwhile, in the systematic error curves of the RR method, the $\propto 1/E_{\mathrm{fx}}$ scaling regions also overlap with each other when compared at the same $E_{\mathrm{fx}}$ . Note that $E_{\mathrm{fx}}$ represents the total electron counts detected in a set of angular bins, thus achieving the same $E_{\mathrm{fx}}$ requires a higher total number of electrons generated per shot when the measurement scheme is more sparse. Another common feature for measurement schemes is the value of $E_{\mathrm{fx}}/N_{Q}$ where the RR and daRR methods are equally accurate. For the $k/\Delta p=0.5$ case shown in panel (e), this cross-over is at $E_{\mathrm{fx}}/N_{Q}\approx 10$ . This agreement is universal so long as $k/\Delta p<0.6$ and the RR method has not reached the noiseless limit.

Next, we consider sampling noise of $\kappa$ in addition to shot noise. Due to this sampling noise of $\kappa$ , when the number of shots $N_{s}$ is fixed and $E_{\mathrm{fx}}$ is high, the statistical error of retrieved delay approaches the lower bound $\sigma[\Delta\phi]\approx\sqrt{\mathcal{C}_{\kappa}/N_{s}}$ for both the RR and daRR methods, similar as in Fig. 10(b). As shown by the dot-dashed lines in Fig. 12(f), this lower bound at $\sqrt{\mathcal{C}_{\kappa}/N_{s}}$ is roughly the same under different levels of angular sparsity. Meanwhile, the overlap of the colored curves in Fig. 12(f) indicates that $\mathcal{C}_{E}$ is also almost independent of the angular sparsity. Thus in the presence of both shot noise and sampling noise of $\kappa$ , the asymptote of the statistical error $\sigma[\Delta\phi]\approx\sqrt{\mathcal{C}_{E}/(N_{s}E_{\mathrm{fx}})+\mathcal{C}_{\kappa}/N_{s}}$ is not significantly affected by the angular sparsity.

Recall that systematic errors are insensitive to $N_{s}$ , the overlap in part of the systematic error curves indicates that in order to retrieve the delay accurately, it is crucial to control the electron counts per shot. Acquiring more shots would not improve the accuracy of the delay retrieval but only improve the precision.

V Encoding Arbitrary Signals

In this section, we extend our discussion to the situation where the emission process approaches or exceeds the period of the dressing laser field, and thus the emission is no longer impulsive. We will remain in the limit that the emission process is shorter than the duration of the dressing field envelope. For this method to work, we still require an impulsive emission process to provide a reference feature. Moreover, these two emission processes should be correlated for the instantaneous process to serve as a reference. Similar to the conventions used above, we will refer to the impulsive feature as feature $X$ , and the longer timescale emission process will produce feature $Y$ .

Owing to the periodic nature of the interaction of the laser field, the time-dependence of the emission is encoded in the Fourier coefficients of the measured electron momentum distribution (MD):

\mathcal{Y}_{m}(\bm{r};k)\equiv\int_{0}^{2\pi}\mathbb{E}\left[Y(\bm{r};\bm{k})|\bm{k}\right]e^{-im\kappa}\frac{d\kappa}{2\pi}

(19)

Here $Y(\bm{r};\bm{k})$ is the yield of electrons belonging to feature $Y$ at detector coordinate $\bm{r}$ for a streaking vector $\bm{k}$ , which is similar to the conventions used above. In writing this expression, we have taken into account the machine fluctuations by using $\mathbb{E}\left[Y|\bm{k}\right]$ . Applying the law of total covariance, we can write $K[X,Y]$ as

	$\displaystyle K[X,Y]$	$\displaystyle\simeq\sum_{m=1}^{+\infty}\left(\langle\mu_{m}(k)\mathcal{Y}_{m}(k)\rangle+\mathrm{c.c.}\right)+C\left[\mu_{0}(k),\mathcal{Y}_{0}({k})\right]$		(20)
	$\displaystyle\mu_{m}(k)$	$\displaystyle\equiv\frac{1}{m!}\left(\frac{-k\partial_{+}}{\sqrt{2}}\right)^{m}{}_{0}F_{1}\left(m+1,\frac{k^{2}\nabla^{2}}{4}\right)\langle X^{0}\rangle,~{}m\geq 0$		(21)

where $\langle X^{0}\rangle$ is the dressing-free MD averaged over machine fluctuations, $\partial_{+}=(\frac{\partial}{\partial x}+i\frac{\partial}{\partial y})/\sqrt{2}$ is an operator that acts on a MD, and ${}_{0}F_{1}(z,x)\equiv\sum_{n=0}^{\infty}((z-1)!x^{n})/((z+n-1)!n!)$ is the confluence hypergeometric limit function [28]. It is clear from Eqn. (21) that all Fourier components, $\mathcal{Y}_{m}$ , are encoded in $K[X,Y]$ , but not always with the same sensitivity. This indicates a signal which is periodic with $\kappa$ is encoded in the streaking covariance with the impulsive reference feature $X$ in the same way as the encoding of $\mathbb{E}\left[Y(\bm{r};\bm{k})|\bm{k}\right]$ described in Eqn. (20). This encoding in all Fourier components is a general property of a periodic signal.

In applying the law of total covariance to partition $K[X,Y]$ , the conditional variable was the streaking amplitude $k$ . In writing Eqn. (20) we have partitioned the terms into two terms, one that results from the shot-to-shot variations of $\kappa$ (the sum over $m{\neq}0$ ) and the other that results from variations in $k$ . The streaking covariance $K[X,Y]$ depends linearly on the $|m|{\geq}1$ components, with sensitivity $\mu_{m}(k)$ given by Eqn. (21). In contrast, the $m{=}0$ component of $Y$ only affects the second term $C\left[\mu_{0}(k),\mathcal{Y}_{0}({k})\right]$ , which indicates that $K[X,Y]$ is sensitive to the covariance of $\mathcal{Y}_{0}$ and $\mu_{0}$ arising from the variation of the streaking amplitude $k$ . Note that in the small streaking regime ( $k<\Delta p_{X}$ of the reference feature), the leading order of the sensitivity is $\mu_{m}\approx\frac{k^{m}}{m!2^{m/2}}\partial_{+}^{m}\langle X^{0}\rangle$ , where the minimal order of differentiation on $\langle X^{0}\rangle$ is $m$ . Thus only the components $\mathcal{Y}_{\pm 1}$ are coupled to the gradient of the reference feature $\nabla\langle X^{0}\rangle$ . This indicates that the delay retrieval methods discussed in the preceding sections are only sensitive to the $|m|=1$ components. In the cases where feature $Y$ is in the impulsive regime, as discussed in the preceding sections, Eqn. (20) reduces to Eqn. (9) after rearranging the terms, as shown in Supplemental Sec. 4.1.

To illustrate Eqn. (20), we simulate the MD for Auger-Meitner (AM) decay of a molecule in a superposition of two core-ionized states, prepared by $K$ -shell ionization with a $0.2$ fs FWHM pulse [17]. The two intermediate cationic states have an energetic separation of $\Delta I_{\mathrm{P}}=7\hbar\omega_{L}$ , and the AM decay process couples these two states to a common dication state. In our simulation, the dressing-free AM electrons have a central momentum of $4.34$ a.u. and $4.38$ a.u. The emitted electrons are dressed by a $|\bm{A}|=0.1$ a.u., $cT_{L}=1.85\,\mathrm{\mu m}$ dressing field (equivalent to Fig. 10) and the MD considered is the $p_{z}=0$ slice. Other details of the simulation of the AM distribution are provided in Supplemental Sec. VI. The decay of this superposition of core-ionized states results in a characteristic modulation of $Y$ over $\kappa$ at a period of $2\pi/7$ , which results from interference between the two pathways via different intermediate states [17]. This modulation appears across a wide range of AM momenta. Here we have assumed that $Y^{0}$ is angularly isotropic. This means that the MD of AM electrons at different angular positions $\theta_{p}$ are equivalent up to an offset in $\kappa$ , i.e. $Y(\theta_{p},r;\kappa,k)=Y(0,r;\kappa-\theta_{p},k)$ , so the yield is also modulated over angular position $\theta_{p}$ .

In Fig.13 we analyze the MD by partitioning the feature $Y$ into $N_{P}=180$ angular bins, between radial boundaries $p_{\mathrm{min}}{=}4.40$ a.u. and $p_{\mathrm{max}}{=}4.50$ a.u.. Figure 13(a) shows the integral of $Y(\bm{r};\bm{k})$ , in the angular bin at $\theta_{p}=0$ , as a function of the streaking directions of the reference feature $\kappa$ . As shown in Fig. 13(b), the yield in these angular bins $Y_{p}$ is clearly modulated with both $\kappa$ and $\theta_{p}$ . We compute the streaking covariance $K[X,Y]$ between $Y$ and the impulsive reference feature $X$ . This reference feature $X$ is simulated in the same dressing field ( $k=0.1\,\mathrm{a.u.}\approx 0.5\Delta p_{X}$ ), and the ROIs $\{Q_{q}\}$ are remain between $p_{\mathrm{MG}}$ and $p_{\max}{=}2.15$ a.u. as in Fig. 10. The $2\pi/7$ -period modulation is barely visible in $K[X,Y]$ , as shown in Fig. 13(c). This attenuation of the $2\pi/7$ -period modulation results from the low sensitivity of $K[X,Y]$ to $\mathcal{Y}_{7}$ . We can study the sensitivity of $K[X,Y]$ to the components $\mathcal{Y}_{m}$ by computing $\mu_{m}$ according to Eqn. (21) and integrating over the angular bins $Q_{q}$ . We show the magnitude of the resultant sensitivities in Fig. 13(d), which demonstrate a decrease with increasing $m$ . Such dependence on $m$ is determined by $k,\Delta p_{X}$ , and the ROIs on feature $X$ , as described below. The products of multiplying these calculated $\mu_{m}$ to the corresponding Fourier components $\mathcal{Y}_{m}$ yield good agreements with the Fourier coefficients of $K[X,Y]$ with respect to $\theta_{p}$ , which is visualized for the $\theta_{q}=0$ slice in panel (e), corroborating Eqn. (21).

We notice that in the small streaking regime, according to Equations (20)(21), the sensitivity is given by $\int_{Q_{q}}\mu_{m}d^{2}\bm{r}\approx\frac{k^{m}}{m!2^{m/2}}\int_{Q_{q}}\partial_{+}^{m}\langle X^{0}\rangle d^{2}\bm{r}$ , which is proportional to the regional integral of a $m$ -th order derivative of the reference feature. Therefore, the scaling of the sensitivity over $m$ depends on the streaking amplitude relative to the characteristic momentum scale $\delta p$ far below which the variations in $\langle X^{0}\rangle$ become negligible, e.g. the width $\Delta p_{X}$ in the example shown in Fig. 13. In the limit of small streaking, the magnitude of sensitivity $|\mu_{m}|$ scales as $o((k/\delta p)^{m}/m!)$ , which rapidly decreases with increasing $m$ . Even in cases where $k$ is comparable to $\delta p$ , as it is the case in Fig. 13, the streaking covariance tends to understate the high-order Fourier components due to the $1/m!$ factor in the sensitivity. This also indicates that to improve the sensitivity to high-order components, it is advisable to intensify the dressing field and to reduce the smoothness of the reference feature to increase $k/\delta p$ .

The bilinearity of covariance also leads to the independence of sensitivity $\mu_{m}$ on feature $Y$ . Rather, $\mu_{m}$ is determined by the reference feature $X$ and the streaking amplitude $k$ , as shown in Eqn. (21). Therefore, when detecting $Y$ as a function of $\kappa$ by analyzing the covariance with an impulsive reference feature $X$ , the sensitivity to certain Fourier components $\mathcal{Y}_{m}$ can be designed in simulation, by choosing the streaking amplitude $k$ , the shape of $X^{0}$ and the position of ROIs $\{Q_{q}\}$ , without detailed prior knowledge in $Y$ .

VI Concluding Remarks

In this work, we present a comprehensive analysis of modelling covariance in the impulsive regime of angular streaking experiments. The displacement of the electron momentum distribution (MD) provides a tight connection between the dressing-free MD and the dressed MD. Such connection establishes universal structures in the composition of streaking covariance that are common across different MDs, regardless of their exact shape. Building on this robust framework, we have developed methods for retrieving temporal information from angular streaking measurements. Specifically for the situations where both features $X$ and $Y$ in the MD are impulsive, we have proposed and evaluated three methods for retrieving relative time delays between $X$ and $Y$ : numerical differentiation (ND), rank-reduction (RR), and a gradient-free approach. Each method offers certain advantages while also depending on experimental conditions and data quality. Several key insights are obtained in our study: (1) Both the streaking effect and the machine fluctuations contribute to the covariance that can be estimated from measurements. Proper removal of the machine fluctuation contribution is crucial for isolating the streaking covariance. (2) Shot noise of the electrons impacts the delay retrieval precision, but not significantly on the accuracy of the ND method or the diagonal-agnostic variant of the RR method. (3) Our investigation of angular sparsity in electron detection has shown that in the high-count scenarios, achieving accurate delay retrieval is possible even with limited angular sampling. (4) Ae of an arbitrary periodic $Y$ signabe encoded to thecan he streaking covarwith a concurrent impulsive streaking signal $X$ , where the sensitivity to different Fourier components of $Y$ is only determined by the dressing-free MD $X^{0}$ and the dressing fieldiance .

By providing a detailed understanding of the covariance structure in angular streaking experiments, our work enables more accurate and robust temporal measurements in a wide range of experimental scenarios. Future work could focus on (1) Developing delay retrieval methods that incorporate higher orders into the model of streaking covariance, in addition to the GIP model; (2) Higher order corrections for the removal of machine fluctuation contribution, e.g. the $n_{X}+n_{Y}=2$ terms shown in Fig. 7(c-d), which can be measured by applying finite difference schemes to $\mathrm{C}_{X^{0}Y^{0}}$ or $\mathrm{PC}_{X^{0}Y^{0};F}$ , provided with adequate momentum resolution in all four dimensions of $(\bm{r}_{q},\bm{r}_{p})$ ; (3) Developing adaptive algorithms that can optimize sensitivity to specific Fourier components of the non-impulsive signals. In conclusion, this work provides a solid framework for leveraging covariance analysis in angular streaking experiments, paving the way for advances in ultrafast science and attosecond metrology.

Acknowledgements

This work is primarily supported by the US DOE, Office of Science, Office of Basic Energy Sciences (BES), Chemical Sciences, Geosciences, and Biosciences Division (CSGB). Z.G. and A.M. acknowledge support from the Accelerator and Detector Research Program of the Department of Energy, Basic Energy Sciences division. Z.G. also acknowledge support from Robert Siemann Fellowship of Stanford University.

References

The Nobel Committee for Physics [2023] The Nobel Committee for Physics, Scientific background to the nobel prize in physics 2023 (2023).
Itatani et al. [2002] J. Itatani, F. Quéré, G. L. Yudin, M. Y. Ivanov, F. Krausz, and P. B. Corkum, Attosecond Streak Camera, Physical Review Letters 88, 173903 (2002).
Bradley et al. [1971] D. Bradley, B. Liddy, and W. Sleat, Direct linear measurement of ultrashort light pulses with a picosecond streak camera, Optics Communications 2, 391 (1971).
Thumm et al. [2015] U. Thumm, Q. Liao, E. M. Bothschafter, F. Süßmann, M. F. Kling, and R. Kienberger, Attosecond physics: Attosecond streaking spectroscopy of atoms and solids, in Photonics (John Wiley & Sons, Ltd, 2015) Chap. 13, pp. 387–441, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119009719.ch13 .
Schultze et al. [2010] M. Schultze, M. Fieß, N. Karpowicz, J. Gagnon, M. Korbman, M. Hofstetter, S. Neppl, A. L. Cavalieri, Y. Komninos, T. Mercouris, C. A. Nicolaides, R. Pazourek, S. Nagele, J. Feist, J. Burgdörfer, A. M. Azzeer, R. Ernstorfer, R. Kienberger, U. Kleineberg, E. Goulielmakis, F. Krausz, and V. S. Yakovlev, Delay in Photoemission, Science 328, 1658 (2010).
Drescher et al. [2002] M. Drescher, M. Hentschel, R. Kienberger, M. Uiberacker, V. Yakovlev, A. Scrinzi, T. Westerwalbesloh, U. Kleineberg, U. Heinzmann, and F. Krausz, Time-resolved atomic inner-shell spectroscopy, Nature 419, 803 (2002).
Eckle et al. [2008] P. Eckle, A. N. Pfeiffer, C. Cirelli, A. Staudte, R. Dörner, H. G. Muller, M. Büttiker, and U. Keller, Attosecond Ionization and Tunneling Delay Time Measurements in Helium, Science 322, 1525 (2008).
Hartmann et al. [2018] N. Hartmann, G. Hartmann, R. Heider, M. S. Wagner, M. Ilchen, J. Buck, A. O. Lindahl, C. Benko, J. Grünert, J. Krzywinski, J. Liu, A. A. Lutman, A. Marinelli, T. Maxwell, A. A. Miahnahri, S. P. Moeller, M. Planas, J. Robinson, A. K. Kazansky, N. M. Kabachnik, J. Viefhaus, T. Feurer, R. Kienberger, R. N. Coffee, and W. Helml, Attosecond time–energy structure of X-ray free-electron laser pulses, Nature Photonics 12, 215 (2018).
Duris et al. [2020] J. Duris, S. Li, T. Driver, E. G. Champenois, J. P. MacArthur, A. A. Lutman, Z. Zhang, P. Rosenberger, J. W. Aldrich, R. Coffee, G. Coslovich, F.-J. Decker, J. M. Glownia, G. Hartmann, W. Helml, A. Kamalov, J. Knurr, J. Krzywinski, M.-F. Lin, J. P. Marangos, M. Nantel, A. Natan, J. T. O’Neal, N. Shivaram, P. Walter, A. L. Wang, J. J. Welch, T. J. A. Wolf, J. Z. Xu, M. F. Kling, P. H. Bucksbaum, A. Zholents, Z. Huang, J. P. Cryan, and A. Marinelli, Tunable isolated attosecond X-ray pulses with gigawatt peak power from a free-electron laser, Nature Photonics 14, 30 (2020), number: 1 Publisher: Nature Publishing Group.
Franz et al. [2024] P. Franz, S. Li, T. Driver, R. R. Robles, D. Cesar, E. Isele, Z. Guo, J. Wang, J. P. Duris, K. Larsen, J. M. Glownia, X. Cheng, M. C. Hoffmann, X. Li, M.-F. Lin, A. Kamalov, R. Obaid, A. Summers, N. Sudar, E. Thierstein, Z. Zhang, M. F. Kling, Z. Huang, J. P. Cryan, and A. Marinelli, Terawatt-scale attosecond X-ray pulses from a cascaded superradiant free-electron laser, Nature Photonics 18, 698 (2024), publisher: Nature Publishing Group.
Haynes et al. [2021] D. C. Haynes, M. Wurzer, A. Schletter, A. Al-Haddad, C. Blaga, C. Bostedt, J. Bozek, H. Bromberger, M. Bucher, A. Camper, S. Carron, R. Coffee, J. T. Costello, L. F. DiMauro, Y. Ding, K. Ferguson, I. Grguraš, W. Helml, M. C. Hoffmann, M. Ilchen, S. Jalas, N. M. Kabachnik, A. K. Kazansky, R. Kienberger, A. R. Maier, T. Maxwell, T. Mazza, M. Meyer, H. Park, J. Robinson, C. Roedig, H. Schlarb, R. Singla, F. Tellkamp, P. A. Walker, K. Zhang, G. Doumy, C. Behrens, and A. L. Cavalieri, Clocking Auger electrons, Nature Physics , 1 (2021), publisher: Nature Publishing Group.
Li et al. [2022] S. Li, T. Driver, P. Rosenberger, E. G. Champenois, J. Duris, A. Al-Haddad, V. Averbukh, J. C. T. Barnard, N. Berrah, C. Bostedt, P. H. Bucksbaum, R. N. Coffee, L. F. DiMauro, L. Fang, D. Garratt, A. Gatton, Z. Guo, G. Hartmann, D. Haxton, W. Helml, Z. Huang, A. C. LaForge, A. Kamalov, J. Knurr, M.-F. Lin, A. A. Lutman, J. P. MacArthur, J. P. Marangos, M. Nantel, A. Natan, R. Obaid, J. T. O’Neal, N. H. Shivaram, A. Schori, P. Walter, A. L. Wang, T. J. A. Wolf, Z. Zhang, M. F. Kling, A. Marinelli, and J. P. Cryan, Attosecond coherent electron motion in Auger-Meitner decay, Science 375, 285 (2022), publisher: American Association for the Advancement of Science.
Guo et al. [2024] Z. Guo, T. Driver, S. Beauvarlet, D. Cesar, J. Duris, P. L. Franz, O. Alexander, D. Bohler, C. Bostedt, V. Averbukh, X. Cheng, L. F. DiMauro, G. Doumy, R. Forbes, O. Gessner, J. M. Glownia, E. Isele, A. Kamalov, K. A. Larsen, S. Li, X. Li, M.-F. Lin, G. A. McCracken, R. Obaid, J. T. O’Neal, R. R. Robles, D. Rolles, M. Ruberti, A. Rudenko, D. S. Slaughter, N. S. Sudar, E. Thierstein, D. Tuthill, K. Ueda, E. Wang, A. L. Wang, J. Wang, T. Weber, T. J. A. Wolf, L. Young, Z. Zhang, P. H. Bucksbaum, J. P. Marangos, M. F. Kling, Z. Huang, P. Walter, L. Inhester, N. Berrah, J. P. Cryan, and A. Marinelli, Experimental demonstration of attosecond pump–probe spectroscopy with an X-ray free-electron laser, Nature Photonics 18, 691 (2024), publisher: Nature Publishing Group.
Maroju et al. [2020] P. K. Maroju, C. Grazioli, M. Di Fraia, M. Moioli, D. Ertel, H. Ahmadi, O. Plekan, P. Finetti, E. Allaria, L. Giannessi, G. De Ninno, C. Spezzani, G. Penco, S. Spampinati, A. Demidovich, M. B. Danailov, R. Borghes, G. Kourousias, C. E. Sanches Dos Reis, F. Billé, A. A. Lutman, R. J. Squibb, R. Feifel, P. Carpeggiani, M. Reduzzi, T. Mazza, M. Meyer, S. Bengtsson, N. Ibrakovic, E. R. Simpson, J. Mauritsson, T. Csizmadia, M. Dumergue, S. Kühn, H. Nandiga Gopalakrishna, D. You, K. Ueda, M. Labeye, J. E. Bækhøj, K. J. Schafer, E. V. Gryzlova, A. N. Grum-Grzhimailo, K. C. Prince, C. Callegari, and G. Sansone, Attosecond pulse shaping using a seeded free-electron laser, Nature 578, 386 (2020).
Maroju et al. [2023] P. K. Maroju, M. Di Fraia, O. Plekan, M. Bonanomi, B. Merzuk, D. Busto, I. Makos, M. Schmoll, R. Shah, P. R. Ribič, L. Giannessi, G. De Ninno, C. Spezzani, G. Penco, A. Demidovich, M. Danailov, M. Coreno, M. Zangrando, A. Simoncig, M. Manfredda, R. J. Squibb, R. Feifel, S. Bengtsson, E. R. Simpson, T. Csizmadia, M. Dumergue, S. Kühn, K. Ueda, J. Li, K. J. Schafer, F. Frassetto, L. Poletto, K. C. Prince, J. Mauritsson, C. Callegari, and G. Sansone, Attosecond coherent control of electronic wave packets in two-colour photoionization using a novel timing tool for seeded free-electron laser, Nature Photonics 17, 200 (2023), number: 2 Publisher: Nature Publishing Group.
Driver et al. [2024] T. Driver, M. Mountney, J. Wang, L. Ortmann, A. Al-Haddad, N. Berrah, C. Bostedt, E. G. Champenois, L. F. DiMauro, J. Duris, D. Garratt, J. M. Glownia, Z. Guo, D. Haxton, E. Isele, I. Ivanov, J. Ji, A. Kamalov, S. Li, M.-F. Lin, J. P. Marangos, R. Obaid, J. T. O’Neal, P. Rosenberger, N. H. Shivaram, A. L. Wang, P. Walter, T. J. A. Wolf, H. J. Wörner, Z. Zhang, P. H. Bucksbaum, M. F. Kling, A. S. Landsman, R. R. Lucchese, A. Emmanouilidou, A. Marinelli, and J. P. Cryan, Attosecond delays in X-ray molecular ionization, Nature 632, 762 (2024), publisher: Nature Publishing Group.
Wang et al. [2024] J. Wang, T. Driver, P. L. Franz, P. Kolorenč, E. Thierstein, R. R. Robles, E. Isele, Z. Guo, D. Cesar, O. Alexander, S. Beauvarlet, K. Borne, X. Cheng, L. F. DiMauro, J. Duris, J. M. Glownia, M. Graßl, P. Hockett, M. Hoffman, A. Kamalov, K. A. Larsen, S. Li, X. Li, M.-F. Lin, R. Obaid, P. Rosenberger, P. Walter, T. J. Wolf, J. P. Marangos, M. F. Kling, P. H. Bucksbaum, A. Marinelli, and J. P. Cryan, Probing electronic coherence between core-level vacancies at different atomic sites, Physical Review X (accepted) (2024).
Haynes et al. [2020] D. C. Haynes, M. Wurzer, A. Schletter, A. Al-Haddad, C. Blaga, C. Bostedt, J. Bozek, M. Bucher, A. Camper, S. Carron, R. Coffee, J. T. Costello, L. F. DiMauro, Y. Ding, K. Ferguson, I. Grguraš, W. Helml, M. C. Hoffmann, M. Ilchen, S. Jalas, N. M. Kabachnik, A. K. Kazansky, R. Kienberger, A. R. Maier, T. Maxwell, T. Mazza, M. Meyer, H. Park, J. S. Robinson, C. Roedig, H. Schlarb, R. Singla, F. Tellkamp, K. Zhang, G. Doumy, C. Behrens, and A. L. Cavalieri, Clocking Auger Electrons, arXiv:2003.10398 [physics] (2020), arXiv: 2003.10398.
Kitzler et al. [2002] M. Kitzler, N. Milosevic, A. Scrinzi, F. Krausz, and T. Brabec, Quantum Theory of Attosecond XUV Pulse Measurement by Laser Dressed Photoionization, Physical Review Letters 88, 173904 (2002).
Wolkow [1935] D. M. Wolkow, Über eine Klasse von Lösungen der Diracschen Gleichung, Zeitschrift für Physik 94, 250 (1935).
Li et al. [2018] S. Li, E. G. Champenois, R. Coffee, Z. Guo, K. Hegazy, A. Kamalov, A. Natan, J. O’Neal, T. Osipov, M. Owens, D. Ray, D. Rich, P. Walter, A. Marinelli, and J. P. Cryan, A co-axial velocity map imaging spectrometer for electrons, AIP Advances 8, 115308 (2018), publisher: American Institute of Physics.
Glownia et al. [2010] J. M. Glownia, J. Cryan, J. Andreasson, A. Belkacem, N. Berrah, C. I. Blaga, C. Bostedt, J. Bozek, L. F. DiMauro, L. Fang, J. Frisch, O. Gessner, M. Gühr, J. Hajdu, M. P. Hertlein, M. Hoener, G. Huang, O. Kornilov, J. P. Marangos, A. M. March, B. K. McFarland, H. Merdji, V. S. Petrovic, C. Raman, D. Ray, D. A. Reis, M. Trigo, J. L. White, W. White, R. Wilcox, L. Young, R. N. Coffee, and P. H. Bucksbaum, Time-resolved pump-probe experiments at the LCLS, Optics Express 18, 17620 (2010).
Kheifets et al. [2022] A. S. Kheifets, R. Wielian, V. V. Serov, I. A. Ivanov, A. L. Wang, A. Marinelli, and J. P. Cryan, Ionization phase retrieval by angular streaking from random shots of XUV radiation, Physical Review A 106, 033106 (2022), publisher: American Physical Society.
DeGroot and Schervish [2010] M. H. DeGroot and M. J. Schervish, Probability and Statistics, 4th ed. (Pearson, Upper Saddle River, NJ, 2010) p. 261.
Eckart and Young [1936] C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika 1, 211 (1936).
Johnson and Wichern [2007] R. A. Johnson and D. W. Wichern, in Applied Multivariate Statistical Analysis (Pearson, Upper Saddle River, NJ, 2007) 6th ed., pp. 407–410.
Walter et al. [2021] P. Walter, A. Kamalov, A. Gatton, T. Driver, D. Bhogadi, J.-C. Castagna, X. Cheng, H. Shi, R. Obaid, J. Cryan, W. Helml, M. Ilchen, and R. N. Coffee, Multi-resolution electron spectrometer array for future free-electron laser experiments, Journal of Synchrotron Radiation 28, 1364 (2021).
Petkovšek et al. [1996] M. Petkovšek, H. S. Wilf, and D. Zeilberger, A=B (A K Peters, Wellesley, MA, 1996) p. 38.