Quantifying power use in silicon photonic neural networks

Alexander N. Tait [email protected] Physical Measurement Laboratory, National Institute of Standards and Technology,
Boulder, CO 80305, USA

Abstract

Due to challenging efficiency limits facing conventional and unconventional electronic architectures, information processors based on photonics have attracted renewed interest. Research communities have yet to settle on definitive techniques to describe the performance of this class of information processors. Photonic systems are different from electronic ones, so the existing concepts of computer performance measurement cannot necessarily apply. In this manuscript, we attempt to quantify the power use of photonic neural networks with state-of-the-art and future hardware. We derive scaling laws, physical limits, and new platform performance metrics. We find that overall performance is regime-like, which means that energy efficiency characteristics of a photonic processor can be completely described by no less than seven performance numbers. The introduction of these analytical strategies provides a much needed foundation for quantitative roadmapping and commercial value assignment for silicon photonic neural networks.

^†^†preprint: APS/123-QED

I Introduction

The computational requirements and, therefore, energy expenditures of machine learning are so staggering that its prevalence is becoming a climate issue Strubell:19. In the pursuit of energy efficiency, massively distributed hardware has been developed Jouppi:17. These pursuits have come to include non-digital signaling Murmann:15a; Jain:20 and post-CMOS platforms Sengupta:16, including photonic platforms Shen:17; Prucnal:17. One of the value propositions of photonic neural networks and vector-matrix multipliers (VMM) is reduced energy use in performing linear operations.

It is essential to have a rigorous understanding of power scaling laws and limits in order to support that proposition. This understanding is foundational to system design and roadmapping, recognizing what the limiting factors are, and prioritizing technological developments that address those factors. In this manuscript, we study the energy consumption and computational efficiency of two silicon photonic neural network architectures Tait:14; Shen:17 described in Fig. 1. We attempt to answer the key questions of how efficient they are in the worst case, how efficient they could be, and which technologies have the greatest impact on getting there.

Energy efficiency of photonic neural networks has been studied in prior works in depth Nahmias:20; Totovic:20. Other works have proposed scaling laws that are only applicable to specific regimes Shen:17; Williamson:20; Tait:17; Hamerly:19. Some of these scaling laws have been extrapolated with gratuitous optimism to make predictions that idealized optical system can perform MACs for free in the limit of large matrices. We refute these predictions using energy conservation arguments. Large matrices are subject to different scaling laws that dictate that MAC efficiency approach finite values. This finding is true for both architectures analyzed: multiwavelength based on WDM weight banks and coherent based on Mach-Zehnder interferometers. These two architectures, despite distinct theories of operation (Fig. 1), are found to share several identical scaling laws.

Despite what might be seen as pessimistic performance predictions (relative to past work), we find that photonic neural networks and VMMs can be highly competitive compared to state-of-the-art electronics. That being said, improving energy efficiency is not the only value proposition of photonic neural networks. Their bandwidth and latency can enable new real-time applications that are unaddressable by foreseeable electronic processors. The scope of this manuscript is only power use, not intended to minimize the pivotal role of quantitative studies of bandwidth, latency, and real-time applications.

This manuscript takes a strategy of identifying invariant quantities that grant insight into the interplay of dominant power contributors. Total power use can be described as a sum of polynomials of the form $P\propto EN^{x}f^{y}z^{B}$ . $N$ is number of channels, $f$ is bandwidth, and $B$ is resolution in bits – these are termed functional parameters. Each polynomial term represents one power contributor. Power use ( $P$ ) is described by scaling polynomials ( $x$ , $y$ , $z$ ) and and energetic scaling coefficient ( $E$ ) that is invariant for each power contributor. The power contributors dominate in different regimes of ( $N$ , $f$ ). By deconstructing these regimes, this approach provides more detailed insight than approaches that simply calculates the overall power or approaches that do not account for all of the contributors.

The novel aspects of this manuscript can be organized by section. Section II includes the first quantification of expected power to counteract fabrication variation in a square matrix of microring resonator (MRR) weights. We propose a path to reduce this power by 4 orders-of-magnitude by pairing two technologies. Sec. III derives resolution-determined scaling laws and closed-form expressions of their energetic scaling coefficients. Prior works have studied physical sources of noise in a single neuron Lima:20 and network scaling behavior in an abstract sense Semenova:19. To the author’s knowledge, this is the first physics-based derivation of resolution concepts in any type of multi-channel photonic information hardware Bogaerts:20. Previously unrealized insights about noise in photonic information processing result, including a physical limit on MAC efficiency and a hard limit on bandwidth.

Sec. LABEL:sec:gain-limited analyzes gain-determined power as set by cascadability and/or digitization requirements. Arguments based only on energy conservation and functional generality results in a scaling law that refutes free-lunch notions presented in numerous prior works. Sec. LABEL:sec:optoelectronic-switching examines O/E/O transduction in analog neurons and proposes a role for photoelectric amplifiers. Scaling laws and energetic coefficients relating to noise, gain, and detection are found to be nearly identical between multiwavelength MRR-based (Fig. 1b) and coherent MZI-based (Fig. 1c) architectures for both photonic neural networks and VMMs. They are described in terms of the same set physical variables. These similarities are surprising because of the fundamental differences in how the architectures employ different properties of light. Section LABEL:sec:summary compares all of these contributors in terms of dominant regimes, in the process providing a roadmap for device technologies and a walkthrough of thought processes for future system design.

Refer to caption — Figure 1: Silicon photonic neural network architectures. Optical pumps and electrical inputs/outputs are shown. a) A vector-matrix operation central to neural interconnects. The weight matrix, $W$ , can be broken down into elements, $w_{ij}$ , or into two unitary ( $U$ , $V^{*}$ ) and one diagonal ( $\Sigma$ ) matrices. b) A multiwavelength broadcast-and-weight network Tait:14. Laser pumps are wavelength-division multiplexed (WDM). Each wavelength ( $\lambda_{1}\ldots\lambda_{N}$ ) is modulated by one electrical input, and all are broadcast. Microring weights in a grid layout each represent one element of the weight matrix. Balanced photodiodes detect the sum of each row. MRR color corresponds to the wavelength it acts upon. c) A silicon coherent nanophotonic network Shen:17. One laser pump provides power to every input beam. Each beam is modulated by one input. Tunable MZI meshes implement the unitary transforms. The singular, diagonal matrix corresponds to an element-wise multiplication, which is implemented by an array of attenuators. The optical output is converted back to the electrical domain. Depending on use case, the output can cascade to another layer (feedforward network, shown in dashed boxes), connect back to the original inputs (recurrent network), or merely be digitized and used elsewhere (vector-matrix multiplier). To function, both require some amount of optical power: $P_{N\text{pumps}}$ . Low-bandwidth electronics for weight configuration are not shown. MRR: microring modulator; MZI: Mach-Zehnder interferometer.

II Weight Control

Photonic weighted addition schemes depend on the ability to configure the transmission state of passive elements. All proposals for programming weights so far employ thermooptic tuning to achieve the necessary phase shifts. The required power breaks down into a static and a configurable component and is proportional to the number of weights. {IEEEeqnarray}rClP_wei &= N^2 ⋅(P_lock + P_conf) The static component is needed to lock MRR weights onto their resonances, counteracting fabrication variations. The configurable component is the power needed to tune the MRR on and off resonance in order to program the desired weight value. When MRR modulator neurons are used, these same principles apply to the neurons. We will leave out this contributor below because it scales linearly with number of neurons, so it will not contribute as much as weights.

II.1 Weight locking power

Weight locking power is the electrical power needed to bias a weight. MZI architectures do not need biasing because MZIs are wavelength independed. MRRs, on the other hand, must be held close to the on-resonance condition with a WDM carrier. Fabrication nonidealities result in a wide variability in fabricated resonant wavelength. The only tuning effects strong enough to counteract this variability are thermal. Thermal locking dissipates a large and static amount of heat on chip. In Ref. Narayana:17jrnl, it was found that, in some operating regimes of a photonic network-on-chip, static MRR heating accounts for up to 80% of total power. Static locking can dominate in simple communication links, such as in Ref. Zheng:14 (80% of total), but not always Timurdogan:14 (10^-4–23% depending on temperature).

Over the chip area, a given resonance can vary more than an FSR from fabrication target, given current fabrication abilities. At most one FSR of tuning range is needed to put one resonance onto a given wavelength target. The standard deviation of resonance offset is correlated with distance, such that nearby MRRs are likely to vary relatively little from one another as compared to their absolute variation. For resonators spaced by $r$ [mm], total standard deviation is {IEEEeqnarray}rClσ_[FSR](r) = σ_[FSR]^(0) + σ_[FSR]^(1) r where the subscript $[FSR]$ means in free spectral range units. Reference Chrostowski:14 measured these parameters for the IME A*STAR process to be $\sigma_{[FSR]}^{(0)}=0.050$ and $\sigma_{[FSR]}^{(1)}=0.060$ mm^-1 with the FSR at 7nm in that work. Variances can also be stated in wavelength units (denoted with subscript $[\lambda]$ ) by multiplying by the FSR in wavelength units.

We introduce a term $\Omega$ to indicate the expected value of resonant shift per MRR needed to bring a square array of MRRs onto resonance. {IEEEeqnarray}rClΩ(N) &= min[σ_[FSR](Nd), 12]
P_lock = K Ω(N) where $d$ is the MRR pitch, and $Nd$ is the side length of the square MRR array. $\Omega$ is in FSR units. $K$ is the tuning efficiency in mW per FSR units. The MRR pitch, $d$ , is taken here to be 20 $\mu$ m.

II.2 Weight configuration power

Weight configuration power is used to program the weight value. Applying heat tunes the MRR from on-resonance (weight –1) to slightly off-resonance (weight +1). Supposing that tuning over a full-width half maximum (FWHM) is required, this power is

{IEEEeqnarray}

rClP_conf = K2F where $\mathcal{F}$ is finesse. We state $K$ in FSR units, so finesse converts it to FWHM units. The factor of 2 results from averaging over the range of possible states from on-resonance to off-resonance by one FWHM. We can approximate finesse as roughly 100 for typical silicon MRRs, although optimized traveling wave resonators have achieved finesse up to 1140 Soltani:10. The vertical junction depletion modulators discussed below had a finesse of 277. Typical values for $K$ are given in 1.

The MRR resonance has a sharp wavelength dependence, meaning that – once locked – there is a small incremental power needed to configure the weight. If we are unable to control where the resonance falls as fabricated, then locking power will be greater than configuration power by a factor of the finesse.

MZI weight configuration

The opposite power balance is found in MZI mesh architectures. MZIs are less sensitive to fabrication variation, and they are correspondingly less sensitive to desirable tuning effects. MZI tuning power is quantified by $\pi$ -power, $P_{\pi}$ : the power needed for a thermal phase shifter to impart an optical phase shift of $\pi$ . In the above architecture from Shen:17, there are four phase shifters per matrix element whose average expected power is halfway between minimum phase ( $0$ ) and maximum phase ( $\pi$ ) states. {IEEEeqnarray}rClP_conf, MZI = 2 P_π, MZI MZI thermal $\pi$ -powers are on the order of of 10mW Annoni:17. Prior work on MZI meshes has calculated system power to be $\sim 1mW\cdot N$ Shen:17 or $\sim 100mW\cdot N$ Williamson:20, but then the neglected weight configuration power, which would severely dominate at $10mW\cdot N^{2}$ .

MZI meshes do not need static weight locking power; however, MZI configuration power and MRR locking power are both in the mW range. They stem from different needs. The first is due to an essential need to program the weights, and the second is due to fabrication non-ideality. This means that MZI meshes have a fundamental need for strong tuning effects, while MRR weights can reduce tuning power by addressing fabricated resonance variability.

II.3 Weight reconfiguration energy

Weight reconfiguration energy is additional energy needed to change weights on a fast timescale, which is distinct from weight tuning. Typically, fast tuning reqires non-thermal tuning, such as depletion modulators that do not draw continuous power. Reconfiguration energy is

{IEEEeqnarray}

rClP_reconfig = N^2 f_reconfig ⋅E_reconfig where $E_{reconfig}$ is the same as the energy-per-bit value when considering each tuning element as a modulator. The reconfiguration rate, $f_{reconfig}$ , is an expected, averaged rate that is less than the maximum reconfiguration bandwidth. $f_{reconfig}$ is highly application dependent.

In terms of technology, larger tuning elements usually have higher capacitance and consequently higher reconfiguration energy. At the same time, those devices with higher capacitances – or, in the case of MEMS, mechanical timeconstants – have lower maximum switching frequencies. The higher switching energies and longer switching times of large devices have opposing effects on this power contribution.

In many neural networks, the reconfiguration of weights happens at much slower timescales than signal timescales. In those cases, the power needed to change weights can generally be neglected. In other applications, such as general VMMs, weights must change at timescales similar to the signals. Since it is application-dependent, we largely leave reconfiguration energy from the rest of the analysis.

II.4 Foreseeable technology

Resonator locking metrics are listed in Table 1. We identify critical technologies that impact the locking power and configuration power: trench isolation for heaters Dong:10opex; Cunningham:10, photonic microelectronic mechanical systems (MEMS) ErrandoHerranz:20, resonator variability reduction Alipour:15, and/or interleaved junction modulators Timurdogan:13; Timurdogan:14. Thermal isolation trenches simply improve the thermal tuning efficiency by one order of magnitude. Several more orders of magnitude reduction could be realized with low-power, non-thermal tuning effects.

Tuning power can be reduced by approximately five orders-of-magnitude using one of two approaches. One – applicable only to MZIs – is using MEMS tuning. The MEMS approach would lead to weights that can be changed on the 100 kHz–10 MHz scale and would require a wet undercut etch sometime before metallization steps ErrandoHerranz:20. The other approach – applicable only to MRRs – is to combine low-power tuning with a variability reduction technique. This approach, introduced in Fig. 2, could be favorable because weak tuning devices are highly developed and already present on mainstream silicon photonic platforms.

Both MZIs and MRRs benefit from strong, low-power tuning effects. Low-power tuning technologies can be distinguished based on whether they provide a complete tuning range – $\geq\pi$ phase shift for MZIs, or $\geq$ FSR for MRRs. We will refer to them as strong vs. weak effects. On mainstream silicon photonics platforms, thermal tuning is the only strong effect; depletion-mode tuning with a lateral diode junction is a weak effect, not able to cover a complete tuning range. Barium titanate (BTO) is another promising candidate for ultralow-power Eltes:19 phase tuning despite the exotic processes needed to integrate its crystalline form with silicon photonics. BTO tuners are typically longer than thermal tuners, increasing propagation loss per neuron; however, recent work has shown that a 220 $\mu$ m shifter could be possible Abel:19 ( $V_{\pi}L/\Delta V=4.5/20=0.22$ mm). Strong, low-power tuning can also be achieved with MEMS ErrandoHerranz:20, waveguide structures that are suspended in air. MEMS phase shifters are released by a wet underetch. Although they are short and therefore low-loss, their drive mechanisms take up significant area that cannot be used for waveguide and metal routing. MEMS mechanical responses are faster than thermal at around 100 kHz–10 MHz ErrandoHerranz:20.

To illustrate the calculation of weak tuning threshold, we can consider variability values obtained by Alipour et al. Alipour:15 and tuning range values for a vertical junction microdisk obtained by Timurdogan et al. Timurdogan:14. The devices had similar geometries, fundamental modes, and FSRs, making them easier to compare and potentially compatible. In Timurdogan:14, a 1.1 V bias resulted in a 270 pm resonance shift and 0.7 $\mu$ A leakage current. Given the radius of 2.4 $\mu$ m, this leads to an extrapolated FSR efficiency of $K=0.13$ mW/FSR. The device survived a 680 pm shift, but efficiency degraded due to reverse leakage current. In Alipour:15, the microtoroids also had an FSR of 45 nm. Post-fabrication trimming (i.e. permanent parameter changes, applied after non-ideal devices are fabricated). Their initial resonance std. dev. of 290 pm was reduced to 25 pm. Furthermore, post-fabrication trimming removes any spatial correlation represented by $\sigma^{(1)}(r)$ . The conclusion is that this variation reduction technique crossed a threshold; it makes this weak tuning device viable for locking. The net result would be a five orders-of-magnitude reduction in expected locking power compared to thermal tuning. An important direction for device research will be demonstrating variability reduction together with depletion modulation in resonators.

Table 1: Platform values for weight configuration

Name	Variable	Value	Description
Variation	$\sigma_{[FSR]}^{(0)}$	0.050	Standard deviation of resonance offset between MRRs spaced close together (FSR-units) Chrostowski:14
Variation	$\sigma_{[FSR]}^{(0)}$	0.0055	Reduced MRR variability using trimming Alipour:15
Covariation	$\sigma_{[FSR]}^{(1)}$	0.060 mm^-1	Distance dependence of standard deviation resonance offset (FSR-units) Chrostowski:14
Covariation	$\sigma_{[FSR]}^{(1)}$	0 mm^-1	Reduced MRR variability using trimming Alipour:15
MRR tuning efficiency	$K$	28 mW/FSR	Embedded N-doped heater Jayatilleka:15opex
		2.4 mW/FSR	Trench etched Dong:10opex; Cunningham:10
		0.13 mW/FSR	Vertical junction depletion Timurdogan:14
MZI tuning efficiency	$P_{\pi}$	10 mW/ $\pi$	Baseline thermal phase shifter Khanna:15
		1.2 mW/ $\pi$	Trench etched Dong:10opex
		100 nW/ $\pi$	Barium titanate Eltes:19
		$<$ 100 nW/ $\pi$	MEMS phase shifter Edinger:19

III Signal Resolution

In this section, we consider the laser pump power needed to achieve a certain signal frequency, $f$ , and resolution, $B$ , in effective number of bits. The signal frequency means the bandwidth of the waveform modulating optical power envelopes, which encodes the values of analog variables. We extend upon analog photonic link theory from Marpaung:09 to derive analytical power use expressions and extend it to multiple channels. The analysis – besides relative intensity noise – applies identically to multiwavelength and coherent architectures because they are both based on power-encoded signals.

III.1 Analog photonic links

Further analysis rests on an understanding of laser power, resolution, gain, and nonlinearity in a single analog photonic link (APL) consisting of a modulator connected to a detector. Every photonic processor, including $N\times N$ weight matrices, has an electrical input to a modulator, a linear subsystem, and an electrical output from a photodetector. This analysis was performed by Marpaung in Ref. Marpaung:09. Since it is foundational, basic APL theory is rederived in Appendix LABEL:sec:analog_photonic_links.

A key feature of APLs is the existence of operating regimes where different sources of noise dominate the signal to noise raio, or, more precisely, the spurious-free dynamic range (SFDR). In the low power regime, thermal noise originating in the photodetector dominates because the received signal is weak. At the highest powers, relative intensity noise (RIN) from the laser dominates. These regimes and the net SFDR are plotted in Fig. LABEL:fig:sfdr_withAPD, derived in the appendix. Whereas Marpaung sought to maximize SFDR performance in a single-channel link, here, we are interested in energy efficiency of a multi-channel link. Subsequently, we extend APL theory to multi-channel architectures.

In digital systems, each added bit increases resolution in proportion, so the required energies are a polynomial function of bit resolution. On the other hand, the power needed to generate an analog signal scales exponentially with its bit resolution Bankman:15. In other words, the endeavor for energy efficiency rapidly becomes futile around 6 bits and then strictly non-viable around 10 bits. Striking this balance, we focus on the 2-6 bit regime of APLs. For reference, the TrueNorth neuromorphic electronic processor uses 4-bit weights Akopyan:15.

III.2 Single channel regimes

Appendix LABEL:sec:analog_photonic_links redervives SFDR as a function of optical pump power subject to three sources of noise: thermal, shot, and relative intensity. Each noise component are critial to consider because they scale differently and therefore dominate in different operating regimes. Here, we extend this analysis to arrive at coefficients relating pump power to frequency and effective bits. Typical values of these coefficients are found in Table 2.

Analog signal resolution is stated in terms of spurious-free dynamic range (SFDR), which, roughly speaking, is the ratio of maximum to minimum resolvable signals. Unlike for digital signals, analog resolution does not depend on the number of wires or serial bit slots; however, analog signal resolution can be stated in terms of an effective number of bits corresponding to an equivalent digital signal. The conversion of SFDR spectral density to bits is (from Eq. (III.2) and Eq. (LABEL:eq:sfdr-spectral-density))

{IEEEeqnarray}

rClB [bits] &= 110 log2SFDR [dB] - 10log(3/2)2
= SFDR - 1.766.02 where variables are defined in Appendix LABEL:sec:analog_photonic_links. The first term converts base 10 to base 2, and the factor of 2 comes from the fact that SFDR is an electrical power and resolution is measured in terms of voltage. The 1.76 arises due to fundamental quantization error. The SFDR from system parameters is derivid the Appendix. Here, we convert SFDR to effective bits and connect it to link power.

Thermal regime

Thermal noise is due to the random motions of electrons in the receiver circuitry. From Eq. (LABEL:eq:sfdr_thermal), the SFDR in the thermal regime is

{IEEEeqnarray}

rClSFDR [dB Hz^2/3] = 23 [&20logP_1pump …
+10log(RbkbTηnet2M2RPD24)] where variables are defined in the Appendix. This equation represents a ratio of signal to noise per unit of spectrum. Combining this thermal SFDR equation with the effective bits equation, Eq. (III.2), results in an expression of pump power needed for a given bit value

{IEEEeqnarray}

rCl10logP_1pump &= 302 log2 ⋅B + 304log(3/2) + 10logf2 …
- 102log(RbkbTηnet2M2RPD24) In linear units, this one-channel power expression is

{IEEEeqnarray}

rClP_1pump(B, f) &= f ⋅J*(B, Rb)ηnet
where J^*(B, R_b) ≡ 2^32B (32)^34 4 kbTRb1M RPD where we have introduced a new term, $J^{*}(B,R_{b})$ , that links power, frequency, and resolution in the thermal noise (a.k.a. Johnson-Nyquist noise) regime for a particular receiver impedance. The link loss, $\eta_{net}$ , is separated because it will later become a function of network size. $J^{*}$ has units of energy-per-root-frequency.

The impedance, $R_{b}$ , is an argument because it is a free design parameter. It can be designed to take on a wide range of resistance values, although its value is fixed at fabrication for a particular chip. When a network is meant to operate at a particular bandwidth, $f$ , there is an optimal design of the junction impedance such that it allows no more than the required signal bandwidth: $R_{b}=(2\pi fC_{pd})^{-1}$ . The capacitance is not a free design parameter, rather a circuit parasitic determined by the layer thicknesses and device sizes available on a particular fabrication platform. This means there is an additional relation for the optimal design:

{IEEEeqnarray}

rClE_thrm(B) &≡ J^*(B, R_b)—_R_b = (2πf C_pd)^-1
= 2^32B (32)^34 8πk_bTCpdM RPD where we have introduced a new term, $E_{thrm}(B)$ , describing the laser pump power needed to support an APL of a given frequency and resolution, supposing an optimal receiver design. Like $J^{*}$ , this term links power, frequency, and resolution in the thermal regime; unlike, $J^{*}$ , it has units of energy – hence our choice of the variable $E$ – resulting in an intuitive power relation: {IEEEeqnarray}rClP_1pump(B, f) &= f ⋅Ethrm(B)ηnet
where $P_{1\text{pump}}$ is the pump laser power for a single APL, and $\eta_{net}$ is the transmission efficiency of the APL.

This equation has several notable features that will carry through to the multi-channel system. Firstly, power scales exponentially with number of bits, which is characteristic of analog signaling. Strikingly, the power-resolution scaling rate of $15\log 2=4.5$ dB/bit is less than that of any analog electrical link: $20\log 2=6.0$ dB/bit. This difference is explained by the fact that output electrical signal power is the square of the received photocurrent and thus optical pump power. The quadratic relation between signal power and supply power also explains the square-root dependence on bandwidth in Eq. (III.2). For a fixed receiver resistance (Eq. (III.2)), the photonic system transmits more information per Joule as its bandwidth increases; however, for a resistance that varies optimally with operating bandwidth (Eq. (III.2)), the information per Joule does not vary. Finally, the APD gain, $M$ , plays a prominent role. We will discuss APDs as a key technology below.

Shot noise regime

Shot noise is due to the randomness in the detection times of quantized photons. From Eq. (LABEL:eq:sfdr_shot), the SFDR in the shot noise regime is

{IEEEeqnarray}

rClSFDR [dB Hz^2/3] = 23 [&10logP_1pump …
+10log(ηnetRPDq FA)] Combining again with Eq. (III.2), we arrive at power needed for a given bit value,

{IEEEeqnarray}

rCl10logP_1pump &= 30 log2 ⋅B + 302log(3/2) + 10 logf …
- 10 log(ηnetRPDq FA) In linear units, the equation is

{IEEEeqnarray}

rClP_1pump(B, f) &= f ⋅Eshot(B)ηnet
E_shot(B) ≡ 2^3 B (32)^32 q FARPD where we have introduced a new term, $E_{shot}$ ¹¹1 $E_{shot}(B)$ is always a function of bits, but we will sometimes drop the argument for brevity, referring to it as $E_{shot}$ . The same goes for $J^{*}$ , $E_{thrm}$ , and $F_{RIN}$ ., that links power, frequency, and resolution in the shot noise regime. All of these terms have a physical limit since $F_{A}$ is strictly greater than one, $\eta_{net}$ is strictly less than one, and $R_{PD}$ is strictly less than $hc/(\lambda q)$ , which is 1.26 A/W at 1550 nm.

Like in the thermal regime, received signal power increases with optical pump power squared, but, now, the noise component also increases with optical power. The result is a strong resolution scaling of $30\log 2=9.0$ dB/bit instead of $20\log 2=6.0$ dB/bit in analog electronics. Another notable feature of the expression for $E_{shot}$ is that APD gain does not appear explicitly. The excess noise, $F_{A}$ , increases with $M$ meaning that APDs strictly increase the power needed to achieve a given resolution in the shot noise regime.

RIN regime

RIN is due to random changes in the power output from carrier lasers. For the relative intensity noise (RIN) relation, we combine Eq. (III.2) and Eq. (LABEL:eq:sfdr_rin):

{IEEEeqnarray}

rCl&20log2 ⋅B + 10log(3/2) …
= 23 [- RIN - 10logF_A + 10log4 - 10 logf] There is no power in this expression, so we rearrange in terms of frequency. This is the maximum frequency that can be obtained at a given bit value.

{IEEEeqnarray}

rCl10 logf ≤&- 30 log2 ⋅B - 302log(3/2) …
- RIN - 10logF_A + 10log4 In linear units, it is

{IEEEeqnarray}

rClf &≤ F_RIN(B)
F_RIN(B) ≡ 2^-3B (23)^324FA 10^-RIN10 where we have defined a term $F_{\text{RIN}}(B)$ , the maximum viable bandwidth of an APL of a given resolution. Using typical values ( $M=1$ , $F_{A}=1$ , and $RIN=-155$ dB/Hz), that maximum bandwidth is approximately: $F_{\text{RIN}}(B)\approx 2^{-3B}\ \ 6.9\times 10^{15}\ \text{Hz}$ .

RIN imposes a hard limit on the ability of laser light to represent analog signals. This limit applies regardless of how signals are generated or detected, or how powerful the laser is. The resolution limit is 7.5 effective bits at 1 GHz and 5.3 bits at 100 GHz. Stated as a bandwidth limit: at 4 bits, the maximum frequency is 1.7 THz; at 6 bits, it is 26 GHz; at 8 bits, it is 410 MHz.

Table 2 calculates typical values for the metrics describing required laser power as limited by thermal, shot, and relative intensity noise. The first row pertains to a chip that is fabricated with 50 $\Omega$ junction resistor, while the second row uses optimally-designed junction resistors whose optimum value varies depends on the operating bandwidth. $E_{thrm}$ and $E_{shot}$ describe situations where more power is needed to support more bandwidth. To be invariant quantities, they therefore must have units of energy, even though it is not obvious how they correspond to a physical packet of light or electricity. These quantities are useful because they can be compared directly to energies of detection and digitization that might be present around an APL. As an example, $E_{shot}$ (physical limit) means that, for a 1 GHz system, the APL would require a minimum 0.96 $\mu$ W of optical power to support a 4-bit signal resolution.

Table 2: Laser pump power metrics for single-channel analog photonic links

Regime	Coefficient²²2 $R_{PD}=0.8$ A/W, $C_{pd}=35$ fF, $M=1$ , $T=300$ K, $\lambda=1550$ nm	$B=$ 2-bit	$B=$ 4-bit	$B=$ 6-bit	$B=$ 8-bit	Units
Thermal	$\left.J^{*}(B,R_{b})\right\|_{R_{b}=50\Omega}$	250 $\times 10^{-3}$	2.0	16	130	nW.Hz ${}^{-\frac{1}{2}}$
Thermal	$E_{thrm}$	820 $\times 10^{-3}$	6.5	52	420	fJ
Shot	$E_{shot}$ (typical)	24 $\times 10^{-3}$	1.5	96	6.2 $\times 10^{3}$	fJ
Shot	$E_{shot}$ (physical limit)	15 $\times 10^{-3}$	0.96	61	3.9 $\times 10^{3}$	fJ
RIN	$F_{\text{RIN}}$	110 $\times 10^{3}$	1.7 $\times 10^{3}$	26	0.41	GHz

III.3 Multiple channels

In photonic neural networks and vector-matrix multiplication (VMM), each input channel has a corresponding laser and modulator. Each signal must have the potential to fan-out to the different outputs, whether fan-out occurs in a broadcast splitter (multiwavelength architecture) or within a MZI mesh (coherent architecture). By energy conservation, fan-out carries an attenuation factor of $1/N$ Goodman:1985. Fan-out in MZI architectures is revisited in more detail in Sec. LABEL:sec:network-power. At the same time, analog summation means that output signal power can recover some of this fan-out attenuation, leading to an apparent fan-in gain. Fan-in gain is dependent on the signals’ cross-correlation, so we must introduce a term to quantify cases of signal correlation.

Fan-in with correlated signals

As a result of additive fan-in, the root-mean-squared (RMS) power of the electrical output signal depends on the values of the inputs, meaning that SFDR and resolution become signal-dependent. Every situation lies somewhere on the continuum between these three cases. These cases are (worst- or singular case): all received signals, after weighting, are zero except for one, (uncorrelated-case): all inputs have the same RMS and are statistically independent, (best- or identical case): all inputs are the same. The three special cases are illustrated in Fig. 3. The effects of fan-out and fan-in can be stated as modifications to the total received photocurrent $I_{rec}$ from Eq. (LABEL:eq:i_rec). We use $\left.I_{rec}\right|_{N=1}$ to indicate the baseline value.

{IEEEeqnarray}

rClll RMS(I_rec(N)) &= N^-1 I_rec—_N=1 (singular case)
= N^-1/2 I_rec—_N=1 (uncorrelated case)
= I_rec—_N=1 (identical case) where the output signal amplitude is a statistical value given by

{IEEEeqnarray}

rClRMS(I_rec) &≡ ¡(I_rec - ¡I_rec¿)^2¿ We can introduce a similarity variable, $s$ , to cover these cases such that $s=0$ is singular, $s=.5$ is uncorrelated, and $s=1$ is identical³³3The $s$ variable describes a concept similar to that described by the $\rho$ variable introduced in Agarwal:16 and used in Nahmias:20. In the case referred to as “fixed output precision, only positive inputs/weights” (Agarwal:16, Table 1), there is an embedded, unstated assumption that all signals must be identical, which is always a trivial computation.. In general,

{IEEEeqnarray}

rClI_rec(N) &= N^s- 1 I_rec—_N=1 where $s$ can take on continuous values between 0 and 1, between the extreme cases shown in Fig. 3. The exact value of $s$ depends on the situation, specifically, on the cross-correlation of input signals and the weight matrix. We leave its general expression for further work.

The multichannel modification to average received photocurrent has effects both on signal and noise. In all noise regimes, adding channels increases average photocurrent and therefore maximum signal amplitude. Noise amplitude can grow at the same rate or slower than the signal amplitude, depending on the type of noise.

Thermal regime

No matter how many channels are present, there is still only one photodetector per channel, so thermal noise does not depend on $N$ . Correlation-dependent fan-in only improves SFDR. We can calculate this effect by substituting the new $I_{rec}$ in Eqs. (LABEL:eq:oip_vs_irec), (LABEL:eq:full-sfdr), and (LABEL:eq:p_thrm), and so on to reach a new version of Eq. (III.2).

{IEEEeqnarray}

rClP_1pump(N, f, B) = N^1-s P_1pump(1, f, B) (thermal) That means, to maintain the SFDR and thus resolution, in the worst case, the laser power must increase in proportion to fan-out, $N$ . In the uncorrelated signal case, it must increase by only $\sqrt{N}$ .

In a $N\times N$ photonic network, total laser pump power carries an additional factor of $N$ to provide power to all of the input channels. The total laser power required by the entire network is thus

{IEEEeqnarray}

rClP_Npumps, thrm-limit &= N^(2-s) f ⋅Ethrm(B)ηnet The result depends significantly on the signal correlation variable, $s$ . The result is that thermal noise-limited system power in a situation-dependent continuum somewhere between activity-proportional (best case) and MAC-proportional (worst case).

Shot regime

Shot noise depends on the total received power regardless of its wavelength, so shot noise is not independent of $N$ . From Eq. (LABEL:eq:p_shot), we see that shot noise power is proportional to $I_{rec}$ , which means that it scales as

{IEEEeqnarray}

rClp_shot(N) &= N^s- 1 p_shot—_N=1 Making a similar rearrangement to get needed power for a single-channel APL and an $N\times N$ network,

{IEEEeqnarray}

rClP_1pump(N, f, B) &= N^1-s/2 P_1pump(1, f, B) (shot)
P_Npumps, shot-limit = N