This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optical neural network architecture for deep learning with the temporal synthetic dimension

Bo Peng1,†, Shuo Yan1,†, Dali Cheng2, Danying Yu1, Zhanwei Liu1, Vladislav V. Yakovlev3, Luqi Yuan1,∗, and Xianfeng Chen1,4,5 1State Key Laboratory of Advanced Optical Communication Systems and Networks, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai 200240, China
2Ginzton Laboratory and Department of Electrical Engineering, Stanford University,
Stanford,CA,49305, USA
3Texas A&M University, College Station, Texas 77843, USA
4Shanghai Research Center for Quantum Sciences, Shanghai 201315, China
5Collaborative Innovation Center of Light Manipulation and Applications, Shandong Normal University, Jinan 250358, China
These authors contribute equally to this work. Corresponding author: [email protected]
Abstract

The physical concept of synthetic dimensions has recently been introduced into optics. The fundamental physics and applications are not yet fully understood, and this report explores an approach to optical neural networks using synthetic dimension in time domain, by theoretically proposing to utilize a single resonator network, where the arrival times of optical pulses are interconnected to construct a temporal synthetic dimension. The set of pulses in each roundtrip therefore provides the sites in each layer in the optical neural network, and can be linearly transformed with splitters and delay lines, including the phase modulators, when pulses circulate inside the network. Such linear transformation can be arbitrarily controlled by applied modulation phases, which serve as the building block of the neural network together with a nonlinear component for pulses. We validate the functionality of the proposed optical neural network for the deep learning purpose with examples handwritten digit recognition and optical pulse train distribution classification problems. This proof of principle computational work explores the new concept of developing a photonics-based machine learning in a single ring network using synthetic dimensions, which allows flexibility and easiness of reconfiguration with complex functionality in achieving desired optical tasks.

PACS: 42.15.Eq; 42.30.Lr; 42.79.Sz; 42.79.Ta

I Introduction

Optical neural networks (ONN) are under extensive studies recently with an ultimate goal of achieving machining learning in a photonic system rosenbluth09 ; tait14 ; shenNP17 ; tait17 ; linS18 ; yingOL18 ; feldmann19 ; zuoO19 ; hamerly19 ; khoram19 ; zhang19newa ; zhang21newa . Recent advancements have revealed that ONN exhibits important computation capability with photonic tools nahmias20 ; wetzstein20 ; bogaerts20 ; xuN21 ; feldmann21 and training optical fields for some specific optimization purposes jiangNRM20 . On the other hand, realizations of ONN on different platforms also attract great interest from theoretical and computational perspectives. For example, training ONN through in situ back propagation hughes18 ; zhouPR20 and quantum ONN can conduct the non-classical tasks stenbrecher20 . In addition, the recurrent neural network I. Goodfellow 17 ; K. Yao11 ; G. Dorffner13 ; M. Husken12 ; J. T. Connor14 , as an important machine learning model, has been studied with the optical-based technologies by Hugeyyy . Nevertheless, it has been found that most of ONN designs depend on the number of photonic devices in each layer as well as the total layer number, which makes an ONN system require N2N^{2} photonic devices with tunable externally controlled components and makes its practical implementation rather complex and lacks the freedom and options for further reconfiguration and miniaturization nahmias20 ; wetzstein20 ; bogaerts20 . It is therefore important to investigate alternative photonic ONN design architectures, which can potentially offer enough freedom towards arbitrary functionality. Thus, it is essential to explore novel physical principles, and the approach based on synthetic dimensions offers an intriguing opportunity to overcome some of the existing challenges and limitations.

Synthetic dimension is a rapidly-arising concept in photonics which facilitates utilization of different degrees of freedom of light to simplify experimental arrangements and get the most out of those yuanoptica18 ; ozawaNRP19 ; Topological photonics ; Chinese Optics Letters ; APL photonics 6 . Recently, it has been suggested that ONN with synthetic dimensions can potentially provide simpler design of the ONN to achieve a complicated functionality pankov19 ; buddhiraju20 ; linArxiv20 . However, the proper implementation of those appeared to be challenging. In this report, we investigate the time-multiplexed architecture using temporal information regensburger11 ; regensburger12 ; wimmer13 ; marandi14 ; wimmer17 ; chenPRL18 that has been demonstrated a highly promising way for optical computations such as coherent Ising machines marandi14 , photonic reservoir computing 23newlarger17 , and ONN with synthetic nonlinear lattices arxiv9-8Aus .

In this work, we introduce and validate through computational experiments a new paradigm to achieve the optical neural network in a single resonator network, with the temporal synthetic dimension constructed by connecting different temporal positions of pulses with pairs of delay lines. Different from pioneering works in Refs. pankov19 ; arxiv9-8Aus that propose ONN with synthetic lattices in coupled rings, the proposed approach here offers an alternative solution to the ONN problem in a single ring. The optical resonator network with reconfigurable couplings between different arrival times (i.e., temporal positions) of optical pulses supports time-multiplexed lattice marandi14 and creates the temporal synthetic dimension. With controllable splitters and phase modulators used to build desired connections between pulses, we show the way of constructing multiple layers of ONN in a single resonator (see Fig. 1(a)). A nonlinear operation is used to perform complex modulations which are being controlled by external signals with the aid of a computer. As validations for the deep-learning functionality, we perform the training of the proposed platform for ONN with the training data set of MNIST handwritten digit database with appropriate noises considered zzzmnist . The striking feature of our ONN is that it needs only one resonator but gives arbitrary size of layers in the network, which makes our system unlimited in total layer (roundtrip) number with high reconfigurability. Moreover, this single resonator network is capable of conducting arbitrary optical tasks, after performing the proper training. For example, we conduct a pulse train classification problem, which recognizes different distributions of pulse trains. Our work hence points out a concept for realizing the ONN with synthetic dimensions, which is highly scalable and therefore gives the extra freedom for further simplification of the setup with possible reconfiguration.

II Model

We start considering a resonator composed of the main cavity loop of the waveguide [see Fig. 1(a)]. By neglecting the group velocity dispersion of the waveguide, we assume there are NN optical pulses simultaneously propagating inside the loop, and every two nearby pulses is temporally separated by a fixed time Δt\Delta t. Each pulse is labelled by its temporal position tnt_{n} (or arrival time, with tn+1tn=Δtt_{n+1}-t_{n}=\Delta t) marandi14 , and we use n=1,,Nn=1,...,N to denote each pulse at different temporal positions.

Refer to caption
Figure 1: (a) The schematic of the single resonator network with two delay lines in purple and green respectively. CO: Combiner, SP: Splitter, PM: Phase modulator, NC: Nonlinear component. AA denotes to the field amplitude while BB denotes output amplitude defined in Eq. 1. (b) The connectivity of the synthetic photonic lattice along the temporal dimension (nn-axis) implemented in (a) for pulses evolving after roundtrips (mm). A number of NN pulses in each roundtrip (shown in circles) is considered and the pulses evolve for MM roundtrips in total, which therefore constructs the ONN with MM layers and NN neurons sites in each layer. Green, black, and purple arrows correspond to different optical branches of delay lines in (a).

To construct the temporal synthetic dimension, we add a pair of delay lines, which are connected with the main loop through splitters and couplers. Each splitter is controlled by parameter ϕ1(2)\phi_{1(2)}, which determines that a portion of the pulse with the amplitude cosϕ1(2)\cos\phi_{1(2)} remains in the main loop while the rest of the pulse with the amplitude isinϕ1(2)i\sin\phi_{1(2)} gets into the delay line regensburger11 ; regensburger12 . Lengths of delay lines are carefully designed. For the pulse at the temporal position nn propagating through the shorter delay line, it combines into the main loop at a time Δt\Delta t ahead of its original arrival time tnt_{n} and contributes to the pulse at the time tn1=tnΔtt_{n-1}=t_{n}-\Delta t, i.e., Δn=1\Delta n=-1. On the other hand, for the pulse propagating through the longer delay line, it combines into the main loop at a time Δt\Delta t behind tnt_{n} and contributes to the pulse at tn+1=tn+Δtt_{n+1}=t_{n}+\Delta t, i.e., Δn=+1\Delta n=+1. Such a design constructs the temporal synthetic dimension [see Fig. 1(b)], where the nn-th pulse during the mm-th roundtrip with the amplitude A(n,m)A(n,m) (in a unit of a reference amplitude A0A_{0}) is connected to its nearest neighbor sites in the temporal synthetic lattice after each roundtrip. The boundary of this lattice can be created by further introducing the intracavity intensity modulator to suppress unwanted pulses in the main loop leefmans .

We place phase modulators inside the main loop as well as two delay lines. Each phase modulator is controlled by external voltage and adds a modulation phase θi\theta_{i} (i=1,2,3i=1,2,3) for the pulse propagating through it marandi14 ; leefmans . Moreover, we use the complex modulator as the nonlinear component, which can convert the input pulse to an output pulse with a complex nonlinear function. In such ONN, parameters ϕi\phi_{i} and θi\theta_{i} can be precisely controlled at any time, meaning that one can manipulate ϕi\phi_{i} and θi\theta_{i} for each pulse nn at each roundtrip number mm.

In this temporal synthetic lattice, the propagation process of pulses in each single roundtrip can compose the linear transformation, described by regensburger11 ; regensburger12

B(n,m)=A(n,m)cosϕ1(n,m)cosϕ2(n,m)eiθ2(n,m)B(n,m)=A(n,m)\cos\phi_{1}(n,m)\cos\phi_{2}(n,m)e^{i\theta_{2}(n,m)}
iA(n+1,m)sinϕ2(n+1,m)eiθ3(n+1,m)-iA(n+1,m)\sin\phi_{2}(n+1,m)e^{i\theta_{3}(n+1,m)}
iA(n1,m)cosϕ2(n1,m)sinϕ1(n1,m)eiθ1(n1,m),-iA(n-1,m)\cos\phi_{2}(n-1,m)\sin\phi_{1}(n-1,m)e^{i\theta_{1}(n-1,m)}, (1)

where B(n,m)B(n,m) denotes output amplitudes for the set of pulses after the linear transformation. A very small portion of pulses are dropped out and collected by detectors, which are stored in the computer for the further analysis. The pulses then pass the nonlinear component where we use a formula has similar formula as a saturable absorber baoNR11 ; chengIEEE14 but with amplitudes, so a complex nonlinear operation is performed

2B(n,m)(1Tn,m)/A0=ln(Tn,m),2B(n,m)(1-T_{n,m})/A_{0}=\mathrm{ln}(T_{n,m}), (2)
A(n,m+1)=B(n,m)Tn,m,A(n,m+1)=B(n,m)T_{n,m}, (3)

For a given input pulse B(n,m)B(n,m), the nonlinear coefficient Tn,mT_{n,m} can be calculated in the computer with Eq. (2), and then appropriate external signal is applied to the complex modulator nonlinearnew3 so the output pulse after the nonlinear component follows Eq. (3), which turns out to be the input pulse A(n,m+1)A(n,m+1) for the next layer (the next roundtrip). We find that this particular choice of the complex nonlinear function works extremely well, compared to regular real nonlinear activation functions such as sigmoid function or hyperbolic tangent function.

Refer to caption
Figure 2: (a) Schematic of the architecture of an optical neural network. A1A_{1} is the vector of pulses imported in the first layer when training starts. AmA_{m}: vector of the output pulses after the (m1)(m-1)-th roundtrip (layer), which is also the input vector for the mm-th roundtrip (layer); WmW_{m}: matrix for the linear transformation during the mm-th roundtrip (layer); BmB_{m}: vector of pulses after the linear transformation during the mm-th roundtrip (layer); fmf_{m}: nonlinear activation operation; fmf_{m}^{\prime}: derivative of fmf_{m} during back propagation. CC is the cost function for the output signal. (b) Illustration of the signal flow through roundtrips (layers) in the resonator in Fig. 1(a).

Fig. 2 summarizes the forward transmissions with linear transformations and nonlinear operations on pulses. Theoretically, the total number of layers, MM as well as the total pulse number NN, can be arbitrary. In Fig. 2, we use WmW_{m} to define the linear transformation in Eq. (1) and fmf_{m} to define the nonlinear operation in Eqs. (2) and (3) for the mm-th roundtrip. Hence the forward transmission at each layer mm follows Bm=WmAmB_{m}=W_{m}A_{m} and Am+1=fmBmA_{m+1}=f_{m}B_{m}, where AmA_{m} and BmB_{m} are vectors of A(n,m)A(n,m) and B(n,m)B(n,m), respectively. Pulse information A(n,m+1)A(n,m+1) (B(n,m)B(n,m)) after (before) the nonlinear operation at the nn-th temporal position during the mm-th roundtrip is collected by dropping a small portion of pulses out of the resonator network into detectors. Such information of AmA_{m} and BmB_{m} is stored in the computer for further backward propagation in training the ONN.

Once the forward propagation is finished after MM roundtrips in the optical resonator network, the backward propagation can be performed in the computer following the standard procedure to correct control parameters hughes18 ; bengio09 , which is briefly summarized here. The backward propagation equations read hughes18 ; bengio09 :

B~m=Bm+fm(Am+1A~m+1),\tilde{B}_{m}=B_{m}+f^{\prime}_{m}(A_{m+1}-\tilde{A}_{m+1}), (4)
A~m=WmTB~,\tilde{A}_{m}=W_{m}^{T}\tilde{B}_{,} (5)

A~m\tilde{A}_{m} and B~m\tilde{B}_{m} are vectors at the mm-th layer, calculated through the back propagation from the stored information of Am+1A_{m+1} and BmB_{m}. Here fmf^{\prime}_{m} is the derivative of the nonlinear operation at the mm-th layer in Eq. (4), WmTW_{m}^{T} is the inverse of WmW_{m}, and A~M+1\tilde{A}_{M+1} is the target vector AtargetA_{\mathrm{target}}, which is the expected output vector of the training set. The cost function after the mm-th layer can therefore be calculated as:

Cm=12Ni=1N|A(i,m+1)A~(i,m+1)|2.{C_{m}=\frac{1}{2N}\sum_{i=1}^{N}|A(i,m+1)-\tilde{A}(i,m+1)|^{2}}.\\ (6)

Throughout the backward propagation, optical controlling parameters ϕ1(n,m)\phi_{1}(n,m), ϕ2(n,m)\phi_{2}(n,m), θ1(n,m)\theta_{1}(n,m), θ2(n,m)\theta_{2}(n,m), and θ3(n,m)\theta_{3}(n,m) can be trained by calculating the derivative of CmC_{m} with respect to these parameters, i.e.,

Cmϕ1,2(n,m)=[(Am+1A~m+1)]TfmWTϕ1,2(n,m)Am,\frac{\partial C_{m}}{\partial\phi_{1,2}(n,m)}=[(A_{m+1}-\tilde{A}_{m+1})]^{T}\bigodot f^{\prime}_{m}\cdot\frac{\partial W^{T}}{\partial\phi_{1,2}(n,m)}\cdot A_{m}, (7)
Cmθ1,2,3(n,m)=[(Am+1A~m+1)]TfmWTθ1,2,3(n,m)Am,\frac{\partial C_{m}}{\partial\theta_{1,2,3}(n,m)}=[(A_{m+1}-\tilde{A}_{m+1})]^{T}\bigodot f^{\prime}_{m}\cdot\frac{\partial W^{T}}{\partial\theta_{1,2,3}(n,m)}\cdot A_{m}, (8)

\bigodot is the vector multiplication, with c=ab\textbf{c}=\textbf{a}\bigodot\textbf{b} defined as cn=anbnc_{n}=a_{n}b_{n}. We can obtain the corrections of parameters as bengio09 :

Δϕ1,2(n,m)=aCmϕ1,2(n,m),\Delta\phi_{1,2}(n,m)=-a\frac{\partial C_{m}}{\partial\phi_{1,2}(n,m)}, (9)
Δθ1,2,3(n,m)=aCmθ1,2,3(n,m),\Delta\theta_{1,2,3}(n,m)=-a\frac{\partial C_{m}}{\partial\theta_{1,2,3}(n,m)}, (10)

where aa is learning rate for this training. Then ϕ1,2(n,m)\phi_{1,2}(n,m) becomes ϕ1,2(n,m)+Δϕ1,2(n,m)\phi_{1,2}(n,m)+\Delta\phi_{1,2}(n,m) and θ1,2,3(n,m)\theta_{1,2,3}(n,m) becomes θ1,2,3(n,m)+Δθ1,2,3(n,m)\theta_{1,2,3}(n,m)+\Delta\theta_{1,2,3}(n,m). Following the backward propagation procedure summarized above, the parameters for controlling the forward propagation of each pulse at the nn-th temporal position for the mm-th roundtrip are updated backwardly from the MM-th layer to the 11-st layer.

Having the entire procedure in hand, one can train ONN with a training set of data to prepare the ONN ready for doing the designed all-optical computation with optical pulses in this single resonator network.

III Results

III.1 Handwritten digit recognition

To show the validity and reliability of our proposed ONN, we consider a MNIST handwritten digit recognition problem as commonly used for ONN zzzmnist , with noises included. The MNIST data set is chosen from the classic data set in the field of machine learning. It consists of 60000 training samples and 10000 test samples. Each sample is a 28 * 28 pixel grayscale handwritten digital picture, representing a number from 0 to 9. Some typical visualization legends are given in Fig. 3.

Refer to caption
Figure 3: Typical visualization legends from the MNIST dataset.
Refer to caption
Figure 4: Relative cost functions defined in Eq. (6) versus the training iteration number during the training process or (a) training set, and (b) test set, respectively, for the Hand written recognition problem.

In simulations, we use 4949 pulses (N=49N=49) and 4545 roundtrips (M=45M=45), with learning rate a=0.001a=0.001. For the simplicity purpose, we pre-process the original MNIST handwritten digit database zzzmnist , where each input data supposes to have an array of 784784 elements, with the maximum pooling twice pooling , so the input data can be mapped on 4949 input pulses in our ONN architecture. Moreover, after the final roundtrip in the single resonator network, we add another full connection layer between collected signals from 4949 pulses and 1010 additional output sites, which shall be assisted by the computer. In this full connection layer on the computer, a=0.02a=0.02 and we use the sigmoid nonlinear function as the activation function. In Fig. 4, we plot the normalized cost function, defined in Eq. (6), for training set and test set versus the computation iteration number, respectively. Such cost function based on the mean square error has been used in the literature for classification problems newRaudys ; newLehtokangas ; newSaleem ; newSebastiani . Both cost functions decreases as the iteration number increases. Therefore, the ONN in the temporal synthetic dimension works fine for the handwritten digit recognition problem. We emphasize that the pre-processing and the additional full connection layer make this model less competitive with previous ONN previous onn2 ; linS18 ; feldmann19 , this simulation is only for the purpose of demonstrating the validity of our ONN and the stability with certain noises. To this end, in the test set, we add random noises on the 4949 input pulses with their amplitudes multiplied by 1+Rδ/21+R\cdot\delta/2, where R(0.5,0.5)R\in(-0.5,0.5) is a random number and δ\delta denotes amplitude of noises, where we choose δ=0,2%,4%,6%,8%,10%\delta=0,2\%,4\%,6\%,8\%,10\%, respectively. 6000060000 sets of training data and 1000010000 sets of test data are used for simulations. After training, noises with δ\delta are appended into the ONN to do the test. We list errors of prediction in Table 1. One can see that the error of prediction in our ONN architecture is 21.1%21.1\% if there is no noise in input pulses from the test set. However, when we add noises into the system, the error increases up to 29.7%29.7\% for δ=10%\delta=10\%. Small noises may be tolerated in this proposed ONN architecture. However large noises could affect the performance of the system, which might need further improve in the future. Although the effects of noises in our proposed ONN architecture are difficult to be compared with those in other ONN systems due to the very different design associated to synthetic dimensions, typical experiments with time-multiplexed architecture can be done with small noises marandi14 .

Table 1: Errors of prediction for handwritten digit recognition with different noises.
δ\delta in test set (%) 0 2 4 6 8 10
error of prediction (%) 21.1 24.6 25.8 27.1 28.0 29.7

III.2 Optical pulse train distribution classification

We have demonstrated the validity of the proposed ONN. One of the key importance of this proposal is to provide a possible trained optical network to act a certain photonic functionality intelligently. As a simple proof-in-principle verification, we perform a home-made optical pulse train distribution classification problem.

Our goal is to train an optical neural network to recognize five different profiles of optical pulse trains composed by 101101 pulses, where shapes of five profiles are chosen as sinusoidal functions as sin(kπti/T)\sin(k\pi t_{i}/T) for pulse at temporal position tit_{i}, with T=100ΔtT=100\Delta t and k=1,2,3,4,5k=1,2,3,4,5 labelling five profiles respectively. For both training and test procedures, each pulse is interrupted with noises. 3000030000 training sets and 50005000 test sets are constructed in simulations. Similar noise is used so pulse is modified in amplitude by 1+R1(2)δ1(2)1+R_{1(2)}\delta_{1(2)}, where δ1(2)\delta_{1(2)} is amplitude of noises in the training (test) sets and R1(2)(0.5,0.5)R_{1(2)}\in(-0.5,0.5) is a random number. The choice of δ1(2)\delta_{1(2)} is listed in Table 2.

During simulations, 101101 pulses (N=101N=101) and 3131 roundtrips (M=31M=31) are chosen for the ONN, and after the final roundtrip, another full connection layer between 101101 pulses and 5 output sites is used for predictions. For the training procedure, the learning rate aa is 0.0010.001 for δ1=0\delta_{1}=0, 0.00170.0017 for δ1=2%\delta_{1}=2\%, 0.0210.021 for δ1=4%\delta_{1}=4\%, and 0.0110.011 for δ1=6%\delta_{1}=6\%. The choice of δ1=0\delta_{1}=0 results in the invalidation due to lack of data type (all training sets having same labels are identical), while the noise amplitude δ1=6%\delta_{1}=6\% induces the complexity caused by high volatility and instability of our data. Therefore, for these two training procedures, the test result has relatively high errors of prediction. Nevertheless, one can see from Table 2 that, for noise amplitudes δ1=2%\delta_{1}=2\% and 4%4\% in the training procedures, the errors of prediction in the test procedures show relatively good results (error of prediction 30%\lesssim 30\%), even for high noise amplitudes δ2=14%\delta_{2}=14\% in the test procedures.

Table 2: Errors of prediction for optical pulse train distribution classification problems.
       δ2\delta_{2} (%) δ1\delta_{1} (%)       0       2       4       6
      0       1.1       16.6       24.4       29.8
      2       43.7       18.7       26.6       30.8
      4       68.9       19.9       26.7       30.9
      14       75.4       30.7       29.4       32.4

The training process with zero noise of pulses in the training set is invalidated due to the monotonicity of the data set. However, in the case of low noise amplitudes of pulses in the training set, our ONN system shows a relatively stable prediction for optical pulse train distribution classifications. Furthermore, as another important feature, one can see that, for a larger noise amplitude δ1\delta_{1} in the training set (for example, compare δ1=2%\delta_{1}=2\% and 4%4\%), although it gives larger errors for smaller noise amplitudes δ2\delta_{2} in the test set, one can obtain a smaller error (such as 30.7%30.7\% and 29.4%29.4\%) for the relative large noise δ2=14%\delta_{2}=14\%. The example therefore shows the capability of our proposed ONN architecture in performing direct optical processing.

IV Discussion and Summary

The proposed platform is experimentally feasible with the state-of-the-art photonic technology. The fiber ring resonator with kilometer-long roundtrip length can be constructed with hundreds of temporal separated pulses circulating inside the resonator marandi14 ; leefmans . In particular, Ref. leefmans shows the capability for constructing a 64 time-multiplexed optical resonant sites with pulses produced by an input 1550 nm mode-locked laser, separated by 4 ns, which points out an excellent possible experimental platform for realizing our theoretical proposal. Moreover, this proposal for achieving the temporal synthetic dimension can also be realized in a resonator with the free-space optics chenPRL18 . In both setups, delay lines (channels) are used to create the nearest-neighbor couplings along the temporal synthetic dimension. Moreover, appropriate delay lines (channels) can also connect pulses at time separations with double, triple, and/or high-order Δt\Delta t, i.e., providing the long-range couplings. It therefore holds the possibility for generating more than three connectivities between sites in two layers in Fig. 1(b), which might be possible to further increase the accuracy of the ONN. These delay lines may induce small errors, but as one sees in Table 1 and Table 2, the synthetic ONN can tolerate small noises. The current nonlinear function in the proposal is performed in the computer. However, it is possible to consider nonlinear component operated by amplitude and phase modulations nonlinearnew3 ; nonlinearnew1 ; nonlinearnew2 or other nonlinear components arxiv9-8Aus ; IEEE-William , which can perform alternative different complex nonlinear functions in optics. One notices that, in the proposed approach, the back propagation in the training process is conducted with a computer and then obtained optimal parameters are transferred to the physical system. Such ex-situ training might bring extra errors, but is currently a reasonable strategy utilized in recent experiments for demonstrating ONN functionality shenNP17 ; linS18 ; nahmias20 ; feldmann19 . Ref. hughes18 suggests a possible way to realize in-situ backward propagation in optical systems, which may greatly improve speed in ONN. The inclusion of such in-situ backward propagation in our proposed ONN could be of interest for the future study.

In summary, we propose a novel paradigm to achieve the ONN in a single resonator network. The proposed approach is based on a physical concept of the temporal synthetic dimension. As the proof of principle, we study the MNIST handwritten digit recognition problem to verify the validation of the deep learning functionality of our proposed ONN. Furthermore, we demonstrate the possibility of photonic intelligent features, by showing the performance of a home-made optical pulse train distribution classification problem. Our proposed ONN in the temporal synthetic dimension uses the trade-off between time and space complexity, and therefore does not have the advantages in energy and speed. However, the key achievement here is that we propose an alternative model with relatively high flexibility, which can be re-configurable and scalable on the number of sites (pulses) in each layers as well as the number of layers (roundtrips) for each computation. Distinguished from other relevant works pankov19 ; arxiv9-8Aus , our proposal focuses one resonator supporting temporal synthetic dimension and shows the opportunity for constructing a flexible ONN that is capable for various optical tasks once getting trained. The construction in Fig. 1(b) can be easily linked to architectures of conventional neural networks with long-range connectivities added via additional delay lines, which can be further generalized to a recurrent neural network I. Goodfellow 17 ; K. Yao11 ; G. Dorffner13 ; M. Husken12 ; J. T. Connor14 . Furthermore, one can also prepare the set of pulses with the single-photon state instead chenPRL18 , which might makes our proposal with the temporal synthetic dimension being possible for constructing the quantum neural network in the future study. Our work therefore shows the opportunity for constructing a flexible ONN in a single resonator, which points to a broad range of potential applications from all-optical computation to intelligent optical information processing chensegev and biomedical imaging newSierra ; newShirshin .

Acknowledgements.
The research was supported by National Natural Science Foundation of China (12122407, 11974245, and 12192252), Shanghai Municipal Science and Technology Major Project (2019SHZDZX01-ZX06). V. V. Y. acknowledges partial funding from NSF (DBI-1455671, ECCS-1509268, CMMI-1826078), AFOSR (FA9550-15-1-0517, FA9550-18-1-0141, FA9550-20-1-0366, FA9550-20-1-0367), DOD Army Medical Research (W81XWH2010777), NIH (1R01GM127696-01, 1R21GM142107-01), and the Cancer Prevention and Research Institute of Texas (RP180588). L.Y. thanks the sponsorship from Yangyang Development Fund and the support from the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

References

  • (1) D. Rosenbluth, K. Kravtsov, M. P. Fok, and P. R. Prucnal, “A high performance photonic pulse processing device,” Optics Express 17, 22767–22772 (2009).
  • (2) A. N. Tait, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Broadcast and weight: an integrated network for scalable photonic spike processing,” Journal of Lightwave Technology 32, 4029–4041 (2014).
  • (3) Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nature Photonics 11, 441–446 (2017).
  • (4) A. N. Tait, T. F. de Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Scientific Reports 7, 7430 (2017).
  • (5) X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361, 1004–1008 (2018).
  • (6) Z. Ying, Z. Wang, Z. Zhao, S. Dhar, D. Z. Pan, R. Soref, and R. T. Chen, “Silicon microdisk-based full adders for optical computing,” Optics Letters 43, 983–986 (2018).
  • (7) J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, and W. H. P. Pernice, “All-optical spiking neurosynaptic networks with self-learning capabilities,” Nature 569, 208–214 (2019).
  • (8) Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y.-C. Chen, P. Chen, G.-B. Jo, J. Liu, and S. Du, “All-optical neural network with nonlinear activation functions,” Optica 6, 1132–1137 (2019).
  • (9) R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Physical Review X 9, 021032 (2019).
  • (10) E. Khoram, A. Chen, D. Liu, L. Ying, Q. Wang, M. Yuan, and Z. Yu, “Nanophotonic media for artificial neural inference,” Photonics Research 7, 823–827 (2019).
  • (11) T. Zhang, J. Wang, Y. Dan, Y. Lanqiu, J. Dai, X. Han, X. Sun, and K. Xu, “Efficient training and design of photonic neural network through neuroevolution,” Optics Express 27, 37150–37163 (2019).
  • (12) H. Zhang, J. Thompson, M. Gu, D. Jiang, H. Cai, P. Y. Liu, Y. Shi, Y. Zhang, M. F. Karim, G. Q. Lo, X. Luo, B. Dong, L. C. Kwek, and A. Q. Liu, “Efficient On-Chip Training of Optical Neural Networks Using Genetic Algorithm,” ACS Photonis 8, 1662–1672 (2021).
  • (13) G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. B. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature 588, 39–47 (2020).
  • (14) M. A. Nahmias, T. F. de Lima, A. N. Tait, H.-T. Peng, B. J. Shastri, and P. R. Prucnal, “Photonic multiply-accumulate operations for neural networks,” IEEE Journal of Selected Topics in Quantum Electronics 26, 1–18 (2020).
  • (15) W. Bogaerts, D. Pérez, J. Capmany, D. A. B. Miller, J. Poon, D. Englund, F. Morichetti, and A. Melloni, “Programmable photonic circuits,” Nature 586, 207–216 (2020).
  • (16) X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature 589, 44–51 (2021).
  • (17) J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, J. Liu, C. D. Wright, A. Sebastian, T. J. Kippenberg, W. H. P. Pernice, and H. Bhaskaran, “Parallel convolutional processing using an integrated photonic tensor core,” Nature 589, 52–58 (2021).
  • (18) J. Jiang, M. Chen, and J. A. Fan, “Deep neural networks for the evaluation and design of photonic devices,” Nature Reviews Materials 6, 679–700 (2021).
  • (19) T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through in situ back-propagation and gradient measurement,” Optica 5, 864–871 (2018).
  • (20) T. Zhou, L. Fang, T. Yan, J. Wu, Y. Li, J. Fan, H. Wu, X. Lin, and Q. Dai, “In situ optical backpropagation training of diffractive optical neural networks,” Photonics Research 8, 940–953 (2020).
  • (21) G. R. Steinbrecher, J. P. Olson, D. Englund, and J. Carolan, “Quantum optical neural networks,” npj Quantum Information 5, 60 (2019).
  • (22) J. T. Connor, R. D. Martin, and L. E. Atlas, “Recurrent neural networks and robust time series prediction,” IEEE Transactions on Neural Networks and Learning Systems 5, 240–254 (1994).
  • (23) G. Dorffner, “Neural networks for time series processing,” Neural Networks World 6, 447–468 (1996).
  • (24) M. Hüsken, and P. Stagge, “Recurrent neural networks for time series classification,” Neurocomputing 50, 223–235 (2003).
  • (25) K. Yao, G. Zweig, M.-Y. Hwang, Y. Shi, and D. Yu, “Recurrent neural networks for understanding,” In Proceedings of Interspeech, 2524–2528 (2013).
  • (26) I. Goodfellow, Y. Bengio, and A. Courville, “Deep Learning,” The MIT Press 1, 326–366 (2016).
  • (27) T. W. Hughes, I. A. D. Williamson, M. Minkov, and S. Fan, “Wave physics as an analog recurrent neural network,” Science Advances 5, eaay6946 (2019).
  • (28) L. Yuan, Q. Lin, M. Xiao, and S. Fan, “Synthetic dimension in photonics,” Optica 5, 1396–1405 (2018).
  • (29) T. Ozawa, and H. M. Price, “Topological quantum matter in synthetic dimensions,” Nature Reviews Physics 1, 349–357 (2019).
  • (30) E. Lustig, and M. Segev, “Topological photonics in synthetic dimensions,” Advances in Optics and Photonics 13, 426–461 (2021).
  • (31) H. Liu, Z. Yan, M. Xiao, and S. Zhu, “Recent Progress in Synthetic Dimension in Topological Photonics,” Chinese Optics Letters 41, 0123002 (2021).
  • (32) L. Yuan, A. Dutt, and S. Fan, “Synthetic frequency dimensions in dynamically modulated ring resonators,” APL Photonics 6, 071102 (2021).
  • (33) A. V. Pankov, O. S. Sidelnikov, I. D. Vatnik, A. A. Sukhorukov, and D. V. Churkin, “Deep learning with synthetic photonic lattices for equalization in optical transmission systems,” Proc. SPIE 11192, Real-time Photonics Measurements, Data Management, and Processing IV, 111920N (2019).
  • (34) S. Buddhiraju, A. Dutt, M. Minkov, I. A. D. Williamson, and S. Fan, “Arbitrary linear transformations for photons in the frequency synthetic dimension,” Nature Communications 12, 2401 (2021).
  • (35) Z. Lin, S. Sun, J. Azana, W. Li, N. Zhu, and M. Li, “Temporal optical neurons for serial deep learning,” arXiv:2009.03213 (2020).
  • (36) A. Regensburger, C. Bersch, B. Hinrichs, G. Onishchukov, A. Schreiber, C. Silberhorn, and U. Peschel, “Photon propagation in a discrete fiber network: an interplay of coherence and losses,” Physical Review Letters 107, 233902 (2011).
  • (37) A. Regensburger, C. Bersch, M.-A. Miri, G. Onishchukov, D. N. Christodoulides, and U. Peschel, “Parity-time synthetic photonic lattices,” Nature 488, 167–171 (2012).
  • (38) M. Wimmer, A. Regensburger, C. Bersch, M.-A. Miri, S. Batz, G. Onishchukov, D. N. Christodoulides, and U. Peschel, “Optical diametric drive acceleration through action-reaction symmetry breaking,” Nature Physics 9, 780–784 (2013).
  • (39) A. Marandi, Z. Wang, K. Takata, R. L. Byer, and Y. Yamamoto, “Network of time-multiplexed optical parametric oscillators as a coherent Ising machining,” Nature Photonics 8, 937–942 (2014).
  • (40) M. Wimmer, H. M. Price, I. Carusotto, and U. Peschel, “Experimental measurement of the Berry curvature from anomalous transport,” Nature Physics 13, 545–550 (2017).
  • (41) C. Chen, X. Ding, J. Qin, Y. He, Y.-H. Luo, M.-C. Chen, C. Liu, X.-L. Wang, W.-J. Zhang, H. Li, L.-X. You, Z. Wang, D.-W. Wang, B. C. Sanders, C.-Y. Lu, and J.-W. Pan, “Observation of topologically protected edge states in a photonic two-dimensional quantum walk,” Physical Review Letters 121, 100502 (2018).
  • (42) L. Larger, A. Baylón-Fuentes, R. Martinenghi, V. S. Udaltsov, Y. K. Chembo, and M. Jacquot, ”High-speed photonic reservoir computing using a time-delay-based architecture: million words per second classification,” Phys Review X 7, 011015 (2017).
  • (43) A. V. Pankov, I. D. Vatnik, A. A. Sukhorukov, “Optical neural network based on synthetic nonlinear photonic lattices,” Physical Review Applied 17, 024011 (2022).
  • (44) Y. Lecun, and L. Botto, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE 86, 2278–2324 (2020).
  • (45) C. Leefmans, A. Dutt, J. Williams, L. Yuan, M. Parto, F. Nori, S. Fan, and A. Marandi, “Topological dissipation in a time-multiplexed photonic resonator network,” Nature Physics 18, 442 (2022).
  • (46) Q. Bao, H. Zhang, Z. Ni, Y. Wang, L. Polavarapu, Z. Shen, Q.-H Xu, D. Tang, and K. P. Loh, “Monolayer graphene as a saturable absorber in a mode-locked laser,” Nano Research 4, 297–307 (2011).
  • (47) Z. Cheng, H. K. Tsang, X. Wang, K. Xu, and J.-B. Xu, “In-plane optical absorption and free carrier absorption in graphene-on-silicon waveguides,” IEEE Journal of Selected Topics in Quantum Electronics 20, 43–48 (2014).
  • (48) Q. Xie, H. Zhang, and C. Shu, “Programmable Schemes on Temporal Waveform Processing of Optical Pulse Trains,” Journal of Lightwave Technology 38, 339–345 (2020).
  • (49) Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning 2, 1–127 (2009).
  • (50) D. Scherer, A. Müller, and S. Behnke, “Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition,” Proceedings of 20th International Conference on Artificial Neural Networks 6354 LNCS (PART 3), 92–101 (2010).
  • (51) S. Raudys, “Evolution and generalization of a single neurone: I. Single-layer perceptron as seven statistical classifiers,” Neural Networks 11, 283–296 (1998).
  • (52) M. Lehtokangas and J. Saarinen, “Weight initialization with reference patterns,” Neurocomputing 20, 265–278 (1998).
  • (53) F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys 34, 1–47 (2002).
  • (54) N. Saleem and M. I. Khattak, “Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation,” Applied Acoustics 167, 107385 (2020).
  • (55) D. Psaltis, D. Brady, and K. Wagner, “Adaptive optical networks using photorefractive crystals,” Applied Optics 27, 1752–1759 (1988).
  • (56) S. Tainta, M. J. Erro, W. Amaya, M. J. Garde, S. Sales, and M. A. Muriel, “Periodic time-domain modulation for the electrically tunable control of optical pulse train envelope and repetition rate multiplication,” IEEE Journal of Selected Topics in Quantum Electronics 18, 377–383 (2012).
  • (57) A. Malacarne and J. Azaña, “Discretely tunable comb spacing of a frequency comb by multilevel phase modulation of a periodic pulse train,” Optics Express 21, 4139–4144 (2013).
  • (58) I. A. D. Willianmson, T. W. Hughes, M. Minkov, B. Bartlett, S. Pai, and S. Fan, “Reprogrammable electro-optic nonlinear activation functions for optical neural networks,” IEEE Journal of Selected Topics in Quantum Electronics 26, 7700412 (2020).
  • (59) Z. Chen and M. Segev, “Highlighting photonics: looking into the next decade,” eLight 1, 2–12 (2021).
  • (60) E. Duran-Sierra, S. Cheng, R. Cuenca, B. Ahmed, J. Ji, V. V. Yakovlev, M. Martinez, M. Al-Khalil, H. Al-Enazi, Y.S. L. Cheng, J. Wright, C. Busso, J. A Jo, “Machine-learning assisted discrimination of precancerous and cancerous from healthy oral tissue based on multispectral autofluorescence lifetime imaging endoscopy,” Cancers 13, 4751 (2021).
  • (61) E. A. Shirshin, A. V. Gayerm, E. E. Nikonova, M. M. Lukina, B. P. Yakimov, G. S. Budylin, V. V. Dudenkova, N. I. Ignatova, D. V. Komarov, E. V. Zagaynova, V. V. Yakovlev, W. Becker, V. I. Shcheslavskiy, M. Shirmanova, and M. O. Scully, “Label-free sensing of cells with fluorescence lifetime imaging: the quest for metabolic heterogeneity,” Proceedings National Academy of Sciences USA 119, e2118241119 (2022).