Zenan Ling \Email[email protected]
\NameZhenyu Liao \Email[email protected]
\NameRobert C. Qiu \Email[email protected]
\addr Huazhong University of Science and Technology, Wuhan, China
On the Equivalence between Implicit and Explicit Neural Networks:
A High-dimensional Viewpoint
Abstract
Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.
1 Introduction
Implicit neural networks (NNs) [Bai et al.(2019)Bai, Kolter, and Koltun] have recently emerged as a new paradigm in neural network design. An implicit NN is equivalent to an infinite-depth weight-shared explicit NN with input-injection. Unlike explicit NNs, implicit NNs generate features by directly solving for the fixed point, rather than through layer-by-layer forward propagation. Moreover, implicit NNs have the remarkable advantage that gradients can be computed analytically only through the fixed point with implicit differentiation. Therefore, training implicit NNs only requires constant memory.
Despite the empirical success achieved by implicit NNs [Bai et al.(2020)Bai, Koltun, and Kolter, Xie et al.(2022)Xie, Wang, Ling, Li, Liu, and Lin], our theoretical understanding of these models is still limited. In particular, there is a lack of theoretical analysis of the training dynamics and generalization performance of implicit NNs, and possibly more importantly, whether these properties can be connected to those of explicit NNs. [Bai et al.(2019)Bai, Kolter, and Koltun] demonstrates that any deep NN can be reformulated as a special implicit NN. However, it remains unknown whether general implicit NNs have advantages over explicit NNs. [Feng and Kolter(2020)] extends previous neural tangent kernel (NTK) studies to implicit NNs and give the exact expression of the NTK of the ReLU implicit NNs. However, the differences between implicit and explicit NTKs are not analyzed. Moreover, previous works [Ling et al.(2023)Ling, Xie, Wang, Zhang, and Lin, Truong(2023)] have proved the global convergence of gradient descent for training implicit NNs. However, it is still unclear what distinguishes the training dynamic of implicit NNs and that of explicit NNs.
In this paper, we investigate implicit NNs from a high-dimensional view. Specifically, we perform a fine-grained asymptotic analysis on the eigenspectra of conjugate kernel (CKs) and NTKs of implicit NNs, which play a fundamental role in the convergence and generalization high dimensional NNs [Jacot et al.(2018)Jacot, Gabriel, and Hongler]. By considering input data uniformly drawn from the unit sphere, we derive, with recent advances in random matrix theory, high-dimensional (spectral) equivalents for the CKs and NTKs of implicit NNs, and establish the equivalence between implicit and explicit NNs by matching the coefficients of the corresponding asymptotic spectral equivalents. Surprisingly, our results reveal that a single-layer explicit NN with carefully designed activations has the same CK or NTK eigenspectra as a ReLU implicit NN, whose depth is essentially infinite.
2 Preliminaries
2.1 Implicit and Explicit NNs
Implicit NNs.
In this paper, we study a typical implicit neural network, the deep equilibrium model (DEQ) [Bai et al.(2019)Bai, Kolter, and Koltun]. Let denote the input data. We define a vanilla DEQ with the transform at the -th layer as
(1) |
where and are weight matrices, are constants, is an element-wise activation, is the pre-activation and is the output feature of the -th hidden layer corresponding to the input data . The output of the last hidden layer is defined by and we denote the corresponding pre-activation by . Note that can be calculated by directly solving for the equilibrium point of the following equation
(2) |
We are interested in the conjugate kernel and neural tangent kernel (Implicit-CK and Implicit-NTK, for short) of implicit neural networks defined in Eq. (2). Following [Feng and Kolter(2020)], we denote the corresponding Implicit-CK by where the -th entry of is defined recursively as
(3) |
And the Implicit-NTK is defined as whose the -th entry is defined as
(4) |
Explicit Neural Networks.
We consider a single-layer fully-connected NN model defined as where is the weight matrix and is an element-wise activation function. Let , the corresponding Explicit-CK matrix and Explicit-NTK matrix are defined as follows:
(5) |
2.2 CKs and NTKs of ReLU Implicit NNs
We make the following assumptions on the random initialization, the input data, and activations.
Assumption 1
(i) As , . All data points , , are independent and uniformly sampled from . (ii) , , and are independent and have i.i.d entries of zero mean, unit variance, and finite fourth kurtosis. Moreover, we require . (iii) The activation of the implicit NN is the normalized ReLU, i.e., . The activation of the explicit NN is a function.
Remark 1
(i) Despite derived here for uniform distribution on the unit sphere, we conjecture that our results extend the result to more general distributions by using the technique developed in [Fan and Wang(2020), Gu et al.(2022)Gu, Du, Yuan, Xie, Pu, Qiu, and Liao]. (ii) The additional requirement on the variance is to ensure the existence and uniqueness of the fixed point of the NTK and to keep the diagonal entries of the CK matrix at , see examples in [Feng and Kolter(2020)]. (iii) It is possible to extend our results to implicit NNs with general activations by using the technique proposed in [Truong(2023)]. We defer the extension to more general data distributions and activation functions to future work.
Under Assumptions 1, the limits of Implicit-CK and Implicit-NTK exist, and one can have precise expressions of and as follows [Feng and Kolter(2020), Ling et al.(2023)Ling, Xie, Wang, Zhang, and Lin].
Lemma 1
Let . Under Assumptions 1, the fixed point of Implicit-CK is the root of
(6) |
The limit of Implicit-NTK is
(7) |
3 Main Results
In this section, we prove the high-dimensional equivalents for CKs and NTKs of implicit and explicit NNs. As a result, by matching the coefficients of the asymptotic spectral equivalents, we establish the equivalence between implicit and explicit NNs in high dimensions.
3.1 Asymptotic Approximations
CKs.
We begin by defining several quantities that are crucial to our results. Note that the unique fixed point of Eq. (6) exists as long as . We define the implicit map induced from Eq. (6) as . Let be the solution of when . Using implicit differentiation, one can obtain that
Now we are ready to present the asymptotic equivalent of the Implicit-CK matrix.
Theorem 1 (Asymptotic approximation of Implicit-CKs)
NTKs.
For the Implicit-NTK, we define , i.e., , for . It is easy to check that and . Using implicit differentiation again, we have
Now we are ready to present the asymptotic equivalent of the Implicit-NTK matrix.
Theorem 3 (Asymptotic approximation for Implicit-NTKs)
Theorem 4 (Asymptotic approximation for Explicit-NTKs)
Remark 2
(i) Due to the homogeneity of the ReLU function, the Implicit-CK and the Implicit-NTK are essentially inner product kernel random matrices. Consequently, Theorem 1 and 3 can be built upon the results in [El Karoui(2010)]. We postpone the study on general activations to future work. (ii) The results in Theorem 2 and 4 generalize those of [Gu et al.(2022)Gu, Du, Yuan, Xie, Pu, Qiu, and Liao, Ali et al.(2022)Ali, Liao, and Couillet] to the cases of “non-centred” activations, i.e., we do not require for .
3.2 The Equivalence between Implicit and Explicit NNs

In the following corollary, we show a concrete case of a single-layer explicit NN with an quadratic activation, that matches the CK or NTK eigenspectra of a ReLU implicit NN. The idea is to utilize the results of Theorems 1-4 to match the coefficients of the asymptotic equivalents such that , or . We implement numerical simulations to verify our theory. The numerical results are shown in Figure 1.
Corollary 1
We consider a quadratic polynomial activation . Let Assumptions 1 hold. As , the Implicit-CK matrix defined in Eq. (6) can be approximated consistently in operator norm, by the Explicit-CK matrix defined in Eq. (5), i.e., , as long as
and the Implicit-NTK matrix defined in Eq. (7) can be approximated consistently in operator norm, by the Explicit-NTK matrix defined in Eq. (5), i.e., , as long as
4 Conclusion
In this paper, we study the CKs and NTKs of high-dimensional ReLU implicit NNs. We prove the asymptotic spectral equivalents for Implicit-CKs and Implicit-NTKs. Moreover, we establish the equivalence between implicit and explicit NNs by matching the coefficients of the asymptotic spectral equivalents. In particular, we show that a single-layer explicit NN with carefully designed activations has the same CK or NTK eigenspectra as a ReLU implicit NN. For future work, it would be interesting to extend our analysis to more general data distributions and activation functions.
Acknowledgements
Z. Liao would like to acknowledge the National Natural Science Foundation of China (via fund NSFC-62206101) and the Fundamental Research Funds for the Central Universities of China (2021XXJS110) for providing partial support. R. C. Qiu and Z. Liao would like to acknowledge the National Natural Science Foundation of China (via fund NSFC-12141107), the Key Research and Development Program of Hubei (2021BAA037) and of Guangxi (GuiKe-AB21196034).
References
- [Ali et al.(2022)Ali, Liao, and Couillet] Hafiz Tiomoko Ali, Zhenyu Liao, and Romain Couillet. Random matrices in service of ml footprint: ternary random features with no performance loss. ICLR, 2022.
- [Bai et al.(2019)Bai, Kolter, and Koltun] Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- [Bai et al.(2020)Bai, Koltun, and Kolter] Shaojie Bai, Vladlen Koltun, and J Zico Kolter. Multiscale deep equilibrium models. Advances in Neural Information Processing Systems, 2020.
- [El Karoui(2010)] Noureddine El Karoui. The spectrum of kernel random matrices. The Annuals of Statistics, 2010.
- [Fan and Wang(2020)] Zhou Fan and Zhichao Wang. Spectra of the conjugate kernel and neural tangent kernel for linear-width neural networks. Advances in neural information processing systems, 33:7710–7721, 2020.
- [Feng and Kolter(2020)] Zhili Feng and J Zico Kolter. On the neural tangent kernel of equilibrium models. arxiv, 2020.
- [Gu et al.(2022)Gu, Du, Yuan, Xie, Pu, Qiu, and Liao] Lingyu Gu, Yongqi Du, Zhang Yuan, Di Xie, Shiliang Pu, Robert Qiu, and Zhenyu Liao. ” lossless” compression of deep neural networks: A high-dimensional neural tangent kernel approach. Advances in Neural Information Processing Systems, 35:3774–3787, 2022.
- [Jacot et al.(2018)Jacot, Gabriel, and Hongler] Arthur Jacot, Franck Gabriel, and Clément Hongler. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- [Ling et al.(2023)Ling, Xie, Wang, Zhang, and Lin] Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, and Zhouchen Lin. Global convergence of over-parameterized deep equilibrium models. In International Conference on Artificial Intelligence and Statistics, pages 767–787. PMLR, 2023.
- [Truong(2023)] Lan V Truong. Global convergence rate of deep equilibrium models with general activations. arXiv preprint arXiv:2302.05797, 2023.
- [Xie et al.(2022)Xie, Wang, Ling, Li, Liu, and Lin] Xingyu Xie, Qiuhao Wang, Zenan Ling, Xia Li, Guangcan Liu, and Zhouchen Lin. Optimization induced equilibrium networks: An explicit optimization perspective for understanding equilibrium models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.