Graduate School of Science and Engineering, Yamagata University, Jonan 4-3-16, Yonezawa-shi Yamagata, 992-8510 [email protected] School of Science and Engineering, Yamagata University, Jonan 4-3-16, Yonezawa-shi Yamagata, 992-8510 Japan. \CopyrightKei Uchizawa and Haruki Abe \ccsdesc[100]Theory of computation Models of computation \EventEditors \EventNoEds1 \EventLongTitle \EventShortTitle \EventAcronym \EventYear2021 \EventDate \EventLocation \EventLogo \SeriesVolume \ArticleNo1
Exponential Lower Bounds for Threshold Circuits of Sub-Linear Depth and Energy
Abstract
In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments.
As our main result, we prove that any threshold circuit of size , depth , energy and weight satisfies , where is the rank of the communication matrix of a -variable Boolean function that computes. Thus, such a threshold circuit is able to compute only a Boolean function of which communication matrix has rank bounded by a product of logarithmic factors of and linear factors of . This implies an exponential lower bound on the size of even sublinear-depth threshold circuit if energy and weight are sufficiently small. For example, we can obtain an exponential lower bound even for threshold circuits of depth , energy and weight . We also show that the inequality is tight up to a constant factor when the depth and energy satisfies .
For other models of neural networks such as a discretized ReLE circuits and decretized sigmoid circuits, we prove that a similar inequality also holds for a discretized circuit : . We obtain this inequality by showing that any discretized circuit can be simulated by a threshold circuit with moderate increase of size, depth, energy and weight. Thus, if we consider the number of non-zero output values as a measure for sparse activity of a neural network, our results suggest that larger depth linearly helps neural networks to acquire sparse activity.
keywords:
Circuit complexity, disjointness function, eqaulity function, neural networks, threshold circuits, ReLU cicuits, sigmoid circuits, sprase activity1 Introduction
Background. DiCarlo and Cox argued that constructing good internal representations is crucial to perform visual information processing, such as object recognition, for neural networks in the brain [6]. Here, an internal representation is described by a vector in a very high dimensional space, where each axis is one neuron’s activity and the dimensionality equals to the number (e.g., 1 million) of neurons in a feedforward neural network. They call representations good if a given pair of two images that are hard to distinguish at the input space, but the resulting representations for them are easy to separate by simple classifiers such as a linear classifier. While such internal representations are likely to play fundamental role in information processing in the brain, it is also known that a neuron needs relatively high energy to be active [19, 30], and hence neural networks are forced to acquire representations supported by only a small number of active neurons [8]. These observations pose a question: for what information processing can neural networks construct good internal representations?
In the paper [44], Uchizawa et al. address the question from the viewpoint of circuit complexity. More formally, they employed threshold circuits as a model of neural networks [26, 27, 31, 34, 36, 37, 38], and introduced a complexity measure, called energy complexity, for sparsity of their internal representations. A threshold circuit is a feedforward logic circuit whose basic computational element computes a linear threshold function, and energy of a circuit is defined as the maximum number of internal gates outputting ones over all the input assignments. (See also [7, 17, 35, 39, 47] for studies on energy complexity of other types of logic circuits). Uchizawa et al. then show that the energy complexity is closely related to the rank of linear decision trees. In particular, they prove that any linear decision tree of leaves can be simulated by a threshold circuit of size and energy . Thus, even logarithmic-energy threshold circuits have certain computational power: any linear decision tree of polynomial number of leaves can be simulated by a polynomial-size and logarithmic-energy threshold circuit.
Following the paper [44], a sequence of papers show relations among other major complexity measures such as size (the number of gates), depth, weight and fan-in [24, 40, 41, 45, 43, 42, 46]. In particular, Uchizawa and Takimoto [45] showed that any threshold circuit of depth and energy requires size if computes a high bounded-error communication complexity function such as Inner-Product function. Even for low communication complexity functions, an exponential lower bound on the size is known for constant-depth threshold circuits: any threshold circuit of depth and energy requires size if computes the parity function [43]. These results provide exponential lower bounds if the depth is constant and energy is sub-linear [45] or sub-logarithmic [43], while both Inner-Product function and Parity function are computable by linear-size, constant-depth, and linear energy threshold circuits. Thus these results imply that the energy complexity strongly related to representational power of threshold circuits. However these lower bounds break down when we consider threshold circuits of larger depth and energy, say, non-constant depth and sub-linear energy.
Our Results for Threshold Circuits. In this paper, we prove that simple Boolean functions are hard even for sub-linear depth and sub-linear energy threshold circuits. Let be a threshold circuit with Boolean input variables and . A communication matrix of is a matrix where each row (resp., each column) is indexed by an assignment to (resp., to ), and the value is defined to be the output of given and . We denote by the rank of over . Our main result is the following relation among size, depth energy and weight.
Theorem 1.1.
Let and be integers satisfying , , . If a threshold circuit computes a Boolean function of variables, and has size , depth , energy and weight , then it holds that
(1) |
The theorem implies exponential lower bounds for sub-linear depth and sub-linear energy threshold circuits. As an example, let us consider a Boolean function defined as follows: For a input variables and ,
We note that is a biologically motivated Boolean function: Maass [22] defined to model coincidence detection or a pattern matching, and Lynch and Musco [20] introduced a related problem, called Filter problem, for studying theoretical aspect of spiking neural networks. Since is the complement of the well-studied Boolean function, the disjointness function, the rank of is [15]. Thus, the theorem implies that
(2) |
holds if a threshold circuit computes . Arranging Eq. (2), we can obtain a lower bound which is exponential in if both and are sub-linear and is sub-exponential. For example, we can obtain an exponential lower bound even for threshold circuits of depth , energy and weight . We can obtain similar lower bounds for the Inner-Product function and the equality function, since they have linear rank.
Comparing the lower bound given in [45] to ours, our lower bound is meaningful only for sub-exponential weight, but improves on it in two-fold: the lower bound is exponential even if is sub-linear, and provide a nontrivial lower bound for Boolean functions with much weaker condition: Threshold circuits need exponential size even for Boolean functions of the standard rank .
Threshold circuits have received considerable attention in circuit complexity, and a number of lower bound arguments have developed for threshold circuits under some restrictions on computational resources including size, depth, energy and weight [1, 2, 4, 10, 11, 14, 16, 24, 29, 33, 43, 45, 46]. However, the arguments for lower bounds are designated for constant-depth threshold circuits, and hence cannot provide meaningful ones when the depth is not constant. In particular, is computable by a depth-2 and linear-size threshold circuit. Thus, directly applying known techniques are unlikely to yield an exponential lower bound for .
To complement Theorem 1.1, we also show that the lower bound is tight up to a constant factor if the product of and are small:
Theorem 1.2.
For any integers and such that and , is computable by a threshold circuit of size
depth , energy and weight
Substituting and of a threshold circuit given in Theorem 1.2 to the right hand side of Eq. (2), we have
which almost matches the left hand side of Eq. (2) if . Thus, Theorem 1.1 neatly captures the computational aspect of threshold circuits computing . Recall that any linear decision tree of polynomial number of leaves can be simulated by a polynomial-size and logarithmic-energy threshold circuit [44]. Also, it is known that any Boolean function is computable by a threshold circuit of energy one if exponential size is allowed [24]. Thus, we believe that the situation is not too restrictive. we We also show that the lower bound is also tight for the equality function.
Our Result for Discretized Circuits. Besides threshold circuits, we consider other other well-studied model of neural network, where an activation function and weights of an computational element are discretized (such as, discretized sigmoid or ReLU circuits). The size, depth, energy and weight are important parameters also for artificial neural networks. The size and depth are major topics on success of deep learning. The energy is related to important techniques for deep learning method such as regularization, sparse coding, or sparse autoencoder [12, 18, 28]. The weight resolution is closely related to chip resources in neuromorphic hardware systems [32], and quantization schemes received attention [5, 13].
We define similar notions for the energy and weight of a discretized circuit, and show that any discretized circuit can be simulated by a threshold circuit with a moderate increase in size, depth, energy, and weight. Consequently, combining with Theorem 1.1, we can show that the rank is bounded by a product of the polylogarithmic factors of and linear factors of for discretized circuits. For example, we can obtain the following proposition for sigmoid circuits:
Theorem 1.3.
If a sigmoid circuit of size , depth , energy , and weight computes a Boolean function , then it holds that
Maass, Schnitger and Sontag [21] showed that a sigmoid circuit could be simulated by a threshold circuit, but their simulation was optimized to be depth-efficient and did not consider energy. Thus, their result does not fit into our purpose.
Theorems 1.1 and 1.3 imply that a threshold circuit or discretized circuit are able to compute a Boolean function of bounded rank. Thus, we can consider these theorems as bounds on corresponding concept classes. According to the bound, times larger depth is comparable to times larger size. Thus, large depth could enormously help neural networks to increase its expressive power. Also, the bound suggests that increasing depth could also help a neural network to acquire sparse activity when we have hardware constraints on both the number of neurons and the weight resolution. These observations may shed some light on the reason for the success of deep learning.
Organization. The rest of the paper is organized as follows. In Section 2, we define terms needed for analysis. In Section 3, we present our main lower bound result. In Section 4, we show the tightness of the bound by constructing a threshold circuit computing . In Section 5, we show that a discretized circuit can be simulated by a threshold circuit. In Section 6, we conclude with some remarks.
2 Preliminaries
For an integer , we denote by a set . The base of the logarithm is two unless stated otherwise. In Section 2.1, we define terms on threshold circuits and discretized circuits. In Section 2.2, we define communication matrix, and present some known facts.
2.1 Circuit Model
In Sections 2.1.1 and 2.1.2, we give definitions of threshold and discritized circuits, respectively.
2.1.1 Threshold Circuits
Let be a positive integer. A threshold gate with input variables has weights , and a threshold . We define the output of as
To evaluate the weight resolution, we assume single synaptic weight to be discrete, and that are integers. The weight of is defined as the maximum of the absolute values of . In other words, we assume that are -bit coded discrete values. Throughout the paper, we allow a gate to have both positive and negative weights, although biological neurons are either excitatory (all the weights are positive) or inhibitory (all the weights are negative). As mentioned in [22], this relaxation has basically no impact on circuit complexity investigations, unless one cares about constant blowup in computational resources. This is because a single gate with positive and negative weights can be simulated by a pair of excitatory and inhibitory gates.
A threshold circuit is a combinatorial circuit consisting of threshold gates, and is expressed by a directed acyclic graph. The nodes of in-degree 0 correspond to input variables, and the other nodes correspond to gates. Let be a set of the gates in . For each gate , the level of , denoted by , is defined as the length of a longest path from an input variable to on the underlying graph of . For each , we define as a set of gates in the th level:
Given an input assignment to , the outputs of the gates in are inductively determined from the bottom level.
In this paper, we consider a threshold circuit for a Boolean function . Thus, has Boolean input variables and , and a unique output gate, denoted by , which is a linear classifier separating internal representations given by the gates in the lower levels (possibly together with input variables). Consider a gate in . Let (resp., ) be the weights for (resp., ), and be threshold of . For each gate directed to , let be a weight of for the output of . Then the output of is defined as
where denotes a potentials of invoked by the input variables and gates:
We sometimes write (resp., ) for the potential invoked by (resp., ):
Although the inputs to are not only and but the outputs of gates in the lower levels, we write for the output of , because and inductively decide the output of . We say that computes a Boolean function if for every .
Let be a threshold circuit. We define size of as the number of the gates in , and depth of as the level of . We define the energy of as
We define weight of as the maximum of the weights of the gates in : .
2.1.2 Discretized Circuits
Let be an activation function. Let be a discretizer that maps a real number to a number representable by a bitwidth . We define a discretized activation function as a composition of and , that is, for any number . We say that has silent range for an interval if if , and , otherwise. For example, if we use the ReLU function as the activation function , then has silent range for for any discretizer . If we use the sigmoid function as the activation function and linear partition as discretizer , then has silent range for where where is the natural logarithm.
Let be a discretized activation function with silent range. A -gate with input variables has weights and a threshold , where each of the weights and threshold are discretized by . The output of is then defined as
A -circuit is a combinatorial circuit consisting of -gates except that the top gate is a threshold gate, that is, a linear classifier. We define size and depth of a -circuit same as the ones for a threshold circuit. We define energy of a -circuit as the maximum number of gates outputting non-zero values in the circuit:
where for a statement denote a notation of the function which outputs one if is true, and zero otherwise. We define weight of as , where is the bitwidth possibly needed to represent a potential value invoked by a single input of a gate in .
2.2 Communication Matrix and its Rank
Let . For a Boolean function , we define a communication matrix over as a matrix where each row and column are indexed by and , respectively, and each entry is defined as . We denote by the rank of over . If a circuit computes , we may write instead of . If a Boolean function does not have an obvious separation of the input variables to and , we may assume a separation so that is maximized.
Let and be natural numbers such that . Let
A -disjointness function over is defines as follows:
where the input assignments are chosen from . The book [15] contains a simple proof showing has full rank.
Theorem 2.1 (Theorem 13.10 [15]).
. In particular, .
is the complement of . We can obtain the same bound for , as follows:
Corollary 2.2.
.
We also use well-known facts on the rank. Let and be two matrices of same dimensions. We denote by the summation of and , and by the Hadamard product of and .
Fact 15.
For two matrices and of same dimensions, we have
- (i)
-
;
- (ii)
-
.
3 Lower Bound for Threshold Circuits
In this section, we give the inequality relating the rank of the communication matrix to the size, depth, energy and weight.
Theorem 3.1 (Theorem 1 restated).
Let and be integers satisfying , , . Suppose a threshold circuit computes a Boolean function of variables, and has size , depth , energy , and weight . Then it holds that
We prove the the theorem by showing that is a sum of matrices each of which corresponds to an internal representation that arises in . Since has bounded energy, the number of internal representations is also bounded. We then show by the inclusion-exclusion principle that each matrix corresponding an internal representation has bounded rank. Thus, Fact 1 implies the theorem.
Proof 3.2.
Let be a threshold circuit that computes a Boolean function of variables, and has size , depth , energy and weight . Let be a set of the gates in . For , let be a set of the gates in -th level of . Without loss of generality, we assume that . We evaluate the rank of , and prove that
(4) |
where . Equation (4) implies that
where the last inequality holds if . Taking the logarithm of the inequality, we obtain the theorem.
Below we verify that Eq. (4) holds. Let , where is defined as a subset of for each . Given an input , we say that an internal representation arises for if, for every , for every , and for every . We denote by the internal representation that arises for . We then define as a set of internal representations that arise for such that :
Note that, for any ,
and . Thus we have
(5) |
For each , let be a matrix such that, for every ,
By the definitions of and , we have
and hence Fact 1(i) implies that
Thus Eq. (5) implies that
We complete the proof by showing that, for any , it holds that
In the following argument, we consider an arbitrary fixed internal representation in . We call a gate a threshold function if the inputs of the gate consists of only and . For each , we denote by a threshold function defined as
where is a threshold of , being assumed that the internal representation arises:
For each , we define a set of threshold functions as
Since every gate in is a threshold function, is identical to .
For any set of threshold functions, we denote by a matrix such that, for every ,
It is well-known that the rank of is bounded [9, 10], as follows.
Claim 16.
.
We give a proof of the claim in Appendix for completeness.
For each , based on in , we define a set of threshold functions as
and a family of sets of threshold functions as
Following the inclusion-exclusion principle, we define a matrix
We can show that is expressed as the Hadamard product of :
Claim 17.
The proof of the claim is given in Appendix.
Corollary 3.3.
Let and be integers satisfying , , . Suppose a threshold circuit of size , depth , energy , and weight computes . Then it holds that
Equivalently, we have .
Theorem 3.1 implies lower bounds for other Boolean functions with linear rank. For example, consider another Boolean function asking if :
Since is the identity matrix with full rank, we have the same lower bound.
Corollary 3.4.
Let and be integers satisfying , , . Suppose a threshold circuit of size , depth , energy , and weight computes . Then it holds that
Equivalently, we have .
4 Tightness of the Lower Bound
In this section, we show that the lower bound given in Theorem 3.1 is almost tight if the depth and energy are small.
4.1 Definitions
Let be a positive integer, and be a Boolean function of variables. We say that is -piecewise with if the following conditions are satisfied: Let
then
- (i)
-
compose a partition of ;
- (ii)
-
for every ;
- (iii)
-
We say that a set of threshold gates sharing input variables is a neural set, and a neural set is selective if at most one of the gates in the set outputs one for any input assignment. A selective neural set computes a Boolean function if for every assignment in , no gates in outputs one, while for every assignment in , exactly one gate in outputs one. We define the size and weight of as and , respectively.
By a DNF-like construction, we can obtain a selective neural set of exponential size that computes for any Boolean function .
Theorem 4.1.
For any Boolean function of variables, there exists a selective neural set of size and weight one that computes .
4.2 Upper Bounds
The following proposition shows that we can a construct threshold circuits of small energy for piecewise functions.
Lemma 4.2.
Let and be integers satisfying and , and . Suppose is a -piecewise function with . If is computable by a selective neural set of size at most and weight for every , is computable by a threshold circuit of size
depth , energy and weight
Clearly, is -piecewise, and so the lemma gives our upper bound for .
Theorem 4.3 (Theorem 1.2 restated).
For any integers and such that and , is computable by a threshold circuit of size
depth , energy and weight
We can also obtain a similar proposition for .
Theorem 4.4.
For any integers and such that and , is computable by a threshold circuit of size
depth , energy and weight
5 Simulating Discretized Circuits
In this section, we show that any discretized circuit can be simulated using a threshold circuit with a moderate increase in size, depth, energy, and weight. Thus, a similar inequality holds for discretized circuits. Recall that we define energy of discretized circuits differently from those of threshold circuits.
Theorem 5.1 (Theorem 1.3 restated).
Let be a discretizer and be an activation function such that has a silent range. If a -circuit of size , depth , energy , and weight computes a Boolean function , then it holds that
Our simulation is based on a binary search of the potentials of a discretized gate, we employ a conversion technique from a linear decision tree to a threshold circuit presented in [44].
6 Conclusions
In this paper, we consider threshold circuits and other theoretical models of neural networks in terms of four complexity measures, size, depth energy and weight. We prove that if the communication matrix of a Boolean function has linear rank, any threshold circuit of sub-linear depth, sub-linear energy and sub-exponential weight needs exponential size to compute .
We believe that circuit complexity arguments provide insights into neural computation. Besides the famous Marr’s three-level approach towards understanding the brain computation (computational level, algorithmic level, and implementation level) [25], Valiant [48] added an additional requirement that it has to incorporate some understanding of the quantitative constraints that are faced by cortex. We expect that circuit complexity arguments could shed light on a quantitative constraint through a complexity measure. Maass et al. [23] pointed out a difficulty to uncover a neural algorithm employed by the brain, because its hardware could be extremely adopted to the task, and consequently the algorithm vanishes: even if we know to the last detail its precise structure, connectivity, and vast array of numerical parameters (an artificial neural network given by deep learning is the case), it is still hard to extract an algorithm implemented in the network. A lower bound does not provide a description of an explicit neural algorithm, but could give a hint for computational natures behind neural computation, because a lower bound argument necessarily deals with every algorithm which a theoretical model of a neural network is able to implement. We expect that developing lower bound proof techniques for a theoretical model of neural networks can push this line of research.
References
- [1] Kazuyuki Amano. On the size of depth-two threshold circuits for the Inner Product mod 2 function. In Alberto Leporati, Carlos Martín-Vide, Dana Shapira, and Claudio Zandron, editors, Language and Automata Theory and Applications, pages 235–247, Cham, 2020. Springer International Publishing.
- [2] Kazuyuki Amano and Akira Maruoka. On the complexity of depth-2 circuits with threshold gates. In Proc. of the 30th international conference on Mathematical Foundations of Computer Science, pages 107–118, 2005.
- [3] Boris Barbour, Nicolas Brunel, Vincent Hakim, and Jean-Pierre Nadal. What can we learn from synaptic weight distributions? Trends in Neurosciences, 30(12):622–629, 2007. URL: https://www.sciencedirect.com/science/article/pii/S0166223607002615, doi:https://doi.org/10.1016/j.tins.2007.09.005.
- [4] Ruiwen Chen, Rahul Santhanam, and Srikanth Srinivasan. Average-case lower bounds and satisfiability algorithms for small threshold circuits. Theory of Computing, 14(9):1–55, 2018.
- [5] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, pages 3123–3131, Cambridge, MA, USA, 2015. MIT Press.
- [6] James J. DiCarlo and David D. Cox. Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8):333–341, 2007. doi:10.1016/j.tics.2007.06.010.
- [7] Krishnamoorthy Dinesh, Samir Otiv, and Jayalal Sarma. New bounds for energy complexity of boolean functions. Theoretical Computer Science, 845:59–75, 2020.
- [8] Peter Földiák. Sparse coding in the primate cortex. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 1064–1068. MIT Press, 2003.
- [9] Jürgen Forster, Matthias Krause, Satyanarayana V. Lokam, Rustam Mubarakzjanov, Niels Schmitt, and Hans Ulrich Simon. Relations between communication complexity, linear arrangements, and computational complexity. In Proc. of the 21st International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 171–182, 2001.
- [10] András Hajnal, Wolfgang Maass, Pavel Pudlák, Márió Szegedy, and György Turán. Threshold circuits of bounded depth. Journal of Computer and System Sciences, 46:129–154, 1993.
- [11] Johan Håstad and Mikael Goldmann. On the power of small-depth threshold circuits. Computational Complexity, 1(2):113–129, 1991.
- [12] Yunlong He, Koray Kavukcuoglu, Yun Wang, Arthur Szlam, and Yanjun Qi. Unsupervised feature learning by deep sparse coding. In Proc. of SIAM International Conference on Data Mining, pages 902–910, 2014.
- [13] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research, 18(187):1–30, 2018. URL: http://jmlr.org/papers/v18/16-456.html.
- [14] Russell Impagliazzo, Ramamohan Paturi, and Michael E. Saks. Size-depth tradeoffs for threshold circuits. SIAM Journal on Computing, 26(3):693–707, 1997.
- [15] Stasys Jukna. Extremal Combinatorics with Applications in Computer Science. Springer-Verlag Berlin Heidelberg, 2011.
- [16] D. M. Kane and R. Williams. Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits. In Proc. of the 48th Annual ACM Symposium on Theory of Computing, pages 633–643, 2016.
- [17] O. M. Kasim-zade. On a measure of active circuits of functional elements (russian). Mathematical problems in cybernetics “Nauka”, (4):218?–228, 1992.
- [18] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y. Ng. Efficient sparse coding algorithms. In Proc. of the 19th Advances in Neural Information Processing Systems, pages 801–808, 2006.
- [19] Peter Lennie. The cost of cortical computation. Current Biology, 13:493–497, 2003.
- [20] Nancy Lynch and Cameron Musco. A Basic Compositional Model for Spiking Neural Networks, pages 403–449. Springer Nature Switzerland, 2022. doi:10.1007/978-3-031-15629-8_22.
- [21] Wolffanf Maass, Georg. Schnitger, and Eduardo D. Sontag. On the computational power of sigmoid versus boolean threshold circuits. In Proc. of 32nd Annual Symposium of Foundations of Computer Science, pages 767–776, 1991. doi:10.1109/SFCS.1991.185447.
- [22] Wolfgang Maass. Networks of spiking neurons: The third generation of neural network models. Neural Networks, 10(9):1659–1671, 1997.
- [23] Wolfgang Maass, Christos H. Papadimitriou, Santosh Vempala, and Robert Legenstein. Brain Computation: A Computer Science Perspective, pages 184–199. Springer International Publishing, 2019. doi:10.1007/978-3-319-91908-9_11.
- [24] Hiroki Maniwa, Takayuki Oki, Akira Suzuki, Kei, and Xiao Zhou. Computational power of threshold circuits of energy at most two. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E101.A(9):1431–1439, 2018.
- [25] David Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H.Freeman & Co Ltd, 1982.
- [26] Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5:115–133, 1943.
- [27] Marvin Minsky and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. MIT Press, 1988.
- [28] Andrew Ng. Sparse autoencoder. CS294A Lecture notes, 2011.
- [29] Noam Nisan. The communication complexity of threshold gates. Proceeding of “Combinatorics, Paul Erds is Eighty”, pages 301–315, 1993.
- [30] Bruno A Olshausen and David J Field. Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4):481–487, 2004.
- [31] Ian Parberry. Circuit Complexity and Neural Networks. MIT Press, 1994.
- [32] Thomas Pfeil, Tobias Potjans, Sven Schrader, Wiebke Potjans, Johannes Schemmel, Markus Diesmann, and Karlheinz Meier. Is a 4-bit synaptic weight resolution enough? - constraints on enabling spike-timing dependent plasticity in neuromorphic hardware. Frontiers in Neuroscience, 6:90, 2012. URL: https://www.frontiersin.org/article/10.3389/fnins.2012.00090, doi:10.3389/fnins.2012.00090.
- [33] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of AC0. SIAM Journal on Computing, 39(5):1833–1855, 2010.
- [34] Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–408, 1958.
- [35] Janio Carlos Nascimento Silva and Uéverton S. Souza. Computing the best-case energy complexity of satisfying assignments in monotone circuits. Theoretical Computer Science, 932:41–55, 2022. URL: https://www.sciencedirect.com/science/article/pii/S0304397522004777, doi:https://doi.org/10.1016/j.tcs.2022.08.005.
- [36] K. Y. Siu and J. Bruck. On the power of threshold circuits with small weights. SIAM Journal on Discrete Mathematics, 4(3):423–435, 1991.
- [37] Kai-Yeung. Siu, Vwani Roychowdhury, and Thomas Kailath. Discrete Neural Computation: A Theoretical Foundation. Prentice Hall, 1995.
- [38] Kai-Yeung Siu and Vwani P. Roychowdhury. On optimal depth threshold circuits for multiplication and related problems. SIAM Journal on Discrete Mathematics, 7(2):284–292, 1994.
- [39] Xiaoming Sun, Yuan Sun, Kewen Wu, and Zhiyu Xia. On the relationship between energy complexity and other boolean function measures. In Proc. of the 25th International Computing and Combinatorics Conference, pages 516–528, 2019.
- [40] Akira Suzuki, Kei Uchizawa, and Xiao Zhou. Energy-efficient threshold circuits computing MOD functions. In Proc. of the 17th Computing: the Australasian Theory Symposium, pages 105–110, 2011.
- [41] Akira Suzuki, Kei Uchizawa, and Xiao Zhou. Energy-efficient threshold circuits computing MOD functions. International Journal of Foundations of Computer Science, 24(1):15–29, 2013.
- [42] Kei Uchizawa. Lower bounds for threshold circuits of bounded energy. Interdisciplinary Information Sciences, 20(1):27–50, 2014.
- [43] Kei Uchizawa. Size, Depth and Energy of Threshold Circuits Computing Parity Function. In Yixin Cao, Siu-Wing Cheng, and Minming Li, editors, 31st International Symposium on Algorithms and Computation (ISAAC 2020), volume 181 of Leibniz International Proceedings in Informatics (LIPIcs), pages 54:1–54:13, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. URL: https://drops.dagstuhl.de/opus/volltexte/2020/13398, doi:10.4230/LIPIcs.ISAAC.2020.54.
- [44] Kei Uchizawa, Rodney Douglas, and Wolfgang Maass. On the computational power of threshold circuits with sparse activity. Neural Computation, 18(12):2994–3008, 2008.
- [45] Kei Uchizawa and Eiji Takimoto. Exponential lower bounds on the size of constant-depth threshold circuits with small energy complexity. Theoretical Computer Science, 407(1–3):474–487, 2008.
- [46] Kei Uchizawa, Eiji Takimoto, and Takao Nishizeki. Size-energy tradeoffs of unate circuits computing symmetric Boolean functions. Theoretical Computer Science, 412:773–782, 2011.
- [47] M. N. Vaintsvaig. On the power of networks of functional elements (Russian). Doklady Akademii Nauk, 139(2):320?323, 1961.
- [48] Leslie G Valiant. What must a global theory of cortex explain? Current Opinion in Neurobiology, 25:15–19, 2014. doi:https://doi.org/10.1016/j.conb.2013.10.006.
7 Appendix
7.1 Proof of Claim 16
Let , and be an arbitrary order of threshold functions in . For each , we define
Since a threshold function receives a value between and from a single input, we have . For , we define as a combinatorial rectangle where
and
Clearly, all the rectangles are disjoint, and hence can be expressed as a sum of rank-1 matrices given by ’s taken over all the ’s. Thus Fact 1(i) implies that its rank is at most .
7.2 Proof of Claim 17
Consider an arbitrary fixed assignment . We show that
if , and
if . We write to denote for a simpler notation.
Suppose . In this case, we have , and hence there exists a level such that while for every . For such , it holds that
(9) |
for every . We show that by considering two cases: and .
Consider the case where , then there exists . Since , we have . Thus, Eq. (9) implies that , and hence for every such that . Therefore, for every , we have , and hence
Consider the other case where . Let . Equation (9) implies that if satisfies , and , otherwise. Thus,
Therefore, by the binomial theorem,
Suppose . In this case, we have . Thus, for every , Eq. (9) implies that if , and , otherwise. Therefore,
(10) | |||||
(11) | |||||
(12) |
Consequently
as desired.
7.3 Proof of Lemma 4.2
Let . For simplicity, we assume that is divisible by . Let be -piecewise with . We first prove the proposition for the case where satisfies
Let us relabel for as for , and . We then denote by for , and , each of which contains integers. Thus, we have
By the assumption, is computable by a selective neural set for every pair of and .
We construct the desired threshold circuit by arranging and connecting the selective neural sets, where has a simple layered structure consisting of the selective neural sets. After we complete the construction of , we show that computes , and evaluate its size, depth, energy, and weight.
We start from the bottom level. First, we put in the first level the gates in for every . Then, for each , , we add at the th level the gates for every , and connect the outputs of all the gates at the lower level to every with weight . For and , we denote by the output of the gate placed in . Finally, we add that computes a conjunction of all gates in the layers :
We here show that computes . By construction, the following claim is easy to verify:
Claim 22.
For any ,
Proof 7.1.
If every gate at the levels outputs zero, the output of is identical to the counterpart of , and hence . Otherwise, there is a gate outputting one at the lower levels. Since receives an output from the gate at the lower level, the value is subtracted from the potential of . Since receives at most positive weights bounded by , the potential of is now below its threshold. Otherwise, outputs one for any input assignment, which implies that, since is selective, is the constant function. This contradicts the fact that is -piecewise.
Suppose . In this case, for every , , and , . Therefore, Claim 22 implies that no gate in outputs one.
Suppose . In this case, there exists such that for some , while for every and . Since compute , Claim 22 implies that there exists such that , which implies that .
Finally, we evaluate the size, depth, energy, and weight of . Since contains at most gates for each pair of and where , we have in total . The additional one corresponds to the output gate. Because the gates are placed at the -th level for , the level of is clearly , and hence, has depth . Claim 22 implies that if there is a gate outputting one at level , then no gate in higher levels outputs one. In addition, since is selective, at most one gate outputs one. Therefore, at most gates at the th level output one, followed by gates outputting one. Thus, has an energy . Any connection in has weight at most or . Thus, the weight of is .
Consider the other case where
We can obtain the desired circuit by the same construction as above except that computes the complement of a conjunction of all gates in the layers :
7.4 Proof of Theorem 4.3
For simplicity, we consider the case where is a multiple of . It suffices to show that is -piecewise, and computable by a neural set of size and weight .
We can verify that is -piecewise, since
where
and .
Consider an arbitrary fixed . Below we show that is computable by a neural set of size
and weight .
Recall that we denote by for a statement a function that outputs one if is true, and zero otherwise. Then can be expressed as
(13) |
where
For any assignment , we define as
Then, for every ,
(16) |
The function is computable by a threshold gate.
Claim 26.
For any , can be computed by a threshold gate with weights , ; the threshold is defined as follows: for every ,
and
and .
Proof 7.2.
Suppose , that is, , and there exists such that . Then, we have
thus, outputs one.
Suppose . There are two cases: and for every
Consider the first case. If there exists , then
Thus,
and consequently, outputs zero. If , then
Thus,
and hence, outputs zero.
In the second case, we have
Thus,
and hence, outputs zero.
7.5 Proof of Theorem 4.4
Let . is -piecewise, since
where
and . Then the theorem implies that is computable by a selective neural set of size and .
7.6 Proof of Theorem 5.1
Let be a discretizer and be an activation function such that has a silent range for . In the proof, we only consider an open interval because the proofs for the other cases are similar. Let be a -circuit of size , depth , energy , and weight . We obtain the desired threshold circuit by showing a procedure by which any -gate in can be safely replaced by a set of threshold gates.
Let be an arbitrary -gate in that computes We first consider . Let be a set of potential values for which outputs a non-zero value:
Since the activation function and weights are discretized, we have .
We operate the binary search over , and construct a threshold gate that outputs one for every input such that takes a particular value in . For any , we define as the median of the integers in , and (resp., ) as the upper (resp., lower) half of :
If contains an even number of values, we take the greater value of the two median values .
Let be a binary string. We inductively construct a threshold gate on the length of . For two strings and , we write if is a proper prefix of . We denote a string followed by 0 (resp., by 1) by (resp., ).
As the base of our construction, we consider the empty string . Let . We constructs a threshold gate that computes
where .
Suppose we have constructed gates for every satisfying . Consider a string of length . By the induction hypothesis, we have gates for every and . Let be a string obtained by dropping the last symbol of . Then, we define
Let . We construct a threshold gate as follows:
where is the weight of the output of and is defined as
and , where is the number of ones in .
We repeatedly apply the above procedure until . Since we apply the binary search over , we have gates and the length of as for any gate .
Consider the strings for which we have constructed . Let
For each , we denote the unique string that satisfies by . The following claims show that the s are useful for simulating .
Claim 27.
Let be an input assignment that satisfies .
- (i)
-
For , outputs one if and only if is a prefix of .
- (ii)
-
For , outputs one if and only if .
Proof 7.3.
Consider an arbitrary input assignment that satisfies . For notational simplicity, we write for .
Proof of (i). We verify the claim by induction on the length of . For the base case, we consider . It suffices to show that if the first symbol of is 1; otherwise, . If the first symbol is 1, we have , which implies that . Thus, . Similarly, if the first symbol is zero, we have , implying that . Thus, .
We assume that for the induction hypothesis, outputs one if and only if is a prefix of for every of length , at most, for a positive integer . Next, we consider a string of length .
We first verify that outputs zero if itself is not a prefix of . If is not a prefix of , there exists a prefix of such that (resp., ) is a prefix of , whereas (resp., ) is not a prefix of .
Consider the case where is a prefix of (i.e., is a prefix of ). In this case, the induction hypothesis implies that: . In addition, since is a prefix of , we have . Thus, the potential of for is at most
which is less than zero because .
Consider the case in which is a prefix of (i.e., is a prefix of ). In this case, the induction hypothesis implies that . In addition, since is a prefix of , . Thus, the potential of for is at most
which is again less than zero because .
Suppose is a prefix of . The induction hypothesis implies that the potential of is equal to
Thus, similar to the base case, we can show that if is a prefix of , and otherwise. More formally, if is a prefix of , we have , which implies that . Thus, . If is a prefix of , we have , which implies that . Thus, .
Proof of (ii). Consider . Similar to claim (i), we can verify that if . If , claim (i) also implies that the potential of is equal to
Since , . Thus, we have
which implies that .
Claim 28.
For any that satisfies , every gate outputs zero.
Proof 7.4.
Since , , which is implied by a similar argument to Claim 27, all the gates such that contains a symbol 1. If consists of only 0s, we have for any , which implies that .
Claims 27 and 28 imply that we can safely replace with s, , by connecting the output of each gate , , with weight , where is the unique value in to every gate that the gate is originally connected to in . Since and the length of is , the size and depth increase by these factors, respectively.
We can construct another set of threshold gates for in a similar manner to the above procedure That is, it suffices to consider a gate obtained by multiplying by the weights and threshold of together with an interval .
Using the above procedure, we replace every gate in with a set of threshold gates, and complete the construction of . Clearly, the size of is . Since receives the outputs of for every and the length of is , the depth of is . Claims 27 and 28 imply that the energy of is . Clearly, the weight of is . Thus, Theorem 3.1 implies that is bounded by
as desired.