Evolution and Exploration for Existing Neural Network model

Abstract

Convolutional neural network (CNN) is one of the most attractive deep learning structure in deep learning (DL) during past decades and made breakthrough progress. However the efficiency of designing neural network model still requires a wealth of professional experiences and knowledge which is the bottleneck of DL’s development. Recently, neural network search (NAS) has been widely interested and utilized to automatically design and generate high-performance models. NAS is an advanced method which successfully led to the development of Deep Learning in recent years but it needs abundant computation resources and time costs. Both of normal network designing method and NAS face this problem which is hard to tackle. Although, we are inspired from the perspective of software evolution and re-exploration that we can implement the same method to existing neural network model to address the above issues. Software evolution is the process of developing software initially, then timely updating it for various reasons. Therefore we can also update the existing neural network model, improve the performance and transplant the model to solve more complex problem. In this paper, we borrow the ideas of software maintenance and evolution, measure the computing cost of NAS and obtain a better model based on existing models, instead of searching for an architecture from the beginning. Our experiment demonstrates that the method we proposed is promising and useful to sufficiently explore the potential of model which helps researchers and engineers to easily attain high-performance network based on existing structure, and the final performance of our method can be more compelling than normal NAS.

Index Terms:

Model Evolution, Deep Learning, Neural architecture search

I Introduction

Software maintenance and evolution are very broad activity that include complex concepts and take considerable efforts. It optimizes the software performance by reducing errors, controlling the time costs and eliminating meaningless parts. It can take up to long time to build a software system while its maintenance and modification can be an ongoing activity which takes even longer. The cost of system maintenance represents a large proportion of the budget of most organizations that use software system. The same problem also exists in DL. Researchers and engineers often invest a lot of time to design neural network models and continuously optimize the model until it archives the best performance. We think that the experiences learned from software maintenance and evolution can be useful for us to better manage and optimize the existing neural network model.

For software maintenance, reuse-oriented perspective was taken to reform and improve the quality of software where the application domain is defined as a family of systems which have some features in common [1]. There is also model-oriented designing of re-configurable software architectures [2]. These work’s principle are similar with neural network designing. For example, as an advanced idea of designing neural network model, NAS can be executed by cell-based search space [3]-[8] which has similarity with explore common features for designing software architecture, and the model-oriented concept can be compared with the classical idea of NAS [9]-[10]. The later work supported software evolution effectively and provided efficient and modular structures, refactoring, and collaboration-level extensions [11], which strengthened the reusing of certain software system and performed the same way like Network Morphism [12] in NAS.

For NAS, it requires professional knowledge, abundant experiences, time and computation resources to design robust neural network which makes deep learning difficult to be deployed and applied. As the growing interests of automation machine learning, Zoph and Le suggested that model can be discovered by reinforcement learning, which symbolized the beginning of related work. NAS process is normally divided into three parts: definition of search space, search strategy and performance estimation strategy. It has been proven to be critical that appropriate search space enables to contribute to competitive performance of models and even archives the State-of-the-art performance. With the developing of NAS, a whole variety of techniques were applied to reduce the costs of search time and computation resource. Evolution algorithm was also proved to be capable for NAS as well as Reinforcement learning algorithm. However, due to the restriction of time costs and computation resources, the efficiency of NAS always restricts the development of automatically designing of model to varying degrees. More and more researchers don’t want to spend a lot of time on designing the network structure. They also want to quickly obtain the desired model through an efficient way. This provides an application scenario for the idea of software maintenance and evolution.

Recently, The maintenance and evolution for NAS system have drawn wide attention, plentiful works have been done to optimize the normal NAS method and reduce the computation burden. One-shot method such as SMASH [13] derives a single model from whole search process. ENAS [14] transformed the search space to a set of graph and explored the key component of model such as cells and nodes. Differential search space was proposed to encode the network for search space and effectively reduce the time of computation [15]. Another compelling research direction is Network Pruning which demonstrated a novel method to control the costs of NAS [16], AutoGrow [17] reformed neural network model by growing instead of pruning. The evolution of specific direction for image classification in NAS [18] has also been published and archived compelling performance. In parallel to these efforts, outstanding tools and platform of NAS was released to accelerate the development of search techniques and application of NAS. AutoKears [19] was published as an open-source NAS platform which integrated the method of NAS and applied advanced techniques like network morphism, helped others to discover network architecture with comparable simply interfaces. AutoKeras employed a three-layer convolutional neural network as the initial seed for search process, ResNet and DenseNet structures can be generated simultaneously. The above methods have each achieved good performance, but their training processes are all started from scratch, and we don’t think it is necessary, but to introduce existing models to greatly reduce the time cost. Like software maintenance and evolution, our neural network models can also evolve and be based on certain foundations rather than starting from scratch. Our experimental results show that this approach is feasible and efficient.

II MOTIVATION

To better illustrate our motivation and approach, we discuss the intentions of this paper through two aspect:

Per 1

Why do we maintain and evolve neural network models?

Per 2

How to efficiently maintain and evolve existing neural network models and avoid to bring side effects to model designing?

For $Per\ 1$ , the idea of applying software maintenance and evolution can obviously help us achieve efficient model reuse. Normal NAS requires abundant computation resources and time costs which is regarded as a challenge for deploying NAS method and attain robust model quickly. But starting from the existing model, the process of designing the model from scratch can be omitted to a certain extent. Because for specific data sets and models, there are always some structures such as layers and nodes that are essential. Therefore, from the perspective of software evolution and exploration, we are inspired that implementing appropriate exploration method on existing neural network model can progressively improve the performance and portability. The existing model has some superior characteristics, such as the depth and basic structure of the network, which does not require much design. It can dig its own shortcomings in the process of NAS itself and continue to achieve better performance through the guidance of reinforcement learning. Although previous works primarily focus on generate model from a random or specific basic component or seed. In order to give full play to the advantages of existing models, we should evolve these models in the right way. Although one of the crucial factor of normal NAS’s correctness and robustness is that the search space always include the most promising model, so the initial shape of seed must be simply enough and guarantee to avoid side affect from the very beginning. More specific network models usually also cause the search space to become narrow. If we adopt the model that has achieved the best performance, our evolutionary measures may become meaningless. Therefore, our method’s choice of existing models is mainly limited to those parts that have room for improvement.

Refer to caption — Figure 1: On the left is the confrontation of work process between normal NAS and Evolution, we use existing network to reduce the time costs of fitting, which is the most time consuming. The right align describes the process of fitting.

For $Per\ 2$ , we utilize the method of AutoKeras and embed functions to initialize the seed of search and propose a better scheme to optimize the whole NAS process. $Figure\ 1$ shows that employing an existing network model is able to reduce the time consuming of fitting, which iterates the search operations until no more increment of performance in a single search round. The essential issue is how to optimize the existing models.We considered many different means to promote performance of models from initial status. Adjust the key node of network can expand our search space but enlarge the time cost at the same time. Search for small topological structure for network requires perplex design for search algorithm and previous work has addressed this issue by Network Pruning and cell-based search. However, due to limited computing resources, the above methods are basically not applicable to the evolution of network architecture. Starting from fully exploiting the superiority of the model itself, we try to find a measurement standard to quantify the mineability of each model architecture. The standard of this metric must heuristically guide us in the direction of mining, that is, to dynamically optimize the evaluation process of each architecture: spend as much time as possible on the potential structure, while minimizing the meaningless time consuming. We introduce the Bayesian Learning algorithm as the backbone of our method to dynamically optimize the search strategy. At the same time, we regard the performance of the model and the structure of the model as obeying some potential probability distribution, and guide our search strategy by calculating and optimizing a conditional probability. As a widely used probability model, multidimensional normal distribution is provided with powerful ability to fit unknown distributions. We find that regard the performance of model as a normal probability distribution actually leads to more convinced results in the experiment. The experiment expressed that our hypothesis and method promoted performance of the final model and reduced time cost to a certain extent.

The contributions of the paper are summarized as follows:

·

We apply the ideas of software maintenance and evolution to neural network models.
·

We propose a dynamic and efficient evolution strategy of neural network architecture.

To the best of our knowledge, this is the first work that the concepts of software evolution and maintenance have been applied to a neural network architecture search.

III Method

In order to measure the discrepancy between two layers and compute the improvement between two search round , we use a chain of matrix multiplication to represent a single network model:

\omega_{o}=\prod_{i=1}^{n}\omega_{i}

(1)

Where $\omega_{1}$ , $\omega_{k}$ and $\omega_{o}$ represents the first layer, the k-th layer and output of the model, respectively. We use $\{\omega_{1},\omega_{2},...,\omega_{n}\}$ to precisely denotes the formation of a model. As we elaborated, CNN-based network can be characterized by $(1)$ , thus we can express the reshaped network by reconstruct $(1)$ as follow:

\omega_{o}=\prod_{i=1}^{k}\omega_{i}\omega_{t}\prod_{j=k+1}^{n}\omega_{i}

(2)

$\omega_{t}$ is the inserted layer, e.g., CNN layer, pooling layer, activation function and etc. $(2)$ provides us an idea to measure the difference of old layer and new layer. The computation method of $\omega_{t}$ is to find the formulation satisfies following relationship :

\omega_{k}\omega_{k+1}=\omega_{k}^{\prime}\omega_{t}\omega_{k}^{\prime}

(3)

Where $\omega_{k}$ and $\omega_{k}+1$ ‘s dimensions are equal to $\omega_{k}^{\prime}$ and $\omega_{k}+1^{\prime}$ , respectively. For regular fully connected layer, $\omega_{t}$ can be easily initialed into identity matrix. Then the NAS process will tune the parameters for whole network and $\omega_{t}$ changed into $\omega_{t}^{\prime}$ and maintain the same dimensions.

Definition 1

For two network models $M_{i}$ and $M_{i}+1$ , we define the distance between $M_{i}$ and $M_{i}+1$ as $D$ :

V=(L_{1},L_{2},...,L_{n})

(4)

D=\|{V_{k}},{V_{k+1}}\|

(5)

Where $L_{i}$ represents the normalization of matrix $M_{i}$ , and we employ the norm of vector $V_{i}$ to subscribe the the specific layer and $D$ is the European distance between $V_{i}$ and $V_{i+1}$ . consequently, we find a method to indicate the difference of two network. Especially, in order to ensure the equal dimension of $V_{i}$ and $V_{i+1}$ , we add $\omega_{t}$ into $V_{i}$ and guarantee Computability of $(3)$ .

Input: model vector $V_{k+1}$ , round $R$ , accuracy $A$

Output: evaluation of depth $NEXTDEPTH$

Initialization: $NEXTDEPTH$ ←0;

if $V_{k}$ == $NULL$ then

return

CONSTANT

else

D

←

calculate\_distance

(

V_{k}

V_{k+1}

);

N_{a}

←

statistic

(

D

R

);

N_{a,d}

←

statistic

(

A

D

R

);

P_{a|d}

←

N_{a,d}

N_{a}

;while $Counter$ ≠ $NULL$ do

Counter

←

Counter.NEXT

+ 1;if $P_{a|d}$ ≥ $HISTORY$ [ $Counter$ ] then

NEXTDEPTH

←

NEXTDEPTH

+ Δ;

else

update(\Delta);

end if

end while

end if

return

NEXTDEPTH

;

Algorithm 1 Evolution of existing model

After computing the discrepancy of the models, we intends to explore the relationship of model’s formation and model’s performance. We propose a hypothesis that there is a probabilistic distribution that can fit the the relationship among performance and formation. Multivariate Normal distribution ( $MVN$ ) has been considered as a powerful model to fit complex distribution of data, which is a generalization of the one-dimensional normal distribution to higher dimensions. We utilize $MVN$ to picture that performance of the model can be calculated by bayesian formula.

To quantify the relationship between performance and model’s structure, we define the probability of model $M_{i}+1$ can archive to such performance from $M_{i}$ as following:

P(A|D)=\frac{P(A,D)}{P(D)}=\frac{N_{a,d}}{N_{d}}

(6)

Where $A$ represents the increment of accuracy of model $M_{i+1}$ . $N_{d}$ represents the number of generated models and $N_{a,d}$ denotes the number of models which archived such increment of accuracy $A$ among $N_{d}$ . Therefore we can easily evaluate how much accuracy we can attain from the generated network and strategically optimize our strategy. Furthermore, we transform the NAS problem to an optimal problem since we are required to establish a function to predict the appropriate depth for every single search round of NAS base on the probability attained from $(5)$ . Although, $P(A|D)$ can be very disarray from the very beginning of the whole process, so we use a parameter $R$ to constraint and avoid the abnormal behavior of optimizer which basically the search progress.

We define the relationship between the possibility of $P$ and depth $T$ as following:

T=f(P,R)

(7)

Where $R$ represents the process rate of NAS process, $A$ is proportional to the time costs thus we have to guide the scale of $A$ into a reasonable interval. We treats f as a classifier which outputs the possible value of $T$ .

Algorithm 1 shows the detail of our work process and principles. For a specific dataset, we loads an existing model as the initial seed for NAS process. For each search round we regard the value of $T$ as the recognition and feedback from NAS to the current structure. Therefore we compute the depth base on conditional probability provided by the tuning process at every round. By dynamically determining and optimizing the exploration of a temporary network, we can flexibly control the degree of mining of each temporary structure. We formulate $\Delta$ to 1 in the beginning and update $\Delta$ until the round closed. After updating the depth of search, Our method will record the depth and use it as a foundation for next round. Our method is embedded in NAS. Since we will read the dynamically saved search record before each optimization, the calculation time consumed is negligible compared to NAS itself.

IV EVALUATION

We used CIFAR10, CIFAR100 as datasets. The CIFAR10 dataset is a collection of images which contains 60,000 32x32 color images in 10 different classes. CIFAR100 is just like the CIFAR10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR100 are grouped into 20 superclasses.

For each data set, we arranged two sets of comparative experiments. First of all, we compare the accuracy achieved by our method and conventional NAS in a specified time to measure the efficiency of the two methods which is called conditional verification. Then we conducted a parallel verification to compare the accuracy that the two methods can ultimately achieve, and the time it takes, respectively. This comparison can better explain the advantages and disadvantages of each method. All experiments are conducted on double Nvidia GTX 1080 GPUs. Table 1 shows the specific results of the two sets of experiments. For convenience, we use N-NAS for normal NAS and E-NAS for model evolution and exploration based on existing models. In addition, we use $init-acc$ to represent the optimal performance of the two methods in the first few hours, where CIFAR10 is 2 hours and CIFAR100 is 4 hours and $final-acc$ to represent the final performance. We also use $init-time$ to indicate the time consuming that the two groups of methods first achieve a certain accuracy rate, of which CIFAR10 is 85% and CIFAR100 is 55%. Finally, the $final-time$ is used to represent the time required for the two methods to converge.

TABLE I: exprtiment on cifar10 and cifar100

	strategy	init-acc	init-time	final-acc	final-time
	N-NAS	80.60	2.5	92.10	11.7
cifar10	E-NAS	85.81	1.7	92.12	7.9
	N-NAS	45.74	6.7	65.55	14.6
cifar100	E-NAS	47.08	4.9	65.62	10.1

IV-A conditional verification

The key to the configuration comparison experiment is to extract the performance of the two methods in the first few hours. The purpose of this is to prevent the NAS from gradually deleting the structure of the existing model and training a new set of hyperparameters from scratch. After embedding the existing model, we first compare the performance of the two methods in a certain time range. After embedding the existing model, we first compare the performance of the two methods in a certain time range. On CIFAR10, the accuracy of N-NAS is stable at 82.60 and E-NAS reaches 85.81 within 2 hours. By analyzing the changes in the performance of the two methods, it can be found that the accuracy of E-NAS can be faster. Stable, the iteration of the network layer is faster. On CIFAR100, N-NAS and E-NAS reached 45.74 and 47.08 respectively in 4 hours. Considering that CIFAR100 is a more complex data set, this performance shows that E-NAS does not bring obvious side effects and search space till reasonable. After many comparisons, we can still reproduce such results, which shows that the reuse and evolution of existing models is a correct idea, but we also need to observe the final results to confirm whether our method is completely effective.

IV-B parallel verification

Based on the conclusions of comparative experiments, we continued to observe the performance of the two groups of methods several times, and obtained the following data: The best results that N-NAS and E-NAS can achieve 92.10 and 92.12 on CIFAR10, and 65.55 and 65.62 on CIFAR100, respectively. In terms of time, on CIFAR10, N-NAS and E-NAS are on It converged after 11.7 and 8.9 hours. And on CIFAR100, N-NAS and E-NAS took 14.6 and 10.1 hours, respectively. This is obviously a more competitive result. Although the final accuracy rate did not exceed the state of the art, we believe that without the use of tricks, such a result is sufficient to demonstrate the following two facts: E-NAS is more efficient compared to N-NAS; after embedding appropriate existing models, E-NAS can get as good results as N-NAS.

In addition, through the performance of the two sets of methods in the entire architecture search process, we can see that based on the evolution and development of existing models, we have effectively saved us a part of the fitting time, which obviously meets expected experimental results. Therefore, looking back at the two research questions originally proposed in the paper, we believe that the ideas of software maintenance and evolution are applicable to the re-exploration of models, and our proposed evolution and exploration method is efficacious.

V conclusion

Neural network architecture takes model as output and has become a popular technology for the design of automated models, but its time costs and computing resources have been restricting its widespread application. We combined the concepts of software maintenance and evolution and NAS, and proposed evolution and exploration methods that can be applied to existing models to address the above problems. According to the experimental results, the evolution of the existing model is of practical significance. We can use this method to improve the portability of the model, thereby solving some more complex problems. The maintenance of the neural network architecture requires the same long-term investment as the software maintenance. Our optimization strategy can reduce the maintenance and evolution time to a certain extent.

VI Future work

Although we have proposed the idea of combining software maintenance and evolution with NAS, it is clear that the optimization of NAS is still a challenging research direction, because in future work, we have two ideas to expand: transplant software The concept of engineering to continuously evolve neural network models or algorithms; propose more competitive optimization measures for NAS optimization to enhance the portability of the model.

References

[1] H. Gomaa, G.A. Farrukh, ”Automated configuration of distributed applications from reusable software architectures”, IEEE International Conference Automated Software Engineering, 1997.
[2] H Gomaa, M Hussein, ”Model-based software design and adaptation”, Software Engineering for Self-Adaptive Systems, 2007.
[3] b11Huang G, Liu Z, Weinberger K Q, et al, ”Densely connected convolutional networks”, Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
[4] Vladimir Nekrasov, Hao Chen, Chunhua Shen, and Ian D. Reid, ”Fast neural architecture search of compact semantic segmentation models via auxiliary cells”, arXiv preprint, 2018.
[5] Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, and Cheng- Lin Liu, ”Blockqnn: Efficient block-wise neural network architecture generation”, arXiv preprint, 2018b.
[6] Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu, ”Path-Level Network Transformation for Efficient Architecture Search”, in International Conference on Machine Learning, June 2018b.
[7] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger, ”Densely Connected Convolutional Networks”, in Conference on Computer Vision and Pattern Recognition, 2017.
[8] Barret Zoph and Quoc V. Le, ”Neural architecture search with reinforcement learning”, in International Conference on Learning Representations, 2017.
[9] Barret Zoph, Quoc V. Le, ”Neural Architecture Search with Reinforcement Learning”, arXiv:1611.01578,2016.
[10] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V Le, ”Learning transferable architectures for scalable image recognition”, in Proceedings of the IEEE conference on computer vision and pattern recognition,2018.
[11] Gyu Kim,Hwan Bae, Eui Hong,”A component composition model providing dynamic, flexible, and hierarchical composition of components for supporting software evolution”, in Journal of Systems and Software, 2007.
[12] Tao Wei, Changhu Wang, Yong Rui, and Chang Wen Chen, ”Network morphism” ,In International Conference on Machine Learning, pp. 564–572, 2016.
[13] Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston, ”SMASH: One-Shot Model Architecture Search through HyperNetworks”, arXiv preprint arXiv:1708.05344, 2017.
[14] Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, ”Efficient Neural Architecture Search via Parameter Sharing”, arXiv:1802.03268, 2018.
[15] Hanxiao Liu, Karen Simonyan, Yiming Yang, ”DARTS: Differentiable architecture search”, in International Conference on Learning Representations, 2019b.
[16] Yijun Bian, Qingquan Song, Mengnan Du, Jun Yao, Huanhuan Chen, Xia Hu, ”Sub-Architecture Ensemble Pruning in Neural Architecture Search”, arXiv:1910.00370, 2019
[17] Wei Wen, Feng Yan, Hai Li, ”AutoGrow: Automatic Layer Growing in Deep Convolutional Networks”, arXiv:1906.02909, 2019.
[18] Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le, ”Regularized Evolution for Image Classifier Architecture Search”, in Association for the Advancement of Artificial Intelligence, 2019.
[19] Haifeng Jin, Qingquan Song, Xia Hu,”Auto-Keras: An Efficient Neural Architecture Search System”, arXiv:1806.10282, 2018.