Layer-Wise Data-Free CNN Compression
Abstract
We present a computationally efficient method for compressing a trained neural network without using real data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. We present results for layer-wise compression using quantization and pruning. When quantizing, we outperform related works while using orders of magnitude less compute. When pruning, we outperform baselines of a similar compute envelope. We also show how to combine our method with high-compute generative methods to improve upon state-of-the-art data-free pruning.