### 数学学院、所系列学术报告(843场)：杨周旺教授

We study L_{1,\infty}-weight normalization for deep neural networks to achieve the sparse architecture. Empirical evidence suggests that inducing sparsity can relieve overfitting, and weight normalization can accelerate the algorithm convergence. In this paper, we theoretically establish the generalization error bounds for both regression and classification under the L_{1,\infty}-weight normalization. It is shown that the upper bounds are independent of the network width and sqrt(k)-dependence on the network depth k, which are the best available bounds for networks with bias neurons. These results provide theoretical justifications on the usage of such weight normalization. We also develop an easily implemented gradient projection descent algorithm to practically obtain a sparse neural network. We perform various experiments to validate our theory and demonstrate the effectiveness of the resulting approach.