Asymptotic Analysis of Deep Residual Networks
Key Findings
We investigate the asymptotic properties of deep Residual networks (ResNets) as the
number of layers increases. We first show the existence of scaling regimes for trained weights
markedly different from those implicitly assumed in the neural ODE literature. We study
the convergence of the hidden state dynamics in these scaling regimes, showing that one may
obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular,
our findings point to the existence of a diffusive regime in which the deep network limit
is described by a class of stochastic differential equations (SDEs). Finally, we derive the
corresponding scaling limits for the backpropagation dynamics.
Abstract
Residual networks, or ResNets, are multilayer neural network architectures in which a skip
connection is introduced at every layer. This allows very deep networks to be trained
by circumventing vanishing and exploding gradients, mentioned in [3]. The increased depth
in ResNets has lead to commensurate performance gains in applications ranging from speech
recognition to computer vision.