Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have motivated the neural ordinary differential equations (neural ODE) architecture which garnered a lot of interest in the Machine Learning community. A widely established claim is that the Neural ODE is the limit of ResNets as the number of layers increases. A collaboration effort between InstaDeep and the University of Oxford challenges this claim by investigating the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments.
