Lottery Ticket Hypothesis

On the other hand, there is no clear notion of a feature in a neural network and the separation of roles in between weights and information representations is less distinct. The holy grail of winning tickets is to recognize them as early as achievable in the education course of action. Dettmers and Zettlemoyer11 propose a method to identify winning tickets without having the need for costly retraining. They exploit the momentum of the gradients to establish how rapid weights modify in the course of training and prune these that do not adjust substantially.

This phenomenon provides a novel interpretation of overparametrization, which behaves as possessing a lot a lot more draws from the lottery. To advantage from their existence, one particular wants to uncover methods to identify winning tickets early and with out coaching the full model at all. Some approaches currently tackle this, when other folks focus on training approaches that make neural networks extra amenable to later pruning. If we could recognize winning tickets early or transfer them to other domains, we would save substantial amounts of education effort. Winning tickets in some cases even outperform the original networks, which could possibly have implications for our understanding of and the style of architectures and their initializations. We can confirm that iterative magnitude pruning succeeds to finds winning tickets that correspond to human wisdom for a very simple toy job.

