Eric Jang sur Twitter : "why transformers need certain optimization tricks that aren't needed by other architectures"
Tags:
About This Document
File info