[1907.05242] Large Memory Layers with Product Keys (2019)(About) > a structured memory which can be easily integrated into a neural network. The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.
> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!