Semanlink - [2302.04907] Binarized Neural Machine Translation

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Yichi Zhang
sl:arxiv_num : 2302.04907
sl:arxiv_published : 2023-02-09T19:27:34Z
sl:arxiv_summary : The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residual connections to improve binarization quality. Experiments on the WMT dataset show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size. One-bit activations incur varying degrees of quality drop, but mitigated by the proposed architectural changes. We further conduct a scaling law study using production-scale translation datasets, which shows that one-bit weight Transformers scale and generalize well in both in-domain and out-of-domain settings. Implementation in JAX/Flax will be open sourced.@en
sl:arxiv_title : Binarized Neural Machine Translation@en
sl:arxiv_updated : 2023-02-09T19:27:34Z
sl:bookmarkOf : https://arxiv.org/abs/2302.04907
sl:creationDate : 2023-02-13
sl:creationTime : 2023-02-13T14:51:45Z

File info

Documents with similar tags (experimental)

Tags:

2021-06-30 About

Tags:

2018-12-14 About

Tags:

2018-10-12 About