- Home
- /
- TIL
- /
- Transformer
Attention Is All You Need
Table of Contents
Title: Attention Is All You Need
Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Published: Jun 12 2017
Link: https://arxiv.org/abs/1706.03762
Summary (Generated by Microsoft Copilot):
Introduction:
- The paper introduces the Transformer, a novel neural network architecture based solely on attention mechanisms, eliminating the need for recurrence and convolutions.
Challenges:
- Traditional sequence transduction models rely on recurrent or convolutional neural networks, which are less parallelizable and more time-consuming to train.
Methods:
- The Transformer uses multi-head self-attention and point-wise, fully connected layers for both the encoder and decoder.
Novelties:
- The model achieves superior performance in machine translation tasks and is significantly more parallelizable, reducing training time.
Results:
- Achieved 28.4 BLEU on the WMT 2014 English-to-German task and 41.8 BLEU on the English-to-French task.
Performances:
- Outperforms previous state-of-the-art models, including ensembles, with a fraction of the training cost.
Limitations:
- The paper does not discuss specific limitations but focuses on the advantages of the Transformer model.
Discussion:
- The Transformer generalizes well to other tasks, such as English constituency parsing, and shows promise for future applications in various domains.