# Paper

{% content-ref url="untitled/sequence-to-sequence" %}
[sequence-to-sequence](https://lswkim322.gitbook.io/til/til-ml/untitled/sequence-to-sequence)
{% endcontent-ref %}

> 봐야할 논문들....
>
> * Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) -- Network <https://arxiv.org/pdf/1808.03314.pdf&#x20>;
> * Attention Is All You Need (Transformer) <https://arxiv.org/abs/1706.03762&#x20>;
> * BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT) <https://arxiv.org/abs/1810.04805&#x20>;
> * Improving Language Understanding by Generative Pre-Training (GPT) <https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language\\_understanding\\_paper.pdf&#x20>;
> * Language Models are Unsupervised Multitask Learners (GPT2)&#x20;
> * ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://arxiv.org/abs/2003.10555&#x20>;
> * RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692&#x20>;
> * ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942&#x20>;
> * BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (BART)&#x20;
> * XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237&#x20>;
> * Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>
