reference 정리

1. Abstract

General Purpose seq2seq models are getting really powerful

  • Capture world knowledge in parameters

  • Strong results on loads of tasks

  • Applicable for almost everything

But

  • Hallucinate

  • Struggle to access and apply knowledge

  • Difficult to update

Retrieval is great

Externally-retrieved knowledge/text is useful for a huge variety of NLP tasks

  • Precise and accurate knowledge access mechanism

  • Trivial to update at test time

  • Dense retrieval starting to outperform traditional IR

But often limited applicability because usually

  • Need retrieval supervision

  • Or "heuristics" - based retrieval (e.g. TF-IDF)

  • Need some (usually task specific) way to integrate into downstream models

How can we combine the strengths of explicit knowledge retrieval and seq2seq

  • 🌟REALM(cuu et al 2020)

  • 🌟LAMA (Petroni et al. 2019, Petroni et al. 2020) "Closed-Book QA" (Roberts et al. 2020)

  • Memory networks (Weston et al. 2015, Sukhbaatar et al. 2015 +++)

  • Knowledge-grounded Dialogue models (Dinan et al. 2019, Weston et al. 2018 +++)

Retrieval-augmented Generation (RAG)

  • Jointly learn to retrieve and generate end2end

  • Latent retrieval - no labels needed for retrieved docs

  • General recipe for any seq2seq task

Needs 3 things:

  • A (pretrained) generator model P(y...)P(y\mid...)​ e.g. BART, GPT2, T5

  • A (pretrained) retriever model P(zx)P(z \mid x) e.g. DPR, ICT

  • An indexed KB of text documents Z e.g. Wikipedia, CommonCrawl, tweets, ++

RAG models combine parametric and non-parametric memory and work well for knowledge intensive tasks

1.2. Architecture

Last updated