reference 정리
1. Abstract
General Purpose seq2seq models are getting really powerful
Capture world knowledge in parameters
Strong results on loads of tasks
Applicable for almost everything
But
Hallucinate
Struggle to access and apply knowledge
Difficult to update
Retrieval is great
Externally-retrieved knowledge/text is useful for a huge variety of NLP tasks
Precise and accurate knowledge access mechanism
Trivial to update at test time
Dense retrieval starting to outperform traditional IR
But often limited applicability because usually
Need retrieval supervision
Or "heuristics" - based retrieval (e.g. TF-IDF)
Need some (usually task specific) way to integrate into downstream models
How can we combine the strengths of explicit knowledge retrieval and seq2seq
Related work
🌟REALM(cuu et al 2020)
🌟LAMA (Petroni et al. 2019, Petroni et al. 2020) "Closed-Book QA" (Roberts et al. 2020)
Memory networks (Weston et al. 2015, Sukhbaatar et al. 2015 +++)
Knowledge-grounded Dialogue models (Dinan et al. 2019, Weston et al. 2018 +++)
Retrieval-augmented Generation (RAG)
Jointly learn to retrieve and generate end2end
Latent retrieval - no labels needed for retrieved docs
General recipe for any seq2seq task
Needs 3 things:
A (pretrained) generator model e.g. BART, GPT2, T5
A (pretrained) retriever model e.g. DPR, ICT
An indexed KB of text documents Z e.g. Wikipedia, CommonCrawl, tweets, ++
RAG models combine parametric and non-parametric memory and work well for knowledge intensive tasks
1.2. Architecture
Last updated