rl-papers/LFHF.md at main · datawhalechina/rl-papers · GitHub

Learning to summarize from human feedback

OpenAI采用LFHF技术在NLP领域的初步尝试

Raining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Anthropic团队尝试用LFHF技术解决Harmless Assistant问题

InstructGPT: Training language models to follow instructions with human feedback

OpenAI采用LFHF技术在NLP领域的进一步尝试，也是ChatGPT的前身

Constitutional AI: Harmlessness from AI Feedback

针对Human feedback效率低的问题，提出AI feedback方案

Scaling Laws for Reward Model Overoptimization

分析 Reward Model 细节的文章