Constitutional AI
March 13, 2024 · 1 minute read ·
NLP
Key Ideas
- Sufficiently large models can be used critique and revise unaligned responses to problematic queries, which can then be used to finetune the original model
Notes
Critiques and Revisions
- Steps
- Ask a “problematic” query
- Ask the model critique its own/another model’s responses
- e.g. “Critique Request: Identify specific ways the assistant’s last response is harmful, … \n Critique: “
- Ask the model to revise the response based on the the critique it provided
- e.g. “Revision Request: Please rewrite the assistant response to remove any and all harmful, … content \n Revision:”
- Finetune the unaligned model with the original problematic query and the revised response
Misc
- There are specific datasets (example) which contain “problematic” prompts that AI researchers and LLM developers can use to apply the same process as above
Resources