Constitutional AI

March 13, 2024 · 1 minute read · NLP

Key Ideas

Sufficiently large models can be used critique and revise unaligned responses to problematic queries, which can then be used to finetune the original model

Notes

Critiques and Revisions

Steps
1. Ask a “problematic” query
2. Ask the model critique its own/another model’s responses
  - e.g. “Critique Request: Identify specific ways the assistant’s last response is harmful, … \n Critique: “
3. Ask the model to revise the response based on the the critique it provided
  - e.g. “Revision Request: Please rewrite the assistant response to remove any and all harmful, … content \n Revision:”
4. Finetune the unaligned model with the original problematic query and the revised response

Misc

There are specific datasets (example) which contain “problematic” prompts that AI researchers and LLM developers can use to apply the same process as above

Resources

Constitutional AI

Share: Twitter, Facebook