Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Kevin Liu, Stephen Casper, Dylan Hadfield-Menell, Jacob Andreas
Main: Interpretability, Interactivity, and Analysis of Models for NLP Main-poster Paper
Poster_Demo_Industry Hybrid 5: Interpretability, Interactivity, and Analysis of Models for NLP (Poster)
Conference Room: East Foyer(Virtual)
Conference Time: December 09, 11:00-12:30 (+08) (Asia/Singapore)
Global Time: December 09, Poster_Demo_Industry Hybrid 5 (03:00-04:30 UTC)
TLDR:
You can open the
#paper-4060
channel in a separate window.
Abstract: