Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

Add to Favorites

Poster_Demo_Industry Hybrid 5: Interpretability, Interactivity, and Analysis of Models for NLP (Poster)

Conference Room: East Foyer(Virtual)

Conference Time: December 09, 11:00-12:30 (+08) (Asia/Singapore)

Global Time: December 09, Poster_Demo_Industry Hybrid 5 (03:00-04:30 UTC)

TLDR:

You can open the #paper-4060 channel in a separate window.

Abstract: