NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark

Oscar Sainz, Jon Ander Campos, Iker García-Ferrero, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre

Findings: Theme Track: Large Language Models and the Future of NLP Findings Paper

Poster_Demo_Industry_Findings Virtual 4: Theme Track: Large Language Models and the Future of NLP (Poster)
Conference Room: Virtual-Gathertown
Conference Time: December 09, 09:00-10:30 (+08) (Asia/Singapore)
Global Time: December 09, Poster_Demo_Industry_Findings Virtual 4 (01:00-02:30 UTC)
TLDR:
You can open the #paper-3019 channel in a separate window.
Abstract: