CALCS

Organizers: Genta Indra Winata, Sudipta Kar, Marina Zhukova, Thamar Solorio, Mona Diab, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

Bilingual and multilingual speakers often mix languages when they communicate with other multilingual speakers in what is usually known as code-switching (CS). CS can occur on various language levels including inter-sentential, intra-sentential, and even morphological. Practically, it presents long-standing challenges for language technologies, such as machine translation, ASR, language generation, information retrieval and extraction, and semantic processing. Models trained for one language can quickly break down when there is input mixed in from another. The recent breakthough on using multilingual pre-trained language models (LMs) have shown possibility to yield subpar performance on CS data. Considering the ubiquitous nature of CS in informal text such as newsgroups, tweets threads, and other forms of social media communication, and the number of multilingual speakers worldwide that use these platforms, addressing the challenge of processing CS data continues to be of great practical value. This workshop aims to bring together researchers interested in technology for mixed language data, in either spoken or written form, and increase community awareness of the different efforts developed to date in this space.
You can open the #workshop-CALCS channel in separate windows.

Workshop Papers

EMNLP 2023

Back to Top

© 2023 Association for Computational Linguistics