This project deals with the analysis and integration of MultiWord Expressions (MWEs) in speech and translation.

Keywords: Natural Language Processing, Multiword Expressions, Machine Translation (MT), Automatic Speech Recognition (ASR), Language Technology, Non Compositionality, Cognitive Lexical Models.

This project aims to investigate techniques, resources and protocols for evaluating and integrating models of multiword expression (MWE) processing into machine translation and automatic speech recognition technology. MWEs like nominal compounds (machine learning, weapons of mass destruction) and verb­particle constructions (break down, clear up) are a challenge for current language technology. They often require additional knowledge for correct computational interpretation due to their often opaque and idiomatic semantics. For instance, failing to recognize that an MWE like kick the bucket needs to be interpreted as a unit (to die) may lead to incorrect translations. The AIM­WEST project addresses the automatic treatment of MWEs focusing on Portuguese, English and French, and on Portuguese↔English, French↔English and Portuguese↔French translation. The main contribution of the project will be the development of enabling multilingual human­machine interfaces that can take into account such complex phenomena as MWEs.

