AILC, the Italian Association for Computational Linguistics, is launching a collaborative effort to develop a dynamic and growing benchmark for evaluating LLMs’ capabilities in Italian.
In the long term, we aim to establish a suite of tasks in the form of a benchmark which can be accessed through a shared platform and a live leaderboard. This would allow for ongoing evaluation of any existing and newly developed Italian or multilingual LLMs.
In the short term, we are looking to start building this benchmark through a series of challenges collaboratively construed by the research community. Concretely, this happens through the present call for challenge contributions. In a similar style to standard Natural Language Processing shared tasks, participants are asked to contribute a task and the corresponding dataset with which a set of LLMs should be challenged.
Please, fill in all the fields and provide as much information as possible for the organisers to assess the nature of the challenge you propose.