Registration Form: Bahrain Corpus

JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

Registration Form: Bahrain Corpus

In the Bahrain Corpus, we aimed to create a specialized corpus of the Bahraini Arabic dialect, which includes written texts as well as transcripts of audio files, belonging to a different genre (folktales, comedy shows, plays, cooking shows, etc.).

At the time of this publication, the corpus comprises 620K words, carefully curated. We also enrich the Bahrain Corpus text with automatic morphological annotations using state-of-the-art morphosyntactic disambiguation for Gulf Arabic. We validate the quality of the annotations on a 7.6K word sample. We make the full corpus as well as the annotated sample publicly available to support researchers interested in Arabic NLP.

More details on this project can be found in Abdulrahim et. al (2022):

Abdulrahim, Dana, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash. 2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France. European Language Resources Association (ELRA).

Email *

First Name *

Last Name *

Affiliation *

Website (optional)

What do you plan to use this resource for? *

License - please read the following license:

//////////////////////////////////////////////////////////////////////////////
// License for The Bahrain Corpus
//////////////////////////////////////////////////////////////////////////////

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License.
(https://creativecommons.org/licenses/by-nc-sa/4.0/)

Created by Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa,
and Nizar Habash, at University of Bahrain and the Computational Approaches to
Modeling Language (CAMeL) Lab in New York University Abu Dhabi.

Please cite Abdulrahim et al. (2022) if you use The Bahrain Corpus in your
research:

Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa, and Nizar Habash.
2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic.
In Proceedings of the Thirteenth International Conference on Language Resources
and Evaluation (LREC 2022), pages 2345-2352, Marseille, France. European
Language Resources Association (ELRA).

//////////////////////////////////////////////////////////////////////////////

By clicking "Yes" you agree to the terms of this license. *

Yes

Citing Guide

If you use this resource, cite:

Abdulrahim, Dana, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash. 2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France. European Language Resources Association (ELRA).

By clicking "Yes" you agree to use this citing guide. *

Yes

Publications

Abdulrahim, Dana, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash. 2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France. European Language Resources Association (ELRA).
PDF: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.251.pdf

Submit

Clear form

Never submit passwords through Google Forms.

This form was created inside of New York University. Report Abuse

Forms