Registration Form: Bahrain Corpus

In the Bahrain Corpus, we aimed to create a specialized corpus of the Bahraini Arabic dialect, which includes written texts as well as transcripts of audio files, belonging to a different genre (folktales, comedy shows, plays, cooking shows, etc.).

At the time of this publication, the corpus comprises 620K words, carefully curated. We also enrich the Bahrain Corpus text with automatic morphological annotations using state-of-the-art morphosyntactic disambiguation for Gulf Arabic. We validate the quality of the annotations on a 7.6K word sample. We make the full corpus as well as the annotated sample publicly available to support researchers interested in Arabic NLP.

More details on this project can be found in Abdulrahim et. al (2022):

Abdulrahim, Dana, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash. 2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France. European Language Resources Association (ELRA).
Sign in to Google to save your progress. Learn more
Email *
First Name *
Last Name *
Affiliation *
Website (optional)
What do you plan to use this resource for? *
License - please read the following license:
//////////////////////////////////////////////////////////////////////////////
// License for The Bahrain Corpus
//////////////////////////////////////////////////////////////////////////////

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License.
(https://creativecommons.org/licenses/by-nc-sa/4.0/)

Created by Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa,
and Nizar Habash, at University of Bahrain and the Computational Approaches to
Modeling Language (CAMeL) Lab in New York University Abu Dhabi.

Please cite Abdulrahim et al. (2022) if you use The Bahrain Corpus in your
research:

Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa, and Nizar Habash.
2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic.
In Proceedings of the Thirteenth International Conference on Language Resources
and Evaluation (LREC 2022), pages 2345-2352, Marseille, France. European
Language Resources Association (ELRA).

//////////////////////////////////////////////////////////////////////////////
By clicking "Yes" you agree to the terms of this license. *
Citing Guide
If you use this resource, cite:

Abdulrahim, Dana, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash. 2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France. European Language Resources Association (ELRA).
By clicking "Yes" you agree to use this citing guide. *
Publications
Abdulrahim, Dana, Go Inoue, Latifa Shamsan, Salam Khalifa and Nizar Habash. 2022. The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. In Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), Marseille, France. European Language Resources Association (ELRA).
PDF: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.251.pdf
Submit
Clear form
Never submit passwords through Google Forms.
This form was created inside of New York University. Report Abuse