Document Type
Article
Publication Date
7-2022
Abstract
Image captioning using deep neural networks has recently gained increasing attention, mostly for English langue, with only few studies in other languages. Good image captioning model is required to automatically generate sensible, syntactically and semantically correct captions, which in turn requires good models for both computer vision and natural language processing. The process is more challenging in case of data scarcity, and languages with complex morphological structures like the Arabic language. This was the reason why only limited number of studies have been published for Arabic image captioning, compared to those of English language. In this paper, an efficient deep learning model for Arabic image captioning has been proposed. In addition, the effect of using different text pre-processing methods on the obtained BLEU-N scores and the quality of generated images, as well as the attention mechanism behavior were investigated. Furthermore, the “THUMB” framework to assess the quality of the generated captions is used -for the first time- for Arabic captions’ evaluation. As shown in the results, a BLEU-4 score of 27.12, has been achieved, which is the highest obtained results so far, for Arabic image captioning. In addition, the best THUMB scores were obtained, compared to previously published results on common images.
Recommended Citation
Barakat, Nahla and Taha, Moaz, "Arabic Image Captioning: The Effect of Text Pre-processing on the Attention Weights and the BLEU-N Scores" (2022). Artificial Intelligence. 16.
https://buescholar.bue.edu.eg/artificial_intelligence/16