Methods for Handling Spontaneous Health Arabic Queries using unsupervised machine learning

Document Type

Article

Publication Date

Winter 12-20-2023

Abstract

The goal of this work is to demonstrate that using mixed sublanguage and linguistic processing techniques, is both essential and possible to create a robust NL-based systems. The merging of accurate language processing with the analysis of the sublanguage will undoubtedly improve the processing's correctness and resilience. As a proof-of-concept, we created an experimental system (HASE) to test this hypothesis. The system is a search system for Arabic documents in the health and medical domain. To study the sublanguage we employed machine learning techniques. The initial corpus consists of 40 thousands unedited queries. HASE is built on top of SOLR with the integration of Arabic linguistic processing Component. Responses are generated using IR approach. Altibby is actively deploying HASE in Jordan (the largest health content). The IR component achieves a 90% f-measure when tested with actual noisy free text.

Share

COinS