Search papers, labs, and topics across Lattice.
This paper investigates the ability of instruction-tuned LLMs to perform morphosyntactic tagging and labeled dependency parsing for Standard Arabic. The authors compare zero-shot prompting with retrieval-based in-context learning (ICL) using examples from Arabic treebanks to assess the impact of prompt design and demonstration selection. Results indicate that proprietary models achieve near state-of-the-art performance on feature-level tagging and competitive performance on dependency parsing, particularly with retrieval-based ICL, while also identifying remaining challenges in tokenization and specific aspects of Arabic morphosyntax.
Instruction-tuned LLMs can nearly match supervised baselines on complex Arabic morphosyntactic tagging and dependency parsing, but only with careful prompt engineering and retrieval-based in-context learning.
Large language models (LLMs) perform strongly on many NLP tasks, but their ability to produce explicit linguistic structure remains unclear. We evaluate instruction-tuned LLMs on two structured prediction tasks for Standard Arabic: morphosyntactic tagging and labeled dependency parsing. Arabic provides a challenging testbed due to its rich morphology and orthographic ambiguity, which create strong morphology-syntax interactions. We compare zero-shot prompting with retrieval-based in-context learning (ICL) using examples from Arabic treebanks. Results show that prompt design and demonstration selection strongly affect performance: proprietary models approach supervised baselines for feature-level tagging and become competitive with specialized dependency parsers. In raw-text settings, tokenization remains challenging, though retrieval-based ICL improves both parsing and tokenization. Our analysis highlights which aspects of Arabic morphosyntax and syntax LLMs capture reliably and which remain difficult.