Search papers, labs, and topics across Lattice.
The authors introduce Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text structures, and T2S-Bench, a benchmark for evaluating text-to-structure capabilities across 6 scientific domains and 32 structural types. SoT consistently boosts performance across eight tasks and three model families, while T2S-Bench reveals substantial room for improvement in multi-hop reasoning and end-to-end extraction. Fine-tuning Qwen2.5-7B-Instruct on T2S-Bench, combined with SoT, yields an average +8.6% improvement across diverse text-processing tasks.
LLMs struggle to extract and reason over complex text structures in scientific domains, but explicitly prompting them to build these structures first yields significant performance gains.
Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large language model benefit from text structure to enhance text-processing performance? To explore it, in this work, we first introduce Structure of Thought (SoT), a prompting technique that explicitly guides models to construct intermediate text structures, consistently boosting performance across eight tasks and three model families. Building upon this insight, we present T2S-Bench, the first benchmark designed to evaluate and improve text-to-structure capabilities of models. T2S-Bench includes 1.8K samples across 6 scientific domains and 32 structural types, rigorously constructed to ensure accuracy, fairness, and quality. Evaluation on 45 mainstream models reveals substantial improvement potential: the average accuracy on the multi-hop reasoning task is only 52.1%, and even the most advanced model achieves 58.1% node accuracy in end-to-end extraction. Furthermore, on Qwen2.5-7B-Instruct, SoT alone yields an average +5.7% improvement across eight diverse text-processing tasks, and fine-tuning on T2S-Bench further increases this gain to +8.6%. These results highlight the value of explicit text structuring and the complementary contributions of SoT and T2S-Bench. Dataset and eval code have been released at https://t2s-bench.github.io/T2S-Bench-Page/.