Search papers, labs, and topics across Lattice.
ScheMatiQ automates the creation of structured databases from unstructured text corpora given a natural language research question, using LLMs to generate an extraction schema and populate a grounded database. This avoids the manual schema design and annotation process, which is slow and error-prone. Experiments with domain experts in law and computational biology demonstrate that ScheMatiQ produces outputs suitable for real-world analysis.
Skip the annotation bottleneck: ScheMatiQ lets you turn research questions and text corpora into structured databases with LLMs, guided by a simple web interface.
Many disciplines pose natural-language research questions over large document collections whose answers typically require structured evidence, traditionally obtained by manually designing an annotation schema and exhaustively labeling the corpus, a slow and error-prone process. We introduce ScheMatiQ, which leverages calls to a backbone LLM to take a question and a corpus to produce a schema and a grounded database, with a web interface that lets steer and revise the extraction. In collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology. We release ScheMatiQ as open source with a public web interface, and invite experts across disciplines to use it with their own data. All resources, including the website, source code, and demonstration video, are available at: www.ScheMatiQ-ai.com