Search papers, labs, and topics across Lattice.
Gyan, a novel language model, is introduced that departs from the transformer architecture to address limitations in compositional understanding, interpretability, and computational cost. It achieves state-of-the-art performance on three public datasets and outperforms existing models on two proprietary datasets by decoupling language modeling from knowledge acquisition and representation. The architecture leverages rhetorical structure theory, semantic role theory, and knowledge-based computational linguistics to capture compositional context and mimic human-like world modeling.
Forget opaque transformers: Gyan offers SOTA language modeling with full interpretability, lower compute, and human-like compositional understanding.
Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous context. Besides, by the very nature of the architecture, these models hallucinate, are difficult to maintain, are not easily interpretable and require enormous compute resources for training and inference. Here, we describe Gyan, an explainable language model based on a novel non-transformer architecture, without any of these limitations. Gyan achieves SOTA performance on 3 widely cited data sets and superior performance on two proprietary data sets. The novel architecture decouples the language model from knowledge acquisition and representation. The model draws on rhetorical structure theory, semantic role theory and knowledge-based computational linguistics. Gyan's meaning representation structure captures the complete compositional context and attempts to mimic humans by expanding the context to a'world model'. AI model adoption critically depends on trust and transparency especially in mission critical use cases. Collectively, our results demonstrate that it is possible to create models which are trustable and reliable for mission critical tasks. We believe our work has tremendous potential for guiding the development of transparent and trusted architectures for language models.