Search papers, labs, and topics across Lattice.
The paper introduces KIRINLog, a novel encoding technique for English sentence pair classification, designed to reduce input vector size while maintaining performance. KIRINLog combines preprocessing, word removal, padding/truncation, flag value addition, Word2Vec embeddings, and PCA for dimensionality reduction to create feature vectors. Applied to the task of recognizing textual entailment using an attention-based bidirectional LSTM, the method achieves a 92.4% accuracy and 88.6% F1-score on a compositional knowledge dataset.
KIRINLog slashes input vector size for sentence pair classification without sacrificing accuracy, achieving a 92.4% accuracy on textual entailment.
Natural language processing is a rapidly evolving technology in an era of quick technological growth. Conventional rule-based methods are insufficient to meet the demands of today's complex challenges. One of the significant tasks in natural language processing is recognizing textual entailment. Recognizing textual entailment is vital in natural language processing as it enables systems to understand the relationship between texts, which is crucial for applications like text summarization, sentiment analysis, information verification, question answering, text classification, and machine translation. One challenge is reducing the size of the input vectors while maintaining good prediction results, F1 scores. To overcome this challenge, this research presents the novel encoding technique by reducing the encoding size for the sentences involving compositional knowledge dataset. This dataset is sentence pairs in English to classify textual entailments into three categories: entailment, neutral, and contradiction. This research proposes the KIRINLog technique, which consists of six main methods: preprocessing, removing the same word, max-length padding and truncation for sentence alignment, adding a flag value, word embedding by Word2Vec, and dimensionality re-duction with principal component analysis to reduce the size of input vector. All of six main processes are proposed to create feature vectors for recognizing textual entailment (the sentences involving compositional knowledge dataset). This study used an attention-based bidirectional LSTM as a classification model. This model combines bidirectional LSTM with an attention to make analyzing sequences in both forward and backward directions and to help the model focus on important information in the sentence. This proposed technique achieved an accuracy of 92.4%, with precision at 88.8%, recall at 88.6%, and an F1-score of 88.6% on the 2014 sentences involving compositional knowledge dataset.