Search papers, labs, and topics across Lattice.
This paper introduces T-SimCSE, a novel unsupervised approach for requirements traceability link recovery based on the SimCSE pre-trained language model. T-SimCSE enhances link recovery by calculating similarity between requirements and target artifacts using SimCSE embeddings, then re-ranks the artifacts using a newly proposed "specificity" metric. Experiments on ten public datasets demonstrate that T-SimCSE outperforms existing methods in recall and Mean Average Precision (MAP) without requiring labeled data.
Unsupervised link recovery gets a boost: T-SimCSE leverages SimCSE and a novel specificity metric to significantly outperform existing methods in requirements traceability, even without labeled data.
Requirements traceability plays an important role in ensuring software quality and responding to changes in requirements. Requirements trace links (such as the links between requirements and other software artifacts) underpin the modeling and implementation of requirements traceability. With the rapid development of artificial intelligence, more and more pre-trained language models (PLMs) techniques are applied to the automatic recovery of requirements trace links. However, the requirements traceability links recovered by these approaches are not accurate enough, and many approaches require a large labeled dataset for training. Currently, there are very few labeled datasets available. To address these limitations, this paper proposes a novel requirements traceability link recovery approach called T-SimCSE, which is based on a PLM -- SimCSE. SimCSE has the advantages of not requiring labeled data, having broad applicability, and performing well. T-SimCSE firstly uses the SimCSE model to calculate the similarity between requirements and target artifacts, and employs a new metric (i.e. specificity) to reorder those target artifacts. Finally, the trace links are created between the requirement and the top-K target artifacts. We have evaluated T-SimCSE on ten public datasets by comparing them with other approaches. The results show that T-SimCSE achieves superior performance in terms of recall and Mean Average Precision (MAP).