Search papers, labs, and topics across Lattice.
This paper benchmarks the performance of LLama-3 and ChatGPT on a novel discrete optimization dataset with varying problem types and parameter magnitudes, including augmented data. The study compares strong vs. weak models and Chain-of-Thought (CoT) vs. No-CoT prompting strategies. Results indicate that while stronger models generally perform better, CoT is not always effective, and disordered datasets can sometimes improve performance on simpler problems, highlighting model instability.
Chain-of-Thought prompting doesn't always improve LLMs' ability to solve discrete optimization problems, and surprisingly, "disordered" datasets can sometimes boost performance on simpler tasks.
This work investigated the capabilities of different models, including the Llama-3 series of models and CHATGPT, with different forms of expression in solving discrete optimization problems by testing natural language datasets. In contrast to formal datasets with a limited scope of parameters, our dataset included a variety of problem types in discrete optimization problems and featured a wide range of parameter magnitudes, including instances with large parameter sets, integrated with augmented data. It aimed to (1) provide an overview of LLMs' ability in large-scale problems, (2) offer suggestions to those who want to solve discrete optimization problems automatically, and (3) regard the performance as a benchmark for future research. These datasets included original, expanded and augmented datasets. Among these three datasets, the original and augmented ones aimed for evaluation while the expanded one may help finetune a new model. In the experiment, comparisons were made between strong and week models, CoT methods and No-CoT methods on various datasets. The result showed that stronger model performed better reasonably. Contrary to general agreement, it also showed that CoT technique was not always effective regarding the capability of models and disordered datasets improved performance of models on easy to-understand problems, even though they were sometimes with high variance, a manifestation of instability. Therefore, for those who seek to enhance the automatic resolution of discrete optimization problems, it is recommended to consult the results, including the line charts presented in the Appendix, as well as the conclusions drawn in this study for relevant suggestions.