2024
Self-training Language Models for Arithmetic Reasoning
KADLČÍK, Marek and Michal ŠTEFÁNIKBasic information
Original name
Self-training Language Models for Arithmetic Reasoning
Authors
KADLČÍK, Marek (203 Czech Republic, belonging to the institution) and Michal ŠTEFÁNIK (703 Slovakia, guarantor, belonging to the institution)
Edition
Hybrid, Miami, 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, p. 12378-12386, 9 pp. 2024
Publisher
Association for Computational Linguistics
Other information
Language
English
Type of outcome
Proceedings paper
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
United States of America
Confidentiality degree
is not subject to a state or trade secret
Publication form
electronic version available online
RIV identification code
RIV/00216224:14330/24:00137409
Organization unit
Faculty of Informatics
ISBN
979-8-89176-168-1
EID Scopus
2-s2.0-85217622765
Keywords in English
language models; arithmetic reasoning; self-training; implicit feedback; preference optimization
Tags
Tags
International impact, Reviewed
Changed: 4/4/2025 01:13, RNDr. Pavel Šmerk, Ph.D.
Abstract
In the original language
Recent language models achieve impressive results in tasks involving complex multistep reasoning, but scaling these capabilities further traditionally requires expensive collection of more annotated data. In this work, we explore the potential of improving models' reasoning capabilities without new data, merely using automated feedback to the validity of their predictions in arithmetic reasoning (self-training). In systematic experimentation across six different arithmetic reasoning datasets, we find that models can substantially improve in both single-round (offline) and online self-training, reaching a correct result in +13.9% and +25.9% more cases, respectively, underlining the importance of actuality of self-training feedback. We further find that in the single-round, offline self-training, traditional supervised training can deliver gains comparable to preference optimization, but in online self-training, preference optimization methods largely outperform supervised training thanks to their superior stability and robustness on unseen types of problems.
Links
MUNI/A/1590/2023, interní kód MU |
| ||
MUNI/A/1608/2023, interní kód MU |
|