Self-Training Language Models in Arithmetic Reasoning

KADLČÍK, Marek, Michal ŠTEFÁNIK, Ondřej SOTOLÁŘ and Vlastimil MARTINEK. Self-Training Language Models in Arithmetic Reasoning. In ICLR 2024 Workshop on Large Language Model (LLM) Agents. 2024.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Self-Training Language Models in Arithmetic Reasoning
Authors	KADLČÍK, Marek, Michal ŠTEFÁNIK, Ondřej SOTOLÁŘ and Vlastimil MARTINEK.
Edition	ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024.

Other information
Original language	English
Type of outcome	Presentations at conferences
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
WWW	Openreview
Organization unit	Faculty of Informatics
Keywords in English	language models, arithmetical reasoning, self-training, preference optimisation
Changed by	Changed by: Mgr. Michal Štefánik, učo 422237. Changed: 12/6/2024 14:45.

Abstract

Recent works show the impressive effectiveness of an agent framework in solving problems with language models. In this work, we apply two key features from the framework, interaction with tools and goal-oriented training, to improve models' arithmetical reasoning. First, we curate and transform existing datasets to create Calc-X, a standardized collection with over 300,000 problems with step-by-step solutions. We use Calc-X to train models we call Calcformers that interact with a calculator during inference. Calcformers achieve twice the accuracy of standard baselines. Finally, we optimize Calcformers via self-training using preference optimization and supervised loss by checking the model's predicted results. We find that self-training can achieve substantial improvements on out-of-domain problems and that traditional supervised loss is a strong baseline for preference optimization. Our results show that preference optimization converges faster and isn't prone to forgetting pre-trained abilities.

Links
MUNI/A/1590/2023, interní kód MU	Name: Využití technik umělé inteligence pro zpracování dat, komplexní analýzy a vizualizaci rozsáhlých dat
MUNI/A/1590/2023, interní kód MU	Investor: Masaryk University, Using artificial intelligence techniques for data processing, complex analysis and visualization of large-scale data

PrintDisplayed: 8/10/2024 08:40

Self-Training Language Models in Arithmetic Reasoning

Other applications