ŠTEFÁNIK, Michal. Methods for Estimating and Improving Robustness of Language Models. Online. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop. Seattle, Washington + Online: Association for Computational Linguistics, 2022, p. 44-51. ISBN 978-1-7138-5621-4. Available from: https://dx.doi.org/10.18653/v1/2022.naacl-srw.6.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Methods for Estimating and Improving Robustness of Language Models.
Authors ŠTEFÁNIK, Michal (703 Slovakia, guarantor, belonging to the institution).
Edition Seattle, Washington + Online, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, p. 44-51, 8 pp. 2022.
Publisher Association for Computational Linguistics
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher United States of America
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
RIV identification code RIV/00216224:14330/22:00126309
Organization unit Faculty of Informatics
ISBN 978-1-7138-5621-4
Doi http://dx.doi.org/10.18653/v1/2022.naacl-srw.6
UT WoS 000860760300006
Keywords in English natural language processing; transformers; robustness; generalization
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 6/4/2023 12:36.
Abstract
Despite their outstanding performance, large language models (LLMs) suffer notorious flaws related to their preference for shallow textual relations over full semantic complexity of the problem. This proposal investigates a common denominator of this problem in their weak ability to generalise outside of the training domain. We survey diverse research directions providing estimations of model generalisation ability and find that incorporating some of these measures in the training objectives leads to enhanced distributional robustness of neural models. Based on these findings, we present future research directions enhancing the robustness of LLMs.
Links
MUNI/A/1195/2021, interní kód MUName: Aplikovaný výzkum v oblastech vyhledávání, analýz a vizualizací rozsáhlých dat, zpracování přirozeného jazyka a aplikované umělé inteligence
Investor: Masaryk University
PrintDisplayed: 14/10/2024 00:51