Bioinformatics, 38(4), 2022,1173-1175 https://doi.org/10.1093/bioinformatics/btab750 Advance Access Publication Date: 30 October 2021 Applications Note Data and text mining AOP-helpFinder webserver: a tool for comprehensive analysis of the literature to support adverse outcome pathways development Florence Jornod1 , Thomas Jaylet1 , Luděk Blaha2 , Denis Sarigiannis3 , Luc Tamisier4 and Karine Audouze )1 '* 'Universitě de Paris, T3S, Inserm UMR-S1124, Paris F-75006, France, 2 RECET0X, Faculty of Science, Masaryk University, Brno CZ62500, Czech Republic, 3 HERACLES Research Center on the Exposome and Health, Aristotle University of Thessaloniki, Center for Interdiciplinary Research and Innovation, Thessaloniki 57001, Greece and "Universitě de Paris, SPPIN CNRS UMR 8003,Paris F-75006, France *To whom correspondence should be addressed. Associate Editor: Jonathan Wren Received on July 21, 2021; revised on September 30, 2021; editorial decision on October 21, 2021; accepted on October 27,2021 A b s t r a c t Motivation: A d v e r s e o u t c o m e p a t h w a y s (AOPs) are a c o n c e p t u a l f r a m e w o r k d e v e l o p e d to s u p p o r t the use of alternative t o x i c o l o g y a p p r o a c h e s in the risk a s s e s s m e n t . A O P s are structured linear organizations of existing k n o w l e d g e illustrating causal p a t h w a y s f r o m the initial m o l e c u l a r perturbation triggered by v a r i o u s stressors, t h r o u g h key events (KEs) at different levels of biology, to the ultimate health or e c o t o x i c o l o g i c a l a d v e r s e o u t c o m e . Results: Artificial intelligence c a n be u s e d to systematically explore available t o x i c o l o g i c a l data that can be parsed in the scientific literature. Recently, a tool called A O P - h e l p F i n d e r w a s d e v e l o p e d to identify a s s o c i a t i o n s b e t w e e n stressors a n d K E s s u p p o r t i n g t h u s d o c u m e n t a t i o n of A O P s . T o facilitate the utilization of this a d v a n c e d bioinformatics tool by the scientific a n d the regulatory c o m m u n i t y , a w e b s e r v e r w a s created. T h e p r o p o s e d A O P - h e l p F i n d e r w e b s e r v e r uses better p e r f o r m i n g v e r s i o n of the tool w h i c h reduces the n e e d for m a n u a l curation of the o b t a i n e d results. A s an e x a m p l e , the server w a s s u c c e s s f u l l y a p p l i e d to explore relationships of a set of e n d o c r i n e disruptors with metabolic-related events. T h e A O P - h e l p F i n d e r w e b s e r v e r assists in a rapid evaluation of existing k n o w l e d g e stored in the P u b M e d database, a global resource of scientific information, to build A O P s a n d A d v e r s e O u t c o m e N e t w o r k s s u p p o r t i n g the c h e m i c a l risk a s s e s s m e n t . Availability and implementation: A O P - h e l p F i n d e r is available at http://aop-helpfinder.u-paris-sciences.fr/index.php Contact: karine.audouze@u-paris.fr Supplementary information: S u p p l e m e n t a r y data are available at Bioinformatics online. 1 Introduction Structured organization of toxicological and ecotoxicological data is now feasible using the adverse outcome pathways (AOP) framework (Ankley et al., 2010). A n A O P is defined by a linear combination of biological events, started from a molecular initiating event (MIE) triggered by stressors (pollutants, ionizing radiations, nanomaterials or climate stressors) connected through a series of key events (KEs) occurring at various levels of the biological organization, to an adverse outcome (AO). Biological events (MIE, K E and AO) are not linked to a unique AOP, but can be shared, allowing the establishment of Adverse Outcome Network (AON) that reflect better the true complexity of the biology. Combined with new approach methodologies (Parish et al., 2020), AOPs and A O N s are extremely useful in establishing integrated approaches to testing and assessment (IATA) for environmental and risk assessment, and they aid to the development of novel nonanimal toxicity testing strategies (Delrue eta/., 2016). With advances in technologies, huge amounts of data have become available, compiled in well-structured toxicological databases (e.g. C T D , CompTox), in AOP-oriented webservers (AOP-wiki, sAOP, AOP4EUpest) and scientific publications (Williams et al., 2017). Innovative data mining tools are needed to identify sparse but complementary data such as Abstract Sifter allowing to have a view of the toxicological information landscape for a set of entities as chemicals (Baker et al., 2017) or ComptoxAI (https://comptox.ai/ index.html). Artificial intelligence (Al) technology, that uses natural language processing (NLP), is an interesting way to facilitate the ©Tbe Autbor(s) 2021. Published by Oxford University Press. 1173 This is an Open A c c e s s article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.Org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com 1174 F.Jornod et al. identification of links between relevant information that can be used to build novel AOPs (Song et al., 2020), and identify knowledge gaps and research needs (Zgheib et al., 2021). Several tools use text mining (TM), an A l method to transform unstructured into structured text. For example, Limtox provides a biomedical search for adverse hepatobiliary reactions (Canada et al., 2017). Recently, the AOP-helpFinder tool, based on T M and graph theory was proposed to identify stressor-KE relationships by examining large collections of scientific abstracts, and was applied to bisphenol A substituents and pesticides (Carvaillo et al., 2019; Jornod et al., 2020; Rugard etal., 2020). Here, we present the AOP-helpFinder webserver, which uses an updated version of the tool, to provide an easy but effective resource for identifying and compiling existing knowledge from the scientific literature. The main optimized features are (i) the capability to choose to search in full abstracts or without considering the introductory parts, (ii) the possibility to perform a refined search using machine learning and (iii) an automatic update of the PubMed database before each search. A case study with endocrine disruptors (ED) and metabolism-related events is provided, illustrating the capacity of the tool to collate quickly an overview of the existing information. 2 M a t e r i a l s a n d m e t h o d s 2.1 T h e A O P - h e l p F i n d e r w e b s e r v e r The proposed webserver is easy to use, and requires only the user email to access the upload page and to receive information when the results are available for download. This simple procedure is in line with digital sobriety that aims to reduce the environmental impact by limiting computing use. Two input files are needed: one with the stressors of interest and the second with biological events (i.e. MIE, KE and/or A O ) . Before running the tool to identify if knowledge connecting stressors and biological events exists, the user can choose between two options: reduce search and refinement filter (see the following section and Supplementary Material), as well as the output format (date, title, PMID, etc.). 2.2 T h e A O P - h e l p F i n d e r tool To increase the performance of the previously developed version, several methods were tested using a set on ED and biological events related to metabolism, and two were kept through the process (see Supplementary Material, https://github.com/jornod/aophelpFinder):(i) 'reduced search': searches are performed in the full abstracts or without considering the introductory part, which appears to be covered usually by the first 20% of the abstracts. This option allows avoiding too many false positives, as the introduction often reflects a working hypothesis instead of the conclusions of the publication and (ii) 'refinement filter': after the preprocessing that uses a stemming process (Carvaillo et al., 2019), the tool can refine the searches by combining a deletion of sentences containing context words with a lemmatization process. Lemmatization is a machine learning method for text normalization used in N L P that considers the context and converts the word to its meaningful base form. This option is very useful when terms have common stems (e.g. tests, testis • test) leading to incorrect meanings and spelling errors. Further, an automatic daily update of the PubMed database was newly implemented using the NCBI API to screen the full existing knowledge. The current version of the A l tool mined the PubMed database, that is a global source for scientific literature. Nevertheless, the developed method screens text-based knowledge, and therefore the AOP-helpFinder server could be improved for mining multiple sources (databases, literature), including studies reporting negative findings, to accelerate information gathering when data are limited and present in diverse sources (Carvaillo etal., 2019). The advantage of the proposed method, is its capacity to be adapted for literature searches in general, independent of A O P development, in order to identify interconnections between the query Fig. 1. Example of E D comentioned in P u b M e d scientific abstracts w i t h biological events related to metabolism, identified by the A O P - h e l p F i n d e r webserver. The numbers correspond to the % of retrieved abstracts mentioning both the stressor (column) and the event (line) among all identified abstracts (the colors are according to the percentage for better visualization). For example, among all identified abstracts that comentioned bisphenol S (BPS) and at least one event from the list, 1 3 % of the abstracts were comentioning BPS (fourth column) and obesity (the second line from the bottom) keywords, as it was successfully done to decipher nonvalidated test methods for ED (Zgheib et al., 2021). 2.3 C a s e study on e n d o c r i n e disruptors a n d m e t a b o l i s m The AOP-helpFinder webserver was used for a case study aiming at automatically identifying existing relationships and knowledge gaps between 10 ED (Supplementary Table SI) and 294 biological events related to metabolism (Supplementary Table S2). The webserver was launched using 'reduced search' (omitting searches in the first 20% of the abstracts) and 'refinement filter'. Among the 83 970 abstracts retrieved in the PubMed database as of May 10, 2021 related to at least one ED (Supplementary Table SI), a total of 4622 were retained (comentioning ED and event). Among the 294 events, 108 were identified as comentioned with at least one ED (Supplementary Table S2). Figure 1 illustrates the large disparity of knowledge for the 10 selected ED in the area of metabolism (see Supplementary Fig. SI for all results). For example, cadmium, bisphenol A and di(2-ethylhexyl) phthalate (DEHP) are well studied chemicals as the webserver retrieved scientific articles for almost all biological events of interest. Other chemicals (bisphenol F, bisphenol S, butyl-paraben) appear to be less studied (Supplementary Table SI), and the information were essentially identified for extensively studied biological events such as oxidative stress or obesity. 3 C o n c l u s i o n The AOP-helpFinder webserver uses an automatic A l screening to rapidly retrieve existing knowledge on links between stressors and biological events to build AOPs and A O N s . This webserver allows highly effective searches in PubMed as it considerably reduces the time of finding relevant scientific articles. The comprehensive AIbased analyses of existing literature support various needs of the risk assessment such as establishment of causality between chemicals and AOs through AOPs and A O N s , identification of gaps or prioritization and design of future experimental and epidemiological studies. A c k n o w l e d g e m e n t s The authors would like to acknowledge Inserm and the Universitě de Paris for supporting the work. AOP-helpFinder webserver 1175 F u n d i n g This work was supported by the European Union's Horizon 2020 Research and Innovation Programme O B E R O N [https://oberon-4eu.com, Grant 8257121 and H B M 4 E U [https://www.hbm4eu.eu/, Grant 733032]. Conflict of Interest: none declared. References Ankley,G.T. et al. (2010) Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment. Environ. Toxicol. Chem.,29, 730-741. Baker,N. et al. (2017) Abstract Sifter: a comprehensive front-end system to PubMed. FlOOOResearch, 6,2164. Canada,A. et al. (2017) LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes. Nucleic Acids Res., 45, W484-W489. Carvaillo,J.-C. et al. (2019) Linking bisphenol S to adverse outcome pathways using a combined text mining and systems biology approach. Environ. Health Perspect., 127,47005. Delrue,N. et al. (2016) The adverse outcome pathway concept: a basis for developing regulatory decision-making tools. Altern. Lab. Anim., 44, 417-429. Jornod,F. et al. (2020) AOP4EUpest: mapping of pesticides in Adverse Outcome Pathways using a text mining tool. Bioinformatics, 36, 4379-4381. Parish,S.T. et al. (2020) A n evaluation framework for new approach methodologies (NAMs) for human health safety assessment. Regul. Toxicol. Pharmacol., 112,104592. Rugard,M. et al. (2020) Deciphering adverse outcome pathway network linked to bisphenol F using text mining and systems toxicology approaches. Toxicol. Sci., 173, 32-40. SongJ. et al. (2020) Upregulation of angiotensin converting enzyme 2 by shear stress reduced inflammation and proliferation in vascular endothelial cells. Biochem. Biophys. Res. Commun., 525, 812-818. Williams,A.J. et al. (2017) The CompTox chemistry dashboard: a community data resource for environmental chemistry./. Cheminform., 9, 61. Zgheib,E. et al. (2021) Identification of non-validated endocrine disrupting chemical characterization methods by screening of the literature using artificial intelligence and by database exploration. Environ. Int., 154,106574.