Microbiological Seminar, Brno, 2021 Artificial Intelligence in Microbiology by St anislav Maz urenko, PhD maz urenko@mail.muni.cz Outline  Motivation  Introduction to AI and ML  Recent applications in Microbiology 2/22 Motivation Motivation: sequences and chemicals  Large volumes of digital data  Affordable computing power and storage  Complex study objects 4/22 0.1 1 10 100 1000 10000 1993 1998 2004 2009 2014 2020 Millions The total number of sequences GenBank WGS 5/22 Source: Scheler et al. "Recent developments of microfluidics as a tool for biotechnology and microbiology." Current opinion in biotechnology 2019 Khater et al. "Picoliter agar droplet breakup in microfluidics meets microbiology application: numerical and experimental approaches." Lab on a Chip2020. Motivation: big experimental data 6/22Source: Scher et al. "In-situ fiducial markers for 3D correlative cryo-fluorescence and FIB-SEM imaging." iScience (2021): 102714. Motivation: cell imaging Introduction to AI and ML  Recommendation engines  Image & speech recognition  Anomaly detection  Natural language processing  Data mining… Introduction to AI and ML 8/22 Source: towardsdatascience.com Introduction to AI and ML 9/22 Introduction to ML  Historically, people tried to find rules themselves, e.g. detection of particular shapes or color contrasts;  Often such manual rules are too simplistic to give good results;  Machine Learning gives the means to generates those rules automatically! Faces Not faces Non-vaccine candidates: Vaccine candidates: 10/22 F( ) = + 1 F( ) = - 1 Basics of ML: data representation MKKLGRAATNKAAKEVLDYCGEAKG… Feature vector: (5, 1, 1, -5.67, 0.69, …) One-hot encoding: K K L G R A A T … A 0 0 0 0 0 1 1 0 … K 1 1 0 0 0 0 0 0 … L 0 0 1 0 0 0 0 0 … G 0 0 0 1 0 0 0 0 … R 0 0 0 0 1 0 0 0 … T 0 0 0 0 0 0 0 1 … … 11/22Source: Goodswen et al. "Machine learning and applications in microbiology." FEMS Microbiology Reviews (2021). Examples: • AA frequency • AA sequence • Conservation scores • Structural elements • … 12/22 F( ) = -1 Basics of ML: training 0.3 1.4 1 -1 F( ) = +1 13/22 Multiple parameters Basics of ML: training 1.2 0.5 10 +1 Basics of ML: validation Data Train an ML predictor Test data Training data Evaluate the predictor 14/22  The goal of ML is to identify generalizable patterns in your training data.  These patterns must be valid for future data!  Therefore, the core of ML protocol is to evaluate the predictor on the test data, hidden from the predictor: Artificial Neural Networks 15/22Link: https://vimeo.com/154085950 Recent applications Source: Peiffer-Smadja et al. "Machine learning in the clinical microbiology laboratory: has the time come for routine practice?" Clinical Microbiology and Infection 2020 Overview 17/22 Bacterial colony morphology Source: Huang, Lei, and Tong Wu. "Novel neural network application for bacterial colony classification." Theoretical Biology and Medical Modelling 15.1 (2018): 1-16. 18/22 Interclass variations Intraclass variations (Streptococcus agalactiae) A convolutional neural network was able to discriminate between 18 classes of bacterial colonies. Source: Ho Chi-Sing et al. "Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning." Nature communications 10.1 (2019): 1-8. Identification of pathogens 19/22 Source: Khaledi Ariane et al. "Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning‐enabled molecular diagnostics." EMBO molecular medicine 12.3 (2020): e10264. Antimicrobial resistance 20/22 Source: C4X Discovery, Fernández-Torras et al. "Connecting chemistry and biology through molecular descriptors." Current Opinion in Chemical Biology 66 (2022): 102090. Clinical outcomes 21/22 Summary  Machine Learning method is a powerful data-driven alternative to traditional modelling;  One turns data into numbers (features) and trains a generic algorithm to discriminate between labels in the feature space;  It is essential to have a separate test set for evaluation of the resulting predictor;  In Microbiology, a wide range of tasks is already solved by Machine Learning. 22/22 Bi9680En: AI in Biology, Chemistry, and Bioengineering  Období: podzim  Rozsah: přednáška 2 hodiny/týden  Vyučující: Dr. Stanislav Mazurenko  Osnova:  modern bio-challenges: drug design, DNA interpretation, protein engineering  types of AI algorithms and workflow for designing predictors  clustering algorithms, random forests, artificial neural networks  features, databases, and predictors used in applications