PLOS COMPUTATIONAL BIOLOGY <8> RESEARCH ARTICLE Accurate prediction of kinase-substrate networks using knowledge graphs Vít Nováček©1 7t*, Gavin McGauran 3, David Matallanas 3, Adrián Vallejo Blanco3'4, Piero Conca2, Emir Muňoz 1'2, Luca Costabello2, Kamalesh Kanakaraj1, Zeeshan Nawaz©1, Brian Walsh©1, Sameh K. Mohamed 1, Pierre-Yves Vandenbussche©2, Colm J. Ryan©3, Walter Kolch3'5'6, Dirk Fey©3'6* 1 Data Science Institute, National University of Ireland Galway, Ireland, 2 Fujitsu Ireland Ltd., Co. Dublin, Ireland, 3 Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland, 4 Department of Oncology, Universidad de Navarra, Pamplona, Spain, 5 Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland, 6 School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland, 7 Faculty of Informatics, Masaryk University, Brno, Czech Republic Check for tLead author. updates * novacek@fi.muni.cz (VN); dirk.fey@ucd.ie (DF) Abstract a OPEN ACCESS Citation: Nováček V, McGauran G, Matallanas D, Vallejo Blanco A, Conca P, Muňoz E, et al. (2020) Accurate prediction of kinase-substrate networks using knowledge graphs. PLoS Comput Biol 16(12): e1007578. https://doi.org/10.1371/journal. pcbi.1007578 Editor: Anand R. Asthagiri, Northeastern University, UNITED STATES Received: November 29,2019 Accepted: August 10,2020 Published: December 3,2020 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pcbi.1007578 Copyright: © 2020 Nováček et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The training/testing splits for reproducing the computational experiments are available at https://doi.org/10. 6084/m9.figshare.12179925.v1. In case of queries Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular processes. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous. Computational predictions alleviate these challenges, but the current approaches suffer from limitations like restricted kinome coverage and inaccuracy. They also typically utilise only local features without reflecting broader interaction context. To address these limitations, we have developed an alternative predictive model. It uses statistical relational learning on top of phosphorylation networks interpreted as knowledge graphs, a simple yet robust model for representing networked knowledge. Compared to a representative selection of six existing systems, our model has the highest kinome coverage and produces biologically valid high-confidence predictions not possible with the other tools. Specifically, we have experimentally validated predictions of previously unknown phosphorylations by the LATS1, AKT1, PKAand MST2 kinases in human. Thus, our tool is useful for focusing phosphoproteomic experiments, and facilitates the discovery of new phosphorylation reactions. Our model can be accessed publicly via an easy-to-use web interface (LinkPhinder). Author summary LinkPhinder is a new approach to prediction of protein signalling networks based on kinase-substrate relationships that outperforms existing approaches. Phosphorylation networks govern virtually all fundamental biochemical processes in cells, and thus have moved into the centre of interest in biology, medicine and drug development. Fundamentally different from current approaches, LinkPhinder is inherently network-based and makes use of the most recent Al developments. We represent existing phosphorylation data as knowledge graphs, a format for large-scale and robust knowledge representation. PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 1 /30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs related to reagent and resource sharing, the point of contact is Systems Biology Ireland, University College Dublin (sbiadmin@ucd.ie). Funding: This work was supported by the CLARIFY project funded by European Commission under the grant number 875160, the TOMOE project funded by Fujitsu Laboratories Ltd., Japan and Insight Centre for Data Analytics at National University of Ireland Galway (supported by the Science Foundation Ireland grant 12/RC/2289) and Science Foundation Ireland grants 14/IA/2395 and 15/CDA 3495 to Walter Kolch and David Matallanas, respectively. Thefunders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Training a link prediction model on such a structure leads to novel, biologically valid phosphorylation network predictions that cannot be made with competing tools. Thus our new conceptual approach can lead to establishing a new niche of AI applications in computational biology. Introduction Nearly all aspects of cell behaviour are controlled by phosphorylation events and intricate networks of kinases-substrate relationships mediating these phosphorylations [1]. Depending on the phosphorylation site, the attachment of a phosphate group can alter the activity of a substrate, its interaction with other proteins or its subcellular localization. This diversity of phosphorylation mediated processes control important cellular functions such as signal transduction, differentiation, migration, cell division and apoptosis. Dysregulation of these kinase-substrate relationships can have devastating consequences and are regularly observed in prevalent diseases, such as cancers or immune diseases. Therefore, kinases have emerged as attractive drug targets and have become the mainstay of targeted therapies with nearly fourty kinase inhibitors approved for clinical use as of 2018 [2] and over 150 in clinical trials since 2012 [3,4]. In order to improve the design of kinase inhibitors, understand their mode of action and potential side effects, abetter understanding of kinase-substrate relationships and the networks they form is necessary. With the advent of modern high-throughput mass-spectrometry based phosphoproteomics, many thousands of phosphorylation sites in substrate proteins can be identified [5]. However, large scale and reliable prediction of which kinase can phosphory-late which substrates at which sites remains challenging. High-throughput experiments are not informative in this case, because they cannot establish these detailed functional relationships, and addressing this issue in a one-by-one fashion is prohibitively expensive and time-consuming due to the large number of candidate interactions to be tested [6]. Reliable automated prediction of phosphorylation candidates is therefore much desired, because it can substantially reduce the number of possibilities that have to be tested experimentally. During the last decade, several tools for predicting phosphorylations have become available. The most widely used and recently described include: Scansite [7], GPS [8], NetPhos [9], NetPhorest [10], NetworKin [6,10], PhosphoPredict fill- Each of these tools, however, covers only a limited fraction of over 500 known human kinases [12], with 33, 217,17,178, and 6 kinases covered, accordingly. In addition to the limited coverage, existing approaches also suffer from an important conceptual limitation. Only intrinsic features of proteins (such as sequence, structure or functional annotations) are primarily used in training the predictive models. Phosphorylations, however, are inherent parts of complex interaction networks, and this type of information is largely neglected by current models. Here, we show that predicting kinase-substrate relationships can be formulated as finding missing links in a knowledge graph (i.e. a relational, machine-readable knowledge base constructed from known phosphorylation networks). Knowledge graphs are a powerful way to organise descriptions of properties of objects and their connections [13]. However, they have not been widely used yet to analyse biological relationships. We show that using such a relational representation enables models that have superior generalisation power and precision when compared to existing approaches, lead to increased phosphoproteome coverage and produce biologically valid predictions. This can be explained by the fact that our approach fully utilises latent patterns in phosphorylation networks that are neglected by existing approaches PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 2/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs (e.g. long-range relational dependencies and implicit hierarchical structure). Moreover, the relational representation is not critically dependent on local features, which means our approach can make predictions even for under-researched proteins where existing approaches fail to provide results. To test this concept, we have built a predictive model based the known phosphorylation network in PhosphoSitePlus [14] interpreted as knowledge graph. This model uses statistical relational learning to address the kinase-substrate prediction problem. We show that our model has superior predictive power based on a comparative validation trial following standard machine learning evaluatuion protocols. The model also outperforms existing tools in the total number of human kinases covered (327, nearly twice as many as the next best tool), which substantially increases the number of potential discoveries that can be made using our tool. The biological relevance of our approach is evidenced by the discovery and experimental validation of previously unknown kinase-substrate relationships for the AKT1, LATS1, PKA and MST2 kinases. Results The concept of our approach in comparison with related existing techniques is illustrated in Fig 1 and details are given in the Materials and Methods section. Where existing tools use a) Sequence-based approach b) Link prediction approach ata Kinase Substrate P-site Sequence Q 4-» LATS1 YAP1 S127 RSRSAPP Inp LATS1 c O CO Q. 01 i_ a. re ro Q ■ Kinase ■"tTäiffily-' ▼ motif Kinase Phosphorylation motif LATS1 Motif_01 LATS1 Knowledge Graph with phosphorylation relations at given site Knowledge Graph with motif as relation i" c o +j u Ť3 J> ^ <1D7 LATS1-S464 s CO K 2 e & iff x108 LATS1-S613 x107 MAP4-S5 X106 ZMYM2-T1253 Fig 7. Mass-spectrometry validation of a subset of LinkPhinder predicted phosphorylations. A) Overview of the experimental design. B) Mass-spectrometry result: Specific LATS1 interactors and their phopshorylations. Bold rows indicate phosphorylation that were predicted by LinkPhinder. (* There is a risk that ZMYM2 binding might be unspecific. Some samples show high intensities in the GFP1 control, see panel D.) C) LinkPhinder predictions for the results in panel B. D) Mass-spec raw intensity values (dots) of the detected phosphorylation sites in GFP-LATS1 associated proteins under the indicated conditions (n = 6 replicates), and corresponding box plots indicating median (red line), upper and lower quartile (grey box), whiskers (most extreme values not defined as outliers), and outliers (plus marks) defined as values outside 1.5 times the interquartile range. https://doi.orq/1Q.1371/iojrnal.pcbi.1Q07578.qQQ7 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 12/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs Table 4. Sensitivity (S) of LinkPhinder substrate predictions per each of the kinase assay. Kinase Predicted substrate gene names S PKA PKA, TGM2, PSMC5, PA2G4 0.57 MST2 MST2, MOB1A, NUP153, SNAPIN 0.13 LATS1 LATS1, RHOA, VCP, SNAP25, CCT2, HNRNPK, RPS6, HSP90AA1 0.17 https://doi.orq/1Q.1371/iojrnal.pcbi.1QQ7578.tQQ4 of phosphorylated proteins using streptavidin and the subsequent identification of these proteins as substrates using mass-spectrometry. In order to test the method we replicated the Pflum study using PKA as kinase in HeLA cells [301 in a different cell line (HEK-293). From a total of 834 identified proteins, 34 proteins were identified as putative substrates of PKA by comparing with the PKA deficient control samples (Table 4, and supplementary experimental information). Five of these proteins were previously identified in the Pflum study, and 11 of them were isoforms or proteins of the same protein family. We also identified 18 new putative substrates. These additional 18 proteins that did not occur in the Pflum study using HeLa cells may be cell-specific substrates in the here used HEK-293 cells. The overlap in the results clearly indicated that the global kinase assay is an additional tool that could be used to validate our predictions. We then extended our validation experiments using this global kinase assay to LATS1 and MST2. First, we used LATS1 as kinase. We identified 240 putative LATS1 substrates from a total of 1397 identified proteins by comparing to the LATS1 deficient controls (Table 4, and detailed description in Section on Experimental Model and Subject Details). Secondly, we used MST2 as kinase. MST2 is another core kinases of the MST2/Hippo pathway with poorly characterised substrates. Our results identified 211 proteins as putative MST2 substrates. Strengthening our confidence into the validity of these results, five of the identified putative substrates have been described as MST2 interactors previously. The experimentally validated PKA, MST2 and LATS1 substrate predictions made by LinkPhinder are listed in Table 4. The table also provides the sensitivity (S) of these predictions in the context of each specific kinase assay. The sensititivy was computed as _ SUBSpredicted where SUBS predicted is the number of substrates for which LinkPhinder provided at least one site-specific phosphorylation prediction with a score above the high confidence threshold, and SUBStotai is the number of substrates that were identified in the kinase assay and that are also present in the PhosphoSitePlus knowledge graph. Identified substrate proteins that were not in the knowledge graph were excluded for this analysis, because no predictions can be generated for those proteins. The sensitivity of the PKA predictions was 0.57, which we consider a good result given that they were validated in an unbiased approach that has inherent technical limitations. For the poorly characterised MST2 and LATS1 kinases the sensitivities were lower, 0.13 and 0.17 respectively. It must be noted that generating predictions for MST2 and LATS1 is challenging because only a few substrates have been described experimentally, and most of the existing predictions tools could not generate predictions for MST2 and LATS1. Together these results indicate that LinkPhinder can be used to predict kinase-substrates interactions for poorly characterised kinases. Finally, we wanted to benchmark LinkPhinder performance against the exsisting tools. However, we found this was not an easy task. Comparing LinkPhinder with existing tools PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 13/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs using the results of these experiments is not as straightforward as in the cases reported before. The main reasons are conceptually different methods for determining the decision threshold employed by each of the tools. This does not allow for direct comparisons in terms of sensitivity as defined above. However, one high-level observation can be made: Only GPS matches the coverage of LinkPhinder as it can produce predictions for all the three kinases we assayed. Net-worKin and NetPhorest cannot compute any predictions for LATS1, NetPhos and Phospho-predict only cover PKA, and Scansite covers none of the assayed kinases. LinkPhinder web interface In order to facilitate usage of LinkPhinder by the community we have developed an online interface available at https://LinkPhinder.insight-centre.org/. A typical interaction with LinkPhinder is depicted in Fig 8. The corresponding instruction video is available in the About tab of the tool's web page. Briefly, the protein of interest can be entered into a search box with auto-completion (box A). Gene names and UniProt accession numbers are supported. The search is performed for high-stringency statements by default. However, all predicted statements can be searched as well (cf. the radio buttons in A). The query protein is evaluated by the system in two different ways, as a kinase and as a substrate and each type of predictions can be browsed independently (box B). The results can be filtered, and the predicted kinase-substrate pairs can be expanded to see the list of corresponding phosphorylation sites and prediction scores. Export of the predictions into a CSV file is also PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 14/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs possible. Further, users can easily access contextual information from a comprehensive protein database (UniProt) by clicking on the proteins in the results (box C). Discussion In this work, we have overcome several limitations of the current phosphorylation prediction tools by representing phosphorylation networks as knowledge graphs. Knowledge graphs are a relatively new approach to representing relational knowledge in the Machine Learning, Artificial Intelligence and Semantic Web communities. They have quickly gained popularity for two main reasons. First, they can represent diverse types of knowledge in a simple format. Secondly, they are amenable to robust techniques of statistical relational machine learning, that can for example be used to discover new facts. The discovery naturally makes use of the entire structure of the knowledge graph (i.e. latent features and long range, implicit relationships instead of just local, explicit features). This makes the representation very useful in domains where complex network dependencies are critical. Kinase-substrate relationships are a good example of such a domain. Our results show that knowledge graphs enabled phosphorylation predictions that were not possible with existing tools that are primarily based on local features. In particular, we have shown that phosphorylation networks can be meaningfully captured by knowledge graphs with kinases and substrate entities linked by relationships based on phosphorylation site motifs. Therewith, modern link prediction methods can be used to predict novel phosphorylation reactions and estimate their probability based on the entire network context. The resulting predictive model allows for making predictions about any protein present in the input data. This is a substantial advantage when compared with the existing tools. These tools typically focus on substrates as initial queries and include only a limited number of kinases. LinkPhinder not only covers a much broader range of possible kinase-substrate relationships than existing tools, but also shows very high generalization power and desirable ranking properties not exhibited by other, currently gold standard approaches. This aspect has been validated in experiments showing that our tool can generate numerous biologically valid predictions. Crucially, these predictions were not possible with a representative range of state-of-the-art tools (Scansite [7], GPS [8], NetPhos [9], NetPhorest [10], NetworKin [6,10], Phos-phoPredict [11]), demonstrating the utility of our tool. More specifically, none of the LATS1 and AKT1 discoveries validated in targeted experiments were predicted with four out of six related tools starting with LATS1 or AKT1 as kinase queries. Only GPS and PhosphoPredict support such queries, but for less than 66.4% and 1.8% of the kinases covered by LinkP hinder, respectively. Furthermore, querying for the substrates directly did not predict any of the validated discoveries using any of the existing tools using their high stringency settings (if applicable; if controlling the stringency was not offered by a particular tool, we used all predictions made by the given tool). On medium stringency, the GPS tool could identify one prediction; the CREB1 phosphorylation by LATS1. On low stringency, the NetPhosK tool could also identify one prediction; the MST2 phosphorylation by AKT1. No existing tool could identify both predictions. The LATS1 predictions validated by the mass spectrometry experiments were not be predicted by any of the existing tools but one. Specifically, the GPS tool could predict one out of the seven predictions we made (LATS1 auto-phosphorylation at S464) on high stringency (and no further ones on lower stringencies). The other five tools could not identify any of our validated predictions. When cross-referencing the list of LATS1 predictions from other tools with our predictions, no additional predictions were made, demonstrating that our tool has the best coverage. Together, these results clearly illustrate the advantages of our tool. PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 15/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs Experimental validation using the global PKA, MST2 and LATS1 kinase assays showed promising results in terms of LinkPhinder s sensitivity for identifying new substrates. Direct comparison with the existing tools was not possible due to disparate methods employed by each tool in determining their decision/high-stringency threshold. However, the results reconfirmed one significant benefit of LinkPhinder. We were able to produce substrate predictions for all three kinases studied, which was not possible with five of the six existing tools, with the exception of GPS, again demonstrating that LinkPhinder's increased kinase covarage is an important contribution. To build on the work presented here, we intend to incorporate more contextual data (e.g., relevant protein interactions from STRING or pathway data from Reactome) to see whether they can bring new and/or more accurate predictions pertinent to clinically relevant pathways. We also want to develop predictive models that would utilize the biology of phosphorylation directly in the training process and not only in the knowledge graph conversion and negative example generation. As demonstrated, incorporating more network context and biological knowledge into the prediction process has great potential to further increase the coverage, predictive power, and usefulness of the resulting tools. Another research direction to explore in future is the applicability of our predictive model to improving the accuracy and scope of methods for predicting downstream effects of kinase signalling or the kinase activity profiles. An example of such method that could benefit from our results is described in [31]. We believe follow-up experiments combining focused phos-phoproteomics studies like this with our model will further demonstrate the practical relevance of the work presented here. Materials and methods Computational model and validation details Datasets and tools used. To compile the phosphorylation network that is the primary input for building the LinkPhinder model, we used the PhosphoSitePlus dataset in a version available on 26th of June 2017 (c.f. https://www.phosphosite.org/staticDownloads.action). There were 10,173 phosphorylation statements on 362, 7,302 and 2,377 distinct kinases, substrate-site combinations and substrates in the compiled phosphorylation network, respectively. Note that in the construction of all datasets, we have focused only on the Homo Sapiens species, unless specified otherwise. In order to convert the phosphorylation statements extracted from PhosphoSitePlus into a knowledge graph, we had to compute motifs characteristic to the context sequences of phosphorylation sites. For that task, we used the MEME tool, version 4.11.2 (c.f. http://meme-suite. org/doc/download.html?man type=web). We used three state of the art knowledge graph embedding and link prediction methods to train a model that can discover new links in the phosphorylation knowledge graph. The methods are TransE [32], DistMult [33] and ComplEx [15]. The PhosphoSitePlus dataset, together with UniProt (c.f. http://www.uniprot.org/) was also used for generating a mapping between substrates and their possible phosphorylation sites. This mapping was used in the conversion of the internal, motif-based knowledge graph statements to phosphorylation statements when computing scores of possible phosphorylations that have not been known before. We focused only on substrates present in our knowledge graph, which resulted in 74,142 distinct substrate-site pairs that can be used for generating candidate phosphorylations (i.e. potential discoveries). To assess LinkPhinder in comparison with related state of the art systems, we downloaded and/or generated full sets of phosphorylation predictions that can be made with the following PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 16/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs tools: Scansite 3 (c.f. http://scansite3.mit.edu), KinomeExplorer (predictions produced by two tools, NetworKIN and NetPhosK, c.f. http://kinomexplorer.info/), Netphos (c.f. http://www. cbs.dtu.dk/services/NetPhos/), GPS (c.f. http://gps.biocuckoo.org/index.php) and Phospho-Predict (c.f. http://phosphopredict.erc.monash.edu/). The numbers of predictions that can be made with the corresponding tools are as follows: 6,130,542 (GPS), 5,192,235 (KinomeExplorer), 3,614,271 (Netphos), 2,006,185 (PhosphoPredict), 311,196 (Scansite 3). The numbers of high-stringency predictions are not straightforward to determine using the set of all predictions available, since some tools allow for stringency settings just at the level of manual, single-protein queries. Thus we were only able to establish the number of high-stringency predictions for Scansite, NetPhos and PhosphoPredict: 12,346, 212,107 and 132, respectively. Construction of the phosphorylation network and knowledge graph for training the model. The construction of the phosphorylation network requires data sources containing relation information of kinase, substrate and substrate's amino acid phosphorylation site. In our experiment, we used PhosphoSitePlus kinase-substrate dataset, an experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature [14]. Only relations involving a kinase and substrate protein for the human species were considered (KIN_ORGANISM == SUB_ORGANISM == 'human'). Although the dataset includes phosphorylation site's amino acids context sequence of size 7, we did not use that information as we wanted to experiment with different and potentially larger context sequence sizes. Instead we extract the context sequence from UniProt (Universal Protein Resource) and more specifically from the reviewed (Swiss-Prot) main protein sequence (uniprot_sprot.fasta) and from isoform sequences (uniprot_sprot_varsplic.fasta). We discard any relation in the kinase-substrate dataset for which the phosphorylation site does not match the UniProt sequence. Table 5 presents some statistics about the phosphorylation network. The knowledge graph conversion makes use of kinase family consensus motifs to transform phosphorylation network statements to knowledge graph relations. The kinase families classification is extracted from UniProt's human and mouse protein kinases: classification and index. Only information about human kinases which are part of the phosphorylation network are kept. The conversion of phosphorylation network data into knowledge made use of the MEME tool in a pipeline graphically described in Fig 9. To realise the step 3 of the above pipeline we used specifically the meme command line utility for sequence motif discovery, version 4.11.2. MEME was applied in parallel on batches of site context sequences drawn from substrates targeted by kinases of the same family. The size of the batches was a configurable hyper-parameter of the conversion and model training process. We used values ranging over the set {50,100}. The static parameters used for every invocation of the MEME tool were: - text, -protein, -mod zoops, -x branch, -minw 2. Table 5. Phosphorylation network components statistics. No. of elements in the phosphorylation network Phosphorylation relations 9,802 Kinases 327 Substrates 2,350 Phosphorylation sites 7,083 Avg. No. of substrate/kinase 7.19 Avg. No. of substrate's site/kinase 21.66 https://doi.org/1Q.1371/iojrnal.pcbi.1QQ7578.tQQ5 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 17/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs f 1) Extracting active site context sequence with a fixed window j-. i i i 11 i 11 i 11 i 11 i 11 i 11 i 11 i 11 i i r, I i i 11 i 11 i 11 i 11 i 11 i i i i 11 i 11 i i p r~n r~n m n~i r~n r~n r~n n~i (2) Categorizing site context according to the kinase family W '-•OB + m (3) Process with MF.MF f u_il_i_i ) ( 1=1=11=1=1 ) I L mm '"J '" J 1 f i=i=i □=] fr=a □= 1 h=n_J [=3=1 i=i=i J DC (4) Generate consensus motif groups for each kinase family target substrates' motifs Fig 9. High-level workflow of generating predicate labels for the phosphorylation knowledge graph based on motifs extracted from the context sequences of phosphorylation sites by means of the MEME tool. https://doi.org/10.1371/iournal.pcbi.1007578.a009 The MEME parameters that were dependent on the specific properties of the sequence batch and/or hyper-parameters of the whole model were: ' -maxw MW, -maxsize MS, -nmotif s NM, -bf ile BF where MW was the maximum width of a sequence in the batch, MS was the maximum width multiplied by the number of sequences in the batch, NM was the maximum number of motifs to be generated (set conservatively to 10 in the reported experiments as no batch generated more motifs than that number under any tested settings) and BF was a background Markov model of order 5 generated from the sequence batch. Table 6 presents some statistics about the generated knowledge graph. Training of the LinkPhinder model. Generating a phosphorylation knowledge graph. Before we could train a statistical relational learning model, we had to construct a knowledge graph representing the known phosphorylation information. As the primary input into the knowledge graph, we chose a phosphorylation network compiled from the PhosphoSitePlus [14] data set (focusing on Homo Sapiens species only). In principle, any phosphorylation data can be used, but PhosphoSitePlus is well curated and comprehensive making it an ideal starting point. There were 10,173 site-specific phosphorylation statements on 363 and 2,377 distinct kinases and substrates, respectively, in the compiled phosphorylation network. The network consists of statements (K, L, S) where K, L, S are kinase, phosphorylation site and substrate, respectively. The biological meaning of such statements is that the kinase K phosphory-lates the substrate S by binding to it and attaching a phosphoryl group to the site L. To convert the phosphorylation network into a knowledge graph, we utilised motifs of phosphorylation sites preferred by specific kinase families. For each kinase family as defined in [34], we computed a set of consensus sequence motifs using the MEME tool run with parameters described in the previous section. The input to the tool were sets of sequences representing the local context of 2k + 1 amino acids surrounding all phosphorylation sites in substrates targeted by the kinases in each family. The value of k was a configurable hyperparameter of the conversion algorithm representing the context size, i.e. the number of amino acids on the left and right side of the phosphorylation site. See section on Finding the Optimal Hyperpara-meters of the Model for details on the other hyperparameters. The output of the conversion process were motifs that characterise the local context of the kinase-substrate interaction using Table 6. Knowledge graph components statistics. No. of elements in the knowledge graph Motif-based relations 9,956 Kinase families 12 Kinase family motifs (relation types) 24 Avg. No. of motif/kinase family 2.00 https://doi.org/10.1371/iojrnal.pcbi.1007578.t006 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pebi. 1007578 December 3, 2020 18/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs a position-specific scoring matrix, that quantifies the relative contribution of each amino acid in the substrate sequence. The scoring matrices were extracted from the text output of the MEME tool executed as described above. The motifs were consequently used for converting the (K, L, S) statements coming from the input phosphorylation network to labeled knowledge graph edges (K, M, S), where M is a link label (also called a typed relation) that corresponds to a motif compatible with the family of K and the site L in the substrate S. Here, compatibility means a positive score of the site's context sequence with respect to the position-specific scoring matrix of the motif M. The end result of the conversion is a knowledge graph consisting of true positive statements {K, M, S). Here, a protein may act as kinase in several statements and as substrate in several other statements. Therewith, these statements describe entire known phosphorylation network from PhosphoSitePlus. Generating negative statements based on the phosphorylation biology. The knowledge graph generated from the phosphorylation network can be used for discovering new kinase-substrate relationships by means of link prediction [35], which is a technique for estimating likelihood of existence of a typed relationship between two entities based on other observed relationships in the data. The typical intention is discovering new relations that are not explicitly present in a knowledge graph. Training a link prediction model is a supervised machine learning process, and therefore requires negative examples in addition to the positive statements in the phosphorylation knowledge graph. Such negative examples are typically created by corruption of the positive statements by introducing random entities as part of the positive relation statements [35]. In our case, this technique could lead to correct kinase-substrate relationships being treated as negatives because kinases are promiscuous (i.e. one kinase can phosphorylate many substrates and one phosphorylation site can be targeted by many kinases). Hence, random corruptions of true statements may generate many false negative statements. Such false negatives would adversely affect the discriminative power of the model. Therefore, we need to impose specific restrictions when generating negative statements. We based these constraints on biological knowledge as follows. Firstly, most kinases belong to families that usually share substrates, while different families tend to phosphorylate different substrates [34]. Secondly, substrates are unlikely to be phosphorylated by a kinase if they have highly incompatible phosphorylation sites with respect to the kinase consensus motif. This incompatibility directly motivates two types of corruptions. For a statement (K, M, S), valid corruptions are: i) statements (K,M, S) such that K is from a different family than K; ii) statements (K, M, S) such that all phosphorylation sites in S score negatively with respect to the scoring matrix of the motif M. Training the model on the full input dataset to maximise its generalisation power. The model with best-performing hyper-parameters was retrained on the entire knowledge graph derived from PhosphoSitePlus. This is appropriate due to the excellent numerical stability reported in Table 1. The main reason for training the model on the entire dataset is that such a strategy is preferable for making new discoveries because it uses all available information. The model can be used for computing probabilistic ranking scores (with values between 0 and 1) of predictions ranging across all possible combinations of kinases, sites and substrates present in PhosphoSitePlus, and thus contribute to the discovery of previously unknown phosphorylations. As described in Fig 2 and the prior parts of this section, the core link prediction model works on the converted knowledge graph, which means that it can only deal with relationships that abstract the site information using motifs. Putative phosphorylations for which the model is supposed to compute scores, therefore, have to be converted to the same form. After the PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 19/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs converted phosphorylation statements are scored, they have to be transformed back to the form that contains the specific phosphorylation site. This conversion is dual to the knowledge graph conversion—each statement (K, M, S) corresponds to statements (K, L, S) such that L is a known phosphorylation site in S (as per the PhosphoSitePlus [361 and UniProt data sets) that scores positively with respect to the position-specific scoring matrix of the motif M. Given a single protein as a query, the model can produce a ranked set of candidate phosphorylation sites that involve the protein either as a substrate or as a kinase. The ranked list can optionally be filtered using high- or medium-stringency thresholds. We apply a threshold derived from the manually curated phosphorylation network we use as an input—the high-stringency threshold is a value such that 99.5% of the known phosphorylations score above it (the value is 0.672 in the reported model). The medium-stringency threshold is 0.5 (i.e. a score that indicates higher-than-random plausibility of the given statement). The ranking of the results reflects the global network context of all known phosphorylation sites and kinase-substrate relationships represented in the input knowledge graph, which is a type of information that is not incorporated by any other existing tool. Moreover, the predictions can be generated on any protein, be it a kinase or substrate. This coverage and flexibility makes our model more powerful than most existing phosphorylation prediction tools that can only be queried for substrate proteins (in the GPS and Phos-phoPredict tools, one can generate predictions associated with a kinase, but the systems combined still cover only about half of the kinases covered by LinkPhinder). In total, LinkPhinder can produce 11,581,940 predictions when applied to all putative phosphorylations that can be generated from the proteins and phosphorylation sites present in the input data (PhosphoSitePlus). Out of these, 2,009,171 and 7,232,636 are of high and medium stringency, respectively. We can make predictions for 327 human kinases, nearly twice as many predictions than the next best among six related methods we have tested (GPS [8], with 217). This shows substantial improvement in the kinome and also general proteome coverage. Further details and information about the coverage of LinkPhinder compared to other systems can be found in Table 7. Finding the optimal hyperparameters of the model. Prediction of phosphorylation reactions is based on models trained on the knowledge graph data consisting of positive and negative statements. Negative statements are computed via perturbation of positive statements by means of ad-hoc operators. In our experiments, two negative statements are generated from each positive statement. Data is split into training+validation and testing. In particular, eighty percent of the available data is used for training and validating the models and the remaining part is used for testing. This data is used to evaluate multiple link prediction techniques with the aim of optimising prediction performance. For each of these, a grid search within the space Table 7. Statistics of the coverage of the different predictive systems and their overlap with the [19] gold standard. The letters S and K in the column headers denote substrates and kinases respectively. Model Triplets Kinases Substrates K-S pairs S-S pairs SperK Sites per S Cutilass20 19066 (100.0%) 103 (100.0%) 2556 (100.0%) 15178 (100.0%) 6090(100.0%) 147.4 2.4 GPS 6130543 (5.3%) 218 (62.1%) 2531 (35.9%) 516158 (23.7%) 293070 (42.8%) 2367.7 115.8 Netphos 3614272 (2.8%) 18 (5.8%) 2531 (35.9%) 42957 (2.7%) 293354 (43.2%) 2386.5 115.9 Networkin 5192236 (0.0%) 206 (55.3%) 6676 (70.0%) 986494 (35.1%) 40737 (0.0%) 4788.8 6.1 Phosphopredict 2006186 (0.0%) 13 (1.0%) 40624 (99.7%) 252509 (0.1%) 1332427(25.1%) 19423.8 32.8 Scansite 311197(0.7%) 34(16.5%) 2530 (35.9%) 61268 (5.3%) 157214(36.5%) 1802.0 62.1 netphorest 5192236 (0.0%) 206 (55.3%) 6676 (70.0%) 986494 (35.1%) 40737 (0.0%) 4788.8 6.1 LinkPhinder 11581940 (26.2%) 327 (84.5%) 2350 (33.2%) 738518 (35.7%) 63509 (39.7%) 2258.5 27.0 https://doi.orq/1Q.1371/iojrnal.pcbi.10Q7578.tQQ7 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 20/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs of available hyper/parameter values of the models is performed. For each configuration, 10-fold cross validation is run. The combination of prediction technique and parameters that delivers the best performance is selected and this information is used to train a model on all the available data in order to exploit the entire knowledge about the phosphorylation reactions that have been experimentally validated. Three link prediction techniques have been used, they are: TransE, DistMult and ComplEx [15, 32, 331. TransE is one of the earliest techniques to have been proposed and its simplicity makes it a valid reference to learn about embeddings. In our case these embeddings are entities and relation types that are represented by means of vectors of the same length. A true statement is expected to satisfy the vectorial expression subject + relation type « object. DistMult adopts a different approach, the score is the sum of the element-wise products between the subject vector, a diagonal matrix representing the relation type and the object vector: score = 'S^^subjecti ■ relationi ■ objecti This denotes that the score is not built considering inter-relations between different latent features. ComplEx follows the same approach as DistMult, with the difference that complex numbers are used in place of real values. The score is the real part of the score formula used in DistMult. The hyperparameters that control model generation are, in this order: number of negatives generated for each positive statement; number of training epochs through which the model parameters are optimised; number of batches in which data for model training is divided; batch size of amino acid sequences for motif generation (it affects the number of relation types); number of dimensions of vectors; margin of the hinge loss; distance function for computing similarity (only for TransE); learning rate of the model and, ultimately, context size, namely, the number of amino acids to consider on the left and on the right of the binding site. While for some hyperparameters values are selected from a set, for others the values are fixed as they were determined by means of independent experiments. Their respective values are listed in Table 8. The link prediction technique that delivers the best performance is ComplEx with vectors of size 50 and context size equal to 15. This configuration was used to train a model on the entire network of phosphorylations and their associated negatives. The trained model is used to predict the likelihood of unobserved phosphorylation reactions actually existing in nature. Table 8. Hyperparameters space used by grid search to identify the best model (Lj, L2 stand for Manhattan and Euclidean distance norms, respectively). hyperparameter values number of negatives 2 number of epochs 100 number of batches for model training 10 batch size for motif generation 50 embedding size {50, 100, 150,250, 500) margin 1 similarity (only TransE) {ii,i2} learning rate 0.1 context size {7, 15} https://doi.orq/1Q.1371/iojrnal.pcbi.1QQ7578.tQQ8 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 21 /30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs Construction of the state of the art prediction data sets. The following paragraphs describe the construction of sets of predictions computed by existing tools that are used in comparative validation of the LinkPhinder model. Scansite 3. Scansite searches for motifs within protein substrates that are likely to be phos-phorylated by a specific protein kinase. It takes as input a protein substrate ID and sequence and gives as output a confidence score for given substrate amino acid sites to be phosphory-lated by one of 70 kinases handled by the system. We queried the system with all substrates contained in our phosphorylation network and separately accepted results with low and high stringency leveld. NetworKIN and NetPhorest. KinomeXplorer framework contains results of both Net-worKIN and NetPhorest systems with only the score changing. The KinomeXplorer dataset uses gene identifiers to refer to protein phosphorylation. In order to compare the results with the validation set we had first to use UniProt gene query to recover the protein identifier. After downloading the dataset, we queried UniProt using both EmbID and gene name to resolve a protein ID. In case a query did not yield any result or multiple proteins were returned, the original statement was omitted. Finally, we keept only the system protein identifier-based statement responses that pertained to the proteins contained in our phosphorylation network. NetPhos 3.1. The NetPhos 3.1 system predicts serine, threonine or tyrosine phosphorylation sites in eukaryotic proteins using ensembles of neural networks. The system can provide predictions for 17 kinases only. Using the stand-alone software package, we queried the system with all substrates and associated sequences contained in our phosphorylation network. The results obtained ar low and high stringency levels were used seperately. GPS 3.0. Group-based Prediction System (GPS) predicts phosphorylation sites with their cognate protein kinases using a four level kinase hierarchical structure in multiple species. We used the batch predictor of the desktop application to pull out results for all substrates and associated sequences contained in our phosphorylation network. PhosphoPredict. The PhosphoPredict system reportedly predicts kinase-specific substrates and the corresponding phosphorylation sites for 12 human kinases, including CSNK1A1, CSNK2A1, PRKACA, ATM, AKT1 (aka. PKB), SRC, GRK, PKC, GSK, CaMK, CDKs and MAPKs. However, only six of these actually correspond to single kinases, whereas the other seven are often rather diverse families of different proteins (CDKs, MAPKs, PKC, GRK, GSK, CaMK), and thus we focused on them in our comparison. PhosphoPredict employs a feature selection method based on the minimum Redundancy and Maximum Relevance (mRMR) to select the most informative feature subsets that contribute to the prediction success of each kinase families. We keept only those system statements which referred to the proteins present in our phosphorylation network. Comparative computational validation. A comparative evaluation was performed with the purpose of assessing the performance of LinkPhinder in the context of existing phosphorylation prediction methods (i.e. GPS, NetworKin, NetPhorest, NetPhosK, Scansite and Phos-phopredict). Since the process of training LinkPhinder is stochastic, the performance changes slightly every time a new model is trained. To minimise the variability of the results, and allow for comparison and repeatability of the experiment, the results we reported in the main part of this work were averaged over 100 runs of the experiment. The dataset generated for each run consists of positive triples, extracted from PhosphoSitePlus, and negative triples, generated by randomly combining kinases with (site.substrate) pairs that appear in PhosphoSitePlus. The training split accounts for 90% of the data, the remaining 10% is used for testing. Both training and test set contain equal numbers of positive and negative instances. PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 22/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs To evaluate LinkPhinder, training data are used to learn a model in each run and its performance is evaluated on the test set. Triples in the test set are assigned the prediction score if this is available, otherwise a zero score is assigned. One note to be taken into account regarding prediction score assignment is this. As stated in the main text, a very accurate model that generates predictions only on a small subset of the triples may be of limited use in phosphorylation prediction. Hence, we also assessed the rate of predictions a model is able to generate by measuring the percentage of triples in the test set for which the model is able to generate a prediction (i.e. non-zero score). We refered to this value as coverage in the Results section. Concerning the existing methods to which we compare ourselves, scores are extracted from the predictions provided by each method (created as described in the previous section). This does not exclude that part of the testing triples may have been used to train the comparative models. Assuming that this is the case, this would represent a disadvantage in terms of performance for our model. Similarly to the LinkPhinder case, coverage is therefore computed over the test data and zero scores are assigned to triples for which a prediction is not available. Verifying the stability of LinkPhinder under different conditions of the computational experiments. To make sure various decisions made in preparation of the benchmarking data do not influence the presented results in terms of comparing the performance of LinkPhinder and related existing tools, we have first experimented with a different positive-negative ratio (ten negatives per one positives, see Table 9), and then with various different train-test split ratios (Table 10). The increase in the number of negatives per a positive typically hampers performance of ranking-based models, and Table 9 clearly shows that our experiments are no exception. However, one can also immeditaly notice that LinkPhinder remains by far the best tool, and is significantly less affected by the change. This demonstrates the superior stability of our tool in the context of changing experimental conditions. The results in Table 10 clearly show that while the performance of LinkPhinder decreases with increasing proportion of testing over the training data, it is still superior to the corresponding Table 9. LinkPhinder performance compared to other systems on our benchmark with 1:10 positive to negative ratio in the testing split where the training/testing splits are 90% and 10% respecitvely. Model AUPR AUROC P@10 P@50 GPS 0.259 ± 0.007 0.731 ±0.006 0.337 ±0.145 0.416 ± 0.063 NetworKin 0.281 ± 0.009 0.618 ±0.007 0.798 ±0.122 0.756 ± 0.055 NetPhorest 0.199 ±0.007 0.597 ± 0.007 0.542 ±0.137 0.520 ± 0.071 Scansite 0.149 ± 0.004 0.571 ± 0.006 0.132 ±0.099 0.210 ± 0.048 Phosphopredict 0.091 ± 0.002 0.500 ± 0.006 0.029 ± 0.050 0.050 ± 0.029 Netphos 0.166 ± 0.006 0.563 ± 0.007 0.426 ±0.149 0.390 ± 0.064 LinkPhinder 0.875 ± 0.010 0.982 ± 0.002 0.993 ± 0.025 0.981 ± 0.024 https://doi.orq/10.1371/iournal.pcbi.1007578.t009 Table 10. Relative LinkPhinder performance across different training-testing splits where the positive to negative ratio of the testing set is 1:10 (the relative performance results were substantially less variable for the 1:1 ratio, therefore we do not report them here). Model AUPR AUROC P@10 P@50 Train 60%, Test 40% 0.768 ± 0.006 0.969 ± 0.001 0.987 ± 0.034 0.981 ± 0.017 Train 70%, Test 30% 0.797 ± 0.006 0.974 ±0.001 0.960 ± 0.049 0.968 ± 0.018 Train 80%, Test 20% 0.835 ± 0.005 0.978 ± 0.001 0.990 ± 0.030 0.984 ± 0.012 Train 90%, Test 10% 0.875 ± 0.010 0.982 ± 0.002 0.993 ± 0.025 0.981 ± 0.024 https://doi.orq/10.1371/iojrnal.pcbi.1007578.t010 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 23/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs results of the related works given in Table 9. This further corroborates our claim of LinkPhinder's stability with respect to different experimental conditions. Generating phosphorylation data for the web interface of LinkPhinder. In order to prepare data of phosphorylation reactions for prediction, a list of known kinases and a list of known substrates with their corresponding phosphorylation sites are extracted from the phosphorylation network. The elements of the lists are combined using their Cartesian product to generate every possible combination of kinase and phosphorylation site of each substrate. These are converted into knowledge graph phosphorylation statements and are then scored using the previously trained best-performing prediction model (i.e. the result of the grid search described previously). Finally, knowledge graph statements with their associated scores are converted back to phosphorylation site-specific statements. If there are duplicate statements after the conversion process that only differ in the scores assigned to them by the conversion and the model, we only keep the one with the highest score determined by the model. This is motivated by the fact that the model utilises more information on the actual phosphorylations than the conversion process and therefore its scores override the scores assigned after conversion. Experimental model and subject details Cell culture experiments for targeted validation. Hek-293 cells were regularly grown in Dulbecco's modified medium supplemented with 10% foetal serum. Subconfluent cell were transfected with Lipofectamine (Invitrogene) following manufacturer's instructions. pSG5-gag-AKT was previously described [37]. LATS1 siRNA and AKT siRNA were from Dharma-corn and sequences have been described before [29]. Twentyfour hours after tranfection HEK293 cells were serum deprived for 16 hours. Subsiquently, cell were lysed in 20mM HEPES (pH 7.5), 150 mM NaCl, 1% NP-40, phosphatase inhibitors (2mM NaF, lOmMb-Gly-cerolphosphate, 2 MM Na4P204) and protease inhibitors (5 f(g/ml Leupeptin and 2.2 f(g/ml aprotinin). Cell lysates were separated by SDS-PAGE analysed by western blotting. Phosphory-lated proteins were immunoprecipitated with pAKT-Substrate specific antibody. Briefly, the lysates were incubated with lfd of antibody and 5jA of protein-G sepharose beads for 1 hour at 4C in an orbital wheel. The immunoprecipitates were washed 3 times with lysis buffer. 2 bed volumes of denaturing laemli buffer were added to the dry pelleted beads and immunocom-plex were eluted by boiling the samples at 100C for 5 minutes. Anti-creb, anti-LATSl anti-P53 anti-tub, p-YAP-S127 were obtained from commercial sources. Mass-spectrometry experiments for extended validation. HeLa cells were transiently transfected with a GFP-tagged LATS1 construct or a GFP construct as control. After 2 days they were serum starved over-night and left untreated (control) or were treated with FasL (50nM) or Etoposide (50mwM) for 16 hours. Then, cells were lysed with Lysis buffer (20mM 4-(2 hydroxyethyl)-lpiperazineethanesulfonic acid (HEPES) pH7.5,150mM NaCl, 1% NP-40, phosphatase inhibitors (10 mMjS-Gycerolphosphate, 1 mM Na3V04, 2mM Na4P207, 2 mM NaF) and protease inhibitors (5 f(g/ml Leupeptin and 2.2 f(g/ml Aprotinin), and proteins were immunoprecipitated using GFP-trap_A (Chromotek) according to the manufacturer's instructions. The beads were washed 3 times with lysis buffer followed by two washes with the same buffer not containing NP-40. The proteins immunoprecipitated onto GFP-beads were prepared for masss-spectrometry analysis as previously described [38]. Briefly, the immunoprecipitates were digested in two steps. Firstly, by adding 60fd of elution buffer-1 (2M urea, 50mM Tris-HCl pH7.5, 5f/g/ml Trypsin), to each sample and incubation at 27°C on a shaker. After 30 minutes initial digestion the samples were centrifuged at 13,000 rpm in a table top centrifuge for 30 seconds and the supernatant was collected into a new Eppendorf tube. In the PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 24/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs second step 25jA of elution buffer-2 (2M urea, 50mM Tris-HCL pH7.5, ImM Dithiothreitol) was added per sample followed by centrifugation as above. The supernatant was collected into a new Eppendorf tube. The elution step was repeated, and both supernatants were combined and incubated overnight at room temperature to allow trypsin digestion to go to completion. The, samples were alkylated by adding 20 jA iodoacetamide (5mg/ml), and incubation for 30 min in the dark at room temperature. The reaction was stopped by adding 1 jA 100% Trifluora-cetic acid (TFA) to each sample. 100 jA of each sample was immediately loaded into equilibrated handmade C18 StageTips containing Octadecyl C18 disks (Supelco) for desalting. Tips were previously activated by washing with 50fd of 50% AcN and 0.1%TFA. After a quick centrifugation the tips were washed with 50jA of 0.1%TFA. 100fd samples was loaded onto the tip washed twice with 50fd of 0.1% TFA and eluted twice with 25fd of 50% AcN and 0.1% TFA solution. The eluates were combined and concentrated until the volume was reduced to 5jA using a CentriVap Concentrator (Labconco). Samples were diluted to obtain a final volume of 15fd by adding 0.1% TFA and centrifuged for 10 minutes at 13000rpm. 12fd of the samples were analysed by MS. The samples were analysed by liquid Chromatography-Tandem Mass Spectrometry (Nanoflow Ultimate 3000 LC and Q-Exactive mass spectrometer [Thermo]). A 10 cm long, 75 jim inner diameter, HLPC cl8-reversed pahes column was used. Samples were loaded at 600nl/min and peptides were eluted at a constant flow rate of 250nl/ min for 40 min. A multisegment linear gradient of 2-135% buffer (98% Acetonitrile and 0.1% formic acid) in positive ion mode was used. Data were acquired with the mass spectrometer operating in automatic data dependent switching mode selecting the 12 most intense ions prior to MS/MS analysis. Mass spectra were analysed by MaxQuant. Label-free quantitation was performed using MaxQuant. PKA Kinase assay. Serum straved HEK293T were lysed in a Nonidet P-40 buffer (50 mM Tris-HCl, pH 7.8,150 mM NaCl, 1% (vol/vol) Nonidet P-40, protease inhibitors and phosphatase inhibitors). Lysates were treated at 1 mg/ml with 10 mM 5'-4-fluorosulphonylbenzoylade-nosine (FSBA) solubilised in DMSO and then incubated at 31 °C for 2 hour. Samples were spun down at 200 x g to remove any precipitate. Sample were diluted down with 2 ml of PKA kinase buffer (50 mM Tris pH 7.5,10 mM MgCl2, 0.1 mM EGTA and 2 mM DTT) and desalted using a Millipore Amicon ultrafiltration columns with a 3 kDa molecular weight cutoff. Following concentration, the samples were incubated with PKA kinase buffer (50 mM Tris pH 7.5,10 mM MgCl2, 0.1 mM EGTA and 2 mM DTT), 500 uM ATP-biotin and 1250 units of recombinant PKA (New England Biolabs) in a total volume of 60 jA. Control samples without recombinant PKA and ATP-biotin were also made up. The controls and kinase-added samples were incubated at 31 °C for 2 hours. 300 jA of phosphate buffer was added to the samples. Streptavidin resin (100 jA of a 50% slurry) was incubated with the samples overnight at 4°C. Samples were spun down samples at 2000 x g for 1 minute and the supernatant was removed. Samples were washed 5 times with 1 ml of phosphate buffer. Samples were analysed by mass spectrometry. The full results of the assay are given in the kinase assays supplement (SI Table). MST2 Kinase assay. Serum straved HEK293T cells were treated with 3 jiM of the MST2 kinase specific inhibitor, XMU-MP-1 or DMSO for 3 hours. Cells were lysed in a Nonidet P-40 buffer (50 mM Tris-HCl, pH 7.8,150 mM NaCl, 1% (vol/vol) Nonidet P-40, protease inhibitors and phosphatase inhibitors). Lysates were treated at 1 mg/ml with 10 mM 5'-4-fluorosul-phonylbenzoyladenosine (FSBA) solubilised in DMSO and then incubated at 31 °C for 2 hour. Samples were spun down at 200 x g to remove any precipitate. Sample were diluted down with 2 ml of MST2 kinase buffer (40 mM HEPES pH 8.0,10 mM MgCl2, 0.5 mM EGTA) and desalted using a Millipore Amicon ultrafiltration columns with a 3 kDa molecular weight cutoff. Following concentration, the samples were incubated with MST2 kinase assay buffer (40 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 25/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs mM HEPES pH 8.0,10 mM MgCl2, 0.5 mM EGT A), 500 uM ATP-biotin and 32 ng of recombinant MST2 (made in house) in a total volume of 60 jA. Control samples without recombinant MST2 and ATP-biotin were also made up. The controls and kinase-added samples were incubated at room temperature for 3 hours. 300 jA of phosphate buffer was added to the samples. Streptavidin resin (100 jA of a 50% slurry) was incubated with the samples for 1 hour at room temperature. Samples were spun down samples at 2000 x g for 1 minute and the supernatant was removed. Samples were washed 5 times with 1 ml of phosphate buffer. Samples were analysed by mass spectrometry. The full results of the assay are given in the kinase assays supplement (S2 Table). LATS1 Kinase assay. Serum straved HEK293T cells were lysed in a Nonidet P-40 buffer (50 mM Tris-HCl, pH 7.8,150 mM NaCl, 1% (vol/vol) Nonidet P-40, protease inhibitors and phosphatase inhibitors). Lysates were treated at 1 mg/ml with 10 mM 5'-4-fluorosulphonyl-benzoyladenosine (FSBA) solubilised in DMSO and then incubated at 31°C for 2 hour. Samples were spun down at 200 x g to remove any precipitate. Sample were diluted down with 2 ml of LATS1 kinase buffer (25 mM HEPES pH 7.4, 50 mM NaCl, 5 mM MgCl2 and 5 mM MnCl2, 5 mM /(-glycerophosphate and 1 mM dithiothreitol) and desalted using a Millipore Amicon ultrafiltration columns with a 3 kDa molecular weight cutoff. Following concentration, the samples were incubated with LATS1 kinase assay buffer (25 mM HEPES pH 7.4, 50 mM NaCl, 5 mM MgCl2 and 5 mM MnCl2, 5 mM ^-glycerophosphate and 1 mM dithiothreitol), 500 uM ATP-biotin and 100 ng of recombinant LATS1 (Abeam) in a total volume of 60 jA. Control samples without recombinant LATS1 and ATP-biotin were also made up. The controls and kinase-added samples were incubated at 30 °C for 30 minutes. 300 jA of phosphate buffer was added to the samples. Streptavidin resin (100 jA of a 50% slurry) was incubated with the samples for 1 hour at room temperature. Samples were spun down samples at 2000 x g for 1 minute and the supernatant was removed. Samples were washed 5 times with 1 ml of phosphate buffer. Samples were analysed by mass spectrometry. The full results of the assay are given in the kinase assays supplement (S3 Table). Mass spectrometry sample preparation. The streptavidin resin containing the bound proteins were incubated with 400 jA of elution buffer I (50 mM Tris-HCl ph 7.5, 2 M Urea, 181 ng/jA trypsin) at 37°C for 30 minutes. The samples were spun at 2000 x g and the superntant was retained. To the streptavidin resin 330 jA of elution buffer II (50 mM Tris-HCl ph 7.5, 2 M Urea, 1 mM DTT) at 37°C for 1 hour. The samples were spun at 2000 x g and the superntant was retained. The two supernatant of elution buffers I and II were combined and incubated overnight at 37°C. After the incubation 130 jA of 5 mg/ml Iodocetamide was added to each and the samples were incubated for 30 minutes at room temperature in the dark. C18 stage tips that were previously prepared were mounted into a 1.5 ml eppendorf were activated by adding 50 jA of 50% acetonitrile (AcN) and 0.1% Trifluoroacetic acid (TFA). The samples were spun at 5000 rpm for 1 minute. 50 jA of 1% TFA was added to the C18 stage tips and the samples were spun at 5000 rpm. After the Iodocetamide incubation the reaction was stopped by adding 1 jA of 100% TFA. The samples were loaded onto the C18 stage tips and they were spun at 5000 rpm. The C18 stage tips were then washed by adding 50 jA of 1% TFA and then the samples were spun at 5000 rpm, this was done twice. Before elution of the samples, the C18 stage tips were mounted into fresh 1.5 ml eppendorfs. The peptides were eluted of the C18 stage tips by adding 25 jA of 50% AcN and 0.1% TFA and spinning the samples at 5000 rpm, this was repeated twice. Samples were evaporated for 10-15 in a CentriVap concentrator until 5 jA was left. The sample was then respuspended in 20 jA of TFA. The samples were then analysed by mass spectrometry. Mass spectrometry. Mass spectrometry was performed using a Ultimate 3000 RSLC system that was coupled to an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific). PLOS Computational Biology | https://doi.orq/10.1371 /journal.pebi. 1007578 December 3, 2020 26/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs Following tryptic digest, the peptides were loaded onto a nano-trap column (300 jiM i.d x 5mm precolumn that was packed with Acclaim PepMaplOO C18, 5 jiM, 100 A; Thermo Scientific) running at a flow rate of 30 fd/min in 0.1% trifluoroacetic acid made up in HPLC water. The peptides were eluted and separated on the analytical column (75 jiM i.d. x 25 cm, Acclaim PepMap RSLC CI 8, 2 jiM, 100 A; Thermo Scientific) after 3 minutes by a linear gradient from 2% to 30%of buffer B (80% acetonitrile and 0.08% formic acid in HPLC water) in buffer A (2% acentonitrile and 0.1% formic acid in HPLC water) using a flow rate if 300 nl/min over 150 minutes. The remaining peptides were eluted using a short gradient from 30% to 95% in buffer B for 10 minutes. The mass spectrometry parameters were as follows: for full mass spectrometry spectra, the scan range was 335-1500 with a resolution of 120,000 at m/z = 200. MS/MS acquisition was performed using top speed mode with 3 seconds cycle time. Maximum injection time was 50 ms. The AGC target was set to 400,000, and the isolation window was 1 m/z. Positive Ions with charge states 2-7 were sequentially fragmented by higher energy collisional dissociation. The dynamic exclusion duration was set at 60 seconds and the lock mass option was activated and set to a background signal with a mass of445.12002. Analysis of mass spectrometry data. Analysis was performed using MaxQuant (version 1.5.3.30). Trypsin was set to be the digesting enzyme with maximal 2 missed cleavages. Cysteine carbmidomethylation was set for fixed modifications and oxidation of methionine and N-thermal acetylation were specified as variable modifications. The data was then analysed with the minimum ratio count of 2. The first search peptide was set to 20, the main search peptide tolerance to 5 ppm and the "re-quantify" option was selected. For protein and peptide identification the Human subset of the SwissProt database (Release 2015_12) was used and the contaminants were detected using the MaxQuant contaminant search. A minimum peptide number of 1 and a minimum of 6 amino acids was tolerated. Unique and razor peptides were used for quantification. The match between run option was enabled with a match time window of 0.7 minutes and an alignment window of 20 minutes. Quantification and statistical analysis Peptide identification. MaxQuant (version 1.3.0.5.) was used to analyse raw mass spec-trometric data files from LC-MS/MS for protein quantification. Default settings were used unless stated otherwise, including the following parameters: Trypsin/P digest allowing for 2 misscleavages; variable modifications included oxidation and acetylation; fixed modification included carbamidomethylation (at Cysteine); to detect phosphopeptides we included phos-pho (STY) as a modification; first search at 20 ppm: main search at 6 ppm mass accuracy (MS) and 20ppm mass deviation for the fragment ions. The MS data were searched against a human database (Uniprot HUMAN) with a minimum peptide length of 6, unfiltered for labelled amino acids, at a false discovery rate (FDR) of 0.01 for peptides and proteins. The results were refined through the re-quantify option; also "match between runs" was selected with a 1 min time window, and label free quantification was selected with the minimum ratio count set at 1. Supporting information 51 Table. PKA Kinase Assay Results (an PDF file; c.f. https://doi.org/10.6084/m9.figshare. 13118441). (PDF) 52 Table. MST2 Kinase Assay Results (an PDF file; c.f. https://doi.org/10.6084/m9. figshare.13118477). (PDF) PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 27/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs 53 Table. LATS1 Kinase Assay Results (an PDF file; c.f. https://doi.org/10.6084/m9. figshare.13118483). (PDF) 54 Table. Mass-spec results for the LATS1 IP (an xlsx file; c.f. https://doi.org/10.6084/m9. figshare.12173163). (XLSX) 55 Table. Mass-spec data for the PKA kinase assay (an xlsx file; c.f. https://figshare.com/ articles/Mass spec data PKA/12200681). (XLSX) 56 Table. Mass-spec data for the MST2 kinase assay (an xlsx file; c.f. https://doi.org/10. 6084/m9.figshare.l2200675.vl). (XLSX) 57 Table. Mass-spec data for the LATS1 kinase assay (an xlsx file; c.f. https://figshare.com/ articles/Mass spec data LATS kinase assay/12200597). (XLSX) 51 Data. Full set of LinkPhinder predictions (a single bzip2-archived CSV file; c.f. https:// doi.org/10.6084/m9.figshare. 12173100). (BZ2) 52 Data. Full set of predictions computed by the related works (a bzip2-archive of 6 CSV files for each of the related tools; c.f. https://doi.org/10.6084/m9.figshare. 12173109) ■ (TBZ) SI Fig. Supporting details on the experimental validation of the LATS1/YAP1 phosphorylation: (A-B) HEK293 were transfected with the indicated siRNAs. 48 hours after transfection the cells were lysed and blotted with the indicated antibodies. (C) HEK293 were transfected with empty vector (EV) or GAG-AKT or treated with AKTi IV (10M) for 1 hour. Phosphory-lated proteins were immunoprecipitated using an anti-AKT antibody and the immunoprecipi-tates were blotted with the indicated antibodies (a PDF figure, c.f. https://doi.org/10.6084/m9. figshare.13118561). (PDF) Author Contributions Conceptualization: Vít Nováček, Pierre-Yves Vandenbussche, Walter Kolch, Dirk Fey. Data curation: Piero Conca, Emir Muňoz, Luca Costabello, Kamalesh Kanakaraj, Zeeshan Nawaz, Pierre-Yves Vandenbussche. Funding acquisition: Vít Nováček, David Matallanas, Pierre-Yves Vandenbussche, Walter Kolch, Dirk Fey. Methodology: Vít Nováček, David Matallanas, Pierre-Yves Vandenbussche, Walter Kolch, Dirk Fey. Project administration: Vít Nováček, Pierre-Yves Vandenbussche, Walter Kolch. Resources: Gavin McGauran, David Matallanas, Adrián Vallejo Blanco, Walter Kolch, Dirk Fey. PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 28/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs Software: Vít Nováček, Piero Conca, Emir Muňoz, Luca Costabello, Kamalesh Kanakaraj, Zeeshan Nawaz, Sameh K. Mohamed. Supervision: Vít Nováček, Pierre-Yves Vandenbussche, Walter Kolch. Validation: Vít Nováček, Gavin McGauran, David Matallanas, Adrián Vallejo Blanco, Piero Conca, Emir Muňoz, Luca Costabello, Kamalesh Kanakaraj, Zeeshan Nawaz, Brian Walsh, Sameh K. Mohamed, Pierre-Yves Vandenbussche, Dirk Fey. Visualization: Kamalesh Kanakaraj, Zeeshan Nawaz. Writing - original draft: Vít Nováček, Pierre-Yves Vandenbussche, Dirk Fey. Writing - review & editing: Colm J. Ryan, Walter Kolch. References 1. Kolch W, Halasz M, Granovskaya M, Kholodenko BN. The dynamic control of signal transduction networks in cancer cells. Nature Reviews Cancer. 2015; 15(9):515. https://doi.org/10.1038/nrc3983 2. Ferguson FM, Gray NS. Kinase inhibitors: the road ahead. Nature Reviews Drug Discovery. 2018; 17 (5):353. https://doi.org/10.1038/nrd.2018.21 3. Cohen P, Alessi DR. Kinase drug discovery-what's next in the field? ACS chemical biology. 2012; 8 (1):96-104. 4. Wu P, Nielsen TE, Clausen MH. FDA-approved small-molecule kinase inhibitors. Trends in pharmacological sciences. 2015; 36(7):422-439. https://doi.orq/10.1016/i.tips.2015.04.005 5. Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, et al. Phospho. ELM: a database of phosphorylation sites—update 2011. Nucleic acids research. 2011; 39(suppl 1):D261-D267. https://doi.org/ 10.1093/nar/gkq1104PMID: 21062810 6. Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, J0rgensen C, Miron IM, et al. Systematic discovery of in vivo phosphorylation networks. Cell. 2007; 129(7):1415-1426. https://doi.Org/10.1016/j.cell.2007. 05.052 PMID: 17570479 7. Obenauer JC, Cantley LC, Yaffe MB. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic acids research. 2003; 31 (13):3635-3641. https://doi.org/10. 1093/nar/qkq584 8. Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Molecular & cellular proteomics. 2008; 7(9): 1598-1608. https://doi.orq/10.1074/mcp. M700574-MCP200 9. Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S. Prediction of post-translational glycosyl-ation and phosphorylation of proteins from the amino acid sequence. Proteomics. 2004; 4(6):1633- 1649. https://doi.orq/10.1002/pmic.200300771 10. Horn H, Schoof EM, Kim J, Robin X, Miller ML, Diella F, et al. KinomeXplorer: an integrated platform for kinome biology studies. Nature methods. 2014; 11 (6):603-604. https://doi.org/10.1038/nmeth.2968 PMID: 24874572 11. Song J, Wang H, Wang J, Leier A, Marquez-Lago T, Yang B, et al. PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Scientific Reports. 2017; 7(1):6862. https://doi.org/10.1038/s41598-017-07199-4 PMID: 28761071 12. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome, science. 2001; 291 (5507): 1304-1351. https://doi.orq/10.1126/science. 1058040 PMID: 11181995 13. Wang Q, MaoZ, Wang B, Guo L. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering. 2017; 29(12):2724-2743. https://doi. orq/10.1109/TKDE.2017.2754499 14. Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, SkrzypekE. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic acids research. 2015; 43(D1):D512-D520. https://doi.org/ 10.1093/nar/gku1267 15. TrouillonT, Welbl J, Riedel S, Gaussier E, Bouchard G. Complex embeddings for simple link prediction. arXiv preprint arXiv:160606357.2016;. PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 29/30 PLOS COMPUTATIONAL BIOLOGY Accurate prediction of kinase-substrate networks using knowledge graphs 16. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of Machine Learning Research. 2012; 13(Feb):281-305. 17. Needham EJ, Parker BL, BurykinT, James DE, Humphrey SJ. Illuminating the dark phosphoproteome. Sci Signal. 2019; 12(565):eaau8645. https://doi.Org/10.1126/scisignal.aau8645 18. Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning. ACM; 2006. p. 233-240. 19. Hijazi M, Smith R, Rajeeve V, Bessant C, Cutillas PR. Reconstructing kinase network topologies from phosphoproteomics data reveals cancer-associated rewiring. Nature Biotechnology. 2020; p. 1-10. 20. Martini M, De Santis MC, Braccini L, Gulluni F, Hirsch E. PI3K/AKT signaling pathway and cancer: an updated review. Annals of medicine. 2014; 46(6):372-383. https://doi.org/10.3109/07853890.2014. 912836 21. Fallahi E, O'Driscoll NA, Matallanas D. The MST/Hippo pathway and cell death: a non-canonical affair. Genes. 2016; 7(6):28. https://doi.org/10.3390/genes7060028 22. Gomez M, Gomez V, Hergovich A. The Hippo pathway in disease and therapy: cancer and beyond. Clinical and translational medicine. 2014; 3(1):22. 23. Mayer I A, ArteagaCL. The PI3K/AKT pathway as a target for cancer treatment. Annual review of medicine. 2016; 67:11-28. https://doi.Org/10.1146/annurev-med-062913-051343 24. Technology CS. PI3K/Akt Substrates Table;, https://www.cellsiqnal.com/contents/resources-reference-tables/pi3k-akt-substrates-table/science-tables-akt-substrate. 25. Mantamadiotis T, Papalexis N, Dworkin S. CREB signalling in neural stem/progenitor cells: recent developments and the implications for brain tumour biology. Bioessays. 2012; 34(4):293-300. https:// doi.orq/10.1002/bies.201100133 26. Wang J, Ma L, Weng W, Qiao Y, Zhang Y, He J, et al. Mutual interaction between YAP and CREB promotes tumorigenesis in liver cancer. Hepatology. 2013; 58(3): 1011-1020. https://doi.org/10.1002/hep. 26420 PMID: 23532963 27. Romano D, Matallanas D, Weitsman G, Preisinger C, Ng T, Kolch W. Proapoptotic kinase MST2 coordinates signaling crosstalk between RASSF1 A, Raf-1, and Akt. Cancer research. 2010; p. 0008-5472. 28. Von Kriegsheim A, Baiocchi D, Birtwistle M, Sumpton D, Bienvenut W, Morrice N, et al. Cell fate decisions are specified by the dynamic ERK interactome. Nature cell biology. 2009; 11(12): 1458. https://doi. orq/10.1038/ncb1994 PMID: 19935650 29. Matallanas D, Romano D, Yee K, Meissl K, Kucerova L, Piazzolla D, et al. RASSF1A elicits apoptosis through an MST2 pathway directing proapoptotic transcription by the p73 tumor suppressor protein. Molecular cell. 2007; 27(6):962-975. https://doi.org/10.1016/i.molcel.2007.08.008 PMID: 17889669 30. Embogama DM, Pflum MKH. K-BILDS: A Kinase Substrate Discovery Tool. ChemBioChem. 2017; 18 (1): 136-141. https://doi.Org/10.1002/cbic.201600511 31. Hernandez-Armenta C, Ochoa D, Gongalves E, Saez-Rodriguez J, Beltrao P. Benchmarking substrate-based kinase activity inference using phosphoproteomic data. Bioinformatics. 2017; 33(12): 1845-1851. https://doi.org/10.1093/bioinformatics/btx082 32. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems; 2013. p. 2787-2795. 33. Yang B, Yih Wt, He X, Gao J, Deng L. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv: 14126575. 2014;. 34. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science. 2002; 298(5600): 1912-1934. https://doi.Org/10.1126/science. 1075762 35. Nickel M, Murphy K, Tresp V, Gabrilovich E. A Review of Relational Machine Learning for Knowledge Graphs. Proceedings of the IEEE. 2016; 104(1):11-33. https://doi.orq/10.1109/JPROC.2015.2483592 36. Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek Ez, Murray B, et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic acids research. 2011; 40(D1):11-22. https://doi. orq/10.1093/nar/qkr1122 PMID: 22135298 37. Boudewijn MT, Coffer PJ, et al. Protein kinase B (c-Akt) in phosphatidylinositol-3-OH kinase signal transduction. Nature. 1995; 376(6541 ):599. https://doi.Org/10.1038/376599a0 38. Turriziani B, Garcia-Munoz A, Pilkington R, Raso C, Kolch W, von Kriegsheim A. On-beads digestion in conjunction with data-dependent mass spectrometry: a shortcut to quantitative and dynamic interaction proteomics. Biology. 2014; 3(2):320-332. https://doi.orq/10.3390/bioloqy3020320 PLOS Computational Biology | https://doi.orq/10.1371 /journal.pcbi. 1007578 December 3, 2020 30/30