[17:56 6/7/2011 Bioinformatics-btr340.tex] Page: 2062 2062–2067
BIOINFORMATICS ORIGINAL PAPER
Vol. 27 no. 15 2011, pages 2062–2067
doi:10.1093/bioinformatics/btr340
Sequence analysis Advance Access publication June 8, 2011
Prediction of transporter targets using efﬁcient RBF networks
with PSSM proﬁles and biochemical properties
Shu-An Chen1, Yu-Yen Ou1,∗, Tzong-Yi Lee1 and M. Michael Gromiha2,∗
1Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan, 2Department of
Biotechnology, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
Associate Editor: Martin Bishop
ABSTRACT
Summary: Transporters are proteins that are involved in the
movement of ions or molecules across biological membranes.
Currently, our knowledge about the functions of transporters
is limited due to the paucity of their 3D structures. Hence,
computational techniques are necessary to annotate the functions
of transporters. In this work, we focused on an important functional
aspect of transporters, namely annotation of targets for transport
proteins. We have systematically analyzed four major classes
of transporters with different transporter targets: (i) electron, (ii)
protein/mRNA, (iii) ion and (iv) others, using amino acid properties.
We have developed a radial basis function network-based method
for predicting transport targets with amino acid properties and
position speciﬁc scoring matrix proﬁles. Our method showed a 10fold
cross-validation accuracy of 90.1, 80.1, 70.3 and 82.3% for
electron transporters, protein/mRNA transporters, ion transporters
and others, respectively, in a dataset of 543 transporters. We have
also evaluated the performance of the method with an independent
dataset of 108 proteins and we obtained similar accuracy. We
suggest that our method could be an effective tool for functional
annotation of transport proteins.
Availability: http://rbf.bioinfo.tw/~sachen/ttrbf.html
Contact: yien@csie.org; gromiha@iitm.ac.in
Supplementary information: Supplementary data are available at
Bioinformatics online.
Received on March 7, 2011; revised on May 23, 2011; accepted on
May 25, 2011
1 INTRODUCTION
Membrane proteins play a vital role in living organisms, such
as transport of ions and molecules across membranes, binding to
small molecules within extracellular space, recognition processes
in immune system, energy transduction, and so on. Transporters
are one of the major classes of membrane proteins, spanning
cell membranes and forming an intricate system of pumps and
channels. They are different in their membrane topologies, energy
coupling mechanisms and the speciﬁcs of their substrates (Ren
et al., 2007; Saier, 2000). Transporters are generally classiﬁed
into channels/pores, electrochemical and active transporters, group
∗To whom correspondence should be addressed.
translocators, electron carriers, as well as factors involved in
transport systems (Saier et al., 2006). The transport primarily
involves ions, proteins, mRNA and electrons. The classiﬁcation
of transporters based on different families as well as their targets
remains an important problem for the advancement of structural
and functional genomics.
Recently, few methods have been proposed for discriminating
transport proteins based on their functions. Gromiha and Yabuki
(2008) analyzed the amino acid composition of transporters and
developed a neural network-based method for classifying them into
channels/pores, electrochemical and active transporters. Li et al.
(2008, 2009) utilized nearest neighbor and hidden Markov model
methods, which integrate sequence similarity search and topological
analysis into a machine-learning framework for categorizing
transporters. In our previous work (Ou et al., 2010), we have
systematically analyzed the amino acid composition, residue pair
preference and amino acid properties in six different families of
transporters. Utilizing the information, we have developed a radial
basis function (RBF) network method based on proﬁles obtained
with position-speciﬁc scoring matrices (PSSMs) for discriminating
transporters belonging to three different classes and six families.
In this work, we focused on another important aspect of functional
annotation of transporters, i.e. the prediction of transporter
targets. First, we constructed a transporter database with target
information from the latest version of UniProt database (The UniProt
Consortium, 2010). The database was then divided into four major
classes of transporters based on their transport targets. The four
classes are electron transporters, protein/mRNA transporters, ion
transporters and other transporters. We have systematically analyzed
the characteristic features of amino acid residues and developed
a radial basis network for discriminating transporter targets using
amino acid properties and PSSM proﬁles. Our method showed a
10-fold cross-validation accuracy of 90.1, 80.1, 70.3 and 82.3% for
electron transporters, protein/mRNA transporters, ion transporters
and others. We evaluated the performance of this method with an
independent dataset of 108 proteins and obtained an accuracy of
92.6, 77.8, 69.4 and 80.6% for electron transporters, protein/mRNA
transporters, ion transporters and others, respectively. A web
server has been developed for discriminating transporters based on
different targets and it is available online for the users. Finally,
we developed a protocol to analyze newly discovered genomic
sequences to annotate putative transporters and their transport
targets. We propose that our method could be an effective tool in the
annotation of transporters in genomic sequences.
2062 © The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
atMasarykUniversityonSeptember16,2011bioinformatics.oxfordjournals.orgDownloadedfrom
[17:56 6/7/2011 Bioinformatics-btr340.tex] Page: 2063 2062–2067
Prediction of transporter targets
2 METHODS
2.1 Dataset
We constructed a dataset of 2452 transporters, which are known at
protein level and clear target annotation in the UniProt database (The
UniProt Consortium, 2010). Using BLAST (Altschul et al., 1997), we
removed the sequences that have more than 20% identity. The ﬁnal dataset
contains 651 transporters with 98 electron transporters, 266 protein/mRNA
transporters, 200 ion (ammonia, calcium, hydrogen ion, chloride, potassium,
sodium, phosphate, sulfate, cobalt, nickel, copper, iron and neurotransmitter)
transporters and 87 other transporters.
From the dataset, we have randomly selected 17 electron, 44
protein/mRNA, 33 ion and 14 other transporters for independent tests. The
remaining data were used for 10-fold cross-validation. The detailed statistics
of the dataset is listed in Supplementary Table S1. The overlap with the data
available in Transport Classiﬁcation Database is given in Supplementary
Table S2.
2.2 Design of the RBF networks
We have employed the QuickRBF package (Ou, 2005) to construct RBFN
classiﬁers in this work. The architecture of RBF network is shown in Figure 1.
As presented in Figure 1, a general RBFN consists of three layers, namely the
input layer, the hidden layer and the output layer. The input layer broadcasts
the coordinates of the input vector to each of the nodes in the hidden layer.
Each node in the hidden layer then produces an activation based on the
associated radial basis kernel function. Finally, each node in the output layer
computes a linear combination of the activations of the hidden nodes. The
general mathematical form of the output nodes in RBFN is as follows:
cj(x)=
k
i=1
wjiφ( x−µi ;σi); (1)
where cj(x) denotes the function corresponding to the j-th output node and
it is a linear combination of k RBFs φ() with center µi and bandwidth σi.
Also, Wji denotes the weight associated with the correlation between the j-th
output node and the i-th hidden node.
A ﬁxed bandwidth of 5 for each kernel function is employed in the
network. We have carried out the computation with different bandwidths
1, 5, 10 and 20, and we obtained similar results. In addition, we used all
training data as centers. Then, the RBFN classiﬁer identiﬁes four types of
transporters based on the output function value. More details about network
structure and design have been explained in our earlier article (Ou et al.,
2005).
Fig. 1. The architecture of RBF network. The input, hidden and output layers
are shown with associated weights.
Classiﬁcation based on RBF networks has several applications in
bioinformatics. It has been widely used to predict the cleavage sites in
proteins (Yang and Thomson, 2005), interresidue contacts (Zhang and
Huang, 2004), protein disorder (Su et al., 2006) and the discrimination of
β-barrel membrane proteins (Ou et al., 2008).
2.3 Compositions of amino acids and amino acid pairs
We used n vectors {xi,i=1,...,n}, to represent all n proteins in the
training data. Each vector was labeled to show the protein groups (e.g.
electrochemical transporters or active transporters). The vector xi has 20
elements for the composition of amino acids, and 400 elements for the
composition of amino acid pairs. The 20 elements show the number of
occurrences of 20 amino acids normalized with total number of residues
in a protein, and the 400 elements show the number of occurrences of those
400 amino acid pairs normalized with the total number of residues in a
protein. Further, we have used the combinations of amino acid and residue
pair compositions with 420 elements in each vector.
2.4 PSSM proﬁles
From the structural point of view, several amino acid residues could be
mutated without altering the structure of the protein, making it possible
that two proteins could share similar structures with different amino acid
compositions. Hence, we have adopted the PSSM proﬁles, which have
been widely used in protein secondary structure prediction, subcellular
localization and other bioinformatics problems with notable improvement
(Jones, 1999; Ou et al., 2008; Xie et al., 2005). The PSSM proﬁles were
obtained with PSI-BLAST and the non-redundant (NR) protein database.
In the classiﬁcation of transport proteins, we used PSSM proﬁles to
generate 400 dimension (20×20 residue pairs) input vectors as input features
by summing up each row of same amino acid in the PSSM proﬁles and
the variable is denoted as ‘x’. Supplementary Figure S1 shows the details
of generating the 400D (dimension) PSSM features from original PSSM
proﬁles. Every element of 400D input vector was divided by the length of
the sequence and then be scaled by 1
1+e−x for normalizing the values between
0 and 1.
2.5 Biochemical properties
To enhance prediction performance, we included the properties of amino
acid residues as new features. In this study, we analyzed the composition
of 20 amino acids, the composition of 400 amino acid pairs and 544 the
biochemical properties in the AAindex database (Kawashima et al., 2008).
We divided the amino acid sequences of each protein into four equal parts,
and computed average value of biochemical properties for each part. Then,
we calculated the F-score for four parts individually using Equation (1) and
obtained the average. The properties were ranked with the average of four
F-score values.
We added these topmost ranking biochemical properties one by one to the
PSSM feature sets according to their F-score value, and kept the property in
the feature set if it improved the performance via 10-fold cross-validation.
The F-score of the i-th feature is deﬁned as:
F-score(i)=
x
(+)
i −xi
2
+ x
(−)
i −xi
2
1
n+−1
n+
k=1
x
(+)
k,i −x
(+)
i
2
+ 1
n−−1
n−
k=1
x
(−)
k,i −x
(−)
i
2
(2)
where xi, x
(+)
i and x
(−)
i are the average of the i-th feature of the whole,
positive and negative datasets, respectively; n+ is the number of positive
dataset and n− is the number of negative dataset; x
(+)
k,i is the i-th feature of the
k-th positive instance, and x
(−)
k,i is the i-th feature of the k-th negative instance
(Chen and Lin, 2006). For each classiﬁcation, speciﬁc target (e.g. electron
transport) is treated as positive dataset and rest of them (protein/mRNA, ion
and others) constitute negative dataset. It may be noted that the properties
2063
atMasarykUniversityonSeptember16,2011bioinformatics.oxfordjournals.orgDownloadedfrom
[17:56 6/7/2011 Bioinformatics-btr340.tex] Page: 2064 2062–2067
S.-A.Chen et al.
Fig. 2. (a) Amino acid composition in four classes of transporters; (b) variance of 20 amino acid residues among four classes of transporters, electron,
protein/mRNA, ion and others.
are selected with ‘best ﬁt’ for 10-fold cross-validation. However, we have
also evaluated the method using an independent dataset of 108 transporters,
which veriﬁes the reliability of results.
2.6 Assessment of predictive ability
The prediction performance was examined by 10-fold cross-validation test,
in which the four types of proteins were randomly divided into 10 subsets
of approximately equal size. We trained the data with nine subsets and the
remaining set was used to test the performance of the method. This process
was repeated 10 times so that every subset had been used as the test data
once.
We used sensitivity, speciﬁcity, accuracy and Matthew’s correlation
coefﬁcient (MCC) to measure the prediction performance. TP, FP, TN, FN are
true positives, false positives, true negatives and false negatives, respectively.
Sensitivity=
TP
TP+FN
(3)
Speciﬁcity=
TN
TN+FP
(4)
Accuracy=
TP+TN
TP+FP+TN+FN
(5)
MCC=
TP×TN−FP×FN
(TP+FP)(TP+FN)(TN+FP)(TN+FN)
(6)
3 RESULTS
3.1 Amino acid composition in various classes of
transporters
Figure 2 shows the computed composition of amino acids and their
variance in four types of transporters. Figure 2a showed that the
composition of Gln, Ser and Trp in electron transporters was notably
different from that of other classes. The variance of 20 amino
acid residues between four classes can be seen in Figure 2b. The
residues Glu, Gly, Lys, Gln, Ser and Ala had a variance higher than
0.5. Interestingly, the variance of these residues was different from
other classes and due mainly to the composition of the residues in
protein/mRNA transporters.
3.2 Dipeptide composition preference in various classes
of transporters
We computed the residue pair preference (dipeptide composition) for
all 400 possible residue pairs in electron transporters, protein/mRNA
Table 1. Features used for classifying transporters
Properties
Electron transport Q, W, S, N, D, Y, V
Protein/mRNA transport Q, E, K
Ion transport B-factors for amino acid residues
Others AG, EE, GA
transporters, ion transporters and others. Based on the dipeptide
composition, we computed the variance, and the residue pairs that
have the topmost variances are listed in Supplementary Table S3.
The residue pairs, GL, VF, LG and GA showed a variance greater
than 0.03; and the residue pair with Gly showed the highest variance
among the four classes of transporters.
3.3 Amino acid properties with high F-score for four
classes of transporters
We computed the F-score for the composition of 20 amino
acids, 400 amino acid pair compositions and 544 biochemical
properties in the AAindex database (Kawashima et al., 2008).
The properties with topmost scores are given in Supplementary
Table S4. These properties were added to PSSM proﬁles and their
inﬂuence was examined with the ability of improving the accuracy
of discrimination. The features selected for the classiﬁcation of each
transporter (electron, protein/mRNA, ion and others) are presented
in Table 1. Interestingly, the inﬂuence of amino acid properties is
vital for all four classes of transporters (Table 2).
3.4 Importance of selected features for the structure
and function of proteins
We have used amino acid composition, residue pair preference,
biochemical properties and evolutionary information in the form
of PSSM proﬁles as main features in the present study. These
features play an important role to classify proteins based on
their structure and function. It has been shown that the positive
charged residues are overrepresented in the binding regions of
DNA and RNA binding proteins to interact with the DNA/RNA
(Ahmad et al., 2004; Bhardwaj and Lu, 2007; Jeong et al., 2003;
Jones et al., 2001; Kumar et al., 2008; Terribilini et al., 2006;
2064
atMasarykUniversityonSeptember16,2011bioinformatics.oxfordjournals.orgDownloadedfrom
[17:56 6/7/2011 Bioinformatics-btr340.tex] Page: 2065 2062–2067
Prediction of transporter targets
Table 2. Discrimination of four classes of transporters with different features
Method Sensitivity (%) Speciﬁcity (%) Accuracy (%) MCC
T1 T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4
Cross-validation dataset (543 proteins)
PSI-BLAST 42.0 66.7 65.3 45.2 95.5 78.8 74.7 92.6 87.5 73.8 71.8 86.2 0.44 0.46 0.38 0.39
AAC 56.8 72.1 59.9 57.5 91.1 74.1 66.2 81.3 86.0 73.3 64.3 78.1 0.47 0.46 0.24 0.31
DPC 63.0 70.3 67.1 56.2 85.9 69.5 61.2 80.2 82.5 69.8 63.0 77.0 0.42 0.39 0.26 0.29
AAC + DPC 64.2 71.2 65.3 58.9 85.3 72.0 62.0 80.6 82.1 71.6 63.0 77.7 0.42 0.43 0.25 0.31
PSSM 70.4 78.4 68.3 69.9 92.4 79.8 69.9 83.8 89.1 79.2 69.4 82.0 0.60 0.58 0.36 0.43
PSSM + properties 71.6 80.2 70.7 72.6 93.3 80.1 70.2 83.8 90.1 80.1 70.3 82.3 0.62 0.60 0.38 0.45
Independent dataset (108 proteins)
PSI-BLAST 41.2 63.6 54.5 21.4 93.4 79.7 65.3 92.6 85.2 73.1 62.0 83.3 0.39 0.44 0.19 0.16
AAC 58.8 70.5 51.5 57.1 92.3 67.2 62.7 83.0 87.0 68.5 59.3 79.6 0.51 0.37 0.13 0.32
DPC 47.1 77.3 60.6 57.1 91.2 71.9 61.3 74.5 84.3 74.1 61.1 72.2 0.39 0.48 0.20 0.23
AAC + DPC 47.1 79.5 57.6 57.1 91.2 76.6 65.3 75.5 84.3 77.8 63.0 73.1 0.39 0.55 0.21 0.24
PSSM 76.5 72.7 63.6 50.0 94.5 79.7 69.3 84.0 91.7 76.9 67.6 79.6 0.69 0.52 0.31 0.28
PSSM + properties 76.5 75.0 63.6 64.3 95.6 79.7 72.0 83.0 92.6 77.8 69.4 80.6 0.72 0.54 0.34 0.38
T1, electron transporters; T2, protein/mRNA transporters; T3, ion transporters; T4, other transporters; BLAST : simple sequence similarity search; AAC, amino acid composition;
DPC, dipeptide composition (residue pair preference).
Wu et al., 2009). The hydrophobic residues are accumulated
in the membrane spanning regions of transmembrane helical
proteins. Hence, the concept of conformational parameters and
physicochemical properties has been widely used for predicting
the membrane spanning segments of α-helical membrane proteins
(Gromiha, 1999; Hirokawa et al., 1998; Tusnady and Simon, 1998).
The membrane spanning regions of β-barrel membrane proteins
showed a periodicity of polar–non-polar residues and the residue
pair preference has been used to discriminate such class of proteins
(Gromiha et al., 2005). Further, these features have been used in
several classiﬁcations such as mesophilic and thermophilic proteins,
binding regions in protein complexes, etc. (Berezovsky et al., 2007;
Gromiha and Suresh, 2008; Zhang and Fang, 2006). The PSSM
proﬁles provide the evolutionary information, which characterize the
proteins with similar sequence and function (Jones, 1999; Ou et al.,
2008). These analyses showed the importance of the features used
in the present study for understanding the structures and functions
of proteins.
3.5 Discrimination of transporters based on four
different classes of transporters
We developed a variety of methods for annotating electron
transporters, protein/mRNA transporters, ion transporters and
others. The results obtained from the composition of amino acids,
residue pair preference, combinations of them, PSSM and the
combination of PSSM with the properties of amino acid residues
are presented in Table 2.
The results showed that PSSM with amino acid properties
was successful in discriminating transporters with an average 10fold
cross-validation accuracy of 90.1, 80.1, 70.3 and 82.3% for
electron transporters, protein/mRNA transporters, ion transporters
and others, respectively.
We have carried out receiver operator characteristic (ROC)
analysis and the results for four transporters are shown in Figure 3.
Our results showed the area under the curve (AUC) of 0.90,
(a) (b)
(c) (d)
Fig. 3. Comparison of ROC curve for different features. The results
obtained with amino acid composition (AAC), dipeptide composition (DPC),
combination of AAC and DPC (AAC + DPC), PSSM and combination
of PSSM and properties (PSSM + properties) are shown. (A) Electron
transporter; (B) protein/mRNAtransporter; (C) ion transporter; and (D) other
transporter.
0.86, 0.77 and 0.86, respectively, for electron, protein/mRNA, ion
and other transporters using PSSM and biochemical properties
(Supplementary Table S5).
We evaluated the performance of this method with an independent
dataset of 108 proteins and obtained an accuracy of 92.6, 77.8, 69.4
and 80.6% for electron transporters, protein/mRNAtransporters, ion
transporters and others. Our analysis showed that PSSM proﬁles
and the properties of amino acid residues improved the accuracy of
discrimination, compared with the composition of amino acids and
2065
atMasarykUniversityonSeptember16,2011bioinformatics.oxfordjournals.orgDownloadedfrom
[17:56 6/7/2011 Bioinformatics-btr340.tex] Page: 2066 2062–2067
S.-A.Chen et al.
Table 3. The comparison between different classiﬁers
Method: PSSM + properties Sensitivity (%) Speciﬁcity (%) Accuracy (%) MCC
T1 T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4
Cross-validation dataset (543 proteins)
Decision Trees (J48) 53.1 59.3 38.1 26.9 89.6 66.7 74.4 88.2 84.2 63.5 62.7 80.3 0.41 0.24 0.10 0.15
Naïve Bayes 66.7 73.9 69.0 70.3 84.3 64.5 53.5 76.4 83.0 68.3 60.0 75.8 0.40 0.38 0.20 0.34
KNN 63.0 76.1 65.3 45.2 90.5 70.4 66.0 89.6 86.4 72.7 65.7 83.6 0.50 0.46 0.29 0.33
Random Forest 69.1 73.9 65.9 57.5 90.0 76.0 72.6 77.9 86.9 73.3 70.5 75.1 0.54 0.46 0.36 0.27
Lib-SVM 70.4 77/9 70.1 71.2 91.8 76.6 70.2 83.6 88.6 77.2 70.2 82.0 0.58 0.54 0.38 0.44
QuickRBF 71.6 80.2 70.7 72.6 93.5 80.1 70.2 83.8 90.1 80.1 70.3 82.3 0.62 0.60 0.38 0.45
Independent dataset (108 proteins)
Decision Trees (J48) 30.1 54.9 36.4 28.7 94.5 74.2 73.1 89.5 86.1 67.2 63.4 82.0 0.31 0.30 0.09 0.19
Naïve Bayes 59.4 80.1 73.8 84.3 88.1 60.6 48.0 74.6 87.1 69.2 56.2 76.9 0.44 0.41 0.21 0.43
KNN 70.6 72.7 69.7 57.1 93.4 73.4 50.7 87.2 89.8 73.1 56.5 83.3 0.63 0.46 0.19 0.38
Random Forest 69.1 81.8 57.6 71.4 90.0 78.1 62.7 79.8 86.9 79.6 61.1 78.7 0.54 0.59 0.19 0.39
Lib-SVM 76.5 75.0 66.7 57.1 94.5 79.7 66.7 83.0 91.7 77.8 66.7 79.6 0.69 0.54 0.31 0.32
QuickRBF 76.5 75.0 63.6 64.3 95.6 79.7 72.0 83.0 92.6 77.8 69.4 80.6 0.72 0.54 0.34 0.38
T1, electron transporters; T2, protein/mRNA transporters; T3, ion transporters; T4, other transporters.
dipeptides. We achieved a correlation of 0.72, 0.54, 0.34 and 0.38
in the test set for the four classes, which was 10–49% improvement
over the results obtained with other features. The usage of PSSM
proﬁles and biochemical properties might be the reason for this
improvement.
3.6 Comparison with other methods
We have analyzed the capability of PSI-BLAST to discriminate
each of the four types of transporter targets, based on sequence
similarity searches. We have examined all transporters in the 543
training and 108 test sets of data and computed the sensitivity,
speciﬁcity, accuracy and MCC. This method showed accuracy in
the range of 64–88% for discriminating the four classes (electron,
protein/mRNA, ion and others) of transporters. Our proposed
method showed the accuracy of 69–92%, which is superior to
PSI-BLAST searches for discrimination. In addition, the simple
sequence similarity search method showed a lower sensitivity of
approximately 10–40% than our proposed method in the test set of
108 proteins. The detailed results of sequence similarity search are
also included in Table 2.
In addition, we have compared the performance of the present
method with other algorithms such as decision trees, k-nearest
neighbors, support vector machines, random forest, etc. and the
results are presented in Table 3. The ROC analysis has been done
with all classiﬁers and the results are presented in Supplementary
Figure S2 and Table S6. We noticed that our method performs better
than other methods in terms of sensitivity, speciﬁcity, accuracy,
MCC and AUC for discriminating all types of transporter targets.
3.7 Importance of the work in biological context and
application to new sequences
The annotation of proteins based on their structure and function
are important in structural and functional genomics. In our earlier
study, we have developed methods to discriminate transporters
from other proteins and classify them into three classes and six
families (Gromiha and Yabuki, 2008; Ou et al., 2010). It is
necessary and important to classify them based on transporting
targets as the efﬁciency, activity, transport and other functions
depends on targets. This can be evidenced with the database for
functionally important residues in membrane proteins and drug–
target interactions (Gromiha et al., 2009). Hence, the classiﬁcation
based on targets used in the present study is biologically relevant to
understand the functions.
We developed a protocol for predicting the target of a transporter.
In this procedure, we initially examined the query sequence whether
it is a transporter or not (Ou et al., 2010). For transporters, we
applied PSSM features and other properties to classify into four
types of transporter targets. This procedure yields four results (for
each transport type) and the transporter target is assigned according
to the greatest preference (Fig. 4). This is a sequence-based method,
which could be used to annotate genomic sequences.As an example,
we utilized this method to annotate the transporters and different
classes of transporter targets in Escherichia coli genome, with 4237
sequences. Our method detected 67, 225, 262 and 155 proteins as
electron transporters, protein/mRNA transporters, ion transporters
and others, respectively. Further investigations on these proteins are
on progress.
3.8 Discrimination on the web
We have developed a web server for discriminating membrane
transport proteins based on their targets, (i) electron, (ii)
protein/mRNA, (iii) ion and (iv) others. It takes the amino acid
sequence in FASTAformat as input and predicts the type of the target
for membrane transport protein. The server can be freely accessible
at http://rbf.bioinfo.tw/~sachen/ttrbf.html.
4 CONCLUSIONS
This study focused on methods for the prediction of targets in
transport proteins. We analyzed four major classes of transporters
with different transporter targets, such as electron, protein/mRNA,
ion and others, and revealed the important amino acid residues,
2066
atMasarykUniversityonSeptember16,2011bioinformatics.oxfordjournals.orgDownloadedfrom
[17:56 6/7/2011 Bioinformatics-btr340.tex] Page: 2067 2062–2067
Prediction of transporter targets
Fig. 4. The architecture for annotating transporters targets with three steps:
(i) PSSM proﬁles for speciﬁc features; (ii) RBF networks for each target;
and (iii) ﬁnal classiﬁcation.
residue pairs and amino acid properties. We developed a radial
basis network for transporter target annotation using amino acid
properties and PSSM proﬁles. Our method showed a 10-fold
cross-validation accuracy of 90.1, 80.1, 70.3 and 82.3% for
electron transporters, protein/mRNA transporters, ion transporters
and others, respectively. We evaluated the performance of the
method with an independent dataset of 108 proteins and we obtained
similar results. Based on the results, we have developed a protocol
for identifying transporters and predicting their transporting targets.
We suggest that our method would serve as an effective tool for the
functional annotation of membrane proteins.
ACKNOWLEDGEMENTS
We thank the reviewers for constructive comments.
Funding: Indian Institute of Technology Madras research
grant (BIO/10-11/540/NFSC/MICH to M.M.G.). National Science
Council (NSC) of Taiwan, NSC99-2221-E155-073 (to Y-Y. O.); and
NSC99-2320-B155-001 (to T-Y. L.).
Conﬂict of Interest: none declared.
REFERENCES
Ahmad,S. et al. (2004) Analysis and prediction of DNA-binding proteins and their
binding residues based on composition, sequence and structural information.
Bioinformatics, 20, 477–486.
Altschul,S. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res., 25, 3389–3402.
Berezovsky,I.N. et al. (2007) Positive and negative design in stability and thermal
adaptation of natural proteins. PLoS Comput. Biol., 3, 498–507.
Bhardwaj,N. and Lu,H. (2007) Residue-level prediction of DNA-binding sites and its
application on DNA-binding protein predictions. FEBS Lett., 581, 1058–1066.
Chen,Y.W. and Lin,C.J. (2006) Combining SVMs with various feature selection
strategies. In Guyon,I. et al. (eds) Feature Extraction: Foundations and
Applications, Springer, Heidelberg, pp. 315–324.
Gromiha,M.M. (1999) A simple method for predicting transmembrane alpha helices
with better accuracy. Protein Eng., 12, 557–561.
Gromiha,M.M. and Suresh,M.X. (2008) Discrimination of mesophilic and thermophilic
proteins using machine learning algorithms. Proteins, 70, 1274–1279.
Gromiha,M.M. and Yabuki,Y. (2008) Functional discrimination of membrane proteins
using machine learning techniques. BMC Bioinformatics, 9, 135.
Gromiha,M.M. et al. (2005) Application of residue distribution along the sequence for
discriminating outer membrane proteins. Comput. Biol. Chem., 29, 135–142.
Gromiha,M.M. et al. (2009) TMFunction: database for functional residues in membrane
proteins. Nucleic Acids Res., 37, D201–D204.
Hirokawa,T. et al. (1998) SOSUI: classiﬁcation and secondary structure prediction
system for membrane proteins. Bioinformatics, 14, 378–379.
Jeong,E. et al. (2003) Discovering the interaction propensities of amino acids and
nucleotides from protein-RNA complexes. Mol. Cells, 16, 161–167.
Jones,D.T. (1999) Protein secondary structure prediction based on position-speciﬁc
scoring matrices. J. Mol. Biol., 292, 195–202.
Jones,S. et al. (2001) Protein-RNA interactions: a structural analysis. Nucleic Acids
Res., 29, 943–954.
Kawashima,S. et al. (2008) AAindex: amino acid index database, progress report 2008.
Nucleic Acids Res., 36, D202.
Kumar,M. et al. (2008) Prediction of RNA binding sites in a protein using SVM and
PSSM proﬁle. Proteins, 71, 189–194.
Li,H. et al. (2008) A nearest neighbor approach for automated transporter prediction
and categorization from protein sequences. Bioinformatics, 24, 1129.
Li,H. et al. (2009) TransportTP: A two-phase classiﬁcation approach for membrane
transporter prediction and characterization. BMC Bioinformatics, 10, 418.
Ou,Y.-Y. (2005) QuickRBF: a package for efﬁcient radial basis function networks.
Software available at http://csie.org/~yien/quickrbf/.
Ou,Y.-Y. et al. (2005) A novel radial basis function network classiﬁer with centers set
by hierarchical clustering. Proc. IJCNN’05 , 3, 1383–1388.
Ou,Y.-Y. et al. (2008) TMBETADISC-RBF: discrimination of $\beta$-barrel membrane
proteins using RBF networks and PSSM proﬁles. Comput. Biol. Chem., 32,
227–231.
Ou,Y.-Y. et al. (2010) Classiﬁcation of transporters using efﬁcient radial basis function
networks with position-speciﬁc scoring matrices and biochemical properties.
Proteins, 78, 1789–1797.
Ren,Q. et al. (2007) TransportDB: a comprehensive database resource for cytoplasmic
membrane transport systems and outer membrane channels. Nucleic Acids Res., 35,
D274–D279.
Saier,M.H. Jr (2000)Afunctional-phylogenetic classiﬁcation system for transmembrane
solute transporters. Microbiol. Mol. Biol. Rev., 64, 354–411.
Saier,M.H. Jr et al. (2006) TCDB: the Transporter Classiﬁcation Database for membrane
transport protein analyses and information. Nucleic Acids Res., 34, D181–D186.
Su,C.-T. et al. (2006) Protein disorder prediction by condensed PSSM considering
propensity for order or disorder. BMC Bioinformatics, 7, 319.
Terribilini,M. et al. (2006) Prediction of RNA binding sites in proteins from amino acid
sequence. RNA, 12, 1450–1462.
The UniProt Consortium (2010) The universal protein resource (UniProt) in 2010.
Nucleic Acids Res., 38, D142–D148.
Tusnady,G.E. and Simon,I. (1998) Principles governing amino acid composition of
integral membrane proteins: application to topology prediction. J. Mol. Biol., 283,
489–506.
Wu, J.S. et al. (2009) Prediction of DNA-binding residues in proteins from amino acid
sequences using a random forest model with a hybrid feature. Bioinformatics, 25,
30–35.
Xie,D. et al. (2005) LOCSVMPSI: a web server for subcellular localization of
eukaryotic proteins using SVM and proﬁle of PSI-BLAST. Nucleic Acids Res.,
33, W105–W110.
Yang,Z.R. and Thomson,R. (2005) Bio-basis function neural network for prediction of
protease cleavage sites in proteins. IEEE Trans. Neural Netw., 16, 263–274.
Zhang,G.Y. and Fang,B.S. (2006) Application of amino acid distribution along
the sequence for discriminating mesophilic and thermophilic proteins. Process
Biochem., 41, 1792–1798.
Zhang,G.Z. and Huang,D.S. (2004) Prediction of inter-residue contacts map based on
genetic algorithm optimized radial basis function neural network and binary input
encoding scheme. J. Comput. Aided Mol. Des., 18, 797–810.
2067
atMasarykUniversityonSeptember16,2011bioinformatics.oxfordjournals.orgDownloadedfrom