https://doi.org/10.1140/epjb/e2005-00333-x
Size dependent complexity of sequences in protein families
1
National laboratory of Solid State
Microstructure, Institute of
Biophysics,
and Department of Physics, Nanjing University, 210093, P.R. China
2
Interdisciplinary Center of Theoretical Studies, Chinese
Academy of Sciences, Beijing 100080, P.R. China
Corresponding author: a wangwei@nju.edu.cn
Received:
11
January
2005
Revised:
6
June
2005
Published online:
28
October
2005
The size dependent complexity of protein sequences in various
families in the FSSP database is characterized by sequence
entropy, sequence similarity and sequence identity. As the average
length Lf of sequences in the family increases, an increasing
trend of the sequence entropy and a decreasing trend of the
sequence similarity and sequence identity are found. As Lf
increases beyond 250, a saturation of the sequence entropy, the
sequence similarity and the sequence identity is observed. Such a
saturated behavior of complexity is attributed to the saturation
of the probability Pg of global (long-range) interactions in
protein structures when . It is also found that the
alphabet size of residue types describing the sequence diversity
depends on the value of Lf, and becomes saturated at 12.
PACS: 87.10+e – General theory and mathematical aspects / 87.15.Cc – Folding and sequence analysis
© EDP Sciences, Società Italiana di Fisica, Springer-Verlag, 2005