https://doi.org/10.1140/epjb/s10051-024-00781-6
Regular Article - Statistical and Nonlinear Physics
Statistical analysis of proteins families: a network and random matrix approach
1
Department of Physics and Astrophysics, University of Delhi, 110007, New Delhi, Delhi, India
2
Centre for Theoretical Physics and Natural Philosophy, Mahidol University, Nakhonsawan Campus, 60130, Phayuha Khiri, Thailand
b
pradeep.bha@mahidol.ac.th
c
ndeo007@gmail.com
Received:
20
June
2024
Accepted:
29
August
2024
Published online:
7
October
2024
We present a novel method for analyzing the structural organization of protein families by integrating random matrix theory (RMT) and network theory with the physiochemical properties of amino acids and multiple sequence alignment. RMT distinguishes significant interactions between amino acids from background noise, pinpointing coevolving positions likely crucial for protein structure and function. This property-based approach captures both short and long-range correlations, unlike previous methods that treat amino acids as mere characters. The eigenvector components of eigenvalues outside the RMT bound deviate from typical RMT observations, offering critical system information. We quantify the information content of each eigenvector using an entropic estimate, showing that the smallest eigenvectors are highly localized and informative. These eigenvectors form clusters of biologically and structurally significant positions, validated by experiments. By creating networks of amino acid interactions for each property, we uncover key motifs and interactions. This method enhances our understanding of protein evolution, interactions, and potential targets to modulate enzymatic actions. We study two protein families Cadherin-4 and Betalactamase families which display two extreme characteristics one nearly random and the other very structured or organised.
Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
© The Author(s), under exclusive licence to EDP Sciences, SIF and Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.