https://doi.org/10.1140/epjb/e2008-00225-7
Generation of hierarchically correlated multivariate symbolic sequences
With an application to the assessment of bootstrap confidence in phylogenetic analysis
1
Dipartimento di Fisica e Tecnologie Relative, Università di Palermo, Viale delle Scienze, 90128 Palermo, Italy
2
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
Corresponding author: a m.tumminello@unipa.it
Received:
11
February
2008
Revised:
28
April
2008
Published online:
18
June
2008
We introduce a method to generate multivariate series of symbols from a finite alphabet with a given hierarchical structure of similarities based on the Hamming distance. The target hierarchical structure of similarities is arbitrary, for instance the one obtained by some hierarchical clustering method applied to an empirical matrix of similarities. The method that we present here is based on a generating mechanism that does not make use of mutation rate, which is widely used in phylogenetic analysis. Here we use the proposed simulation method to investigate the relationship between the bootstrap value associated with a node of a phylogeny and the probability of finding that node in the true phylogeny. The results of this analysis are compared with those obtained in the literature according to an evolutionary model with a per-symbol constant mutation rate. We observe that the relationship between the bootstrap value of a node and the probability of the corresponding clade being correct is sensitive to both the length of data series and the length of the branch connecting the node to its closest ancestor in the phylogenetic tree, whereas such a relationship is only slightly affected by the topology of the true phylogeny and by the absolute value of similarity.
PACS: 89.75.-k – Complex systems / 02.50.Sk – Multivariate analysis / 02.10.Ox – Combinatorics; graph theory
© EDP Sciences, Società Italiana di Fisica, Springer-Verlag, 2008