**7**, 293-308 (1999)

https://doi.org/10.1007/s100510050616

## Extensive simulations for longest common subsequences

### Finite size scaling, a cavity solution, and configuration space properties

Forschungszentrum BiBos, Fakultät für Physik,
Universität Bielefeld, 33615 Bielefeld, Germany

Received:
23
April
1998

Revised:
30
July
1998

Accepted:
14
August
1998

Published online: 15 January 1999

Given two strings *X* and *Y* of *N* and *M* characters respectively,
the Longest Common Subsequence (LCS) Problem asks for the longest sequence
of (non-contiguous) matches between *X* and *Y*.
Using extensive Monte-Carlo simulations for this problem,
we find a finite size scaling law of the form
for the average LCS length
of two random strings of size *N* over *S* letters. We provide precise
estimates of for .
We consider also a related Bernoulli Matching model where the different entries
of an array are occupied with a match *independently* with
probability *1/S*. On the basis of a cavity-like analysis we find that the
length of a longest sequence of matches in that case behaves as
where *r=M/N* and
.
This formula agrees very well with our numerical computations. It provides
a very good approximation for the Random String model, the approximation
getting more accurate as *S* increases. The question of the "universality class" of the LCS problem is also considered. Our results for the Bernoulli
Matching model show very good agreement with the scaling predictions of [CITE] for Needleman-Wunsch sequence alignment.
We find however that the variance of the LCS length has a scaling different
from Var in the Random String model, suggesting
that long-ranged correlations among the matches are relevant in
this model.
We finally study the "ground state" properties of this problem. We find
that the number of solutions typically grows
exponentially with *N*. In other words, this system does not
satisfy "Nernst's principle" . This is also reflected at the level of
the overlap between two LCSs chosen at random, which is found to be
self averaging and to approach a definite value as .

PACS: 75.10.Nr – Spin-glass and other random models / 02.60.Pn – Numerical optimization

*© EDP Sciences, Società Italiana di Fisica, Springer-Verlag, 1999*