https://doi.org/10.1140/epjb/e2014-40805-2
Regular Article
Rank-frequency relation for Chinese characters
1 Laboratoire de Physique Statistique
et Systèmes Complexes, ISMANS, LUNAM Université, 44 av. Bartholdi, 72000
Le Mans,
France
2 Complexity Science Center and
Institute of Particle Physics, Hua-Zhong Normal University,
Wuhan
430079, P.R.
China
3 IMMM, UMR CNRS 6283, Université du
Maine, 72085
Le Mans,
France
4 Yerevan Physics Institute,
Alikhanian Brothers Street
2, 375036
Yerevan,
Armenia
5 Department of Chinese Literature,
University of Heilongjiang, Harbin
150080, P.R.
China
a
e-mail: wdeng@ismans.fr
b
e-mail: armen.allahverdyan@gmail.com
Received:
1
September
2013
Received in final form:
5
January
2014
Published online:
26
February
2014
We show that the Zipf’s law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf’s law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent characters (second layer). For these two layers we provide different (though related) theoretical descriptions that include the range of low-frequency characters (hapax legomena). We suggest that this hierarchic structure of the rank-frequency relation connects to semantic features of Chinese characters (number of different meanings and homographies). The comparative analysis of rank-frequency relations for Chinese characters versus English words illustrates the extent to which the characters play for Chinese writers the same role as the words for those writing within alphabetical systems.
Key words: Statistical and Nonlinear Physics
© EDP Sciences, Società Italiana di Fisica and Springer-Verlag, 2014