## LEXSTATS: A program for the statistical analysis of word frequency distributions

**Harald Baayen**

University of Nijmegan

Max Planck Institute for Psycholinguistics

Wundtlaan 1

PB 310, 6500 AH NIJMEGEN

The Netherlands

baayen@mpi.nl
**Fiona J. Tweedie**

Department of Statistics

Mathematics Building

University Gardens

GLASGOW, Scotland G12 8QW

fiona@stats.gla.ac.uk

Various computationally intensive statistical models are available for the
analysis of word frequency distributions (e.g., Carroll, 1967; Sichel 1975, and
Chitashvili and Baayen, 1993). These models provide linguists and
lexicographers with elegant means for obtaining sample-size invariant
characteristic textual measures, for extrapolating the development of the
vocabulary beyond sample sizes larger than the observed text size, and for
estimating the population vocabulary size.

Thusfar, these models have not been used widely, which is not surprising given
the absence of software implementing these models. At the conference, we will
present the beta version of LEXSTATS, a user-friendly GUI interface to a series
of C programs that implement a wide range of word frequency analyses. LEXSTATS
and the underlying C code will become available as freeware under the GNU
software license.

We will illustrate LEXSTATS by applying it to word frequency distributions of
various kinds of texts as well as to word frequency distributions of a range of
morphological categories.

### References

- Carroll, J. B.: 1967, "On Sampling from a Lognormal Model of Word Frequency
Distribution," In: H. Kucera and W. N. Francis (Eds.),
*Computational Analysis of
Present-Day American English*, Brown University Press, Providence, pp. 406-424.

- Chitashvili, R. J. and Baayen, R. H.: 1993, "Word Frequency Distributions,"
In: G. Altmann and L. Hrebicek, L. (Eds.),
*Quantitative Text Analysis*,
Wissenschaftlicher Verlag Trier, Trier, pp. 54-135.

- Sichel, H. S.: 1975, "On a Distibution Law for Word Frequencies,"
*Journal of the
American Statistical Association*, 70, 542-547.