Some words, such as "Phrenology" or "Stylometry", insinuate their own assumptions. In fact, nobody has ever proved that minds can be measured by bumps or style by numbers.
Sams [1994] p. 469
In our view the protagonists of stylistic analysis in forensic applications have not only failed to demonstrate such a link [between style and authorship] but have not even attempted to do so.
Totty et al. [1987] p. 18
The hypothesis behind non-traditional authorship attribution studies -- those using the computer, statistics, and stylistics -- is that every author has a verifiably unique style. This paper points out and discusses the fact that this hypothesis has never been empirically tested, let alone proven. The lack of a proven theory after more than thirty years and well over 600 studies is one of the main reasons that non-traditional authorship studies are not accepted --in the main-- by either the literary or the scientific community.
This paper then goes on to discuss some other assumptions behind the main one and finishes by outlining an empirical study to help move the hypothesis to proof. The movement of this hypothesis through theory to proof is needed to give validity to all authorship attribution studies.
...try to balance in your own mind the question whether the latter [text] does not deal in longer words than the former [text]. It has always run in my head that a little expenditure of money would settle questions of authorship this way.... Some of these days spurious writings will be detected by this test. Mind, I told you so.
de Morgan [1851] p. 215-216
May there not be "fingerprints" in writing, of which the author, and most of his critics, are quite unconscious, but which could be discovered by some new approach, to the benefit of the search for truth?
Williams [1970] p. 2
This section outlines the history of the hypothesis that every author has a verifiably unique style. Some of the reasons why the hypothesis was never tested are listed with a short discussion (e.g.):
Wordprinting is still in its infancy and cannot yet boast an explanatory theory or even an agreed-upon name. Nor do its practitioners agree on an optimal statistical model. This degree of openness...has not prevented the convincing success of a number of important studies, which in turn gives added intuitive plausibility to its basic assumptions.
Reynolds [1995] p. 157
This section lists and discusses some of the sub-assumptions of the main hypothesis:
That style is quantifiable is now a given -- a fact already established. This quantifiability is what sets the working definition of style for not only this paper, but for most non-traditional authorship attribution studies. A short explanation with examples of empirical studies that prove this point is provided.
The problems with this assumption are listed and discussed. Key studies on style change over time are explicated.
The problems with this assumption are listed and discussed. Key studies on style change over genre are explicated.
These assumptions differ as to the attainable degree of certainty in any findings. This section goes on to discuss what has been reported in the literature about the degree of certainty and what can and should be expected.
The general problems of non-traditional authorship attribution as reported by Rudman (Rudman, 1998) are discussed only in so far as they have first level bearing on each sub-assumption (e.g.):
Is the number of style markers infinite? Is style an open ended system? (This is a follow-up on a discussion at the Kingstown conference.)
Do each of these statistical tests need their own theoretical underpinnings? Michael Farringdon's discussion of the criticism that, "QSUM has no theoretical basis," is explicated.
There are two strategies to making progress toward finding the correct underlying theory, (1) the so-called "top-down" approach where one postulates a complete theory of everything... (2) the empirically based "bottom up" approach where one uses experimental data to make smaller, incremental steps.
Rothstein [1998] p. 4
This section discusses the "top-down" and "bottom-up" experimental strategies for moving the hypothesis to a correct theory and thence to studies that can prove or disprove the theory. I have not found a "top-down" approach in the literature -- and, understandably so, if for no other reason than logistics.
One experimental approach to test the hypothesis, a hybrid of the "top-down" and "bottom-up" is given here and discussed:
These constraints eliminate the need to show that a writer's style changes over time, over genre, or language.
The question, "How can we be sure that (n2) is truly representative," is discussed.
The question, "How do we know (n3) is large enough," is discussed.
The statement that, "This should be done using as many style markers as possible," is explicated. A short discussion of the statistics behind the adjudication of each style marker is presented.
The determination of each variable "n*" is discussed.
This type of study should be done for every non-traditional authorship attribution study as part of the control. It is important to realize that if this type of control is carried out for every authorship study and if it is consistently shown that every author has a unique style, q.e.d., the hypothesis, is proven!
A survey and critique of some important "bottom-up" studies is presented. The importance of attacking both strategies simultaneously is discussed.