Encoding Renditional Information in Primary Source Texts

Julia Flanders
Paul Caton
Brown University
Women Writers Project
Box 1841
Brown University
Providence, Rhode Island 02912
Julia_Flanders@brown.edu
paul@swansong.stg.brown.edu

Introduction

Renditional information--by which we mean broadly any facts about the appearance, ornamentation, or layout of text on a page--occupies an uncertain position in current theories of humanities text encoding. At the heart of the problem lie longstanding philosophical questions about the nature of 'text'. Does a particular physical instantiation of a work combine an an essential component with non-essential features that can vary without altering the essence? On the surface it would seem that markup schemes where the defined tag set is weighted towards capturing structure rather than appearance (such as the TEI and most other SGML-based schemes) lend support to a practicable distinction between the 'essential text' and its renditional 'packaging' of any one instance (see Renear, Renear et al., DeRose et al.). On the other hand it might be argued that much of what we commonly consider renditional information--line-spacing, indentation, font family, use of italics, small caps, etc.--serves to impose structure and draw attention to particular types of content. It would follow that to capture the structure of a document together with identifiable content objects like quotes, foreign words, technical terms, and so on, makes it unnecessary to capture the renditional details per se.

Many scholarly encoding projects, however, capture primary source data for quasi-archival purposes, and their transcription needs to supply information to people with a variety of critical interests and a corresponding variety of opinions as to what constitutes significant textual information. These projects have to confront the problem of dealing with visual information on its own terms (as opposed to treating it as a cue for structure and content). A clear methodological framework is essential even for a project with ambitions to capture all possible renditional information, let alone one with more modest and realistic goals; without such a framework it is impossible to determine what and how to record. This paper focuses on the problem of defining a rationale for the capture of renditional information: on what grounds do we decide what kinds of rendition to record? and if we want to use meaning as a way of deciding that rendition is important enough to record, what is the horizon of meaningfulness?

Methodological frameworks

The possible criteria by which renditional features will be deemed worthy or unworthy of capture emerge from a variety of different sources: some of them pragmatic, some deriving from aesthetic theory or literary criticism. Some of the most significant are listed below, and will be discussed in more detail in the finished paper. These do not represent mutually exclusive categories, but rather overlapping conceptual axes which may interact in various ways: for instance, the criterion of meaningfulness requires one to specify a user population for whom meaning is being defined (linguistic meaning? literary meaning? cultural meaning?).

The criteria we will consider are as follows:

A phenomenon like wrong-font letters might fare very differently depending on the criteria chosen: on the grounds of meaningfulness or substantiveness we might not record it, but on grounds of measurability and serving a certain user population (say, analytical bibliographers) we might well include it, and on the issue of intentionality the inclusion or omission would carry a strong theoretical message.

Creating a taxonomy of rendition

Developing a taxonomy of renditional characteristics, however simple, must be done by individual projects based on their own location within the approaches described above. If renditional information is to be used for any kind of processing, retrieval, or comparison, it must be described systematically using terms which identify the significant boundaries between phenomena (for instance, alignment and justification). It is also important (as with any kind of data capture) to decompose complex phenomena into their basic significant parts so that each may be described distinctly.

References

DeRose et al., "What is Text, Really?" in Journal of Computing in Higher Education 1:2, Winter 1990.

Renear, Allen. "Out of Praxis: Three (Meta)Theories of Textuality", in
Electronic Text: Investigations in Method and Theory
, ed. Kathryn Sutherland, Oxford, 1997.

Renear et al., "Refining Our Notion of What Text Really Is: The Problem of Overlapping Hierarchies" in Research in Humanities Computing, ed. N. Ide, Oxford, 1995.