University of Virginia Presentation
When we say that something is trustworthy we generally mean that it deserves, or is entitled to, our trust or confidence. When we say that a document is trustworthy, we mean that it is both an accurate expression of the facts to which it attests and a genuine manifestation of those facts. (I use the word fact in a broad sense to mean a thing done; an action performed or an incident transpiring; the thing done may be physical, mental, or creative).
Documentary trustworthiness thus has two qualitative dimensions: reliability and authenticity. Reliability means that the document is capable of standing for the facts to which it attests, while authenticity means that the document is what it claims to be, and has not been corrupted or tampered with in essential respects.
The archival notion of document trustworthiness borrows from a number of traditions, the most influential of which are the rationalist tradition of legal evidence scholarship, specifically the rules governing the admissibility of documents, the modernist tradition of historical criticism, specifically the procedures governing the treatment of historical sources and the diplomatic tradition of documentary criticism. In all these traditions, the concepts of reliability and authenticity are posited on a direct connection between the word and the world and are rooted, both literally and metaphorically, in observational principles.
Reliability refers to the truth-value of a document as a statement of facts and it is assessed in relation to the proximity of the observer and recorder to the facts recorded (we sometimes speak of this as the ground-zero theory of evidence). Determinations of a document's degree of reliability are based on a range of criteria and will depend on the type of document being considered. Documents that are the product of personal, literary, or other artistic activity, for example, are considered a form of testimony and so we ask questions about the author's intentionality and motivation. Documents that are the product of bureaucratic activity on the other hand are considered a form of social book keeping so it is not so much the author's intentionality as it is the nature and extent of bureaucratic controls exercised over that author that determines the document's degree of reliability.
Authenticity refers to the truth-value of a document as a physical manifestation of the facts it records and is assessed in relation to the document's original instantiation. Whereas reliability is connected to the process of documentary creation, authenticity is connected to its transmission and involves verifying or establishing the document's authorship, its place and date of origin, its status as an original or copy, and the history of its transmission, maintenance, and custody over time and space. Determining the authenticity of a document is intimately connected to the question of what constitutes the essence of a document and the status of copies relative to an original.
For the remainder of my talk I want to focus on one dimension of trustworthiness, namely, authenticity, and on a particular species of document, namely, electronic records.
I define a document (in accordance with the ISO standard) as "recorded information or [an] object which can be treated as a physical or logical unit". I define record as a document made or received by a physical or juridical person in the course of activity and set aside either for action or reference. An electronic record is a record made or received and set aside in electronic form. Since my research up till now has focused on bureaucratic records (the products of social bookkeeping rather than of testimony) that is what I will be discussing today. While the characteristics of electronic records are different in many respects from the characteristics of electronic texts, in other respects they are analogous and the methods of analysing records and texts are built on the same tradition of philological criticism; therefore I think there are useful points of convergence between them.
Assessing a record's authenticity is predicated on its endurance and stability over time. Assessments of authenticity in the digital world are complicated by the fact that, in such world, there are no stable and enduring physical objects. While a paper record is a self-contained bounded object, its digital counterpart has, what David Levy calls, "a divided existence" because it comprises both a digital representation (the 0s and 1s) and the perceptible forms produced from it. This is why, "strictly speaking, it is not possible to preserve an electronic record. It is only possible to preserve the ability to reproduce an electronic record. It is always necessary to retrieve from storage the binary digits that make up the record (the digital representation) and process them through some software for delivery or presentation (the perceptible form produced from the digital representation)."
The authenticity of electronic records is threatened whenever they are transmitted across space (that is, when sent to an addressee or between systems or applications) or time (that is, either when they are in storage, or when the hardware or software used to store, process, or communicate them is updated or replaced). Given that the acts of setting aside an electronic record and retrieving it inevitably entail moving it across significant technological boundaries (from display to storage subsystems and vice versa), virtually all electronic records are considered reproductions as opposed to originals: they have undergone some change and therefore cannot be said to exist as first created.
Since exact replication of electronic records (or any digital object for that matter) is unfeasible and that loss and change are inevitable and unavoidable, on what grounds do we base our trust in their authenticity over the long term? The need to establish such grounds has been the driving force behind a number of recent research initiatives, including the InterPARES I project, which wrapped up earlier this year. The goal of InterPARES I was to formulate principles and methods for ensuring the long-term preservation of authentic electronic records.
The Authenticity Task Force of InterPARES was given the specific charge of identifying conceptual requirements for assessing and maintaining the authenticity of electronic records. The theoretical perspective that shaped the Task Force's approach to the problem of identifying those requirements was diplomatics. Diplomatics was born in the seventeenth century as an analytical technique for determining the authenticity of records issued by sovereign authorities in previous centuries. Its tenets and methods were laid out in Jean Mabillon's six part treatise, De Re Diplomatica. In Part I, Mabillon defined the different kinds of documents and examined the main materials generally used in making them as well as the ink and kinds of writing. In Part II, he examined the language of the documents, their characteristic parts, the seals, and the systems of chronology used in dating them. On the basis of this examination, "Mabillon stated what, for a particular time and place, was the correct form for a genuine document, and presented ... the general principles of diplomatics." The remaining four parts of the treatise were devoted to proofs and illustrations of these principles and the manner in which they were to be applied. By comparing documents created in different periods and issued by different chanceries, and discovering the attributes they shared, as well as those they did not share, Mabillon was able to articulate the necessary and sufficient elements of documents and identify the purpose each fulfilled in the document as a whole.
Since the early 1990s a number of archival academics have been exploring the possibility of extending diplomatics beyond its traditional focus on medieval documents to contemporary records. The contemporary diplomatic analysis of the components of a record is a process of abstraction and systematisation, the aim of which is to identify the essential or "ideal" attributes of a record and make them transportable to different contexts. By decontextualising and generalising the attributes of an "ideal" record, the original diplomatists were able to recognise and evaluate records created over several centuries and across different, and sometimes bewildering juridical systems. In the same way, it might also provide a means of recognising and identifying electronic records generated within many different and equally bewildering hardware and software environments. The exploration that has taken place over the last 10 years or so has resulted in the articulation of a hybrid set of principles and methods called contemporary archival diplomatics, which is an adaptation of traditional diplomatic concepts and methods to contemporary record-keeping environments, and an integration of those concepts and methods with those of archival science.
Viewed from the perspective of contemporary archival diplomatics, a record is a complex of elements and their relationships.
It possesses a number of identifiable characteristics, among them a fixed documentary form, a stable content, an intrinsic bond with other records, and an identifiable context. It participates in or supports an action, and at least three persons (author, writer, and addressee) are involved in its creation (although these three conceptual persons may in fact be only one physical or juridical person).
The working hypothesis of the Authenticity Task Force was that, while they may manifest themselves differently, the elements typically found in traditional records will continue to be found in electronic records. To test that hypothesis, the Task Force developed a Template for Analysis, which is essentially, a model of an ideal electronic record based on all its possible known elements. The template decomposes an electronic record into its constituent elements, defines each element, explains its purpose, and indicates whether, and to what extent, that element is instrumental in assessing the record's authenticity. The template provided the basis for diplomatic analyses of a wide range of live electronic systems, which were carried out through four rounds of case studies.
To assess a record's authenticity we need to establish its identity and demonstrate its integrity. The identity of a record refers to the attributes that uniquely characterise it and distinguish it from other records (the name of the author, its date, place of origin); while the integrity of a record refers to its wholeness and soundness: a record has integrity when it is complete and uncorrupted in all its essential respects. So in analysing the records in live systems we were looking specifically for the presence or absence of elements associated with identity and integrity and asking questions about the way in which these elements manifested themselves: are they embedded in the record, or are they linked to it? If they are linked to the record, how determined and enforced is that link? If the elements are not explicitly found in the record, are they implicit in any of the records' contexts? What specific elements does the records creator consider essential for establishing and verifying the record's authenticity and what kinds of procedural controls support a presumption of authenticity? Our expectation was that this analysis would enable us to identify general requirements for authenticity. It would also, we hoped, provide the foundation for the development of a typology of electronic records based on authenticity requirements for specific types of records.
The elements of an electronic record included in the Template fall into 3 main categories: documentary form, annotations, and context.
Documentary form is defined as the rules of representation according to which the content of a record, its immediate administrative and documentary context, and its authority are communicated. It possesses both extrinsic and intrinsic elements. Intrinsic elements refer to a record's internal composition or articulation. These are discursive elements within the record that communicate the action in which it participates and its immediate context. [the linguistic code] Intrinsic elements fall into three groups:
Extrinsic elements refer to specific features of the record's external appearance that are instrumental in communicating and achieving the purpose for which the record is created [the bibliographic code]. For traditional diplomatists examining medieval acts, extrinsic elements, which could only be examined on the original document, constituted the first and most obvious proof of authenticity. Such elements included the layout, paragraphing, colour of ink, type and size of letters and so on, as well as the seals moulded into or appended to the record.
For electronic records, presentation features, electronic signatures, electronic seals, digital time stamps, and other special signs are treated as extrinsic elements.
The intrinsic elements of form that convey aspects of the record's juridical and administrative context include the name of the author, the name of the originator, the chronological date, the name of place of origin of the record, the name(s) of the addressee(s) and other recipients of the record.
The elements that communicate the action itself include the indication and description of the action or matter; in other words, the content of the record. Since a stable content is considered one of the identifying characteristics of a record, we were interested in knowing at what point in time the content is considered complete, stable, and unchangeable. If there is no such point in time, the question then becomes in what specific ways can the content be changed: by addition of new content, by deletion or substitution of existing content? If the content can be changed, who has the authority to make that change and how and to what extent are such changes tracked by the system?
The visible means by which the content of an electronic record is communicated is governed by presentation features, which are included among the extrinsic elements of form. These are the sets of perceivable features, generated by means of encoding and program instructions, which are capable of presenting a message to our senses. Such features include the overall configuration or representation of the content, e.g., text, graphic, image, moving images, sound, or some combination thereof. They also include particular aspects of the record's formal presentation that are necessary for it to achieve the purpose for which it was created, e.g., standardised spacing and fonts, deliberately employed colours, special layouts (e.g., spreadsheets), hyperlinks, sample rates of sound files, resolution of image files, scales of maps.
The intrinsic elements that convey the record's documentary context and its means of validation include the name of the writer, the attestation, and corroboration. The attestation is the commonest means of validation and it usually takes the form of a signature of one or more of the persons involved in issuing the record.
The extrinsic elements of form that are closely associated with the attestation function in an electronic record-keeping environment are electronic signatures and electronic seals. An electronic signature is a digital mark having the function of a signature in, attached to, or logically associated with a record, and that is used by a signatory to indicate her approval of the content of that record.
Digital signatures (those based on encryption technologies and a Public Key Infrastructure) are considered an example of electronic seals. This is because digital signatures are functionally analogous to medieval seals in general and the sovereign's seal in particular. Medieval seals guaranteed the integrity of texts, provided proof of ownership, and affirmed that the text represented the sealer's will. The digital signature – which allows the recipient to verify the origin of the record and check that it has not been altered during its transmission – performs similar functions of identification, attestation, and non-repudiation.
Other extrinsic elements of form associated with attestation and identification are digital time stamps issued by a trusted third party and special signs. Special signs are symbols that identify one or more of the persons involved in the compilation, execution, or receipt of the record and which are distinct from a signature or seal. In medieval documents, such signs typically included the chrismon, the signum manus, or the monogram. Special signs that may be found in or on electronic records include agency crests personal logos). Digital watermarks used to protect intellectual property are another type of special sign related to identification and attestation.
Annotations, i.e., additions made to a record after it has been created, constitute the next category of elements. In medieval documents, annotations typically took the form of chancery or notarial notes, which were added on the bottom of the document or on its verso. In contemporary environments, the annotations that either appear on the face of a record, or are linked inextricably to it, assume a wide variety of forms.
Annotations fall into three basic groups. The first group includes additions made to the record after its creation as part of the execution phase of an administrative procedure. Traditionally, this sort of annotation has been used for the authentication and registration of records whose form is required by law. For example, the registration number added to a land deed by the land registry office, or the statement of the authenticity of the signatures in a will. For specific types of electronic records, namely, electronic mail records, the date, time, and place of transmission, and the indication of attachments also belong to this group. Digital signatures, which function as attestations, are considered to belong to this group of annotations as well.
The second group consists of additions made to the record in the course of handling the matter in which the record participates. Examples of this type of annotation include, but are not limited to, the identification of the name of the office handling the matter, comments noted on the face of the record, or embedded in it, and dates of transmission to other offices.
The third group consists of additions made to the record in the course of handling it for records management purposes. Such additions typically include the classification code or file number assigned to the record, its draft and/or version number, cross-references to other records, and so on.
The last category of elements is context. The identified elements of context correspond to a hierarchy of frameworks ranging from the general to the specific.
They include the record's juridical-administrative context, its provenancial context, its procedural context, its documentary context, and its technological context, which includes hardware, software, data, system models, and system administration.
While the record itself may contain indications of one or more of these contexts, the greater part of our understanding derives from an examination of sources outside the record. Indicators of these contexts include laws and regulations that control how the creator conducts its business and manages its records, organisational charts, annual reports, and so on that identify the creator's structure, mandate, and functions; workflow rules, codes of administrative procedure, classification schemes, and so on that explain the business procedure in the course of which the record is created, maintained, and used, record inventories, indexes, registers, and so on that situate the record within the broader documentary aggregation to which it belongs. Specific indicators of the record's technological context include workflow models, data models, and so on that explain the technological environment surrounding the record.
An examination of these contexts is important to understand why and how records were created. That understanding in turn provides a foundation on which to identify more precisely the kinds of documentation and information that are essential to support the attestation of a record's authenticity over time and which, therefore, must be preserved and transferred along with the records when they become inactive and are transferred to the record preserver.
We had hypothesised at the outset of the research that intrinsic and extrinsic elements of documentary form and the annotations would play key roles in establishing the identity and demonstrating the integrity of electronic records. This hypothesis failed to be supported, however by the case studies. In the case studies analysed, it was often difficult to determine the relevance of specific elements of documentary form or annotations to a consideration of a record's authenticity. This is because the determination of documentary forms in general and the establishment of required elements of form and annotations in particular are deeply embedded within specific institutional and procedural contexts and are resistant to any easy generalisations. This made it impossible for us to develop a typology that would provide a meaningful differentiation and specification of requirements for authenticity according to types of records. (We did succeed in developing general requirements for assessing and maintaining the authenticity of electronic records and I will talk about these briefly at the end).
We found that assessments of the integrity of a record especially cannot be made in any absolute sense but only in relation to the purpose the record served in the environment in which it was originally created, maintained, and used. When we refer to an electronic record, we consider it essentially complete and uncorrupted if the message that it is meant to communicate in order to achieve its purpose is unaltered. This implies that its physical integrity, such as the proper number of bit strings, may be compromised, provided that the articulation of the content and any required elements of form and annotations (meaning required by the creator) remain the same. For example, for an electronic mail message, an authentic copy of a complete message may include only the text. Provided it clearly indicated the author, addressee, receivers, and date as well as the content, it would not need to appear in the same way in which it was originally seen by the author or addressee. In contrast, an authentic copy of a map would have to retain its original presentation features, including colour and geographic feature presentation. Provided these requirements were met, an authentic copy could be produced in GIF, JPEG, or GML format.
In other words, the criterion for preservation in any given case is the necessity of a particular element to enable to a record to achieve its intended consequences. This implies a criterion of adequacy rather than completeness, to borrow a phrase used by John Lavagnino in the context of text encoding: one that emphasises preserving the authority of the record as evidence rather than its purity as a text. Further research into the nature of a record in an electronic environment is needed before we can develop a more nuanced set of criteria.
Our failure to develop a typology of electronic records suggests some of the limits of contemporary diplomatics as an analytical tool. Although we attempted to adapt it to contemporary record keeping realities, diplomatics remains rooted in a very traditional conception of what a record is and is therefore limited in its capacity to extend the range of understanding about the nature of different kinds of electronic systems and the variety of entities contained within them. While it is quite effective in analysing electronic systems containing digital objects that behave like traditional records, i.e., in systems in which the digital objects are fixed and circumscribable, it is considerably less helpful in analysing electronic systems containing digital objects that behave differently, i.e., systems in which the digital entities are fluid and less easy to circumscribe.
So what we bumped up against are the limits of the known as an aid to understanding the unknown. For that reason, future archival research will do well to focus less attention on establishing whether the record is complete, stable, and unchangeable, and more attention on determining whether and to what extent the system is capable of tracking changes; and how that tracking function might be managed over time. Inevitably, we will be forced to make difficult decisions about the nature and extent of the changes that will and will not be captured and preserved over time. This means, in turn, that we will have to define and defend criteria for distinguishing between substantive and incidental changes to the record. While it is neither feasible nor desirable to capture and preserve every change, it is essential that we provide logical and defensible reasons for our inclusions and exclusions and be straightforward about the assumptions on which those reasons are based.
An increased attention to the characteristics and behaviours of fluid systems does not, of course, imply an abandonment of fixity as a desirable characteristic of electronic systems. One of the things the diplomatic analysis highlighted was the extent to which electronic systems are still being designed to manage data rather than records. This appears to be the case even when the purpose for which the system is designed would appear to require the creation and maintenance of fixed records rather than fluid data. What is needed is a deeper analysis of the nature and purpose of different kinds of electronic systems that would enable us to specify the degree of fixity and stability necessary to protect the integrity of certain types of records; and to stipulate, in the absence of fixity, alternative means for protecting their integrity.
The limitations of the diplomatic model of an ideal electronic record are attributable mainly to the fact that the model was built on the premises of general diplomatics. General diplomatics seeks to decontextualise records, to eliminate their particularities, variations and anomalies in the interest of identifying the common, shared elements of records that cut across juridical, provenancial, and technological boundaries. The case studies of electronic systems suggest that we are living in an era that is analogous to the era of medieval manuscripts where documentary variation was the norm rather than the exception. Given the variety and complexity of current electronic systems, it probably makes more sense to adopt the approach of special diplomatics, which, traditionally, has focussed on analysing individual chanceries and specific juridical systems. For electronic records, this means beginning with an analysis of the various features of individual electronic systems and record-keeping environments in their own terms, with all their particularities, variations, and anomalies; and, on the basis of that analysis begin to build a more general framework. In this way we can strike a more equitable balance between ideal and local features of electronic records. What John Lavagnino has observed, again in relation to the digital encoding of texts, also holds true for electronic records; his observation is that, while it is possible to "identify objective features of our texts; there is no closed and determined set of such features." This implies that the systems we establish for assessing and maintaining the authenticity of electronic records should be viewed as open-ended rather than closed.
Of course here again, recognising the need to accommodate localism does not invalidate idealism. The ideal elements of an electronic record as they are currently defined may be inadequate but they are far from irrelevant to the consideration of a record's authenticity. In this regard again there are observations from the world of scholarly editing that have resonance for the archival world. I am thinking specifically of Philip Doss's discussion of creating hyperlinked scholarly editions, where he speaks of the desirability of electronic editors "present[ing] variation within a context of authority." This balancing of variation and authority is especially important for archivists because we are not just seeking to understand the reality; we are, to a certain extent, actively trying to shape it. The degree to which the context of authority shapes and constrains the range of acceptable variation will depend of course on the nature of the digital object. When dealing with electronic records that document individual rights and obligations, the range of acceptable variation and fluidity built into the design of the system will certainly be narrower than it would be for electronic editions.
It is also clear that we need to strike a more equitable balance between the product and the process of documentary creation and transmission. Despite our efforts to capture some of that process, the Template for Analysis still tended to privilege the product, i.e., the individual record, and was insufficiently attentive to the context that shaped it. One of the things we discovered is that the various categories of context (juridical-administrative, provenancial, procedural, technological, documentary), turned out to be the most relevant to an understanding of the record-keeping environment, and provided the main grounds on which creators based their presumption of the records' authenticity.
These contexts were, however, the least well developed part of our model. For example, in several case studies, audit trails were identified by the creator as a significant means of ensuring the authenticity of electronic records. Audit trails are part of system administration and therefore were considered an element within the record's technological context. The element "system administration" was not decomposed sufficiently, however, to enable Task Force researchers to identify the various kinds of audit trails and the specific purposes they serve in a given environment. In the absence of that identification, it was difficult to assess the extent to which an audit trail supported the creator's presumption of authenticity in particular cases.
Although we failed to come up with a comprehensive typology of electronic records we did manage to come up with two sets of general requirements: the first set directed toward records creators and the second toward records preservers. Both sets of requirements are based on the notion of trust in record keeping and record preservation from the moment of its creation, and more specifically on the notion of a trusted record keeping system and the role of the preserver as a trusted custodian. The standard of trust to which both sets of requirements aspire is measured in terms of circumstantial probability rather than certainty.
The first set focuses on active records and enumerates the salient characteristics of a trusted record keeping system. The requirements identify the core information (the metadata if you like) about an electronic record that must be persistently linked to the record over time and across hardware and software platforms in order to establish and perpetuate its identity; as well as the kinds of procedural controls that provide a circumstantial probability of its integrity. These include, among other things, access controls, audit trails, procedures for preventing, discovering, and correcting loss or corruption of records; procedures for protecting the identity and integrity of records against media deterioration and across technological change, and so on.
The requirements in the second set focus on inactive records and enumerate the procedures necessary to enable records preservers to attest to the authenticity of their electronic holdings. They include establishing controls over the maintenance and reproduction of electronic records; documenting the reproduction process and its specific effects on the records' documentary form; and incorporating the history of the records' reproduction into the archival description of those records.
Documentation in general and archival description in particular are essential means of accounting for the integrity of the reproduction process and are necessary, therefore, to the proper fulfilment of the preserver's role as a trusted custodian of the records. Although archivists have not spent very much time examining its role as part of our apparatus of authenticity, it has been an implicit function of archival description to provide a kind of collective attestation of the authenticity of a body of records by explaining the context in which they were created, maintained, and used over time, their relationship to various creators, their relationship to the actions in which they participated, and so on. Explaining the history of a body of electronic records and their various reproductions is really just an extension of a traditional function that requires more explicit articulation.
There are three basic lessons that may be drawn from our experience over the past three years: