Identifying, preserving, and using high quality digital resources

Michael Fraser
Michael Popham
Humanities Computing Unit, Computing Services
Oxford University
13 Banbury Road
Oxford, OX2 6NN

Elizabeth Solopova
Department of English
University of Kentucky
Lexington, KY 40506

Selecting Resources for a Subject Gateway: Who Decides?

Michael Fraser

The Higher Education Funding bodies in the UK recently called for bids to develop subject-based faculty 'hubs' or gateways which locate, catalogue, and give access to digital resources suitable for use in Higher Education teaching and research as part of the new Resource Discovery Network. The new faculty hubs will further develop the design and purpose of the existing centrally-funded gateways, amongst which ADAM (Art, Design and Media), IHR-INFO (History), and SOSIG (Social Sciences) currently have some remit for humanities disciplines.

The Humanities Computing Unit has been invited to submit a bid to develop the proposed Humanities Hub of the new Resource Discovery Network. The proposal draws upon existing work relating to subject gateways within Oxford, in particular the HumBul Gateway for the Humanities and, on a smaller scale, the Computer-Assisted Theology gateway, as well as other gateways within the UK and beyond.

This paper will focus on a particular issue which lies at the core of any subject-based gateway, the criteria by which resources are selected for inclusion within the gateway. Subject gateways explicitly state or at the very least imply a concern that the resources catalogued are quality-assured, an assurance based on human intervention. But what does quality mean in this context? Against what criteria and with what authority can an individual resource be deemed fit for inclusion and therefore deemed fit for purpose?

Gateways tend to fall into two basic types. To a large extent the HumBul Gateway and the Computer-Assisted Theology gateway demonstrate both types. The Theology gateway was developed by an individual enthusiast with a keen interest in the possibilities offered by the Internet for teaching and research and with specific subject expertise. Gateways of this type are numerous on the Internet and indeed many of the existing gateways to humanities subjects fall into this category. For the purposes of this discussion such gateways may be termed amateur gateways since their development is often dependant on one or two individuals, often without formal institutional support and frequently presented with little information about selection criteria, intended audience, available metadata, consistent classification, advanced searching and so on. What these gateways can offer however, is a subject practitioner's view of the Internet with evaluative as well as descriptive annotation for each linked resource; they derive their authority from the recognised expertise of the subject specialist. The second type of gateway, one which to a large extent the new HumBul strives to be, and which may be termed the professional gateway, are fewer in number (certainly for the humanities). The professional gateway is identified by institutional and sometimes national support, developed by a specific project team, and constructed along the lines of an advanced library catalogue (often drawing its cataloguers from amongst subject librarians). The professional gateway offers durability, structured and easily retrievable data. On the other hand there is a tendency to hide from the end-user the evaluative judgments made about individual resources held within the virtual collection despite publication of the criteria by which resources are selected for inclusion. Both types of gateway contend with the inherent tension between trying to be a digital library (cataloguing and dissemination) and something like an academic reviews journal (discovery and evaluation).

The EU-funded DESIRE Project, "Selection Criteria for Quality Controlled Information Gateways", developed and published a list of quality selection criteria designed as a reference point for subject-gateways (see <>). The criteria presented were arranged under five headings which may be summarized as relating to: audience, content, design, maintenance or durability, and comparison with related resources. Under each heading a number of sub-categories contain a series of questions to be considered by resource contributors to subject gateways. The categories are comprehensive and the questions detailed. The application of this criteria is intended to highlight quality and limit quantity. The ADAM and SOSIG Gateways, for example, either explicitly draw attention to this particular set of criteria or have developed a similar set for their own resource contributors.

Leaving aside the issue of whether such a comprehensive approach to selection criteria actually serves the purpose for which it was designed, it is significant to note that the wide range of questions which are required to be answered satisfactorily before a resource can be admitted into the catalogue bear little or no relation to the metadata available to the end-user. For the most part the user of gateways such as the two mentioned above are presented with fairly sparse metadata consisting of title, subject, description and so on. Whilst the cataloguer is forced to make evaluative judgements about resources, the user has little notion of what these judgements might have been. The mere fact that a resource has been included within the gateway is, it seems, assumed to be enough. Descriptions are short and objective, and rarely is an indication given even to the contributor's identity or authority for making such judgments.

A fundamental question which underlies this paper is whether there is a need for detailed quality assurance at all when the effort might be better expended on more comprehensive, factual, metadata to assist the searching and delivery of a gateway's holdings. The combination of professional cataloging and amateur evaluation appears to be successfully provided by services like the Internet Movie Database and to some extent by commercial ventures like Amazon.Com. Both these databases, however, concern themselves with offline media available to their users only with some additional effort. The Internet subject gateway, of course, catalogues resources sharing the same digital medium as itself. It is intrinsic to an Internet gateway to not only point away from itself but to actually take the user to those resources using the same mode of delivery. One might argue that providing reviews of Internet resources is a superfluous activity given that the function of a gateway is to take the user to the objects which they might inspect for themselves, a task which neither the Internet Movie Database nor Amazon.Com can undertake.

On the other hand, as this paper will argue, given that academic subject gateways have an additional role of providing access to digital resources suitable for teaching and research the Internet offers something which the offline media cannot: a full integration of the resource catalogue, resource evaluation, and the resources themselves. It is only the combination of all four fundamental elements, discovery, evaluation, cataloguing and dissemination, which moves us towards a gateway which is subject-based, academic, and Internet integrated, a combination which lies at the core of the proposed Humanities Hub.

Accept/Reject? Quality decisions facing the Oxford Text Archive

Michael Popham

The Oxford Text Archive is one of the world's best known electronic text centres, and has been in existence for almost a quarter of a century. At the time of its establishment in 1976, there were relatively few humanities scholars interested in the creation and use of electronic textual resources, which meant that it was all the more important to ensure that their efforts were preserved and made available to future generations. However, despite the small size of the humanities computing community, the resource implications of undertaking any work involving electronic text meant that such endeavours were rarely entered into lightly, or without significant scholarly and technical input.

In the summer of 1996, the OTA was appointed as the electronic text Service Provider for the UK's national Arts and Humanities Data Service. In many respects this appointment was extremely timely, as by now the international community of humanities computing scholars had grown significantly, and many individuals who were less computer-literate than their predecessors were starting to take advantage of the facilities offered by cheap scanning technologies and the emergence of the world wide web as a ubiquitous technology. Individual academics saw less of a need to rely upon the archival and distribution services offered by bodies such as the OTA, as they now believed that they could undertake these tasks for themselves. Yet this rapid growth in self-publishing on the web has raised a number of concerns -- not simply about the quality of the materials being created, but also about the methods and standards that have been used.

Within the UK, the Arts and Humanities Research Board (AHRB) has recently been established following agreement by the British Academy, the Department of Education for Northern Ireland (DENI), and the Higher Education Funding Council for England (HEFCE). They have agreed to set up the Board pending a decision by the Government on whether to establish an Arts and Humanities Research Council. Funding for the AHRB will total over £36 million in the financial year 1998-99, and £44 million in 1999-2000, with contributions from all three parties to the agreement. Under section 10 of the application form for research grants in excess of £5000, applicants are now told that in the case of "projects whose primary purpose, or significant product, is the creation of an electronic resource, it will be a condition of award that data created as a result of the research, together with documentation, should be offered for deposit at the Arts and Humanities Data Service, within a reasonable time after the completion of the project. Applicants involved [in] research leading to the creation of such a resource are strongly advised to obtain advice from the AHDS concerning appropriate standards and methods". In practice this means that the AHDS Service Providers, such as the OTA, have been receiving a glut of enquiries from academics who will be affected by this new condition of award. Many of those who have contacted the OTA have been somewhat surprised to learn that we are less than enthusiastic about endorsing their plans to create their materials solely in HTML, and distribute these via a local website -- and very few have shown any awareness of the relevant standards for resource creation, preservation, and metadata.

We now find ourselves in something of a dilemma. The OTA is obviously keen to ensure the long-term preservation and availability of the scholarly outputs of AHRB-funded research. Yet at the same time, many of the electronic resources that seem likely to be produced by AHRB funding are not going to be created in accordance with crucial standards and best practices. So, whilst the scholarly content of these resources will almost certainly be of the highest order, they may turn out to be poor quality resources from the point of view of long-term preservation and viability. In order to address this problem, the OTA (and the four other AHDS Service Providers) will be producing a series of Guides to Good Practice, which will provide the necessary guidance to the creators of electronic scholarly resources. However, at the time of writing, it seems unlikely that the AHRB will compel resource creators to follow the advice of the AHDS Service Providers, which will surely result in the creation of many technologically weak and poor quality resources, not to mention the squandering of available funding. Moreover, if these resources are to be preserved and remain viable in the long-term, they are likely to prove difficult and costly for the OTA to maintain, and present future end-users with additional problems (and therefore costs) to overcome.

Elsewhere within the academic community, we have seen the emergence of other recommendations, such as the MLA's Guidelines for Electronic Scholarly Editions. Although this document relates to the production of one very specific kind of electronic textual resource, it is gratifying to note that it draws heavily upon the recommendations set out in the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange (TEI-P3), and is therefore in keeping with the recommendations made by the OTA to resource creators. Even so, despite the fact that the MLA, TEI, and OTA are in accord with regard to what constitutes good practice when creating electronic resources, it seems likely that it will be some time yet before the majority of academics (and especially those with minimal computing expertise), adopt such practices as a matter of course.

This paper will briefly set out the OTA's perception of electronic resource creation within the UK, and examine the reasons why many academics seem unwilling or unable to adopt the recommendations and good practices that originate from several of the key players in the scholarly electronic text community. It will then look particularly at the challenges confronting the OTA when identifying and accepting electronic textual resources for accessioning into the OTA's holdings. Having discussed the difficulties of weighing scholarly merit against the long-term preservation costs, viability, and usability of resources, the paper will conclude with an explication of the OTA's policy concerning this contentious area, and set out our criteria for resource selection.

Fit for Purpose: Issues Surrounding the Use of Digital Resources in Research and Teaching

Elizabeth Solopova

Humanities disciplines have a large and varied body of digital resources on which to draw for teaching and research purposes which include scholarly editions, on-line dictionaries and journals, collections of digital texts and images, large Internet gateways and numerous individual and course Web pages. In spite of the recent quick growth of digital resources, there are no guidelines published, as far as I am aware, which assist the academic in assessing the quality of a digital resource for actual use in teaching or research. This is not surprising. First, the answer to whether a resource is fit for a purpose will almost always be, 'it depends'. Not only does the answer depend on the general purpose envisaged, whether for teaching or research, it will also inevitably depend on the precise needs of the individual asking the question, for there is a whole spectrum of approaches to the subject even within a single discipline. Secondly, it is not surprising that no set of criteria exists for determining the quality of a digital resource when so little published criteria exist for assessing the quality of academic research in general. It is a contentious issue, as the United Kingdom academic community who have been subject to the Research Assessment Exercise will confirm. But the issue also comes to the fore within the evaluation process for tenure, the acceptance of publications by publishers and editorial boards, and the success or otherwise of research funding. Underpinning all of these is some notion of peer-review and the refereeing process which remains crucial in the assessment of research publications.

How appropriate is the application of the peer-review methods to the assessment of digital resources? In summary one might argue that digital resources should not be treated any differently from other resources. The peer-review process is as appropriate for determining their 'usefulness', as it is for effecting their development, publication and, hopefully, the academic rewards structure. The assessment of digital resources, however, whilst always requiring expert knowledge of the subject area, also requires an understanding of the underlying technology. Acknowledged experts in manuscript studies, for example, simply may not appreciate the potential scholarly contribution of an electronic facsimile, if the digital medium itself is significantly more alien to them, than the publication of another printed facsimile. The recognition that subject experts must understand the potential of the technology employed, in order to assess the quality of a resource, was apparent in the recent study undertaken by the Arts & Humanities Data Service into the requirements of academics for the scholarly use of digital resources (see The Oxford Text Archive reported not only that their academic users perceived as obstacles to the use of digital resources the technical ability required to use certain resources and the corresponding lack of training available, but also the current proliferation of resources which by-pass the benefits of academic review.

Academic practitioners who do have a familiarity with current and emerging technologies, however, have come to expect more from a digital resource than can be delivered on paper. The electronic edition of a medieval text is no longer a novelty. For both research and teaching purposes there is almost an accepted expectation that a critical edition in digital form will comprise not only the full texts, but also the high quality digital facsimiles of all the surviving witnesses. Moreover the witnesses are expected to be encoded for advanced searching and linked to supplementary materials such as glossaries and textual notes. In such cases it is less a single resource than an entire scholarly environment that projects are expected to provide. The editions are expected to be easy to use and transparent even for a student inexperienced in both their academic and technical aspects. As is well known digital resources which strive to meet such expectations are often expensive, very time consuming undertakings, requiring unremitting devotion and extremely hard work from the teams which create them. These difficulties are acknowledged by some members of the academic community in that they make a positive evaluation of digital resources as suitable for academic use, in spite of, for example, the lack of high quality digital photography (accepting that it is expensive and that permissions are difficult to obtain), or (accepting the need to work in the situation of ever changing and developing technology) in spite of occasional technological failures, the lack of compatibility with all the existing platforms, or their slowness even on the most current computers. Other members of academic community are however less forgiving of these conditions 'outside the editor's control', and lose confidence in digital resources.

Another difficulty well-known to any 'insider' is that the huge quantities of data which underlie electronic resources often exist in a form which makes them especially difficult to proofread. In these situations the proofreading and checking done by the project team under the increasing pressure of deadlines never seems sufficient and may have to stop before complete satisfaction is achieved. Again the arguments for making the results of work available in spite of certain imperfections, and the dangers which may result from this are not easy to balance. One particular danger is that as a result of a combination of a large body of complex data in a digital resource with a lack of technical expertise on the users' part, a resource might be used or recommended for scholarly use for some time before its faults become apparent. Any 'forgiving' attitude on the part of the academic community, which itself may benefit from the early publication of a cutting-edge resource still imperfect in some aspects, should require an honest assessment of the resource by its creators and an open statement of its weaknesses. The promise of easy updating which comes with digital technology justly encourages a 'forgiving' attitude, but should not be used as a justification for the publication of poor quality work.

The input of scholars with different backgrounds is required for the evaluation of digital resources within collections and gateways. Ultimately, the inclusion of a resource within a peer-reviewed gateway or a 'published' digital collection should have the same effect upon its use and the reward of its creators as is associated with current publishing activities.