Supporting Digital Scholarship:

Annual Report, 2000

 

John Unsworth

 

 

 

Overview:

 

In January of 2000, the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) and The University of Virginia Library’s Digital Library Research and Development group (DLR&D) began a three-year project called “Supporting Digital Scholarship” (SDS) funded by the Andrew W. Mellon Foundation and co-directed by John Unsworth (Director, IATH) and Thornton Staples (Director, DLR&D).  The purpose of this project is to address second-generation digital library problems, where the focus is on scholarly analysis, reprocessing, and creation of digital resources.  The specific digital library problems we are examining are:

 

1)      scholarly use of digital primary resources;

2)      library adoption of “born-digital” scholarly research; and

3)      co-creation of digital resources by scholars, publishers, and libraries. 

 

 

Staffing

 

Approaching these problems requires us to develop both technical methods and institutional policies for collecting originally digital scholarly publications.  Accordingly, we have organized two working committees, one on technical issues and one on policy issues.  The committee members are:

 

Technical Committee:

 

Rob Cordaro, Library: DLR&D

Kirk Hastings, IATH

Chris Jessee, IATH

Worthy Martin, IATH/Computer Science

Daniel Pitti, IATH

Steve Ramsay, IATH

Perry Roland, Library: DLR&D

Thorny Staples, Library: DLR&D (Co-Chair)

John Unsworth, IATH/English Dept. (Co-Chair)

Ross Wayland, Library: DLR&D

 

(names italicized = staff wholly or partly supported on SDS funding)


Policy Committee:

 

George Crafts, Library: Humanities Services

John Dobbins, Art Dept.

Edward Gaynor, Special Collections.

Sandy Kerbel, Science and Engineering Library

Phil McEldowney, Library: Social Sciences Services

Daniel Pitti, IATH (Chair)

Joan Ruelle, Science and Engineering Library

Thorny Staples, Library: DLR&D

 

The Technical Committee is the group actually responsible for production and implementation of the software, standards, and systems that this project requires.  The Policy committee is charged with considering and proposing policies with respect to long-term preservation and access for digital materials in the library, and policies that govern the integration, dissemination, and re-use of those materials.  These two committees meet regularly—the technical committee once a week, and the policy committee at least once a month. 

 

There is some intentional overlap between these two groups, in that Thornton Staples and Daniel Pitti serve on both, and though the Policy Committee consists largely of Library staff (from the Library’s Collections Committee), it is chaired by IATH’s expert on standards, Daniel Pitti, and it includes a faculty member who is himself engaged in producing one of the projects that is a test-case for SDS (John Dobbins/The Pompeii Forum Project). 

 

The ultimate goal of the Technical Committee is to build the systems that show what can and can’t be done, at a technical level, to support digital scholarship.  The ultimate goal of the Policy Committee is to produce a set of guidelines for collecting digital scholarship that outline what libraries can and can’t promise to do with these materials depending on what form they take, what standards they do or don’t adhere to, what functionality they have, and how they achieve it. 

 

In addition to these two groups, we have also assembled an advisory committee consisting of faculty and administrators who are institutional leaders and/or prominently engaged in the use of information technology in their own research and teaching.  The members of the Advisory committee are:

 

Ed Ayers, Virginia Center for Digital History/History

Brian Balogh, History 

Johanna Drucker, Media Studies/English

David Germano, Religious Studies

Cheryl Mason, Curriculum, Instruction & Special Education

Kirk Martini, Architecture/Civil Engineering

Alan Howard, American Studies/English

Ben Ray, Religious Studies

Kathy Reed, Provost’s Office

Glen Robinson, Law School

Kathryn Rohe, Drama

Tim Sigmon, ITC

Kendon Stubbs, Library

Karin Wittenborg, Library


The charge to the Advisory Committee is to meet once a semester to review and respond to the work of the Policy and Technical Committees in the context of broader University perspectives, to consider long-term issues in supporting digital scholarship, and to disseminate information about this project to their colleagues. 

 

All three groups are also invited to attend the public presentations of outside experts invited to UVa to consult on the SDS project.  So far, local events for these groups have been:

 

April 27, 2000: Coffee for SDS Technical and Policy committees

May 26, 2000: Reception and Progress Report for SDS Policy, Technical,

and Advisory committees

November 3, 2000: Public presentation on digital preservation by Margaret

Hedstrom (University of Michigan)

December 14, 2000: Reception and Progress Report for SDS Policy, Technical,

 and Advisory committees

 

Each group has a closed, unmoderated, archived majordomo list for discussion, and the project as a whole has a publicly accessible web site at http://www.iath.virginia.edu/sds/ which provides the original proposal, related readings (here presented as Appendix B, Bibliographies), names and email addresses of participants, and links to faculty projects and SDS work in progress. 

 

The original proposal set out these goals for the first year:

 

Primary objectives in the first half of this year will be hiring, training, and information gathering (which would include external consultation as well as a thorough analysis of our own data and systems).  In the second half of the year, we will finalize a first version of the General Descriptive Modeling Scheme, while working with individual projects to establish and document standard procedures for producing descriptive, structural, and administrative metadata.

 

As predicted, this first year has been a ramping-up period, involving a good deal of organizational work, content selection, hiring, and training.  During this period, two full-time staff members were hired: Kirk Hastings, who began work in April, and Rob Cordaro, who—partly because his position was newly created for this project—was not hired until August.  A third full-time staff member, Chris Jessee, was already in an IATH position on other soft money, and simply changed his funding source, and a fourth person, Steve Ramsay, was shifted to partial funding under the Mellon grant (after discussion with Don Waters) when it seemed his contribution to the project could be more important than some of the contract-work we originally budgeted.   Because most of the funding in this project is for personnel, and the personnel have only gradually come on budget over the year, while interest has been returning to the project at the rate of nearly $15,000 per quarter, we come to the end of the first full year of this three-year project having spent only about $100,000—a good deal less than a third of the budget—and with a balance of over $900,000.  Nevertheless, we have indeed finalized (and put into practice) a first version of the General Descriptive Modeling Scheme (an XML DTD for abstract and highly generalizable description of hierarchical information structures), and we have begun working with three projects (see below).

 

 

Content Selection:

 

During the early part of the year, the Technical Committee spent time elaborating our original work-plan and reviewing existing IATH projects as potential test-cases.  In choosing our test-cases, we were looking for some simple problems to start with, and some complex ones for later on.  We also wanted a mix of some highly structured material and some relatively unstructured material (to reflect the range of what one finds in real-world born-digital scholarship), some heavily image-based projects, some more heavily textual (but still multimedia) projects, some typical and some idiosyncratic materials, and some projects that involved spatial representation, mapping, and modeling.  After a good deal of discussion, we chose Marion Roberts’ Salisbury Project, Jerome McGann’s Rossetti Archive, and John Dobbins’ Pompeii Forum Project.  We expect to add other projects in calendar year 2001.

 

The Salisbury Project is the newest of these three test-cases, being only about three years old.  It consists of a collection of several hundred photographs with basic descriptive metadata, documenting Salisbury Cathedral: the photographs were taken by the project’s author, and so present no problem in the area of rights and permissions (e.g., no complexities of administrative metadata).  The project has some HTML-based textual material (essays, bibliographies, syllabi) but not a great deal of it, and it mostly consists of image files and SGML records in the EAD (Encoded Archival Description) DTD.  The photographs are keyed, by metadata, to a plan of the building, introducing a simple two-dimensional mapping component (but raising also the possibility of more complex three-dimensional components).  Finally, the Salisbury project, in its original form, is delivered through Dynaweb, a commercial web server that uses stylesheets to transform markup in arbitrary DTDs (in this case, EAD) into HTML for standard web browsers: before the advent of XML and XSL, Dynaweb was one of two or three strategies, all involving proprietary software, that would allow one to separate information capture from rapidly shifting techniques of presentation, and therefore many IATH projects use it.  Dynaweb-based projects also provide a nice example of the problem of functionality in scholarly publications that is partially embedded in or dependent on proprietary software that, for legal and technical reasons, it is not practical to collect in library systems.  As complex as all this may sound, Salisbury was the simple example we elected to start with, and our progress with it is described below, under “Technical Results.” 

 

The Rossetti Archive is a very large and (technically) idiosyncratic but highly regularized collection of SGML and image files documenting the graphic and poetic creativity of Dante Gabriel Rossetti, pre-Raphaelite painter and poet.  An IATH project that began in 1992, the Rossetti Archive is both complex and extensive, comprising about 10,000 objects with elaborate and highly detailed interconnections.  It has its own (tri-partite) SGML DTD, developed at IATH for this publication, and it has very little HTML.  Like Salisbury, it is currently delivered using Dynaweb, but it is a project that depends far more than does Salisbury on searching structured text—a functionality that still requires expensive proprietary software.  It is also a project undergoing more rapid change than Salisbury, being actively edited and added to on a daily basis by a group of half a dozen contributing editorial staff supervised by Professor McGann.  For these reasons, we saw in the Rossetti Archive an opportunity to examine the search-engine problem and the problems that arise around library collection of digital scholarship that is ongoing, as many of these publications will be.  Our aim, then, is not only to collect the Rossetti Archive and provide it, with all its original functionality, out of FEDORA, but also to explore the technical and policy issues that arise when we move the Rossetti Archive into IATH’s document management system (Astoria) and set up mechanisms to allow the Rossetti Archive staff to use Astoria as their day-to-day production environment, and as a means of publishing “editions” of the Archive into library systems.  This work is next on our calendar, and will begin (in tandem with work on the Pompeii Forum Project) in the early part of 2001.

 

The Pompeii Forum Project has fewer pieces than the Rossetti Archive, but more pieces than Salisbury, and a greater variety of data types than either.  It also has less structure than either, and what structure it has is largely implicit, being represented only as links between HTML files and their sub-components.  On the other hand, the Pompeii project was the original inspiration for Thornton Staples’ work on GDMS, and offers an interesting opportunity to experiment with partial collection strategies (in which, for example, primary data might be collected and arranged in a wholly new structure in the library system, rather than trying to collect an entire publication as it exists in the HTML world).  It also offers an opportunity to explore the possibilities for inferring data structures as part of an automated collection strategy, the problems involved in dealing with datatypes for which no good non-proprietary forms exist (e.g., CAD drawings, photorealistic 3D imaging and modeling, etc.), and the possibilities for using maps and architectural information to structure other kinds of data. 

 

 

Technical Results:

 

The basis of our work in this project is a digital library architecture called FEDORA (Federated Extensible Digital Object Repository Architecture): FEDORA originated at Cornell in research done by Carl Lagoze and others, but the Virginia implementation is its largest testbed to date, and its first real-world installation.  Virginia’s FEDORA implementation is outlined in an article by Thornton Staples and Ross Wayland entitled “Virginia Dons FEDORA: A Prototype for a Digital Object Repository” (D-Lib Magazine, Volume 6 Number 7/8, July/August 2000), which can be found on the Web at http://www.dlib.org/dlib/july00/staples/07staples.html.  In our FEDORA implementation, objects within a digital library (or repository) consist of a basis (for example, a JPEG image in a simple object, or a machine-readable text with page-images, in a complex object), plus three metadata packages (administrative, technical, and descriptive).  Finally, objects can be associated with one or more “disseminators”—data structures that pair a particular set of behaviors (called “signatures”) with methods to produce that behavior (in our implementation, servlets).  This graphic from the Staples/Wayland DLIB article shows two different objects in relation to signatures and servlets:

 

Figure 1: Different Objects, Same Signatures -- FEDORA Architecture as implemented at Virginia

 

Starting with the Salisbury Project, then, our first objective was to import and catalogue the constituent elements of that publication, making FEDORA objects for each component—every image file, every thumbnail or enlargement, the plans of the building, and so on.  We took two different approaches to doing this, one relatively expensive and handmade, but maximally faithful to the original, the other automated, inexpensive, and only partly faithful to the original.  The point of this exercise was to try to establish parameters for the high and low end of collections strategies: what could you do if the publication were very important and you had complete access to the underlying source data, on the one hand, and, on the other hand, what could you do if you had limited or no access to source materials and/or the publication didn’t justify a substantial expenditure of time and effort in collection.  In what follows, those two strategies, and the results of executing each, are described.

 

The Salisbury Project consists of an image archive, maps of Salisbury and Wiltshire, a teacher’s guide, credits, a site map, and a comment form.  Figure 2 shows the original home page for the project, the point of entry in the HTML superstructure for what is, largely, a Dynaweb-based, EAD-encoded image archive. 

 

Figure 2: IATH Home Page for the Salisbury Project

 

The “image archive” is the bulk of this publication, as it contains all of the several hundred photographs plus their descriptive metadata.  In the original IATH publication, this material was presented through Dynaweb, in a fairly typical frame-based display, with a table of contents in the upper left-hand cell, two cells on the right for image content (the upper one is initially used to display help documentation, until images are directed there from the lower right-hand cell, for comparison), and a small cell on the lower left for the building plan, which changes on request to show what part of the plan is represented in a particular photograph.  The original publication employs a fairly generic and re-usable design for collections of photographs or slides that document architectural objects, and we were pleased to have been able to provide both physical and intellectual views of the structure of the Cathedral, and to provide basic comparative functionality, using Dynaweb. 

 

Figure 3: Dynaweb version of the Salisbury Project

 

The “expensive” collection strategy for the Salisbury Project involved manually importing images into FEDORA and converting the EAD files to GDMS, the generalized structural markup developed by the DLR&D for administering information structures in the digital library.  This process is described in detail in Kirk Hasting’s “Replicating the Salisbury Image Archive”—to be found off the SDS home page at IATH.  At the end of this process, each component of the Salisbury Project was separately identified and catalogued in FEDORA, and the relations among these components was expressed in a way that would allow the repository to reassemble the logical structure of the building.

 

The next challenge on the “expensive” route was to provide the original, Dynaweb-based functionality in the more abstract, generalized, maintainable FEDORA version of the project, using nothing but XML and XSL, Java servlets, and the underlying MYSQL database that keeps track of FEDORA objects.  Kirk Hastings produced the XSL stylesheets to do this, and the stylesheets themselves were imported into FEDORA and linked, through signatures, to the XML files as a disseminator.  The results are impressive: the FEDORA-based Salisbury Project looks and functions just like the Dynaweb version (see figure 4, below), but it does so with no proprietary software, and using only open standards for all its components—which means that the publication is maximally maintainable in a library context. 

 

Figure 4: The FEDORA version of the Salisbury Project

Figure 5: Alternate Methods of Access to the FEDORA Salisbury Project

 

It is important to point out, though, that the result of this experiment is not only the duplication of Dynaweb functionality in a software-independent, standards-compliant environment: the importation and cataloguing of the Salisbury Project’s components into FEDORA also make those components available to other disseminators, and to presentation and use in other contexts (see Figure 5, above).  In other words, whereas before one would only have approached the Salisbury Project as a self-contained publication, one might now find components of that publication in other contexts, and one might see the whole of the project not only in the original interface, but optionally in other more generic interfaces used across the digital library—for example, a simple table of contents.  Figure 6, below, shows the result of the “get_toc” method for disseminating the Salisbury Project through FEDORA:

 

Figure 6: The "get_toc" dissemination of the Salisbury Project

 

At this point, we consider the FEDORA collection and representation of the Salisbury Project essentially complete and successful, but we recognize that the expensive methods used to get Salisbury to this point are probably not justifiable as a general strategy for collecting born-digital scholarship—nor do they deal with the layers of plain HTML in the Salisbury Project (bibliographies, syllabi, essays, introductory matter).  In IATH’s experience, and in what we observe elsewhere, it is not uncommon for a scholarly publication on the Web to have a combination of native HTML components and some components delivered (via Dynaweb, Perl scripts, or some other mechanism) from material marked up in some more specifically adapted DTD.  For that reason, we also used Salisbury as the content for a low-cost, automated collection method.  Rob Cordaro (DLR&D) is developing a Java “web-whacking” utility that imports material into FEDORA. 

 

Figure 7: Java Utility for Importing Web Publications into FEDORA

Rob’s “URL Parser” utility can be pointed at a URL and, within given parameters, made to traverse the links that constitute a web publication, transferring text and image components to the library system, automatically creating permanent ids for those components, creating FEDORA objects for all components, and actually rewriting the markup in the text components so that links and image calls refer to FEDORA objects and use the new permanent ids   Specifically, in the Salisbury example, the web-whacking collection method proceeds as follows:

 

1)                  Start search at http://www.iath.virginia.edu/salisbury/

2)                  Limit search to pages within the directory tree http://www.iath.virginia.edu/salisbury/

3)                  Retrieve web page contents and parse html markup tags.  Pass all text and all tags other than "<a href" and "<img through untouched.

4)                  Parse each "<a href" and "<img" tag and check the URL to see if it's within the search limits.  Leave those outside of search limits alone and pass through untouched. (May want to add an informative message in case URL becomes inaccessable.)

5)                  For each "<a href" and "<img" URL within search limits, replace the URL with a URN of the form:

http://dl.lib.virginia.edu/servlets/ObjectServlet3?action=dissem&

     doid=<URN>&sigName=<SIG>&methName=<METH>

   where

<URN> - 1007.lib.dl.test/salisbury_project/<remaining path>

<SIG> - "web_default"

            <METH> - "get_as_page"

6)                  Write out modified html page (now transformed into an html object with URLs changed into URNs) to:

icarus.lib.virginia.edu/repo/data/project/salisbury_project/

7)                  Copy all image files into same directory. Create sub-directories as needed in salisbury_project directory.

8)                  Write out "old URL - new URN" translations to a file for a record.

9)                  Create objects in Repository Database for each URN. (Use Ross's java library calls.)

10)              Repeat process starting at step (3) for each html page in website.

 

At this point, the Java utility can do all of the above, but it has acknowledged limitations—some of which are necessary concomitants of the desire to keep the cost of collection low by automating the process.  Still, we believe we can improve this tool in some ways: future extensions to the utility will include the ability to handle applets, import standards-based stylesheets, parse DynaWeb-style URLS and deal with cgi and other components (e.g., Javascript) that are either inaccessible or unsustainable within the digital library by referring the library user to a page that explains why the collected publication cannot include this component.

                       

Finally, with respect to our Salisbury experimentation, Chris Jessee has been developing three-dimensional computer models of the building, with a particular interest in demonstrating the construction techniques used in the building.  These FormZ models (see http://www.iath.virginia.edu/~cj8n/salisbury/) are being used in Marion Roberts’ graduate art-history seminar, in conjunction with Roberts’ photographs; they will also constitute the first 3D FEDORA objects:

 

Figure 8: Cutaway computer model of Salisbury Cathedral

Figure 9: Model and photograph comparison

As noted above, the Salisbury project is not particularly dependent on the search-engine functionality that Dynaweb provides, but this will be critical functionality for other publications we expect to collect—for example, the Rossetti Archive.  Although the library has the OpenText search engine—now produced and licensed by the University of Michigan—even OpenText is a proprietary technology, with its own query syntax, compiled binaries and indices, and no particular facility for abstraction.  Indeed, some of these characteristics are inevitable at this point—for example, since the XML Query Language standard is still in development (see http://www.w3.org/TR/xmlquery-req), it is not possible at this point to build a search engine that uses standards-based query syntax.  Nonetheless, we this standard will surely be settled in the not too distant future, and it will then be possible and indeed desirable to be abstract the formation of queries and the formatting of search results from a particular search engine.  That kind of abstraction would also be required if one wanted to treat searching as a method within FEDORA.  In order for this to happen, then, we need a less monolithic, more modular, architecture that separates searching, as well as the formation of queries and the styling of results, from one another and from the actual mechanisms that carry out these functions.  Steve Ramsay (IATH) has been working on an architecture that meets these requirements, called Granby.  Currently instantiated as a set of Java servlets using SGrep for the core XML search functionality (see Figure 9), Granby will be an important part of the SDS project, and as it matures, we intend to make it available as free software for educational use.  Full documentation of Granby is available from the main SDS web site.

 

 

Figure 10: Granby

 

Policy Results:

 

As noted above, the goal of the Policy Committee is to produce a set of library policy guidelines for collecting digital scholarship.  The SDS project has focused initially on technical challenges, because we expect technical possibilities and limitations to set some parameters for policy-making; conversely, though, we recognize that library policies will ultimately govern the selection and deployment of technical solutions.  

 

The Policy Committee’s initial work has focused on the Library’s traditional mission, policies, and practices in order to establish a clear understanding of the familiar before attempting to analyze, understand, and accommodate the unfamiliar.   In the view of the Committee, traditional “collecting” involves five activities: selection, acquiring, preserving, description and access, and deselecting. These activities are all applicable to the collecting of born-digital scholarship, but the specific methods and criteria used in each may differ considerably when considered in traditional vs. digital contexts.  By the same token, some issues that play a small part in traditional collecting may assume a much larger importance in digital collections.  For example, in the traditional library, the publishing unit largely dictates the unit of collection and description: book, serial, or serial issue, multi-volume monograph, and so on. The boundaries of the unit of collection for digital publications is frequently problematic, especially when the publishing unit comprises a complex mixture of intellectual objects that may or may not be intellectually independent, which is to say, objects which may or may not be understood independently of the collection of which they are members.  And, of course, other anticipated issues are challenges presented exclusively, or almost exclusively, by the digital medium.  For example, copyright law for traditional materials has achieved a working balance between the rights of creators, owners, publishers, libraries, and readers and users.  Managing, preserving, describing and providing access to both dependent and independent information objects can be made even more problematic when different rights are associated with them. 

 

With considerations such as these in mind, the Policy Committee has articulated three working assumptions:

 

·        First, while digital collecting policies must be based on traditional library policies and mission, those policies and that mission will need to expand in order to address the challenges and opportunities presented by digital scholarship and publication.  Accordingly, the majority of the members of the Policy Committee are librarians with extensive experience in selecting materials for the Library’s collection.  The selection experience of the members covers both materials in traditional media (books, serials, special collections, and manuscript materials) and some digital media—but while the Library has a built a large collection of reformatted or purchased digital material, but it has done so without a long-term strategy for preservation and access. 

 

·        Second, new policies developed to address born-digital publications will impose practical constraints and requirements on scholars, and if  these policies are to succeed, they must be based on a consensus that includes scholars.  Therefore, John Dobbins, a scholar with extensive experience in developing a digital publication, serves on the committee to represent the scholarly perspective.

 

·        Third, technical and policy developments will need to be coordinated, because policies will (to a significant extent) provide the specifications for SDS technical work, and because policy will need to grapple with new forms of research and publication.  To facilitate this coordination, the two committees share two members:  Daniel Pitti and Thornton Staples.

 

At this stage of its deliberations, the Committee is turning from its analysis of traditional collecting activities to a taxonomy of the policy areas and issues for digital collections development.  Ultimately, we understand that, even if the issues are fundamentally philosophical, pragmatic policies are required.  We believe that policies developed in the presence of real-world examples and real-world implementation will achieve that pragmatism, and will find an audience and an application in many contexts beyond the SDS project or the University of Virginia Libraries: for that reason, we have had preliminary discussions with Deanna Marcum of CLIR on the possibility of CLIR publishing the findings of the Policy Committee, and she has expressed an interest in this possibility. 

 

 

Related Grants:

 

IATH and DLR&D jointly applied, in January of 2000, for a Sun Microsystems Academic Equipment Grant in support of the SDS project, requesting three Ultra 80 workstations with a value of approximately $75,000.   This request was granted, and the workstations were delivered in the September of 2000: one machine went to DLR&D, where it serves as the main development server for the SDS and FEDORA projects; the other two went to IATH, where they are used as desktop development machines in support of SDS and other projects. 

 

Sun has shown a good deal of interest in digital library issues—far more than any other computer hardware company.  Sun has an executive whose duties are entirely focused on digital library problems, and they have already designated Cornell’s library as a Center of Excellence in Digital Libraries.  Recently, IATH and the Library have been encouraged by Sun to propose Virginia as a second Center of Excellence in Digital Libraries, with a particular emphasis on the collection and preservation of born-digital scholarship.  Being designated a Center of Excellence would bring significant publicity to the SDS project and it would also encourage the development of commercial software for creating and managing digital libraries with the capacity to support digital scholarship.  We expect to submit our proposal to the Sun CoE program early next year, and we believe it will be successful. 

 

In November of 2000, John Unsworth submitted to Mellon a proposal for an electronic imprint at the University Press of Virginia, and the University of Virginia committed $650,000 over the next two and a half years for this project: this proposal was approved in December of 2000.  While not related in any direct administrative way to the SDS project, the electronic imprint certainly promises to provide a unique opportunity to explore the ways in which a university press publishing originally digital scholarly work might make it easier for libraries to collect that work, and perhaps also how library-based digital scholarship might be published, in whole or in part, by a digitally capable University Press.   Finally, on a purely institutional level, the coincidence of these two projects at the University of Virginia promises to strengthen relationships between the library and the press, encouraging both to develop the facilities, policies, and products that born-digital scholarship requires.

 

 

Expenditures:

 

Appendix C details the interest income and expenditures for the SDS project in calendar year 2000.  What follows is a brief explanation of some of the items in that accounting.

 

Training:

 

·        Kirk Hastings (IATH): Sun Training (March, 2000 – Dublin, CA)

·        Chris Jessee (IATH): Flash Conference and training (March, 2000 – San Francisco, CA)

·        Thornton Staples (DLR&D): Extreme Markup Languages 2000 (August, 2000 – Montreal, Canada)

·        Rob Cordaro (DLR&D): Java Conference and training (September, 2000 – San Diego, CA)

·        Thornton Staples, Fourth European Conference on Digital Libraries, Lisbon, Portugal, September 18-20, 2000.

·        Kirk Hastings (IATH) and Perry Roland (DLR&D): XML Conference and training (December, 2000 -- Washington, DC)

·        Chris Jessee (IATH): Flash Conference and training (December, 2000 – Chicago, IL)

 

Services:

 

Direct project funding (as described in the original proposal) to Rossetti, Salisbury, Pompeii.  None of these projects have spent to the projected levels, and Pompeii has spent very little, if any, of the available funding, but this reflects, in part, the fact that we didn’t select these projects until part of the year had passed.

 

Consultants:

 

David Smith (Perseus), re: London project, distributed digital library collections (September 20, 2000)

 

Margaret Hedstrom (University of Michigan), re: Digital Preservation (November 3, 2000)

 


Equipment:

 

1 Powerbook Laptop for independent scholar (Katherine Rinne) working on SGML catalogue of objects mapped in an IATH project on the waters of the city of Rome (discussed in the original proposal, likely to be included in SDS work if this catalogue can be completed).

 

 

Conclusions:

 

Overall, we have made more progress on the technical front than on the policy front, but this may simply be an indication that the formulation of policy requires some concrete technical implementations to consider, and therefore policy follows a bit behind the building of prototypes.  The technical work that’s been done so far in this project is extremely encouraging, though, because it shows that it is possible to collect, catalogue, and deliver complex scholarly publications in a software- and hardware-neutral system that abstracts and generalizes the naming of component parts, the specification of their relations to one another, and the functionality of the publication as a whole, even when that publication is originally digital and involves structured data and complex multimedia elements.  We have many challenges ahead, but we think our high-end experiment with replicating the Salisbury Project publication within FEDORA has produced remarkable results, and we are also pleased with the progress to date on the low-end, automated tool that Rob Cordaro is developing, and Steve Ramsay’s Granby Suite. 

 

We are also keen on pursuing the partnership with Sun Microsystems—their wholehearted endorsement of open standards (XML and Java, for example) make them ideal corporate partners in an effort such as SDS, and the Sun Center of Excellence program, with its emphasis on three-way partnerships among academic research centers, Sun, and Sun software partners make it probable that production-quality commercial software will be developed to support and maintain digital library collections based on SDS principles and precepts.  Of course, the Sun CoE partnerships involve no inhibition of our ability to distribute importation utilities, search engines, or other software that we develop on our own in the course of the SDS project. 

 

Going forward, we expect to see a good deal more activity on the policy front in the coming year (including potential publication of policy guidelines with the Council on Library and Information Resources), more and more intensive interaction with outside experts and consultants, and, on the technical front, an attack on some new problems of scale, data diversity, and integration with production environments. 

 


Appendix A:   Papers delivered or proposed on matters relevant to SDS

 

Daniel Pitti, "Thematic Research Collections: A New Humanities Genre?" University College Dublin, October 1999

 

Thornton Staples (DLR&D): “Supporting Digital Scholarship in the Digital Library,” Pacific Neighborhood Consortium Conference (January, 2000 – Berkeley, CA)

 

Daniel Pitti, "Thematic Research Collections: A New Humanities Genre?" University of Western Australia, March 2000.

 

Thornton Staples (DLR&D), John Unsworth and Ken Price (IATH): "Supporting Digital Scholarship," panel discussion at the 2000 annual joint conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, Glasgow, Scotland, July 25, 2000. Chris Jessee (IATH), ALLC/ACH Conference.

 

Steve Ramsay and Kirk Hastings (IATH): “The Granby Suite: Building XML Search and Delivery Architectures,” delivered at Extreme Markup Languages 2000 (August, 2000 – Montreal, Canada)

 

John Unsworth (IATH), "Second-Generation Digital Resources in the Humanities," opening plenary address at Digital Resources in the Humanities 2000, Sheffield, England, Sept. 10, 2000.

 

John Unsworth, “Supporting Digital Scholarship," delivered as part of "New Models of Electronic Publication/Dissemination" at the Building Blocks Workshop of the National Initiative for Networked Cultural Heritage, Washington, D.C., September 20, 2000.

 

Daniel Pitti,  "Electronic Publishing and Archiving." Universitaet Trier, December, 2000.

 

John Unsworth, “Thematic Research Collections: An Emerging Genre of Scholarly Publication,” delivered as part of “The Fate of the Scholarly Monograph,” Annual Convention of the  Modern Language Association (December 28, 2000 – Washington DC)

 

Sandy Kerbel and Joan Ruelle (Policy Committee): “Developing Collection Development Guidelines for Digital Content”, ACRL Conference (March, 2001 – Denver, CO)

 

Rob Cordaro, Kirk Hastings, Steve Ramsay, Thornton Staples, "Progress of the Supporting Digital Scholarship Project," a panel proposed to the 2001 joint annual conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, June 2001, NY, NY.

 


Appendix B:   Bibliographies

 

Readings on FEDORA Compiled by Ross Wayland, DLR&D

 

Fedora Home Page

This is the home page for the FEDORA project. It provides a brief overview of the FEDORA architecture and provides links to other papers written by Carl Lagoze and Sandy Payette that relate to FEDORA and their research.

http://www.cs.cornell.edu/cdlrg/fedora.html

 

Fedora DLIB paper

This is a recent paper published in D-LIB magazine that discusses the interoperability experiments between Cornell and CNRI using FEDORA but also provides an overview of the FEDORA architecture. It is less technical than some of the other research papers and easy reading.

http://www.dlib.org/dlib/may99/payette/05payette.html

 

Flexible and Extensible Digital Object and Repository Architecture(FEDORA) paper

This is the primary research paper by Lagoze/Payette that describes in detail the FEDORA architecture. http://www2.cs.cornell.edu/payette/papers/ECDL98/FEDORA.html

 

Infrastructure for Open-Architecture Digital Libraries (Dienst implementation manual)

A protocol and server that provides distributed document libraries over the World Wide Web. Dienst is based on a document model that incorporates unique document names, multiple document formats, and multiple document decompositions. Interoperability among Dienst servers provides the user with a single logical document collection, even though the actual collection is distributed across multiple servers. The Dienst protocol uses HTTP (the protocol of the World Wide Web) as a transport layer, making Dienst servers accessible from any WWW client. Dienst is currently used as the infrastructure for a distributed computer science technical report library by a number of U.S. universities.  Although this paper is not about FEDORA, both Dienst and FEDORA were based on similar concepts and have similar goals.

http://ncstrl.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR98-1690

 

Kahn/Wilensky paper

If you want to start at the beginning, the Kahn/Wilensky paper is the root of most later work involving Dienst and FEDORA. The paper is very technical and abstract and not an easy read.

http://www.cnri.reston.va.us/cstr/arch/k-w.html

 

“Virginia Dons FEDORA: A Prototype for a Digital Object Repository.”

This article in the July 2000 issue of D-Lib magazine describes the University of Virginia’s implementation of the FEDORA architecture.

http://www.dlib.org/dlib/july00/staples/07staples.html


Readings on Archive and Library Collecting: Complied by Daniel Pitti, IATH

 

General:

 

Collection Level Description: A review of existing practice. Edited by Andy Powell. Bath: UKOLN, 1999.

This study reviews existing practice for collection level description, as it exists in the library, archival, museum and Internet communities. It begins by providing a discussion on what the term 'collection' means, firstly from the perspective of libraries, museums and archives and then taking a look at the more recent meaning of the term as it is used on the World Wide Web. Finally, a detailed look is taken at some of the existing schemes that are used for collection and service description.

 

Digital Library SunSITE Collection and Preservation Policy. Berkeley: University of California, Berkeley, Library, 1996.

 

Enduring Paradigm, New Opportunities: The Value of the Archival Perspective in the Digital Environment. Anne J. Gilliland-Swetland. Washington D.C.: CLIR, 2000.

This report examines how the archival perspective can be useful in addressing problems faced by those who design, manage, disseminate, and preserve digital information.

 

 

Preservation:

 

Digital preservation: a time bomb for Digital Libraries. Margaret Hedstrom.

 

Preserving Digital Information: Final Report and Recommendations. Mountain View, California: RLG and CLIR, 1996.

 

Authenticity in a Digital Environment. Charles T. Cullen, Peter B. Hirtle, David Levy, Clifford A. Lynch, and Jeff Rothenberg. Washington D.C.: CLIR, 2000.

 

Digital Preservation Needs and Requirements in RLG Member Institutions. By Margaret Hedstrom and Sheon Montgomery.  Mountain View, California: RLG, 1998.

 

Digital Preservation: Matching Problems, Requirements and Solutions. Margaret Hedstrom.Washington, DCNSF Workshop on Data Archival and Information Preservation , 1999.

 

Reality and Chimeras in the Preservation of Electronic Records. David Bearman. Washington, DC: D-Lib Magazine, April 1999.

 


Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Rothenberg, Jeff Rothenberg, Jeff. Washington D.C.: CLIR, 1998.

The report follows up Dr. Rothenberg's 1995 article in Scientific American, "Ensuring the Longevity of Digital Documents" by elaborating the author's proposal for emulating obsolete software/hardware systems on future, unknown systems, as a means of preserving digital information far into the future. The report, and the research agenda it proposes, will be of interest to managers of digital information resources in libraries and archives, computer scientists, and to all those concerned about the preservation of intellectual resources and records in all formats, including government records, medical records, corporate data, and environmental and scientific data.

 

Best Practices for Digital Archiving: An Information Life Cycle Approach. Gail M. Hodge. Washington, DC: D-Lib Magazine, January 2000.

 

Long-term Preservation of Electronic Publications: The NEDLIB project. Titia van der Werf-Davelaar. Washington, DC: D-Lib Magazine, September 1999.

 

Canonicalization: A Fundamental Tool to Facilitate Preservation and Management of Digital Information. Clifford Lynch. Washington, DC: D-Lib Magazine, September 1999.

 

Preservation Management of Digital Materials. Neil Beagrie and Maggie Jones. UK: JISC, 2000.

 

Description and Access

 

Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. Gail Hodge. Washington D.C.: CLIR, 2000.

This report examines the use of knowledge organization systems—schemes for organizing information and facilitating knowledge management in a digital environment.

 

Architecture

 

Flexible and Extensible Digital Object and Repository Architecture (FEDORA). Sandra Payette and Carl Lagoze. Ithaca: Cornell University, 1998.

 

Collection Based Persistent Archives. Reagan Moore. Washington, DCNSF Workshop on Data Archival and Information Preservation , 1999.

 

Reference Model for an Open Archival Information System (OAIS). Washington, DCCCSDS Secretariat, Program Integration Division (Code MG), National Aeronautics and Space Administration, 1999.

 

 


Metadata

 

The Making of America II Testbed Project: A Digital Library Service Model. Bernard J. Hurley, John Price-Wilkin, Merrilee Proffitt, Howard Besser. Washington D.C.: CLIR, 1999.

The work of the Making of America II Testbed Project reported in this paper represents a singular effort in digital library development to find ways to provide access to and navigate a variety of materials. In this endeavor, a digital library service model has been defined that encapsulates the interaction of digital objects (including their metadata), tools, and services based on principles of object-oriented design. In developing the digital library service model, project participants did extensive work to identify and define the structural and administrative (often referred to as technical) metadata elements that are crucial in the development of the digital library services and tools.

 

Metadata for digital preservation: an update. Michael Day. Bath: UKOLN, 1999.

 

Issues and approaches to preservation metadata. Michael Day. Bath: UKOLN, 1999.

 

Extending metadata for digital preservation. Michael Day. Bath: UKOLN, 1997.


 

Appendix C:   Income and Expenditures

 

SDS Budget

1/00 – 12/02

Revenue

Salaries

Consulting

Travel

Training

Research

Balance

 

 

 

 

 

 

 

 

Grant Budget

 

631,000.00

30,000.00

75,000.00

39,000.00

225,000.00

 

 

 

 

 

 

 

 

 

C Jessee (Feb)

 

-3,786.92

 

 

 

 

 

FB

 

-1,268.62

 

 

 

 

 

Thomas Tobin (Rossetti)

 

 

 

 

 

-300.00

 

C. Jessee (March)

 

-3,786.92

 

 

 

 

 

FB

 

-1,268.62

 

 

 

 

 

Chris Jess (Rinne)

 

 

 

 

 

-191.64

 

C. Jeffee (April)

 

-3,786.92

 

 

 

 

 

K Hastings (April)

 

-3,124.99

 

 

 

 

 

FB

 

-2,315.49

 

 

 

 

 

Revenue 4/25/00

13,290.23

 

 

 

 

 

 

C. Jessee (May)

 

-3,786.92

 

 

 

 

 

K. Hastings (May)

 

-4,166.67

 

 

 

 

 

FB

 

-2,664.45

 

 

 

 

 

Cavalier Computers(Rinne)

 

 

 

 

 

-3,378.00

 

ALLC-ACH Conf Fees

 

 

 

-1,510.59

 

 

 

Covin Travel (K. Price)

 

 

 

-1,274.88

 

 

 

Catering

 

 

 

 

-51.25

 

 

Catering

 

 

 

 

-493.90

 

 

C. Jessee (June)

 

-3,786.92

 

 

 

 

 

K. Hastings (June)

 

-4,166.67

 

 

 

 

 

C. Jessee (July)

 

-3,786.92

 

 

 

 

 

K. Hastings (July)

 

-4,166.67

 

 

 

 

 

FB

 

-5,010.76

 

 

 

 

 

Revenue 7/27

13,832.33

 

 

 

 

 

 

ALLC-ACH airfare

 

 

 

-964.90

 

 

 

Rotunda for meeting

 

 

-30.00

 

 

 

 

 

 

 

 

 

 

 

 

Subtotal:

27,122.56

580,125.54

29,970.00

71,249.63

38,454.85

221,130.36

968,052.94

 

 

 

 

 

 

 

 

R Cordaro (Aug)

 

-2,404.12

 

 

 

 

 

K. Hastings (Aug)

 

-4,166.67

 

 

 

 

 

C. Jessee (Aug)

 

-3,786.92

 

 

 

 

 

FB

 

-3,262.68

 

 

 

 

 

 

 

 

 

 

 

 

 

Subtotal:

27,122.56

566,505.15

29,970.00

71,249.63

38,454.85

221,130.36

954,432.55

 

 

 

 

 

 

 

 


 

R. Cordaro (Sept.)

 

-4,808.25

 

 

 

 

 

K. Hastings (Sept.)

 

-4,166.67

 

 

 

 

 

C. Jessee (Sept)

 

-3,786.92

 

 

 

 

 

W. Hughes (Rossetti)

 

 

 

 

 

-42.50

 

FB

 

-4,019.98

 

 

 

 

 

Market Street Wine

 

 

-142.68

 

 

 

 

Travel agent fee

 

 

 

-15.00

 

 

 

 

 

 

 

 

 

 

 

Subtotal:

27,122.56

549,723.33

29,827.32

71,234.63

38,454.85

221,087.86

937,450.55

 

 

 

 

 

 

 

 

R. Cordaro (Oct)

 

-4,808.25

 

 

 

 

 

K. Hastings (Oct)

 

-4,166.67

 

 

 

 

 

C. Jessee (Oct)

 

-3,786.92

 

 

 

 

 

Student wage (Oct.)Rossetti

 

 

 

 

 

-838.75

 

FB

 

-4,019.98

 

 

 

 

 

Thornton Staples

 

 

 

-3,090.75

 

 

 

R. Cordara

 

 

 

 

 

 

 

John Unsworth (D. Smith)

 

 

-35.44

 

 

 

 

Red Roof Inn (D. Smith)

 

 

-131.38

 

 

 

 

Revenue 10/00

14,431.95

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Subtotal:

41,554.51

532,941.51

29,660.50

68,143.88

38,454.85

220,249.11

931,004.36

 

 

 

 

 

 

 

 

R. Cordaro

 

 

 

 

-599.00

 

 

M. Hedstrom

 

 

-1,000.00

 

 

 

 

D. Pitti

 

 

-40.49

 

 

 

 

D. Pitti

 

 

-209.63

 

 

 

 

 

 

 

 

 

 

 

 

Subtotal:

41,554.51

532,941.51

28,410.38

68,143.88

37,855.85

220,249.11

929,155.24

 

 

 

 

 

 

 

 

Sun (Cordaro/JAVA)

 

 

 

 

-1,995.00

 

 

Perry Roland (XML)

 

 

 

 

-759.76

 

 

C. Jessee (Flash)

 

 

 

 

-592.82

 

 

 

 

 

 

 

 

 

 

Totals:

41,554.51

532,941.51

28,410.38

68,143.88

34,508.27

220,249.11

925,807.66