Interfacing the Rossetti Hypermedia Archive



Bethany Nowviskie,
Design Editor, Rossetti Archive

[This is the unedited text of a talk
given at the 2000 conference of the
Humanities and Technology Association.]

The English Department grad students who tagged the Rossetti Archive had a long-standing joke. The Archive was neither hardware nor software, but something much better suited to us as humanists -- to our seven-types-of-ambiguity, how-does-this-poem-make-you-feel mentalities. The Rossetti Archive was vaporware.

And the Archive did fit the classic profile for vaporware. Like Windows 2000 and Diablo II, it was much-hyped and long in coming. For many years you could read twenty words of promise, description, praise, and criticism for every one word of Rossetti's verse the project had made available to the public. And even after the first four or five years, when we had devised an architecture for the whole and created a substantial body of SGML-tagged text ready to ship, publishing problems delayed release of the Archive again and again.

But we meant something slightly different -- and much more visceral -- by the term "vaporware." It seemed to capture perfectly the sensation of tagging the Archive. We were diligently weaving the void. There was nothing you could touch in our day-to-day work, and very little to see. The Rossetti Archive was created almost entirely in the absence of an interface.

In retrospect, I find it suprising that we protested this situation so little. The invariable first question that new taggers asked on seeing our SGML was, "And how does this look in the real Archive -- in a browser window -- when rendered to the screen?" You'd think that proximity to our sister ship, the William Blake Archive, would have made us more conscious of what Innocence has to teach Experience -- but we cultivated a patient, vaporware culture. If we thought about the eventual look-and-feel of the Archive at all, it was in a wistful way. "Hope I live to see it." Interface would be the icing on the cake; it had nothing to do with the real work of building scholarly electronic editions.

If you all don´t mind, I´m going embrace my role as the old fogey of the Rossetti Archive
this morning and reminisce a little for you. I'll sketch a few of the problems caused by the lack of an organic, evolving interface -- an interface that grew along with our data and could give access to it more perfectly -- and tell you what made us realize that we lacked this interface so sorely. Then I'll attempt a historical or institutional explanation of how the Invisible Archive came to dominate our thinking and finally suggest a perhaps not-so-obvious lesson in all this for computing humanists.

I'd keep you here all day if I tried to tell the whole tale of interface and the Rossetti Archive. The nature of our work at IATH is the kind of bristly, corrosive trial-and-error that is always etching inward from one supposed ideal form to the purer metal of its core. It's nonstop. So I'm just going to bring a few of our acid-process moments to your attention -- points where we looked at what we had always had in front of us and realized the hot imperative to make it new.

You see, we're still finding points at which our DTDs are inadequate in terms of interface. Hardly a month goes by now -- and, if you'd asked me during the time last winter when I was writing the stylesheets that finally fleshed out the Archive, I'd have said hardly a day goes by -- that we don't run into some longstanding tagging practice that makes our data hard to interface.

Let me explain what I mean. And I think to understand how this kind of thing is even possible you need to understand that from the outset, our attitude toward programming and interface was shaped in part by our relationship with a separate University Press -- hundreds of miles away from our shop here in Charlottesville -- that was supposed to publish the Archive. I'll talk more about that in a minute, but I'm about to tell you something embarrassing and I want you to at least be able to comprehend how we could make such a truly dense mistake.

I'm talking about one little entity attribute that we neglected to add to our DTDs, and therefore to each and every one of the thousands of files in the Archive. This attribute eventually came to be placed on the base element of all our SGML files and, to differentiate it from the "type" attribute we'd already included, we called it the "metatype." Once I explain what a "type" is, you'll probably immediately see what we, with our noses pressed so close to our UNIX editors, could not. A type is a kind of object, sometimes a genre. For SGML files representing poems, we had types like "sonnet" and "lyric." For prose files, we had "essay," "short story," -- things like that. We had types for images as well: "drawing," "painting," and so forth. Now when we sent our data to our publisher, we did so with a blithe request that the basic interface to the Archive be a gateway page through which the user could access broad groupings of Rossetti's work -- indeed of his work and all the contextual material that we'd assembled. Basically, we were saying, "Put all the poems together, all the prose. Group all visual art. Make a link for items NOT by Rossetti. Oh, and because of Rossetti's fascinating tendency to produce a painting to go along with his poems, or compose a poem for a picture, please group these special objects together as well." It's almost incomprehensible that we didn't realize we hadn't provided a programmer enough information to do all that. Especially if you could see the great complexity of our markup and the obvious critical power brought to bear in the design of our DTDs. A programmer could get a little way toward our most basic request with the "type" attributes, if we specified for him that types X and Y really fit under unspecified metatype Z. But he couldn't go all the way, because we hadn't included specialized types for the non-Rossetti material, for example -- and because, in the case of the doubleworks, those poem/picture combinations, we had given each component -- the SGML file for the text and the one for the image -- an individual but not a collective "type." So metatypes were born -- late -- and we were left wondering how on earth we had neglected the most fundamental piece of architectural information in the Archive.

My temptation is to dig right in and start making excuses and explanations, but I'm going to hold off until I've presented a couple of different cases to you. And maybe do my apologizing along the way. I think of our metatype problem -- the discovery of which, as you can imagine, led to several feverish days of making the kind of judgements about documents and concepts that only human beings well versed in our specialized subject matter could make, and inserting most of these attributes by hand. But I think of our metatype problem as an early moment of revelation in which interface collided with ignorance. We had some instances -- also in that very fruitful summer a few years ago -- in which we did begin to cast our imaginations toward the eventual form all our parsed and validated data would take. Those, I think of as points at which interface collided with intellect, and that's what I'm going to talk about next.

One of these collisions happened at the corner of Content and Rendering. It took several forms, but one I want to talk about concerns quotations. The Rossetti Archive contains as marked-up document (or RAD) files several volumes of criticism and supplementary material. Things not written by Rossetti, but rather about him -- and naturally these documents quote DGR frequently. Our early tagging practice was simple. Any quoted passage within another text would be surrounded by <quote> tags. <quote>Now I'm quoting.</quote> And it took us about a year to see any problem with that method. (It had taken us three years to even see the point in quote tags period.)

You see, if you're not thinking about interface -- about how the document you're tagging will eventually appear to a user -- then it's very hard to see a problem here at all. The issue is this. We had conflated the content-message of those <quote> tags with the rendering-message of quotation marks. When we saw a quotation on the page of the book we were tagging from, we generally recognized as such in two ways at once. Its message -- the content -- what that passage of text said -- identified it as a quotation. Suddenly another voice speaks. Clear enough. But there was usually also a visual cue -- a typographical one. The quoted text would be surrounded by quotation marks. When we tagged, because typography (or signifier, or rendering) and text (or signified, or content) seemed so naturally inextricable, we did not realize that one type of tag for quotations was inadequate. And it wouldn't be inadequate at all, if human beings weren't the producers of text. That is, if EVERY quotation we would encounter in the tagging of the Rossetti Archive could be guaranteed to be surrounded by quotation marks, we'd have no problem. But the minute our taggers could identify a single case in which a string of text was conceptually a quotation but not surrounded by the punctuation that marked it as one, we'd have presented our as-yet invisible interface with a real conflict.

Because interface is unforgiving. It doesn't say "Let me help you express the ambiguity you feel in your heart about Rossetti's work" -- or about textuality in general -- the real ambiguity that our corrosive work helps us see more and more. No. Interface says "Give me clean data. Clear rules. I will take your data and render content for your users."

So thinking about interface -- about how the material we were tagging, given the rules we had laid out for ourselves in the DTD and the rules we'd eventually have to state in a stylesheet, would appear to a user of the Archive -- thinking about this led us to institute a new quotation-tagging system. <Quote> tags would be used to indicate passages of text that were conceptually quotations. Character entities for left-single-quotes or right-double-quotes would be used to indicate precisely how the quotation was rendered typographically in the source document. It's a good system, but we're still picking out the bugs in texts tagged before our moment of clarity.

Now that's just quotations. Imagine the kind of discussion caused by textual features like line-breaks in prose (sometimes even coming in the middle of a word), poetry indentations, running titles (which to date we've completely neglected to address) -- things like that. Our work continues to fascinate me because of the way practical interface-related questions lead us to re-examine and sometimes dissolve calcified concepts. And to realize that understanding interface itself is fundamental to understanding documents. Recently one of our new employees took a long look at some books that have been "finished" -- all tagged up -- for years and said to me, "Why aren't all the titles of poems marked as being centered on the page?" My kind of complacent answer was, "I noticed we neglected that and I centered them in the stylesheet when I was building the interface -- instead of here. It's easier to fix the stylesheet than all the data, and they're always centered." His answer? "Yeah, all except THIS ONE."

So that's one of those moments when we groan and steel ourselves to a tragically boring period of SGML fixes. But we all know that that "what about THIS ONE" moment is what we're waiting for. It's our chance to step back and watch the acid work away at the very concept of titles.

I'm going to tell you about one more case of interface colliding with intellect before I start really preaching at you.

As you've heard from Maura, we've got this file type called a RAW, a Rossetti Archive Work, that is meant to function as a gateway to any visual and textual documents related to its primary concept -- related to a particular artistic or literary work. It also serves as a storage bin for general information of all kinds about that work. So the RAW for the text of the Blessed Damozel, our favorite example, houses the composition and publication history of the poem, our annotations to each line (keyed to a complex scheme that accounts for discrepancies in the line numbering of multiple drafts and versions), a general introduction, dates and so forth. In theory, everything we'd want to give a researcher before he heads off in the wilderness of manuscripts, proofs, and editions. The RAW is the basic intellectual unit of the Rossetti Archive, and is therefore (perhaps surprisingly, given Jerry McGann's own editorial stance) the basic unit of the theory of editing that we're embodying as we build the Archive. It's a familiar concept to editors -- the idea that there is an identifiable WORK, distinct from the transient, changeable documents that give us access to it at any one time. And the concept of the RAW proved to be a fairly sturdy one. If any thought had been given to interface at the outset of work on the Archive, it was in the context of the RAW. The RAW would be the point of interface with all the pictures and texts, RAPs and RADs.

But early one summer we found some tagging problems in the SGML file that represented Rossetti's Collected Works, a very thick book published in 1911. Fixing up the 1911 became my project, and nobody suspected that my investigation into it was going to cause a massive restructuring of the Archive and propel me into the role of Chief Interface Harpy. But the 1911 was the book that had been used to lay out the Invisible Archive. Jerry or somebody from before my time had used the table of contents of the 1911 to create a list of all of Rossetti's known works. A list of RAWs. As I went through the document itself, our list, and the RAW files referenced by the 1911 -- that is, as I began to move from RAD to RAW just as a user of the Archive might, I began to notice redundancy that wasn't evident when you looked at our RAWs one at a time, rather than in an intuitive, research-like sequence.

We had created a RAW for every item in the 1911 that had its own listing in the table of contents. Which seems perfectly reasonable. If it's an object that can be named, then isn't it a work? Yes, intellectually. But if interface is unforgiving, it's also often unintellectual. Following our own logical system, we had given separate RAWs to each component in Rossetti's many multi-poem works. My favorite example is the Willowwood sonnet sequence. Four sonnets. Four RAWs. Logical, as you can clearly speak about Willowwood III as a separate entity from Willowwood I. But because we had structured our DTD so as to make RAWs serve as holding pens for general information, we were requiring ourselves to fill in much of the SAME general information for these sonnets FOUR TIMES. Okay, inefficient, but already over and done with. So what? Well, once you step through the screen and put yourself on the side of the user, you realize how frustrating and difficult it would be to sift through 90% of the same information four times to get to the 10% you want. And the problem was worse than that -- because we'd also made our DTD require RAWs to serve as gateways to other things. But for multi-component works like Willowwood, well, what was going to be the gateway to the components? What was going to hold Willowwood together? Our logical system of file IDs -- a number code -- which would call a manuscript or pencil sketch of the Blessed Damozel out of the void and marry it to its proper RAW was going to crash and burn here. Why? Because each component of Willowwood had been given an unrelated number. To make a long story at least a little shorter, all this led to some of the most exciting discussions I've ever had about the nature of artistic creations and how humans and machines could understand and represent them, to a lot of people scribbling on whiteboards, and a month or so of drudgery in the name of interface, as we worked to collapse these multiple RAW sets and make the pure data of the Archive more fit for human consumption.

So now I've gone perhaps overlong telling you about how interface collided for us with both ignorance and intellect. I left out inconsistency, but take my word for it that interface thumps us there regularly, too. That's the all-important "Who do you think you're tagging for?" question. Are we marking the text for a user? If so, what kind of user? What does he want? Are we marking it for a machine to manipulate? If so, are we up to the job? Or should we at the highly self-involved Rossetti Archive finally just up and admit that we're really tagging for ourselves -- for the process -- to watch things corrode?

Those are critical questions, not only for us, but for anyone working in the realm of scholarly editing. And that includes many -- perhaps most -- computing humanists.

Why did we neglect interface until it was almost too late? Partly because we needed somebody to harp on the subject, and once it hit me I was glad to oblige. Which simply means that the lifesblood of a project like this is new people, new ideas. We have a shop full of new people right now, and I think we can count on beginning to expand in ways we hadn't dreamed before. But part of the reason we neglected interface was because of an unsavory combination of condescension toward and blind faith in programmers. Our own clarity on the subject has increased, I think, precisely in step with our education in technical issues, programming and stylesheet languages -- all the things that academics are too likely to leave to others.

Last year I had the privilege of helping John Unsworth run a year-long faculty seminar on humanities computing in preparation for a proposed MA program in Digital Media here at Virginia. We had humanists and technicians from all over the world come here to talk to us about what the next generation of scholars in this field needs. The answer was clear: we need to be tool-builders and not just tool-users. And that means that we need to look closely at the very concept of interface -- how does this hammer fit your hand? how does it strike the nail? -- as we construct the world's archives and research collections.

The Rossetti interface was not allowed to grow thoughtfully and organically along with the Archive's content model and data. We thought somebody else would make it, sometime later -- after we were done with all the brainy work. I'm afraid that our interface will therefore always have something of a tacked-on quality. And I can see other projects tripping down that road. It's a very tempting route for scholars to take. You see, despite the fact that we were trained in bibliography -- the most practical, do-it-yourself humanistic discipline next to Archaeology -- and despite some of our very eloquent professions to the contrary, I don't believe we really thought we were a practical project. We were interested in theory, so much so that we almost lost sight of something we've since kept in close view: practical problems ARE intellectual problems. And the point at which theory and practice meet is the place where all things meet: the interface.

bethany@virginia.edu