The point of departure is the electronic edition of one of Joyce's notebooks for Finnegans Wake. The 'Guiltless notebook' (47471b, so called because the first word is 'Guiltless') is about 90 pages long and contains very early draft versions of chapters 2, 3, 4, 5, 7 and 8 of Book I. Over a number of years the James Joyce Center at the University of Antwerp (UIA) has made a full transcription of this notebook using TEI-conformant XML/SGML. In a first phase this highly complex collection of textual units was approached from a teleological standpoint: the transcriptions were made on the basis of the narrative structure of the novel as it was first published years later, and encoded in order to leave open the possibility of other approaches (chronological or document-oriented) in a second phase.[1]
A first point of focus in this paper will be the transcription itself. Seamus Deane called Finnegans Wake "in an important sense, unreadable" (Deane 1992, vii) and the same applies to Joyce's notebooks. Joyce intended to write only on the recto pages, but almost immediately started using the verso pages for additions too lengthy to write in between the lines or in the margins, and soon thereafter scribbled whole additional paragraphs on the verso's, all linked to each other by means of a confusing and unsystematic complex of lines and sigla. Because of the complexity of this manuscript, we've had to stretch the available tags defined in the full XML-ized TEI Document Type Definition up to and beyond the limits and decide quite randomly when to store information in elements, attributes or as data content.[2] Specifically for encoding additions and deletions on a manuscript page, TEI offers insufficient distinctive attributes for the editor to include the information about the type of addition/deletion he should be able to provide. These limitations imposed by the TEI DTDs endanger the third "overarching goal" C. M. Sperberg-McQueen proposes for 'serious electronic editions': "accessibility, longevity, and intellectual integrity". Sperberg-McQueen correctly argues that accessibility and longevity are secured by the TEI encoding scheme, but has to admit that: "The intellectual integrity of materials encoded with the TEI encoding scheme is harder to guarantee." (Sperberg-McQueen 1994).
The following sample code demonstrates how hard it is to encode a clear distinction between additions Joyce made immediately (and wrote inline) or added later while rereading (and most of the time wrote above or below the line):
He had left the country <del type="O" rend="overstrike" resp="DVH">by </del>
<add place="inline" hand="JJ" resp="DVH"><hi rend="italic">via</hi></add>
a subterranean tunnel<add place="facingleaf, 47471b-27v" rend="$I" hand="JJ" resp="DVH">
<del type="S" rend="overstrike" resp="DVH"> lined </del>
<xref doc="shored"><add place="supralinear, 47471b-27v" hand="JJ" resp="DVH">shored </add>
with bedboards</xref>.
<add place="marginleft, 47471b-27v" hand="JJ" resp="DVH">An infamous private ailment
(vario<add place="supralinear, 47471b-27v" hand="JJ" resp="DVH">lo</add>venereal)
had claimed him.</add></add>
A second point of focus will be the usability of this transcription: can this XML archive be used to automate or generate all the different visualisations of the manuscript material textual critics have come to expect in an electronic edition? Or is our encoding so dependent on the way we structured the material that it can no longer be broken down and restructured automatically? With reasonable success, I've been using XSL Transformations to this end. I've transformed the teleologically structured files (section per section) to manuscript-orientated files (page per page) through three consecutively run XSLTs. I've generated unique id's in certain tags, generated new elements from attribute values and automatically linked them to corresponding anchors in the edition.
XSLT is a very powerful tool. It has however been developed for and by people using it to extract information from databases. In the humanities we deal with texts. In a database all information is stored mainly in elements, there is no advantage in 'locking away' information as an attribute value. Encoded literary texts on the other hand are linear and have to remain legible at all times, so any information the editor wishes to add, ends up in attributes. This results in very 'heavy' tags limiting the possibilities of XSLT a great deal. But if you take some specific factors into account while encoding, your archive can still benefit from the unmistakable power of XSLT:
- Predefine all attribute values or make an inventory of all values used, because in XSLT you can only match exact strings, no regular expressions![3]
- Make tags context-independent so they don't loose all meaning and usability when you extract them from their context or restructure the data using XSLT.
eg. an addition encoded simply as <add place="facingleaf"> when it's on the facingleaf of page 4 is dependent for its absolute location in the manuscript on the tag <pb n="4"/> which precedes it. If you extract all 'facingleaf'-additions from their context using XSLT, the acquired data is unusable.
- Try not to make the divisions in your document dependent on empty tags like <pb />. XSLT only copies from starting tag to closing tag, and the latter is of course missing in empty tags. For instance, XSLT cannot extract <pb n="12" /> up to <pb n="24" /> in a 100 page document.