B33080 Humanities Computing: Electronic Text

University of Antwerp, Campus Drie Eiken

Second Term 2004

Edward Vanhoutte

edward.vanhoutte@kantl.be

TOC | First

B33080 Humanities Computing: Electronic Text

Week 7: XML theory and practice: DALF.

University of Antwerp, Campus Drie Eiken

Monday 29 March

Edward Vanhoutte

TOC | First


I. Monday 29 March Overview

Monday 29 March: Overview

previous table of contents next
1 of 2 [31]
  1. Revision of week 6
  2. DALF
    • Introduction
    • Structure
    • Hands-on
    • Group Project

Goals of this lecture

previous table of contents next
2 of 2 [31]
After this lecture, you should be able to
  • use and understand DALF
  • create DALF documents
  • parse DALF documents for validation
  • start the group project

II. Monday 29 March Revision of week 6.

Revision of week 6

previous table of contents next
1 of 6 [31]
  1. Revision of week 5
  2. TeixLite
    • <teiHeader>
    • <text>

TEILite

previous table of contents next
2 of 6 [31]
  • one of many possible views of the TEI DTD
  • small and simple
  • 20% of the tags, 80% of the projects
  • 121 elements
  • was devised as a didactic stepstone to the full flavour TEI, but began a life of its own
  • realistic for existing texts and for document production
TeixLite: the XML compatible version of TEILite

TeixLite: start

previous table of contents next
3 of 6 [31]
A TeixLite document is an XML document which refers to a DTD, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>
→ root element = <TEI.2>, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>


<TEI.2>
...
</TEI.2>

A TEI-conformant document

previous table of contents next
4 of 6 [31]
comprises a header followed by a text
<TEI.2>
 <teiHeader>...<teiHeader>
 <text>...<text>
<TEI.2>

<teiHeader>

previous table of contents next
5 of 6 [31]
The full form of a TEI Header is thus:
   <teiHeader>
      <fileDesc> ... </fileDesc>
      <encodingDesc> ... </encodingDesc>
      <profileDesc> ... </profileDesc>
      <revisionDesc> ... </revisionDesc>
   </teiHeader>
 
While a minimal header takes the form:
   <teiHeader>
      <fileDesc> ... </fileDesc>
   </teiHeader>

Structure of a TEI Document

previous table of contents next
6 of 6 [31]

III. Monday 29 March DALF - Introduction

DALF - Aims and Purposes

previous table of contents next
1 of 6 [31]
DALF: Digital Archive of Letters in Flanders
  • Construction of a growing textbase consisting of digitized and annotated correspondence materials of 19th and 20th century Flemish authors and composers
  • Long-term project at CTB, functioning as umbrella for other correspondence editions
  • Development of a general methodology for electronic correspondence editions
  • Stimulation of new electronic edition projects, as well as the international debate on electronic editions of correspondence material

DALF - International orientation

previous table of contents next
2 of 6 [31]
  • Joining in with open standards for text representation
    • XML (eXtensible Markup Language) and related standards for presentation
    • TEI (Text Encoding Initiative)
  • Internationally, DALF is the first project developing a firm (and well-documented) theoretical framework for electronic correspondence editions
  • Presentations on diverse international symposia have shown broad international interest (Netherlands, Italy, USA, United Kingdom, South Africa)
  • English reference documentation

DALF - Textbase

previous table of contents next
3 of 6 [31]
INPUT
  • Electronic correspondence editions at the CTB
  • Theses and dissertations
  • ... all correspondence editions
→ XML
OUTPUT
  • Traditional paper editions
  • Electronic:
    • Correspondence editions on CD-Rom (off-line)
    • DALF textbase on internet (on-line)

DALF - Interface

previous table of contents next
4 of 6 [31]

DALF - Application areas

previous table of contents next
5 of 6 [31]
  • DALF is not an electronic edition itself, but a textbase that permits the generation of user-defined custom editions, along criteria of interest (eg. thematic, chronological, per author,...)
  • Very divergent research applications:
    • Literary criticism
    • Linguistic research (diachronic, synchronic)
    • Historical research
    • Sociolinguistic research
    • ...

DALF - State of affairs

previous table of contents next
6 of 6 [31]
  • Development of DTD (Document Type Definition), defining the elements that can be encoded in DALF-letters (eg. postscripts, envelopes,...)
  • Development of reference documentation of the DALF DTD for encoders
  • Development of DALF website (http://www.kantl.be/ctb/project/dalf/)
  • Development of software tools (Notetab library)
  • Application of DALF
    • Correspondence Stijn Streuvels with his Dutch-speaking publishers (1.748 letters)
    • Correspondence Stijn Streuvels with his German-speaking publishers
    • Correspondence Karel Van den Woestijne with Emmanuel de Bom
    • Correspondence Julius J.B. Schrey (1893-1894)
    • Correspondence Lynne Bryer and Daphne Rooke
    • Puccini Letters

IV. Monday 29 March DALF - Guidelines

DALF - Guidelines and Website

previous table of contents next
1 of 17 [31]
DALF Website:
→ http://www.kantl.be/ctb/project/dalf/index.htm
DALF guidelines for the description and encoding of modern correspondence material, version 1.0:
→http://www.kantl.be/ctb/project/dalf/dalfdoc/

DALF - TEI

previous table of contents next
2 of 17 [31]

DALF - TEI

previous table of contents next
3 of 17 [31]
TEI
  • 'mixed base' tag set, with prose and drama bases
  • elements from additional tag sets Linking, figures, Analysis, transcr, textcrit and names.dates
  • entity sets ISOlat1, ISOlat2, ISOnum and ISOpub
  • 8 modified TEI elements; 213 TEI elements
DALF
  • 60 unique elements
→ DALF is a TEI customization

DALF Header

previous table of contents next
4 of 17 [31]
DALF needs rich header for integration in textbase
  • bibliographic description
  • documentation of repository
  • physical description
DALF Header: general design
  • Letter-specific header elements start with "let-".
  • Strict design: several mandatory elements and strict order, to ensure consistency and facilitate integration in searchable electronic archive.
  • Flexibility: optional <note> elements are allowed after mandatory contents.

<letDesc>

previous table of contents next
5 of 17 [31]
The distinguishing feature of a header for DALF letters is the mandatory presence of a <letDesc> element in <sourceDesc>
  • <letIdentifier>: Contains information concerning the identification of the letter within its holding institution. (mandatory)
  • <letHeading>: Contains a structured description of bibliographical information of a letter. (mandatory)
  • <physDesc>: Contains a description of the physical appearance of the letter. (mandatory)
  • <envOcc />: Contains an indication of the presence or absence of an envelope. (mandatory)
  • <letContents>: Contains a description of the intellectual contents of the letter. (optional)
  • <history>: Contains a description of the history of the letter. (optional)
  • <additional>: Groups additional information about the letter. (optional)
  • <letPart>: Contains metadata about distinct parts of a letter. (optional)
  • <note>: Contains additional information about the letter that is not covered by any other of the previous elements. (optional)

<letIdentifier>

previous table of contents next
6 of 17 [31]
containing a mandatory hierarchic location path, from macro- to micro-level (country, region, settlement, institution, collection, identification within collection,...)

<letIdentifier>
   <country>Belgium</country>
   <settlement>Antwerp</settlement>
   <repository>AMVC</repository>
   <idno>S 935 / 62295</idno>
</letIdentifier>

<letHeading>

previous table of contents next
7 of 17 [31]
containing mandatory identifications of author, receiver, place and date of writing (and opportunity to mark the status of these data)

<letHeading>
   <author attested="yes">Stijn Streuvels</author>
   <addressee attested="yes">Maurice De Meyer</addressee>
   <placeLet attested="no">Ingooigem</placeLet>
   <dateLet attested="yes">1945-01-13</dateLet>
</letHeading>


<physDesc>

previous table of contents next
8 of 17 [31]
containing mandatory characterisation of the document, description of size and materials; possibility to describe layout, condition of the document and possible illustrations, paraphernalia and music notations

<physDesc>
   <type>letter</type>
   <support>single page with pre-printed letterhead, with writing (black ink) on 	one side only
   </support>
   <extent>
	<dimensions>
	   <height units="mm">214</height>
	   <width units="mm">276</height>
	</dimensions>
   </extent>
</physDesc>


<envOcc />

previous table of contents next
9 of 17 [31]
mandating a choice of attribute value "yes" or "no"

<envOcc occ="no" />

<letContents>

previous table of contents next
10 of 17 [31]
optional description of the contents

<letContents>
   <class>[businesslike letter]</class>
   <p>Streuvels makes an agreement with De Meyer on an
	order of a book</p>
</letContents>


<text>

previous table of contents next
11 of 17 [31]
New text elements:
  • <envelope>
  • <ps>
  • <calc>
  • <print>

<envelope>

previous table of contents next
12 of 17 [31]
  • structural: functionally separate from body of the letter; itself containing typical structures like address data, postmark, random text,...
  • semantic: containing data for communicative contextualisation; may contain further contents related to that of the letter / autonomous contents

<envelope>
   <envPart type="front">
	<div>
		<deco/>
	</div>
	<address type="addressee">
		<addrLine>De Heer <name>Styn Streuvels</name></addrLine>
 		<addrLine>"Lijsternest"</addrLine>
		<addrLine><hi rend="underlined">INGOYGHEM</hi></addrLine>
	</address>
	<postmark>
		<date value="1924-01-04">4.I.1924</date>
		<placeName><place>ANTWERPEN</place></placeName>
	</postmark>
</envelope>

<envelope>

previous table of contents next
13 of 17 [31]

<ps>

previous table of contents next
14 of 17 [31]
  • structural: occurring after the closing formulae and salutation
  • semantic: form a last addition to the contents of the letter. Moreover, the author often explicitly signals this additional status with the abbreviation 'P.S.'


<closer>
   <salute>Met vriendelijken groet</salute>
   <signed>(Styn Streuvels)</signed>
   <ps>
      <p id="xr2">
         <add id="add1"><abbr expan="postscriptum">P.S.</abbr>
         Ze jubileeren bij de firma Veen (60 jaar bestaan)
         <ref target="n8">8</ref> en er wordt me daarom gevraagd,
         door het comit&eacute;: hoeveel geld ik daarvoor als
         feestgave wensch te geven! Zonderlinge zeden?
         Als ik nu eens vroeg: hoeveel ze voor mij beschikken
         als 75-jarige jubilaris!</add>
      </p>
   </ps>
</closer>

<calc>

previous table of contents next
15 of 17 [31]
  • structural: calculations are often set apart formally from running text; marking them with explicit encoding features provides researchers with greater control over the textual features they want to study.
  • semantic: different structural / semantic units can be distinguished: arguments, operators and results.


<calc>
  <arg>969 <abbr expan="exemplaren">ex.</abbr> (zie afrekening van 30.8.4I)</arg>
  <oper>-</oper>
  <arg>I38<abbr expan="exemplaren">ex</abbr>
  (<arg>I33 <abbr expan="exemplaren">ex.</abbr> verkocht</arg>
  <oper>+</oper> <arg>5 <abbr expan="persexemplaren">persex.</abbr></arg>)
  </arg>
  <result><hi rend="double underlined">83I</hi><abbr expan="exemplaren"> ex.</abbr>
  </result>
</calc>


<print>

previous table of contents next
16 of 17 [31]
  • structural: sometimes pre-/post-printed fragments in letters (not part of main writing act)
  • semantic: may need to be distinguished from more "authorial" parts of the letter, as they mostly have an impersonal character

<print type="letterhead">FRANK&middot;LATEUR</print>

previous table of contents next
17 of 17 [31]