B33080 Humanities Computing: Electronic Text
University of Antwerp, Campus Drie Eiken
Second Term 2004
Edward Vanhoutte
edward.vanhoutte@kantl.be

B33080 Humanities Computing: Electronic Text
Week 7: XML theory and practice: DALF.
University of Antwerp, Campus Drie Eiken
Monday 29 March
Edward Vanhoutte
II. Monday 29 March Revision of week 6.
- Revision of week 5
- TeixLite
- one of many possible views of the TEI DTD
- small and simple
- 20% of the tags, 80% of the projects
- 121 elements
- was devised as a didactic stepstone to the full flavour TEI, but began a life of its own
- realistic for existing texts and for document production
TeixLite: the XML compatible version of TEILite
A TeixLite document is an XML document which refers to a DTD, so:
<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
"teixlite.dtd"
[
]>
→ root element = <TEI.2>, so:
<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
"teixlite.dtd"
[
]>
<TEI.2>
...
</TEI.2>
A TEI-conformant document | |
comprises a header followed by a text
<TEI.2>
<teiHeader>...<teiHeader>
<text>...<text>
<TEI.2>
The full form of a TEI Header is thus:
<teiHeader>
<fileDesc> ... </fileDesc>
<encodingDesc> ... </encodingDesc>
<profileDesc> ... </profileDesc>
<revisionDesc> ... </revisionDesc>
</teiHeader>
While a minimal header takes the form:
<teiHeader>
<fileDesc> ... </fileDesc>
</teiHeader>
Structure of a TEI Document | |
IV. Monday 29 March DALF - Guidelines
DALF - Guidelines and Website | |
DALF Website:
→ http://www.kantl.be/ctb/project/dalf/index.htm
DALF guidelines for the description and encoding of modern correspondence material, version 1.0:
→http://www.kantl.be/ctb/project/dalf/dalfdoc/
TEI
- 'mixed base' tag set, with prose and drama bases
- elements from additional tag sets Linking, figures, Analysis, transcr, textcrit and names.dates
- entity sets ISOlat1, ISOlat2, ISOnum and ISOpub
- 8 modified TEI elements; 213 TEI elements
DALF
→ DALF is a TEI customization
DALF needs rich header for integration in textbase
- bibliographic description
- documentation of repository
- physical description
DALF Header: general design
- Letter-specific header elements start with "let-".
- Strict design: several mandatory elements and strict order, to ensure consistency and facilitate integration in searchable electronic archive.
- Flexibility: optional <note> elements are allowed after mandatory contents.
The distinguishing feature of a header for DALF letters is the mandatory presence of a <letDesc> element in <sourceDesc>
- <letIdentifier>:
Contains information concerning the identification of the letter within its holding institution. (mandatory)
- <letHeading>:
Contains a structured description of bibliographical information of a letter. (mandatory)
- <physDesc>:
Contains a description of the physical appearance of the letter. (mandatory)
- <envOcc />:
Contains an indication of the presence or absence of an envelope. (mandatory)
- <letContents>:
Contains a description of the intellectual contents of the letter. (optional)
- <history>:
Contains a description of the history of the letter. (optional)
- <additional>:
Groups additional information about the letter. (optional)
- <letPart>:
Contains metadata about distinct parts of a letter. (optional)
- <note>:
Contains additional information about the letter that is not covered by any other of the previous elements. (optional)
containing a mandatory hierarchic location path, from macro- to micro-level (country, region, settlement, institution, collection, identification within collection,...)
<letIdentifier>
<country>Belgium</country>
<settlement>Antwerp</settlement>
<repository>AMVC</repository>
<idno>S 935 / 62295</idno>
</letIdentifier>
containing mandatory identifications of author, receiver, place and date of writing (and opportunity to mark the status of these data)
<letHeading>
<author attested="yes">Stijn Streuvels</author>
<addressee attested="yes">Maurice De Meyer</addressee>
<placeLet attested="no">Ingooigem</placeLet>
<dateLet attested="yes">1945-01-13</dateLet>
</letHeading>
containing mandatory characterisation of the document, description of size and materials; possibility to describe layout, condition of the document and possible illustrations, paraphernalia and music notations
<physDesc>
<type>letter</type>
<support>single page with pre-printed letterhead, with writing (black ink) on one side only
</support>
<extent>
<dimensions>
<height units="mm">214</height>
<width units="mm">276</height>
</dimensions>
</extent>
</physDesc>
mandating a choice of attribute value "yes" or "no"
<envOcc occ="no" />
optional description of the contents
<letContents>
<class>[businesslike letter]</class>
<p>Streuvels makes an agreement with De Meyer on an
order of a book</p>
</letContents>
New text elements:
- <envelope>
- <ps>
- <calc>
- <print>
- structural: functionally separate from body of the letter; itself containing typical structures like address data, postmark, random text,...
- semantic: containing data for communicative contextualisation; may contain further contents related to that of the letter / autonomous contents
<envelope>
<envPart type="front">
<div>
<deco/>
</div>
<address type="addressee">
<addrLine>De Heer <name>Styn Streuvels</name></addrLine>
<addrLine>"Lijsternest"</addrLine>
<addrLine><hi rend="underlined">INGOYGHEM</hi></addrLine>
</address>
<postmark>
<date value="1924-01-04">4.I.1924</date>
<placeName><place>ANTWERPEN</place></placeName>
</postmark>
</envelope>
- structural: occurring after the closing formulae and salutation
- semantic: form a last addition to the contents of the letter. Moreover, the author often explicitly signals this additional status with the abbreviation 'P.S.'
<closer>
<salute>Met vriendelijken groet</salute>
<signed>(Styn Streuvels)</signed>
<ps>
<p id="xr2">
<add id="add1"><abbr expan="postscriptum">P.S.</abbr>
Ze jubileeren bij de firma Veen (60 jaar bestaan)
<ref target="n8">8</ref> en er wordt me daarom gevraagd,
door het comité: hoeveel geld ik daarvoor als
feestgave wensch te geven! Zonderlinge zeden?
Als ik nu eens vroeg: hoeveel ze voor mij beschikken
als 75-jarige jubilaris!</add>
</p>
</ps>
</closer>
- structural: calculations are often set apart formally from running text; marking them with explicit encoding features provides researchers with greater control over the textual features they want to study.
- semantic: different structural / semantic units can be distinguished: arguments, operators and results.
<calc>
<arg>969 <abbr expan="exemplaren">ex.</abbr> (zie afrekening van 30.8.4I)</arg>
<oper>-</oper>
<arg>I38<abbr expan="exemplaren">ex</abbr>
(<arg>I33 <abbr expan="exemplaren">ex.</abbr> verkocht</arg>
<oper>+</oper> <arg>5 <abbr expan="persexemplaren">persex.</abbr></arg>)
</arg>
<result><hi rend="double underlined">83I</hi><abbr expan="exemplaren"> ex.</abbr>
</result>
</calc>
- structural:
sometimes pre-/post-printed fragments in letters (not part of main writing act)
- semantic: may need to be distinguished from more "authorial" parts of the letter, as they mostly have an impersonal character
<print type="letterhead">FRANK·LATEUR</print>