B33080 Humanities Computing: Electronic Text

University of Antwerp, Campus Drie Eiken

Second Term 2004

Edward Vanhoutte

edward.vanhoutte@kantl.be

TOC | First

B33080 Humanities Computing: Electronic Text

Week 6: XML theory and practice: TeixLite.

University of Antwerp, Campus Drie Eiken

Monday 22 March

Edward Vanhoutte

TOC | First


I. Monday 22 March Overview

Monday 22 March: Overview

previous table of contents next
1 of 2 [52]
  1. Revision of week 5
  2. TeixLite
    • <teiHeader>
    • <text>

Goals of this lecture

previous table of contents next
2 of 2 [52]
After this lecture, you should be able to
  • use and understand TeixLite
  • create TeixLite documents
  • parse TeixLite documents for validation

II. Monday 22 March Revision of week 5.

Revision of week 5

previous table of contents next
1 of 7 [52]
  1. Revision of week 4
  2. XML: theory & practice
    • Valid XML
    • Validating XML

Valid XML

previous table of contents next
2 of 7 [52]
A valid XML document will reference a Document Type Declaration
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite 1.0//EN"
    "../dtd/xmllite.dtd"
    [
    ]>
Which contains the reference to a Document Type Definition (DTD)
A DTD specifies the vocabulary and the syntax of a markup language
It defines:
  • names for all your elements
  • names and default values for their attributes
  • rules about how elements can nest
  • names for re-usable pieces of data (entities)
  • and a few other things
A DTD does not specify anything about what elements "mean"

DTD

previous table of contents next
3 of 7 [52]
Defining an element in a DTD
<!ELEMENT  name  contentModel>

email DTD

previous table of contents next
4 of 7 [52]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>
<!ELEMENT    ps        (#PCDATA)>
<!ELEMENT    sign      (name | address)*>
<!ELEMENT    name      (#PCDATA)>
<!ELEMENT    address   (addrline)+>
<!ELEMENT    addrline  (#PCDATA)>

Validating XML

previous table of contents next
5 of 7 [52]
XML can be validated when we have:
  • an XML document
  • a DTD
  • a validating parser
Make sure:
  • the Doctype Declaration inside the XML document refers to the appropriate DTD and its path
  • the parser can find an XML declaration

Assignment: solution

previous table of contents next
6 of 7 [52]
On parsing error.xml, the parser gave the following 54 errors:
ERROR.XML:17:8:E: element "Funder" undefined
ERROR.XML:23:46:E: end tag for element "addrline" which is not open
ERROR.XML:24:12:E: document type does not allow element "addrLine" here; assuming missing "address" start-tag
ERROR.XML:27:9:E: end tag for element "funder" which is not open
ERROR.XML:29:11:E: end tag for "addrLine" omitted, but OMITTAG NO was specified
ERROR.XML:23:3: start tag was here
ERROR.XML:29:11:E: end tag for "address" omitted, but OMITTAG NO was specified
ERROR.XML:19:2: start tag was here
ERROR.XML:29:11:E: end tag for "Funder" omitted, but OMITTAG NO was specified
ERROR.XML:17:1: start tag was here
ERROR.XML:87:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:91:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:93:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:95:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:97:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:99:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:101:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:103:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:105:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:106:5:E: end tag for "head" omitted, but OMITTAG NO was specified
ERROR.XML:85:0: start tag was here
ERROR.XML:106:5:E: end tag for "div" which is not finished
ERROR.XML:140:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:142:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:144:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:146:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:147:5:E: end tag for "p" omitted, but OMITTAG NO was specified
ERROR.XML:138:0: start tag was here
ERROR.XML:554:13:E: document type does not allow element "head" here; missing one of "listBibl", "figure", "list", "table" start-tag
ERROR.XML:556:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:558:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:560:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:562:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:564:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:566:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:568:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:570:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:572:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:574:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:575:5:E: end tag for "head" omitted, but OMITTAG NO was specified
ERROR.XML:554:8: start tag was here
ERROR.XML:575:5:E: end tag for "head" omitted, but OMITTAG NO was specified
ERROR.XML:554:0: start tag was here
ERROR.XML:575:5:E: end tag for "div" which is not finished
ERROR.XML:646:2:E: document type does not allow element "p" here; missing one of "add", "corr", "sic", "note", "figure", "q", "stage" start-tag
ERROR.XML:646:38:E: end tag for "p" omitted, but OMITTAG NO was specified
ERROR.XML:646:0: start tag was here
ERROR.XML:646:42:E: end tag for element "p" which is not open
ERROR.XML:809:10:E: an attribute value specification must be an attribute value literal unless SHORTTAG YES is specified
ERROR.XML:925:6:E: end tag for element "div1" which is not open
ERROR.XML:926:6:E: end tag for "div" omitted, but OMITTAG NO was specified
ERROR.XML:875:0: start tag was here
ERROR.XML:926:6:E: end tag for "div" omitted, but OMITTAG NO was specified
ERROR.XML:503:0: start tag was here

Which can be solved in 9 steps

previous table of contents next
7 of 7 [52]
  1. Change <Funder> to <funder>: 50 errors left
  2. Change <addrline> to <addrLine>: 44 errors left
  3. Add </head> to <head>1: 32 errors left
  4. Close <p>: 26 errors left
  5. Change <head>21<head> to <head>21</head>: 10 errors left
  6. Change <head>25<head><p>'Ziezo,' zei Val, 'ik begin!'</head></p> to <head>25</head><p>'Ziezo,' zei Val, 'ik begin!'</p>: 6 errors left
  7. Change <div type=chapter n=""> to <div type="chapter" n="">: 5 errors left
  8. Change </div1> to </div>: 2 errors left
  9. Add </div>: all correct, no errors reported

III. 3. TeixLite

3: TEI Lite: TeixLite

previous table of contents next
1 of 43 [52]
→ "TEI U5: Encoding for Interchange: an introduction to the TEI."
→ http://www.tei-c.org/Lite/

Is XML too eXtensible?

previous table of contents next
2 of 43 [52]
XML allows you to make up your own tags, and doesn't require a DTD... isn't that rather dangerous?
  • XML allows you to name elements freely
  • one person's <p> is another's <para> (or is it?)
  • documents are not interchangeable that way
  • no one ontology of the text
→ namespaces
→ DTD

Namespaces

previous table of contents next
3 of 43 [52]
An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names.
→ http://www.w3.org/TR/REC-xml-names/
e.g.: <table> in doc1 is not necessarily the same as <table> in doc2

Why a DTD?

previous table of contents next
4 of 43 [52]
  • A DTD is very useful at data preparation time: validating editors only allow you to input correct markup
  • Useful for consistent encoding in projects
  • Guarantees longevity and interchangeability of semantics and structure in encoded texts
  • We need it for validation
  • Useful for software development and operability

Text Encoding Initiative: a pizza model

previous table of contents next
5 of 43 [52]
The TEI has produced a number of DTD subsets which can be combined according to the needs for a particular project in the humanities.
"All pizza's have some ingredients in common (cheese and tomato sauce); in Chicago, at least, they may have entirely different forms of pastry base, with which (universally) the consumer is expected to make his or her own selection of toppings."
  • Core tag sets (cheese and tomato): define mandatory elements for all document types. e.g. a TEI compliant document always has a header and a text
  • Base tag sets (the pastry): define the structural components of a document. Only one choice ia allowed amongst: Prose, Poetry, Drama, Speech, Lexicography, and Terminology
  • Additional tag sets (toppings): can occur in all document type classes, but define specialised tag sets which can be combined according to taste: Links, Figures, tables, formulae, Structural analysis, Transcription, Text crit, Names & dates, Corpus linguistics

TEILite

previous table of contents next
6 of 43 [52]
  • one of many possible views of the TEI DTD
  • small and simple
  • 20% of the tags, 80% of the projects
  • 121 elements
  • was devised as a didactic stepstone to the full flavour TEI, but began a life of its own
  • realistic for existing texts and for document production
TeixLite: the XML compatible version of TEILite

TeixLite: start

previous table of contents next
7 of 43 [52]
A TeixLite document is an XML document which refers to a DTD, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>

TeixLite: start

previous table of contents next
8 of 43 [52]
A TeixLite document is an XML document which refers to a DTD, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>
→ root element = <TEI.2>, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>


<TEI.2>
...
</TEI.2>

A TEI-conformant document

previous table of contents next
9 of 43 [52]
comprises a header followed by a text
<TEI.2>
 <teiHeader>...<teiHeader>
 <text>...<text>
<TEI.2>

<teiHeader>

previous table of contents next
10 of 43 [52]
The header is essential for:
  • bibliographic control and identification
  • resource documentation and
  • processing (see later)
The TEI Header is introduced by the element <teiHeader> and has 4 major parts, only the first of which is mandatory:
  1. file description <fileDesc>:> contains a full bibliographic description of an electronic file amongst which information about the sources from which the electronic text was derived. Essential for bibliographic referencing and cataloguing.
  2. encoding description <encodingDesc>: documents the relationship between an electronic text and the source or sources from which it was derived. It allows for documenting detailed information about transcription/transliteration principles such as normalization, the treatment of quotations and hyphenation and the levels of interpretation i.e. analytic tagging and encoding applied to the document.
  3. profile description <profileDesc>: provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their settings.
  4. revision description <revisionDesc>: summarizes the revision history for a file, which is important for version control and for resolving questions about the history of a file, especially when a team of scholars is working on the same document.

<teiHeader>

previous table of contents next
11 of 43 [52]
The full form of a TEI Header is thus:
   <teiHeader>
      <fileDesc> ... </fileDesc>
      <encodingDesc> ... </encodingDesc>
      <profileDesc> ... </profileDesc>
      <revisionDesc> ... </revisionDesc>
   </teiHeader>
 
While a minimal header takes the form:
   <teiHeader>
      <fileDesc> ... </fileDesc>
   </teiHeader>

<text>: A text may be unitary or composite

previous table of contents next
12 of 43 [52]
A unitary text contains
  • <front>: front matter
  • <back>: back matter
  • <body>: a body
   <text>
      <front>...</front>
      <body>...</body>
      <back>...</back>
   </text>

<text>: A text may be unitary or composite

previous table of contents next
13 of 43 [52]
A unitary text contains
  • <front>: front matter
  • <back>: back matter
  • <body>: a body
   <text>
      <front>...</front>
      <body>...</body>
      <back>...</back>
   </text>
In a composite text, the body is a
  • <group>: group of texts (or nested groups)
   <group>
      <text>...</text>
      <text>...</text>
      <text>...</text>
   </group>

<front>

previous table of contents next
14 of 43 [52]
Preliminary material such as title pages, prefatory epistles, etc.,
<front>
<titlePage>
   <docTitle>
      <titlePart type="main">
	...
      </titlePart>
      <docAuthor>...</docAuthor>
      <docDate>...</docDate>
      <docEdition>...</docEdition>
      <docImprint>...</docImprint>
      <epigraph>...</epigraph>
   </docTitle>
</titlePage>
</front>

<titlePage>

previous table of contents next
15 of 43 [52]
<titlePage>
   <docTitle>
      <titlePart>
         <title level="m" type="main">DE TELEURGANG VAN DEN WATERHOEK</title>
      </titlePart>
   </docTitle>
   <titlePart>DOOR</titlePart>
   <docAuthor>STIJN STREUVELS</docAuthor>
   <docImprint>UITGAVE "EXCELSIOR" — BRUGGE</docImprint>
   <docImprint>AMSTERDAM, L. J. VEEN, UITGEVER.</docImprint>
</titlePage>

Structure of a TEI Document

previous table of contents next
16 of 43 [52]

A text usually has divisions

previous table of contents next
17 of 43 [52]
  • generic, hierarchic subdivisions
  • vanilla or numbered
  • type attribute
  • associated <head> and <trailer>
  • <div>, <div0>, <div1>, <div2>, <div3>, <div4>, <div5>, <div6>, <div7>,
<text>
   <front> <!-- titlepage etc here --> </front>
   <body>
      <div1 type="book" n="1" id="b0100">
         <head>Book1</head>
            <div2 type="chapter" n="1" id="b0101">
               <head>Chapter 1</head>
               <!-- rest of the chapter -->
            </div2>
            <div2 type="chapter" n="2" id="b0102">
               <head>Chapter 2</head>
	       <!-- rest of the chapter -->
            </div2>
       </div1>
   </body>
</text>

Use of global attributes

previous table of contents next
18 of 43 [52]
Applicable to all elements
  • id for unique identification
  • n for (non-unique) name or number
  • rend for rendition (appearance)
  • lang for language and hence writing-system
→ Extensible, like other classes

Text components in TEI Lite

previous table of contents next
19 of 43 [52]
What are divisions made of?
  • Prose is mostly paragraphs (<p>)
  • Verse is mostly lines (<l>), sometimes in hierarchic groups (<lg>)
  • Drama is mostly speeches (<sp>) containing <p> or <l> and interspersed with stage directions (<stage>)
These may be mixed, and may appear also directly within undivided texts

Prose: an example

previous table of contents next
20 of 43 [52]
<p>Initially launched in 1987, the TEI is an international and interdisciplinary
standard that helps libraries, museums, publishers, and individual scholars represent
all kinds ofliterary and linguistic texts for online research and teaching, using an
encoding scheme that is maximally expressive and minimally obsolescent.</p>
<p>For current membership of the TEI Consortium, please check the members list.</p>

Verse: an example

previous table of contents next
21 of 43 [52]

<lg type="poem">
<head>Poor Edward</title>
<lg type="stanza">
<l>Did you hear the news about Edward?</l>
<l>On the back of his head he had another face</l>
<l>Was it a woman's face or a young girl?</l>
<l>They said to remove it would kill him</l>
<l>So poor Edward was doomed</l>
</lg>
<lg type="stanza">
<l>The face could laugh and cry</l>
<l>It was his devil twin</l>
<l>And at night she spoke to him</l>
<l>Things heard only in hell</l>
<l>But they were impossible to separate</l>
<l>Chained together for life</l>
</lg>
<lg type="stanza">
<l>Finally the bell tolled his doom</l>
<l>He took a suite of rooms</l>
<l>And hung himself and her from the balcony irons</l>
<l>Some still believe he was freed from her</l>
<l>But I knew her too well</l>
<l>I say she drove him to suicide</l>
<l>And took poor Edward to hell</l>
</lg>
</lg>

Drama: an example

previous table of contents next
22 of 43 [52]
<stage>Enter Barnardo and Francisco, two Sentinels,at several doors</stage>
<sp who="Barnardo"><l>Who's there?</l></sp>
<sp who="Francisco"><l>Nay, answer me. Stand and unfold yourself.</l></sp>
<sp who="Barnardo"><l>Long live the king!</l></sp>
<sp who="Francisco"><l>Barnardo?</l></sp>
<sp who="Barnardo"><l>He.</l></sp>
Enter Barnardo and Francisco, two Sentinels, at several doors
Barnardo: Who's there?
Francisco: Nay, answer me. Stand and unfold yourself.
Barnardo: Long live the king!
Francisco: Barnardo?
Barnardo: He.

Page and line numbers

previous table of contents next
23 of 43 [52]
  • <pb />: pagebreak
  • <lb />: linebreak

Whan that Aprill with his shoures soote<lb />
The droghte of March hath perced to the roote,<lb />
And bathed every veyne in swich licour<lb />
Of which vertu engendred is the flour;<lb />
<pb ed="riverside" n="23" />
Whan Zephirus eek with his sweete breeth<lb />


Texts are not just words...

previous table of contents next
24 of 43 [52]
  • but probably only people know that
  • an encoding may claim to capture
    • just visual salience
    • just its assumed causes
    • both
  • encoding makes explicit one (or more) sets of interpretations

Highlighted phrases

previous table of contents next
25 of 43 [52]
<hi>: used to indicate typographic highlighting:
<hi rend="gothic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,...

Or

previous table of contents next
26 of 43 [52]
<hi> used to indicate typographic highlighting:
<hi rend="gothic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,...
A descriptive tag can document the reason for highlighting:
<seg type="formula">And this Indenture further witnesseth</seg>
that the said <name rend="italic">Walter Shandy</name>, merchant,...

Interpretive highlighting

previous table of contents next
27 of 43 [52]
<emph>: used to indicate linguistic (not typographic) emphasis
You did <emph>what?</emph>
Varieties of linguistic distancing: <gloss>, <term>, <soCalled>, and <mentioned>
A <term>DTD</term> specifies <gloss>the vocabulary and the syntax
of a markup language</gloss>
They put us out of work and call it <soCalled>downsizing<soCalled>
<mentioned>Downsizing</mentioned> is a very nasty neologism

<q>: Direct speech

previous table of contents next
28 of 43 [52]
  • Use the type attribute to indicate whether it is spoken or thought
  • Use the who attribute to show speakers
  • Speeches can be nested in other speeches

<p>
 <q who="Flotte, Pierre" type="spoken">"Ho dit verstaen wy!"</q>
 viel <name reg="Flotte, Pierre" type="person">Pierre Flotte</name>
 <corr sic="in">in, </corr>
 <q who="Flotte, Pierre" type="spoken">"Maer Myne heeren geeft uwen draveren de spoor
 en haest u voort - want ginds zie ik Mynheer
 <name reg="De Valois, Charles" type="person"> De Valois </name>
 tusschen de boomen verdwynen.
 </q>
</p>

"Foreign" language phrases

previous table of contents next
29 of 43 [52]
  • The lang attribute may be attached to any element
  • Use <foreign> if nothing else is available
  • Define each language in <langUsage> in the <teiHeader>

<profileDesc>
   <langUsage>
      <language id="deu">German</language>
      <language id="fra">French</language>
   </langUsage>
</profileDesc>


<p>Have you read <title lang="deu">Die Dreigroschenoper</title>?
<mentioned lang="fra">Savoir-faire</mentioned> is French for know-how.
John has real <foreign lang="fra">savoir-faire</foreign>.</p>

Phrase level elements

previous table of contents next
30 of 43 [52]
  • are often by convention typographically distinct
  • "data-like" (names, numbers, dates, times, addresses)
    • <name>
    • <num>
    • <date>
    • <time>
    • <address> & <addrLine>
  • editorial intervention (corrections, regularizations, additions, omissions ...)
    • <corr>
    • <sic>
    • <reg>
    • <orig>
    • <add>
    • <del>
  • cross references and links
    • <xptr />
    • <xref>

Dates, times, numbers

previous table of contents next
31 of 43 [52]
  • attributes can be used to quantify <date> expressions
  • similarly, times <time>, and numbers <num>

Today is <date>Monday 15th</date>

Today is <date value="2004-03-15">Monday 15th</date>

One day in <date certainty="approx" value="2004-03">late March</date>


Its now <time value="12:00">noon</time>.


<num value="15">fifteen</num>

<num value="3.1419">pi</num>

Correction and regularisation

previous table of contents next
32 of 43 [52]
  • <corr> and <sic> for correction (or non-correction)
  • <reg> and <orig> for regularization (or the reverse)

<p>"Ho dit verstaen wy!" viel
  <name reg="Flotte, Pierre" type="person">Pierre Flotte </name>
  <corr sic="in">in, </corr>
  "Maer Myne heeren geeft uwen draveren de spoor en haest u voort -
  want ginds zie ik Mynheer
  <name reg="De Valois, Charles" type="person">De Valois</name>
  tusschen de boomen verdwynen."
</p>

Omissions, Deletions, Additions

previous table of contents next
33 of 43 [52]
  • <gap>: omission by transcriber
  • <del>: cancellation in source or by editor
  • <add> or <supplied>: insertion in source or by editor
  • <unclear>: material uncertain because illegible
  • <damage>: physical damage to text carrier

<gap reason="illegible" desc="bloodstain">

He was <del resp="EV" hand="author" type="strike">not</del> very nice.

He was <add resp="EV" hand="author" place="supralinear">not</add> very nice.

He was <unclear resp="EV" reason="ink stain">not</unclear> very nice.



Abbreviations and expansions

previous table of contents next
34 of 43 [52]
  • <abbr>: contains an abbreviation. It's expansion is given in an "expan" attribute.

We learn how to write <abbr expan="eXtensible Markup Language">XML</abbr>

Addresses

previous table of contents next
35 of 43 [52]
  • <address>: contains a postal or other address.
  • <addrLine>: contains a line inside an <address>.

<address>
	<addrLine>Koninklijke Academie voor Nederlandse Taal- en Letterkunde</addrLine>
	<addrLine>Koningstraat 18</addrLine>
	<addrLine>b-9000 Gent</addrLine>
	<addrLine>tel: +32 (0)9 265.93.50</addrLine>
	<addrLine>fax: +32 (0)9 265.93.49</addrLine>
	<addrLine>email: ctb@kantl.be</addrLine>
</address>

Lists

previous table of contents next
36 of 43 [52]
<list>: contains any sequence of items organized as a list.
<item>: contains one component of a list.
<label>: contains the label associated with an item in a list; in glossaries, marks the term being defined.

<list type="ordered">
<item>Week 1</item>
<item>Week 2</item>
<item>Week 3</item>
<item>Week 4</item>
<item>Week 5</item>
</list>


<list type="gloss">
<head>Vocabulary</head>
<label lang="enm">Whan that</label>    <item>When</item>
<label lang="enm">Aprill</label>       <item>April</item>
<label lang="enm">with</label>         <item>with</item>
<label lang="enm">his</label>          <item>its</item>
<label lang="enm">shoures</label>      <item>showers</item>
<label lang="enm">soote</label>        <item>sweet</item>
</list>

Notes

previous table of contents next
37 of 43 [52]
<note>: contains a note or annotation.

<p>Indeed, hypertext is just the visualization of linking which
DeRose &amp; Van Dam define as "the ability to express relationships
between places in a universe of information"<note place="foot" n="1">
"A place should be any piece of information, or at least any that
exists in a stable or recoverable form." (DeRose &amp; van Dam 9).</note></p>


<table>

previous table of contents next
38 of 43 [52]
  • a <table> element contains <row>s of <cell>s
  • spanning is indicated by rows and cols attributes
  • role attribute indicated whether <row> or <cell> holds data or a label
  • embedded tables are permitted


<table rows="2" cols="2">
   <row role="label">
      <cell>Male Students</cell>
      <cell>Female Students</cell>
   </row>
   <row role="data">
      <cell>1</cell>
      <cell>5</cell>
   </row>
</table>


Bibliography

previous table of contents next
39 of 43 [52]
Use simple <bibl> with optional subcomponents:
  • <respStmt> (for any kind of responsibility) or <author>, <editor>, etc.
  • <title> with optional level attribute
  • <imprint> groups publication details
  • <biblScope> adds page references
  • Use <listBibl> for list of references

<bibl>
  <author>Walsh, Marcus</author>
  <date>(1993)</date>
  <title level="a">The Fluid Text and the Orientations of Editing.</title>
  <editor>Chernaik, Warren, Caroline Davis, and Marilyn Deegan</editor>
  <title level="m">The Politics of the Electronic Text</title>
  <imprint>
     <pubPlace>Oxford</pubPlace>
     <publisher>Office for Humanities Communication</publisher>
  </imprint>
  <biblScope>31-39</biblScope>
</bibl>

Referring strings

previous table of contents next
40 of 43 [52]
The <rs> (referring string)
→ element is used for any kind of name or reference

<q>My dear <rs type="person" key="BENM1">Mr. Bennet</rs>,</q>
said <rs type="person" key="BENM2">his lady</rs> to him one day,
<q>have you heard that <rs type="place" key="NETP1">Netherfield Park</rs>
is let at last?</q>

<figure>

previous table of contents next
41 of 43 [52]
  • The presence of a graphic is indicated by the <figure> element
  • The title of the graphic is tagged as a <head>
  • A description of the graphic may be supplied (as a <figDesc>) for use by software unable to render the graphic
  • The graphic itself is specified as an external entity


<!ENTITY logoctb SYSTEM "logoctb.gif" NDATA GIF>





<figure entity="logoctb">
<head>The logo of the CTB</head>
<figdesc>The letters c, t, and b in red with black border on a white field</figdesc>
</figure>


The multiple hierarchy

previous table of contents next
42 of 43 [52]
problem
  • XML allows only one hierarchy at a time
  • Is a document
    • chapter-paragraph-phrase
    • gathering-page-leaf
    • or both?
  • discontinuous segments
  • links and milestones

Not covered here

previous table of contents next
43 of 43 [52]
  • TEI XPointer syntax
  • Interpretation and analysis