B33080 Humanities Computing: Electronic Text

University of Antwerp, Campus Drie Eiken

Second Term 2005

Edward Vanhoutte

edward.vanhoutte@kantl.be

TOC | First

B33080 Humanities Computing: Electronic Text

Week 4: XML theory and practice: valid XML - parsing/validating - TeixLite.

University of Antwerp, Campus Drie Eiken

Monday 28 February

Edward Vanhoutte

TOC | First


I. Monday 28 February Overview

Monday 28 February: Overview

previous table of contents next
1 of 2 [64]
  1. Revision of week 3
  2. XML: theory & practice
    • DTD
    • Valid XML
    • Validating XML
    • Teixlite

Goals of this lecture

previous table of contents next
2 of 2 [64]
After this lecture, you should be able to
  • create a valid XML document
  • parse an XML document for validation
  • read and interpret an XML document
  • create and read small DTDs
  • use and understand TeixLite
  • create TeixLite documents
  • parse TeixLite documents for validation

II. Monday 28 February Revision of week 3.

1. Revision of week 3

previous table of contents next
1 of 9 [64]
  1. Revision of week 2
  2. Standard Generalized Markup Language (SGML)
  3. Text Encoding Initiative (TEI)
  4. eXtensible Markup Language (XML)
    • What it is
    • SGML - XML
    • Input & output
    • Production Process
  5. XML: theory & practice
    • How do we recognize an XML document?
    • Minimal XML document
    • Jargon
    • Well formed XML
    • Valid XML

XML jargon

previous table of contents next
2 of 9 [64]
An XML document contains
  • Processing Instructions
    • <?xml version="1.0" ?>
    • <?xml-stylesheet href="../dtd/xsltslides.xsl" type="text/xsl" ?>
  • Elements
    • <title> or </title>
    • <empty />
    • → XML is case sensitive in the naming of the elements. <title> is not <TITLE> is not <Title> is not <tItle> etc.
  • Attributes (optional)
    • <title type="journal">
    • <name who="Edward" reg="VanhoutteE" >
  • Entity References
    • to represent characters which cannot reliably be typed in (ISO 8859-1 (IsoLat1), ISO 10646 - Unicode): &eacute; = &#233; = é
    • as a shortcut for boiler plate text: &mayonnaise; = reference to an external recipe (text) for mayonnaise
    • containers for external (non-XML) data such as graphics: <figure entity="ascii-full" />
  • CDATA:
    • allows you to include application code (JavaScript, Perl, BASIC, etc.) in an XML document without having to worry about escaping characters. A CDATA section is treated by the XML processor as it appears
    • <![CDATA[This text escapes the XML processor]]>

Well Formed XML

previous table of contents next
3 of 9 [64]
  1. XML is case sensitive
  2. At least one element
  3. There is always a root element
  4. All logical and physical structures nest properly
  5. Correspondence of element names in start- and end-tags
  6. Attribute names only appear once in a start-tag
  7. Attribute values are quoted
  8. Attribute values don't refer to external entities
  9. Entities are declared
  10. No entity reference contains the name of a non-parsed entity

Valid XML

previous table of contents next
4 of 9 [64]
A valid XML document will reference a Document Type Declaration
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite 1.0//EN"
    "../dtd/xmllite.dtd"
    [
    ]>

Document Type Declaration

previous table of contents next
5 of 9 [64]
Contains the reference to a Document Type Definition (DTD)
A DTD specifies the vocabulary and the syntax of a markup language
It defines:
  • names for all your elements
  • names and default values for their attributes
  • rules about how elements can nest
  • names for re-usable pieces of data (entities)
  • and a few other things
A DTD does not specify anything about what elements "mean"

Defining an element in a DTD

previous table of contents next
6 of 9 [64]
<!ELEMENT  name  contentModel>
  • name is the name of the element (GI)
  • contentModel defines valid content for the element:
    • #PCDATA
    • EMPTY
    • other elements
    • mixed content combining PCDATA and other elements

Document Type Declaration

previous table of contents next
7 of 9 [64]
Contains the reference to a Document Type Definition (DTD)
→ The formal specification for the structure of an XML document

DTD toolkit

previous table of contents next
8 of 9 [64]

Real Example: Anthology DTD

previous table of contents next
9 of 9 [64]
<!ELEMENT   anthology    (poem,poem+)>
<!ELEMENT   poem         (title?, stanza+)>
<!ELEMENT   title        (#PCDATA)>
<!ELEMENT   stanza       (line)+>
<!ELEMENT   line         (#PCDATA)>

III. Week 4

Exercise

previous table of contents next
1 of 22 [64]
→ write a DTD for the following document and save it as c:\lab\email.dtd
<?xml version="1.0" ?>
<email>
 <header>
   <subj>my dog</subj>
   <date>Mon, 7 Feb 2004 13:57:24 +0100</date>
   <from>"thedoglady@wanteddogs.org"
&lt;thedoglady&commat;wanteddogs.org&gt;</from>
   <to>"edward.vanhoutte&commat;kantl.be" &lt;edward.vanhoutte&commat;kantl.be&gt;</to>
 </header>
 <body>
   <open>Sir,</open>
   <p>Last week I lost my dog and don't know where she is now.</p>
    <p>Normally I keep it in a bag under my arm, but now she's gone.</p>
   <p>Could you please check all your bags and look whether you can find my dog?</p>
   <close>Very many thanks in advance</close>
   <sign>
     <name>Lady D. Og</name>
     <address>
       <addrLine>Department of lost dogs</addrLine>
       <addrLine>Ministry of dogs and bags, Dogtown</addrLine>
       <addrLine>thedoglady&commat;wanteddogs.org</addrLine>
     </address>
   </sign>
 </body>
</email>

email DTD

previous table of contents next
2 of 22 [64]
<!ELEMENT    email     (header, body)>

email DTD

previous table of contents next
3 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>

email DTD

previous table of contents next
4 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>

email DTD

previous table of contents next
5 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>

email DTD

previous table of contents next
6 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>

email DTD

previous table of contents next
7 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>

email DTD

previous table of contents next
8 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>

email DTD

previous table of contents next
9 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>

email DTD

previous table of contents next
10 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>

email DTD

previous table of contents next
11 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>

email DTD

previous table of contents next
12 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>
<!ELEMENT    ps        (#PCDATA)>

email DTD

previous table of contents next
13 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>
<!ELEMENT    ps        (#PCDATA)>
<!ELEMENT    sign      (name | address)*>

email DTD

previous table of contents next
14 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>
<!ELEMENT    ps        (#PCDATA)>
<!ELEMENT    sign      (name | address)*>
<!ELEMENT    name      (#PCDATA)>

email DTD

previous table of contents next
15 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>
<!ELEMENT    ps        (#PCDATA)>
<!ELEMENT    sign      (name | address)*>
<!ELEMENT    name      (#PCDATA)>
<!ELEMENT    address   (addrline)+>

email DTD

previous table of contents next
16 of 22 [64]
<!ELEMENT    email     (header, body)>
<!ELEMENT    header    (subj, date, from, to)>
<!ELEMENT    subj      (#PCDATA)>
<!ELEMENT    date      (#PCDATA)>
<!ELEMENT    from      (#PCDATA)>
<!ELEMENT    to        (#PCDATA)>
<!ELEMENT    body      (open | p | ps | close | sign)*>
<!ELEMENT    open      (#PCDATA)>
<!ELEMENT    p         (#PCDATA)>
<!ELEMENT    close     (#PCDATA)>
<!ELEMENT    ps        (#PCDATA)>
<!ELEMENT    sign      (name | address)*>
<!ELEMENT    name      (#PCDATA)>
<!ELEMENT    address   (addrline)+>
<!ELEMENT    addrline  (#PCDATA)>

Validating XML

previous table of contents next
17 of 22 [64]
XML can be validated when we have:
  • an XML document
  • a DTD
  • a validating parser
Make sure:
  • the Doctype Declaration inside the XML document refers to the appropriate DTD and its path
  • the parser can find an XML declaration

Validating with NSGMLS (SP)

previous table of contents next
18 of 22 [64]
check whether the following files are in the bin folder of the program, and download from

Validating with NSGMLS (SP)

previous table of contents next
19 of 22 [64]
  • Specify where nsgmls can find the catalog file under Options in the toolbar of runsp2.
  • Specify where nsgmls can find xml.dcl under Options in the toolbar of runsp2.

Exercise 1

previous table of contents next
20 of 22 [64]
  • http://www.kantl.be/ctb/vanhoutte/teach/hc2005.htm (week 4)
    • Copy email.xml to C:\lab\
    • Copy email.dtd to C:\lab\
    • Copy iso-lat1.ent, iso-lat2.ent, iso-num.ent & iso-pub.ent to C:\lab\
    • Validate with NSGMLS (SP)
  • CD-ROM
    • Copy exx/email.xml to C:\lab\
    • Copy exx/email.dtd to C:\lab\
    • Copy exx/iso-lat1.ent, exx/iso-lat2.ent, exx/iso-num.ent & exx/iso-pub.ent to C:\lab\
    • Validate with NSGMLS (SP)

Exercise 2

previous table of contents next
21 of 22 [64]
  • Validate with Open XML Editor

Exercise 3

previous table of contents next
22 of 22 [64]
  • Hike your neighbour's computer
  • Sneak some mistakes in his/her email.xml document
  • Let him/her validate and look for the mistakes

IV. Teixlite

Theoretical problem with the DTD

previous table of contents next
1 of 5 [64]
  • DTD's are written in a formal language different from the XML documents.
→ XML Schema
  • A self-declarative way of documenting the formal specification for the structure of an XML document
  • Expressed in XML itself
Schema can deal with namespaces and DTDs can not
An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names.
e.g.: <table> in doc1 is not necessarily the same as <table> in doc2

Relax NG Schema

previous table of contents next
2 of 5 [64]
  • uses XML syntax to represent schemas
  • supports datatyping
  • integrates attributes into content models
  • supports XML namespaces
  • supports unordered content
  • supports context-sensitive content models

Relax NG Schema for email.xml

previous table of contents next
3 of 5 [64]

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
>
  
   <define name="email">
     <element name="email">
       <oneOrMore>
         <ref name="header"></ref>
         <ref name="body"></ref>
       </oneOrMore>
     </element>
   </define>
   
   <define name="header">
     <element name="subj">
       <text></text>
     </element>
     <element name="date">
       <text></text>
     </element>
     <element name="from">
       <text></text>
     </element>
     <element name="to">
       <text></text>
     </element>
   </define>

   <define name="body">
    <zeroOrMore>
     <choice>
       <element name="open">
        <text></text>
       </element>
       <element name="p">
        <text></text>
       </element>
       <element name="close">
        <text></text>
       </element>
       <element ref="sign"></element>
     </choice>
    </zeroOrMore>
   </define>
   
   <define name="sign">
     <zeroOrMore>
       <choice>
         <element name="name">
           <text></text>
         </element>
         <ref name="address"></ref>
        </choice>
     </zeroOrMore>
   </define>


<define name="address">
    <oneOrMore>
       <element name="addrline">
         <text></text>
       </element>
    </oneOrMore>
</define>

   <start>
     <choice>
       <ref name="email" />
     </choice>
   </start>
</grammar>

Writing DTDs or Schema

previous table of contents next
4 of 5 [64]
→ Bad news
  • Difficult
  • Time Consuming
  • Expensive

Writing DTDs or Schema

previous table of contents next
5 of 5 [64]
→ Bad news
  • Difficult
  • Time Consuming
  • Expensive
→ Good news
  • The work has been done for you

V. TEI

Text Encoding Initiative (TEI)

previous table of contents next
1 of 26 [64]
provides DTD subsets and elements for the encoding of:
  • texts in any natural language
  • of any date
  • in any literary genre or text type
  • without restrictions on form or content
They treat both continuous materials ('running text') and discontinuous materials such as dictionaries and linguistic corpora.

Text Encoding Initiative: a pizza model

previous table of contents next
2 of 26 [64]
The TEI has produced a number of DTD subsets which can be combined according to the needs for a particular project in the humanities.
"All pizza's have some ingredients in common (cheese and tomato sauce); in Chicago, at least, they may have entirely different forms of pastry base, with which (universally) the consumer is expected to make his or her own selection of toppings."
  • Core tag sets (cheese and tomato): define mandatory elements for all document types.
  • Base tag sets (the pastry): define the structural components of a document.
  • Additional tag sets (toppings): can occur in all document type classes, but define specialised tag sets which can be combined according to taste.

Core Tag Sets

previous table of contents next
3 of 26 [64]
→ Are always required and contain the teiHeader DTD and elements available in all TEI documents.

Base Tag Sets

previous table of contents next
4 of 26 [64]
→ Define the basic building blocks of different text types. Following selections are available:
  • Prose: this tagset is suitable for most documents most of the time;
  • Verse: this tagset adds specialist tagging for metrical analysis, rhyme-scheme etc. to the basic verse markup already included in the core;
  • Drama: this tagset adds specialist tagging for cast lists, records of first performance, etc. to the basic drama markup already included in the core;
  • Speech: this tagset replaces the basic structure by one suitable for linguistic analysis of speech acts, etc.;
  • Dictionaries: this tagset replaces the basic structure with one containing detailed lexicographic features;
  • Terminology: this tagset replaces the basic structure with one specific to terminological databases;
  • General base: this tagset allows you to combine tags from different base tagsets, with the proviso that any single text division can contain tags from only one of the base tagsets you choose from the following list: prose, verse, drama, spoken texts, dictionaries, terminology:
  • Mixed base: this tagset allows you to combine tags from different base tagsets, with no restriction at all as to where tags from different base tagsets can appear. The different tagsets to combine are: prose, verse, drama, spoken texts, dictionaries, terminology.

Additional Tag Sets

previous table of contents next
5 of 26 [64]
→ May be selected and are optional:
  • linking: adds elements for hypertext linking, segmentation, and alignment;
  • figures: adds elements for encoding tables, pictures, and formulae;
  • analysis: adds elements for interpretation and simple linguistic analyses;
  • fs: adds elements for feature structure analysis;
  • certainty: adds elements for recording uncertainty and responsibility;
  • transcr: adds elements for the transcription of primary sources (e.g. manuscripts);
  • textcrit: adds elements for text-critical apparatus;
  • names.dates: adds elements for the detailed tagging of names and dates;
  • nets: adds elements for recording the abstract structure of mathematical graphs, networks, and trees;
  • corpora: adds specialized elements to the TEI-header for use with language corpora.

TEILite

previous table of contents next
6 of 26 [64]
  • one of many possible views of the TEI DTD
  • small and simple
  • 20% of the tags, 80% of the projects
  • 121 elements
  • was devised as a didactic stepstone to the full flavour TEI, but began a life of its own
  • realistic for existing texts and for document production
TeixLite: the XML compatible version of TEILite
→ "TEI U5: Encoding for Interchange: an introduction to the TEI."

TeixLite: start

previous table of contents next
7 of 26 [64]
A TeixLite document is an XML document which refers to a DTD, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>

TeixLite: start

previous table of contents next
8 of 26 [64]
A TeixLite document is an XML document which refers to a DTD, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>
→ root element = <TEI.2>, so:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN"
      "teixlite.dtd"
    [
    ]>


<TEI.2>
...
</TEI.2>

teixlite.dtd

previous table of contents next
9 of 26 [64]
Copy teixlite.dtd to C:\lab\

A TEI-conformant document

previous table of contents next
10 of 26 [64]
comprises a header followed by a text
<TEI.2>
 <teiHeader>...<teiHeader>
 <text>...<text>
<TEI.2>

teixlite.clb

previous table of contents next
11 of 26 [64]
Copy teixlite.clb to C:\program files\NoteTab Light\Libraries\

<teiHeader>

previous table of contents next
12 of 26 [64]
The header is essential for:
  • bibliographic control and identification
  • resource documentation and
  • processing (see later)

<teiHeader>

previous table of contents next
13 of 26 [64]
The TEI Header is introduced by the element <teiHeader> and has 4 major parts, only the first of which is mandatory:
  1. file description <fileDesc>:> contains a full bibliographic description of an electronic file amongst which information about the sources from which the electronic text was derived. Essential for bibliographic referencing and cataloguing.
  2. encoding description <encodingDesc>: documents the relationship between an electronic text and the source or sources from which it was derived. It allows for documenting detailed information about transcription/transliteration principles such as normalization, the treatment of quotations and hyphenation and the levels of interpretation i.e. analytic tagging and encoding applied to the document.
  3. profile description <profileDesc>: provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their settings.
  4. revision description <revisionDesc>: summarizes the revision history for a file, which is important for version control and for resolving questions about the history of a file, especially when a team of scholars is working on the same document.

<teiHeader>

previous table of contents next
14 of 26 [64]
The full form of a TEI Header is thus:
   <teiHeader>
      <fileDesc> ... </fileDesc>
      <encodingDesc> ... </encodingDesc>
      <profileDesc> ... </profileDesc>
      <revisionDesc> ... </revisionDesc>
   </teiHeader>
 
While a minimal header takes the form:
   <teiHeader>
      <fileDesc> ... </fileDesc>
   </teiHeader>

<text>: A text may be unitary or composite

previous table of contents next
15 of 26 [64]
A unitary text contains
  • <front>: front matter
  • <back>: back matter
  • <body>: a body
   <text>
      <front>...</front>
      <body>...</body>
      <back>...</back>
   </text>

<text>: A text may be unitary or composite

previous table of contents next
16 of 26 [64]
A unitary text contains
  • <front>: front matter
  • <back>: back matter
  • <body>: a body
   <text>
      <front>...</front>
      <body>...</body>
      <back>...</back>
   </text>
In a composite text, the body is a
  • <group>: group of texts (or nested groups)
   <group>
      <text>...</text>
      <text>...</text>
      <text>...</text>
   </group>

<front>

previous table of contents next
17 of 26 [64]
Preliminary material such as title pages, prefatory epistles, etc.,
<front>
<titlePage>
   <docTitle>
      <titlePart type="main">
	...
      </titlePart>
      <docAuthor>...</docAuthor>
      <docDate>...</docDate>
      <docEdition>...</docEdition>
      <docImprint>...</docImprint>
      <epigraph>...</epigraph>
   </docTitle>
</titlePage>
</front>

<titlePage>

previous table of contents next
18 of 26 [64]
<titlePage>
   <docTitle>
      <titlePart>
         <title level="m" type="main">DE TELEURGANG VAN DEN WATERHOEK</title>
      </titlePart>
   </docTitle>
   <titlePart>DOOR</titlePart>
   <docAuthor>STIJN STREUVELS</docAuthor>
   <docImprint>UITGAVE "EXCELSIOR" — BRUGGE</docImprint>
   <docImprint>AMSTERDAM, L. J. VEEN, UITGEVER.</docImprint>
</titlePage>

Structure of a TEI Document

previous table of contents next
19 of 26 [64]

A text usually has divisions

previous table of contents next
20 of 26 [64]
  • generic, hierarchic subdivisions
  • vanilla or numbered
  • type attribute
  • associated <head> and <trailer>
  • <div>, <div0>, <div1>, <div2>, <div3>, <div4>, <div5>, <div6>, <div7>,
<text>
   <front> <!-- titlepage etc here --> </front>
   <body>
      <div1 type="book" n="1" id="b0100">
         <head>Book1</head>
            <div2 type="chapter" n="1" id="b0101">
               <head>Chapter 1</head>
               <!-- rest of the chapter -->
            </div2>
            <div2 type="chapter" n="2" id="b0102">
               <head>Chapter 2</head>
	       <!-- rest of the chapter -->
            </div2>
       </div1>
   </body>
</text>

Use of global attributes

previous table of contents next
21 of 26 [64]
Applicable to all elements
  • id for unique identification
  • n for (non-unique) name or number
  • rend for rendition (appearance)
  • lang for language and hence writing-system
→ Extensible, like other classes

Text components in TEI Lite

previous table of contents next
22 of 26 [64]
What are divisions made of?
  • Prose is mostly paragraphs (<p>)
  • Verse is mostly lines (<l>), sometimes in hierarchic groups (<lg>)
  • Drama is mostly speeches (<sp>) containing <p> or <l> and interspersed with stage directions (<stage>)
These may be mixed, and may appear also directly within undivided texts

Prose: an example

previous table of contents next
23 of 26 [64]
<p>Initially launched in 1987, the TEI is an international and interdisciplinary
standard that helps libraries, museums, publishers, and individual scholars represent
all kinds ofliterary and linguistic texts for online research and teaching, using an
encoding scheme that is maximally expressive and minimally obsolescent.</p>
<p>For current membership of the TEI Consortium, please check the members list.</p>

Verse: an example

previous table of contents next
24 of 26 [64]

<lg type="poem">
<head>Poor Edward</title>
<lg type="stanza">
<l>Did you hear the news about Edward?</l>
<l>On the back of his head he had another face</l>
<l>Was it a woman's face or a young girl?</l>
<l>They said to remove it would kill him</l>
<l>So poor Edward was doomed</l>
</lg>
<lg type="stanza">
<l>The face could laugh and cry</l>
<l>It was his devil twin</l>
<l>And at night she spoke to him</l>
<l>Things heard only in hell</l>
<l>But they were impossible to separate</l>
<l>Chained together for life</l>
</lg>
<lg type="stanza">
<l>Finally the bell tolled his doom</l>
<l>He took a suite of rooms</l>
<l>And hung himself and her from the balcony irons</l>
<l>Some still believe he was freed from her</l>
<l>But I knew her too well</l>
<l>I say she drove him to suicide</l>
<l>And took poor Edward to hell</l>
</lg>
</lg>

Drama: an example

previous table of contents next
25 of 26 [64]
<stage>Enter Beatrice</stage>
<sp who="Beatrice"><l>Against my will I am sent to bid you come in to dinner.</l></sp>
<sp who="Benedick"><l>Fair Beatrice, I thank you for your pains.</l></sp>
<sp who="Beatrice"><l>I took no more pains for these thanks,
than you took pains to to thank me; if it had
been painful I would not have come.</l></sp>
<sp who="Benedick"><l>You take pleasure, then, in the message?</l></sp>
<sp who="Beatrice"><l>Yea, just so much as you may take
upon a knife's point, and choke a daw withal
- You have no stomach, signior; fare you well.</l></sp>
<stage>Exit.</stage>
Enter Beatrice
Beatrice: Against my will I am sent to bid you come in to dinner.
Benedick: Fair Beatrice, I thank you for your pains.
Beatrice: I took no more pains for these thanks, than you took pains to to thank me; if it had been painful I would not have come.
Benedick: You take pleasure, then, in the message?
Beatrice: Yea, just so much as you may take upon a knife's point, and choke a daw withal - You have no stomach, signior; fare you well.
Exit.
Much Ado About Nothing, 2.3

Page and line numbers

previous table of contents next
26 of 26 [64]
  • <pb />: pagebreak
  • <lb />: linebreak
Whan that Aprill with his shoures soote<lb />
The droghte of March hath perced to the roote,<lb />
And bathed every veyne in swich licour<lb />
Of which vertu engendred is the flour;<lb />
<pb ed="riverside" n="23" /> Whan Zephirus eek with his sweete breeth<lb />