Encoding texts for humanities research:
History, method, and tools
University College London
4 December 2008
Edward Vanhoutte
edward.vanhoutte@kantl.be

Royal Academy of Dutch Language and Literature
III. XML, DTD, validation
- Part 1: Theory
- XML: recapitulation
- DTD and Schema
- TEI
- TEI Consortium
- Teilite
- Part 2: Demonstration
- Electronic scholarly editions
- Digital Archive of Letters in Flanders
- James Joyce Timeline: Finnegans Wake Notebooks
- TEI by Example
- Can you describe it?
- What make?
- What do you use it for?
Application: Where can a bike be used? | |
- Geographically
- Topographically
Application: When can a bike be used? | |
- Gender
- Race
- Religion
- Ability
The concept 'bicycle' is generally understood as naming a general means of transportation that can be used
- by everyone
- everywhere
- for every purpose
Even this is recognized as a bike | |
- Can you describe it?
- What make?
- What do you use it for?
Application: Where can a text be used? | |
- Geographically
- Topographically
Application: When can a text be used? | |
- Gender
- Race
- Religion
- Ability
The concept 'text' is generally understood as naming a general means of transportation of language that can be used
- by everyone
- everywhere
- for every purpose
Even this is recognized as a text | |
Concept: What constitutes a bike? | |
- Proprietory
- Incompatibility
- Paying for licences
- Procedural markup
If we want to be able do ride on the right and the left side of the road | |
- Non-proprietory
- Compatibility
- Open Source & free
- Descriptive markup
→ XML
Why would you want to learn about XML? | |
→ XML is not the end of the world, and won't solve all your problems,
but:
- It's a good approximation
- It works (fairly well)
- It's widely supported
- It's a W3C recommendation
Why would you want to learn about XML? | |
→ XML will not live forever, but:
- Migration will be supported
- Migration will create more jobs
Why would you want to learn about XML? | |
→ It's fun.
What does an XML document contain of? | |
Five essential components
What does an XML document contain of? | |
- Processing Instructions
-
<?xml version="1.0" ?>
-
<?xml-stylesheet href="../dtd/xsltslides.xsl"
type="text/xsl" ?>
What does an XML document contain of? | |
- Elements
- <title> or </title>
-
<empty />
- → XML is case sensitive in the naming of the elements.
<title> is not <TITLE> is not <Title> is not <tItle> etc.
What does an XML document contain of? | |
- Attributes (optional)
-
<title
type="journal"
>
-
<name
who="Edward"
reg="VanhoutteE"
>
What does an XML document contain of? | |
- Entity References
- to represent characters which cannot reliably be typed in (ISO 8859-1:
IsoLat1, ISO 10646 - Unicode): é = é = é
- as a shortcut for boiler plate text: &tomatorelish; = reference to an
external recipe (text) for tomato relish
- containers for external (non-XML) data such as graphics: <figure
entity="ascii-full" />
What does an XML document contain of? | |
- CDATA:
- allows you to include application code (JavaScript, Perl, BASIC, etc.)
in an XML document without having to worry about escaping characters. A
CDATA section is treated by the XML processor as it appears
- <![CDATA[This text escapes the XML
processor]]>
- Well-formed XML
- Valid XML
- XML is case sensitive
- At least one element
- There is always a root element
- All logical and physical structures nest properly
- Correspondence of element names in start- and end-tags
- Attribute names only appear once in a start-tag
- Attribute values are quoted
- Attribute values do not refer to external entities
- Entities are declared
- No entity reference contains the name of a non-parsed entity
- Well-formed XML
- Conforms to a Document Type Definition (DTD)
- Or to a valid Schema
→ The formal specification for the structure of an XML document
<?xml version="1.0" ?>
<greeting>
<salutation>hello</salutation>
<target>world</target>
</greeting>
→ The formal specification for the structure of an XML document
<?xml version="1.0" ?>
<greeting>
<salutation>hello</salutation>
<target>world</target>
</greeting>
DTD:
<!ELEMENT greeting (salutation, target)>
<!ELEMENT salutation (#PCDATA)>
<!ELEMENT target (#PCDATA)>
A DTD specifies the vocabulary and the syntax of a markup language
It defines:
- names for all your elements
- names and default values for their attributes
- rules about how elements can nest
- names for re-usable pieces of data (entities)
- and a few other things
A DTD does not specify anything about what elements "mean"
Defining an element in a DTD | |
<!ELEMENT name contentModel>
- name is the name of the element (GI)
- contentModel defines valid content for the element:
- #PCDATA
- EMPTY
- other elements
- mixed content combining PCDATA and other elements
<!ELEMENT email (header, body)>
<!ELEMENT header (subj, date, from, to)>
<!ELEMENT subj (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT body (open | p | ps | close | sign)*>
<!ELEMENT open (#PCDATA)>
<!ELEMENT p (#PCDATA)>
<!ELEMENT close (#PCDATA)>
<!ELEMENT ps (#PCDATA)>
<!ELEMENT sign (name | address)*>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (addrline)+>
<!ELEMENT addrline (#PCDATA)>
Theoretical problem with the DTD | |
- DTDs are written in a formal language different from the XML
documents.
→ XML Schema
- A self-declarative way of documenting the formal specification for the
structure of an XML document
- Expressed in XML itself
Schema can deal with namespaces and DTDs can not
An XML namespace is a collection of names, identified by a URI reference, which
are used in XML documents as element types and attribute names.
e.g.: <table> in doc1 is not necessarily the same as <table> in
doc2
→There are a couple of Schema languages around:
- W3C XML Schema: http://www.w3.org/XML/Schema
- Relax NG: http://www.relaxng.org
- XML-Data (XDR)
- Document Content Description (DCD)
- Schema for Object-oriented XML (SOX)
- Document Definition Markup Language (DDML)
- Schematron
- Datatypes for DTDs (DT4DTD)
- Document Structure Description (DSD)
- Regular Language Description for XML (RELAX)
- TREX (Tree Regular Expressions for XML)
- Examplotron
- Hook
- Document Schema Definition Language (DSDL)
- STEP/EXPRESS and XML
<!ELEMENT book (chapter+)>
<!ELEMENT chapter (section+)>
<!ELEMENT section (p+)>
<!ELEMENT p (#PCDATA)>
Book Schema (Relax NG Simple notation) | |
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
>
<start>
<ref name="book" />
</start>
<define name="book">
<element>
<name>book</name>
<oneOrMore>
<ref name="chapter" />
</oneOrMore>
</element>
</define>
<define name="chapter">
<element>
<name>chapter</name>
<oneOrMore>
<ref name="section" />
</oneOrMore>
</element>
</define>
<define name="section">
<element>
<name>section</name>
<oneOrMore>
<ref name="p" />
</oneOrMore>
</element>
</define>
<define name="p">
<element>
<name>p</name>
<text />
</element>
</define>
</grammar>
<!ELEMENT book (chapter+)>
|
<define name="book">
<element>
<name>book</name>
<oneOrMore>
<ref name="chapter" />
</oneOrMore>
</element>
</define>
|
<!ELEMENT chapter (section+)>
|
<define name="chapter">
<element>
<name>chapter</name>
<oneOrMore>
<ref name="section" />
</oneOrMore>
</element>
</define>
|
<!ELEMENT section (p+) >
|
<define name="section">
<element>
<name>section</name>
<oneOrMore>
<ref name="p" />
</oneOrMore>
</element>
</define>
|
<!ELEMENT p (#PCDATA)>
|
<define name="p">
<element>
<name>p</name>
<text />
</element>
</define>
|
- A DTD/Schema is very useful at data preparation time: validating editors
only allow you to input correct markup
- Useful for consistent encoding in projects
- Guarantees longevity and interchangeability of semantics and structure in
encoded texts
- We need it for validation
- Useful for software development and operability
A valid XML document will reference a Document Type Declaration
A Document Declaration contains the reference to a Document Type Definition (DTD)
or a Schema
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite 1.0//EN"
"../dtd/teixlite.dtd" [ ]>
XML can be validated when we have:
- an XML document
- a DTD or Schema
- a validating parser
Make sure:
- the Doctype Declaration inside the XML document refers to the appropriate
DTD or Schema and its path
- the parser can find an XML declaration
→ Bad news
- Difficult
- Time Consuming
- Expensive
→ Bad news
- Difficult
- Time Consuming
- Expensive
→ Good news
- The work has been done for you
IV. TEI
Text Encoding Initiative (TEI) | |
provides DTD subsets, Schema modules, and elements for the encoding of:
- texts in any natural language
- of any date
- in any literary genre or text type
- without restrictions on form or content
They treat both continuous materials ('running text') and discontinuous materials
such as dictionaries and linguistic corpora.
The use of computers in the humanities | |
- 1949-ca. 1970:
- ad hoc programming in projects
- higher programming languages
- ca. 1970-1985:
- method oriented program packages
- SPSS (Statistical Package for the Social Sciences)
- OCP (Oxford Concordance Program)
- 1985-1997
- PC revolution
- standard software
- dBase, Access
- SGML ISO 8879:1986
- 1997-now
- web / XML orientation
- computer as presentational medium
- web tool
The Text Encoding Initiative (TEI) | |
11-12 November 1987: Vassar College, Poughkeepsie (NY)
→ 32
humanities scholars
Principles:
- Platform-independent
- Software-independent
- Endurability
- Re-usability
- Accessibility
- Language-independent
- For all of the Humanities disciplines
→ SGML ISO 8879:1996
→ Sperberg-McQueen, C.M.. and Burnard, L. (eds.) (2002).
TEI P4: Guidelines for Electronic Text Encoding and Interchange. XML
Version (Oxford, Providence, Charlottesville, Bergen: Text Encoding
Initiative Consortium.)
- 1987: Vassar College, Poughkeepsie
- 1990: P1 : SGML
- 1992: P2 : SGML
- 1994: P3 : SGML : 600 + elements
- 1995: TeiLite : SGML : 131 elements
- 1999: P3rev : SGML
- 2000: TEI Consortium
- 2001: P4 : XML comp
- 2001: TeixLite : XML
- 2005: P5 : XML
- 2007: P5 version 1
From the Poughkeepsie Principles the TEI concluded that the TEI Guidelines should:
- Provide a standard format for data interchange;
- Provide guidance for encoding of texts in this format;
- Support the encoding of all kinds of features of all kinds of texts studied
by researchers;
- Be application independent.
What does the TEI offer you? | |
The TEI has produced a number of DTD subsets/Schema fragments which can be
combined according to the needs for a particular project in the humanities.
- Vocabulary (elements)
- Tuning tools (attributes)
- Syntax (content models and nesting rules)
- Modification and extension guidelines
- Edit texts (e.g. word processors, syntax-directed editors)
- Edit, display, and link texts in hypertext systems
- Format and print texts using desktop publishing systems, or batch-oriented
formatting programs
- Load texts into free-text retrieval databases or conventional databases
- Unload texts from databases as search results or for export to other
software
- Search texts for words or phrases
- Perform content analysis on texts
- Collate texts for critical editions
- Scan texts for automatic indexing or similar purposes
- Parse texts linguistically
- Analyze texts stylistically
- Scan verse texts metrically
- Ink text and images
→ The aim has been to make the TEI Guidelines useful for encoding the
same texts for different purposes.
How does the TEI offer this to you? | |
- Website http://www.tei-c.org
- On-line reference documentation: Guidelines
- Print reference documentation: Guidelines
- On-line schema/DTD generator: ROMA
- Free software
- Free stylesheets
- Community of practitioners: TEI-L maillist
- Experience of projects
- Opportunity to participate through SIG's
- Conferences
Tuning tools (attributes) | |
e.g. Global attributes applicable to all elements
- xml:id for unique identification
- n for (non-unique) name or number
- rend (rendition) indicates how the element in question
was rendered or presented in the source text.
- rendition points to a description of the rendering or
presentation used for this element in the source text.
- xml:lang for language and hence writing-system
- xml:base provides a base URI reference with which
applications can resolve relative URI references into absolute URI
references.
→ Extensible, like other classes
essential common elements
→ abbr add addrLine address analytic author bibl biblScope biblStruct binaryObject cb choice cit corr date del desc distinct divGen editor email emph expan foreign gap gloss graphic head headItem headLabel hi imprint index item l label lb lg list listBibl measure measureGrp meeting mentioned milestone monogr name note num orig p pb postBox postCode ptr pubPlace publisher q quote ref reg relatedItem resp respStmt rs said series sic soCalled sp speaker stage street teiCorpus term time title unclear
- analysis: adds elements for simple analytic mechanisms
- certainty: adds elements for recording uncertainty and responsibility
- corpus: adds specialized elements to the TEI-header for use with language corpora
- dictionaries: replaces the basic structure with one containing detailed lexicographic features
- drama: adds specialist tagging for cast lists, records of first performance, etc. to the basic drama markup already included in the core
- figures: adds elements for encoding tables, pictures, and formulae
- gaiji: adds elements for epresentation of Non-standard Characters and Glyphs
- header: adds elements for recording common metadata
- iso-fs: adds elements for feature structure analysis
- linking: adds elements for hypertext linking, segmentation, and alignment
- msdescription: adds elements for description of manscripts
- namesdates: adds elements for the detailed tagging of names and dates
- nets: adds elements for recording the abstract structure of mathematical graphs, networks, and trees
- spoken: replaces the basic structure by one suitable for linguistic analysis of speech acts, etc
- tagdocs: adds elements for the documentation of the XML elements and element classes which make up any markup scheme
- tei: TEI infrastructure
- textcrit: adds elements for text-critical apparatus
- textstructure: default Text Structure
- transcr: adds elements for the transcription of primary sources (e.g. manuscripts)
- verse: adds specialist tagging for metrical analysis, rhyme-scheme etc. to the basic verse markup already included in the core
Structure of a TEI Document | |

|
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title><!--Title--></title>
</titleStmt>
<publicationStmt>
<p><!--Publication Information--></p>
</publicationStmt>
<sourceDesc>
<p><!--Information about the source--></p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text xml:id="text1">
<body>
<p>This is the first paragraph</p>
</body>
</text>
</TEI>
|
A TEI-conformant document | |
comprises a header followed by a text
<TEI>
<teiHeader>...<teiHeader>
<text>...<text>
<TEI>
The header is essential for:
- bibliographic control and identification
- resource documentation and
- processing (see later)
The TEI Header is introduced by the element <teiHeader> that contains descriptive meta-information. This should minimally document following aspects of the electronic file itself (<fileDesc>):
- the title statement (<titleStmt>), providing information about the title, author and others responsible for the electronic text
- the publication statement (<publicationStmt>), providing publication details about the electronic text
- a description of the source (<sourceDesc>), documenting bibliographic details about the electronic text's material source (if any)
- file description <fileDesc>: contains a full bibliographic
description of an electronic file amongst which information about the sources
from which the electronic text was derived. Essential for bibliographic
referencing and cataloguing.
- encoding description <encodingDesc>: documents the
relationship between an electronic text and the source or sources from which it
was derived. It allows for documenting detailed information about
transcription/transliteration principles such as normalization, the treatment
of quotations and hyphenation and the levels of interpretation i.e. analytic
tagging and encoding applied to the document.
- profile description <profileDesc>: provides a detailed
description of non-bibliographic aspects of a text, specifically the languages
and sublanguages used, the situation in which it was produced, the participants
and their settings.
- revision description <revisionDesc>: summarizes the revision
history for a file, which is important for version control and for resolving
questions about the history of a file, especially when a team of scholars is
working on the same document.
The full form of a TEI Header is thus:
<teiHeader>
<fileDesc> ... </fileDesc>
<encodingDesc> ... </encodingDesc>
<profileDesc> ... </profileDesc>
<revisionDesc> ... </revisionDesc>
</teiHeader>
While a minimal header takes the form:
<teiHeader>
<fileDesc> ... </fileDesc>
</teiHeader>
- one of many possible views of the TEI DTD
- small and simple
- designed to meet 90% of the needs of 90% of the TEI user community
- 128 elements
- was devised as a didactic stepstone to the full flavour TEI, but began a
life of its own
- realistic for existing texts and for document production
→ TEI Lite: Encoding for Interchange: an introduction to the TEI Revised for TEI P5 release.
A text usually has divisions <div> | |
- generic, hierarchic subdivisions
- vanilla or numbered
- type attribute
- associated <head> and <trailer>
-
<div>
- (The full TEI supports numbered divisions: <div0>, <div1>,
<div2>, <div3>, <div4>, <div5>, <div6>,
<div7>,)
|
|
<text>
<front>
<!-- titlepage etc here -->
</front>
<body>
<div type="book" n="1" xml:id="b0100">
<head>Book1</head>
<div type="chapter" n="1" xml:id="b0101">
<head>Chapter 1</head>
<!-- rest of the chapter -->
</div>
<div type="chapter" n="2" xml:id="b0102">
<head>Chapter 2</head>
<!-- rest of the chapter -->
</div>
</div>
</body>
</text>
|
Text components in TEI Lite | |
What are divisions made of?
- Prose is mostly paragraphs (
<p>
)
- Verse is mostly lines (
<l>
), sometimes in hierarchic groups (
<lg>
)
- Drama is mostly speeches (
<sp>
) containing
<p>
or
<l>
and interspersed with stage directions (
<stage>
)
These may be mixed, and may appear also directly within undivided texts
<p>Cras interdum sollicitudin dui. Vivamus mattis pretium turpis.
Pellentesque dolor lectus, lobortis non, euismod eleifend, feugiat
sit amet, diam. Suspendisse potenti. Proin id massa non ligula
sodales fermentum. In sodales justo eget leo.</p>
<p>Praesent nec felis. Vestibulum ante ipsum primis in faucibus
orci luctus et ultrices posuere cubilia Curae; Praesent ipsum nisi,
sodales id, eleifend nec, lobortis eget, nulla.</p>
<lg type="poem">
<lg type="stanza">
<l>Poppadom</l>
<l>Oatmeal</l>
<l>Bubble gum</l>
<l>Cut of veal</l>
</lg>
<lg type="stanza">
<l>Mince for pie</l>
<l>Frozen peas</l>
<l>Video for Guy</l>
<l>Selection of teas</l>
</lg>
<lg type="stanza">
<l>Paper towels/garbage bags</l>
<l>Pasta sauce and Parmesan</l>
<l>Pumpkin seed and olive oil</l>
</lg>
<lg type="stanza">
<l>Cheesy crisps and favourite mags</l>
<l>Kidney beans (1 large can)</l>
<l>>Cling film and kitchen foil</l>
</lg>
</lg>
<stage>A customer enters a pet shop.</stage>
<sp who="Customer"><l>Ello, I wish to register a complaint</l></sp>
<stage>The owner does not respond.</stage>
<sp who="Customer"><l>Ello, Miss?.</l></sp>
<sp who="Owner"><l>What do you mean "miss"?</l></sp>
<sp who="Customer"><l>I'm sorry, I have a cold. I wish to make a
complaint!</l></sp>
<sp who="Owner"><l>We're closin' for lunch.</l></sp>
<sp who="Customer"><l>Never mind that, my lad. I wish to complain about
this parrot what I purchased not half an hour ago from
this very boutique.</l></sp>
A customer enters a pet shop.
Customer: 'Ello, I wish to register a complaint.
The owner does not respond.
Customer:'Ello, Miss?
Owner: What do you mean "miss"?
Customer: I'm sorry, I have a cold. I wish to make a complaint!
Owner: We're closin' for lunch.
Customer: Never mind that, my lad. I wish to complain about this parrot what I purchased not half an hour ago from this very boutique.
→ Monthy Python, The Dead Parrot Sketch
- <pb />: pagebreak
- <lb />: linebreak
Whan that Aprill with his shoures soote<lb />
The droghte of March hath perced to the roote,<lb />
And bathed every veyne in swich licour<lb />
Of which vertu engendred is the flour;<lb />
<pb ed="riverside" n="23" />
Whan Zephirus eek with his sweete breeth<lb />
Texts are not just words... | |
- but probably only people know that
- an encoding may claim to capture
- just visual salience
- just its assumed causes
- both
- encoding makes explicit one (or more) sets of interpretations
<hi>: used to indicate typographic highlighting:
<hi rend="gothic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,...
<hi> used to indicate typographic highlighting:
<hi rend="gothic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,...
A descriptive tag can document the reason for highlighting:
<seg type="formula">And this Indenture further witnesseth</seg>
that the said <name rend="italic">Walter Shandy</name>, merchant,...
Interpretive highlighting | |
<emph>: used to indicate linguistic (not typographic) emphasis
You did <emph>what?</emph>
Varieties of linguistic distancing: <gloss>, <term>,
<soCalled>, and <mentioned>
A <term>DTD</term> specifies <gloss>the vocabulary
and the syntax of a markup language</gloss>
They put us out of work and call it <soCalled>downsizing<soCalled>
<mentioned>Downsizing</mentioned> is a very nasty neologism
<q>: Direct - Indirect speech & thought | |
- Use the type attribute to indicate whether it is spoken or thought
- Use the who attribute to show speakers
- Speeches can be nested in other speeches
<p>Praesent a orci. Donec cursus augue in leo. Nam tristique. Morbi
consequat diam in neque. Nullam ac ipsum laoreet mi porta fringilla
Barbara: <said who="Barbara" direct="true" aloud="true" rend="inline">
“Vestibulum ante ipsum primis in faucibus
orci luctus et ultrices posuere cubilia Curae”</said>;
Etiam placerat hendrerit lacus. Nulla sodales.</p>
<q>: Direct - Indirect speech & thought | |
Whether or not the quotation marks (or any other mark) are explicitly transcribed and encoded is up to the encoder. Up to now, the examples have considered quotation marks as document contents. Alternatively the rendering of the quotation marks can be documented inside an <att>rend</att> attribute using some appropriate set of conventions. A possible alternative for one of the examples above could be:
<p>Praesent a orci. Donec cursus augue in leo. Nam tristique. Morbi
consequat diam in neque. Nullam ac ipsum laoreet mi porta fringilla
Barbara: <said who="Barbara" direct="true" aloud="true" rend="PRE ldquo POST rdquo">
Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae</said>;
Etiam placerat hendrerit lacus. Nulla sodales.</p>
"Foreign" language phrases | |
- The xml:lang attribute may be attached to any element
- Use <foreign> if nothing else is available
- Define each language in <langUsage> in the <teiHeader>
<profileDesc>
<langUsage>
<language xml:id="deu">German</language>
<language xml:id="fra">French</language>
</langUsage>
</profileDesc>
<p>Have you read <title xml:lang="deu">Die Dreigroschenoper</title>?
<mentioned xml:lang="fra">Savoir-faire</mentioned> is French for know-how.
John has real <foreign xml:lang="fra">savoir-faire</foreign>.</p>
- are often by convention typographically distinct
- "data-like" (names, numbers, dates, times, addresses)
- <name>
- <num>
- <date>
- <time>
- <address> & <addrLine>
- editorial intervention (corrections, regularizations, additions, omissions
...)
- <corr>
- <sic>
- <reg>
- <orig>
- <add>
- <del>
- cross references and links
- attributes can be used to quantify <date> expressions
- similarly, times <time>, and numbers <num>
Today is <date>Friday 4th</date>
Today is <date value="2008-12-04">Friday 4th</date>
One day in <date certainty="approx" value="2008-12">early December</date>
Its now <time value="12:00">noon</time>.
<num value="4">four</num>
<num value="3.1419">pi</num>
Omissions, Deletions, Additions | |
- <gap>: omission by transcriber
- <del>: cancellation in source or by editor
- <add> or <supplied>: insertion in source or by editor
- <unclear>: material uncertain because illegible
- <damage>: physical damage to text carrier
<gap reason="illegible" desc="bloodstain">
He was <del resp="EV" hand="author" type="strike">not</del> very nice.
He was <add resp="EV" hand="author" place="supralinear">not</add> very nice.
He was <unclear resp="EV" reason="ink stain">not</unclear> very nice.
Abbreviations and expansions | |
- <abbr>: contains an abbreviation. It's expansion is given in
an "expan" attribute.
We learn how to write <abbr expan="eXtensible Markup Language">XML</abbr>
- <address>: contains a postal or other address.
- <addrLine>: contains a line inside an <address>.
<address>
<addrLine>Koninklijke Academie voor Nederlandse Taal- en
Letterkunde</addrLine>
<addrLine>Koningstraat 18</addrLine>
<addrLine>b-9000 Gent</addrLine>
<addrLine>tel: +32 (0)9 265.93.50</addrLine>
<addrLine>fax: +32 (0)9 265.93.49</addrLine>
<addrLine>email: ctb@kantl.be</addrLine>
</address>
<list>: contains any sequence of items organized as a list.
<item>: contains one component of a list.
<label>: contains the label associated with an item in a list; in
glossaries, marks the term being defined.
<list type="ordered">
<item>Week 1</item>
<item>Week 2</item>
<item>Week 3</item>
<item>Week 4</item>
<item>Week 5</item>
</list>
<list type="gloss">
<head>Vocabulary</head>
<label xml:lang="enm">Whan that</label> <item>When</item>
<label xml:lang="enm">Aprill</label> <item>April</item>
<label xml:lang="enm">with</label> <item>with</item>
<label xml:lang="enm">his</label> <item>its</item>
<label xml:lang="enm">shoures</label> <item>showers</item>
<label xml:lang="enm">soote</label> <item>sweet</item>
</list>
<note>: contains a note or annotation.
<p>Indeed, hypertext is just the visualization of linking which
DeRose & Van Dam define as "the ability to express relationships
between places in a universe of information"
<note place="foot" n="1">"A place should be any piece of
information, or at least any that exists in a stable or recoverable
form." (DeRose & van Dam 9).</note></p>
- a <table> element contains <row>s of <cell>s
- spanning is indicated by rows and cols attributes
- role attribute indicated whether <row> or <cell> holds data or
a label
- embedded tables are permitted
<table rows="2" cols="2">
<row role="label">
<cell>Male Students</cell>
<cell>Female Students</cell>
</row>
<row role="data">
<cell>3</cell>
<cell>6</cell>
</row>
</table>
Use simple <bibl> with optional subcomponents:
- <respStmt> (for any kind of responsibility) or <author>,
<editor>, etc.
- <title> with optional level attribute
- <imprint> groups publication details
- <biblScope> adds page references
- Use <listBibl> for list of references
<bibl>
<author>Walsh, Marcus</author>
<date>(1993)</date>
<title level="a">The Fluid Text and the Orientations of Editing.</title>
<editor>Chernaik, Warren, Caroline Davis, and Marilyn Deegan</editor>
<title level="m">The Politics of the Electronic Text</title>
<imprint>
<pubPlace>Oxford</pubPlace>
<publisher>Office for Humanities Communication</publisher>
</imprint>
<biblScope>31-39</biblScope>
</bibl>
The <rs> (referring string)
→ element is used for any kind of name or reference
<q>My dear <rs type="person" key="BENM1">Mr. Bennet</rs>,</q>
said <rs type="person" key="BENM2">his lady</rs> to him one day,
<q>have you heard that
<rs type="place" key="NETP1">Netherfield Park</rs> is let
at last?</q>
- The presence of a graphic is indicated by the <figure> element
- The title of the graphic is tagged as a <head>
- A description of the graphic may be supplied (as a <figDesc>) for use
by software unable to render the graphic
- The graphic itself is specified as an external entity
<!ENTITY logoctb SYSTEM "logoctb.gif" NDATA GIF>
<figure entity="logoctb">
<head>The logo of the CTB</head>
<figdesc>The letters c, t, and b in red with black border on
a white field</figdesc>
</figure>
problem
- XML allows only one hierarchy at a time
- Is a document
- chapter-paragraph-phrase
- gathering-page-leaf
- or both?
- discontinuous segments
- links and milestones
But what are we doing with this? | |