B33080 Humanities Computing: Electronic Text

University of Antwerp, Campus Drie Eiken

Second Term 2005

Edward Vanhoutte

edward.vanhoutte@kantl.be

TOC | First

B33080 Humanities Computing: Electronic Text

Week 2: History of the Internet - Hypertext XML theory and practice: Text & Computers - Text Encoding & Markup - Document Analysis - DTD.

University of Antwerp, Campus Drie Eiken

Monday 14 February

Edward Vanhoutte

TOC | First


I. Monday 14 February Overview

Monday 14 February: Overview

previous table of contents next
1 of 2 [68]
    • Revision of week 1
    • The Internet
    • The World Wide Web
    • Hypertext
    • Text and Computers
    • Text encoding and markup
      • Document analysis
      • Structure and content
      • Document Type Definition (DTD)

Goals of this lecture

previous table of contents next
2 of 2 [68]
After this lecture, you should
  • have no questions anymore about week 1 and the readings
  • know the history of the Internet and the WWW
  • understand what hypertext is
  • be able to provide a definition of what hypertext is
  • understand the problems with text and computers
  • understand the problems with proprietary software
  • be able to analyse a document
  • understand what text encoding and markup are
  • have a clue of what a DTD is

II. Monday 21 February Revision of week 1 & week 2.

1. Monday 7 February: Introduction to this course - Humanities Computing.

previous table of contents next
1 of 1 [68]
  1. Introduction to this course
    1. Objectives of this course
    2. (Non-)Assumptions
    3. Me & You
    4. Housekeeping Rules
    5. Overview of the Course
    6. Test Elementary Computer Skills
  2. Introduction to Humanities Computing
    1. Humanities Computing: definitions
    2. Humanities Computing: a field, a discipline
      1. Associations involved
      2. Journals and mailinglists
      3. Publications
      4. Institutions
    3. Humanities Computing: short history
  3. Computing
    1. Hardware
    2. Graphical Interface

III. Monday 14 February The Internet & the WWW

3. The Internet & the WWW

previous table of contents next
1 of 11 [68]
  • Definition
  • Services
  • Short history

Official definition

previous table of contents next
2 of 11 [68]
"The Federal Networking Council (FNC) agrees that the following language reflects our definition of the term "Internet". "Internet" refers to the global information system that --
  1. is logically linked together by a globally unique address space based on the Internet Protocol (IP) or its subsequent extensions/follow-ons;
  2. is able to support communications using the Transmission Control Protocol/Internet Protocol (TCP/IP) suite or its subsequent extensions/follow-ons, and/or other IP-compatible protocols; and
  3. provides, uses or makes accessible, either publicly or privately, high level services layered on the communications and related infrastructure described herein."
On October 24, 1995, the FNC unanimously passed a resolution defining the term Internet. This definition was developed in consultation with the leadership of the Internet and Intellectual Property Rights (IPR) Communities.

What does this mean?

previous table of contents next
3 of 11 [68]
  1. The Internet is a worldwide collection of computer networks connecting academic, governmental, commercial, and organizational sites.
  2. It provides access to communication services and information resources to millions of users around the globe.
  3. Internet services include:
    • direct communication (e-mail, IRC-chat)
    • online conferencing (Usenet News, e-mail discussion lists)
    • distributed information resources (World Wide Web, Gopher)
    • remote login and file transfer (telnet, ftp)
    • and many other valuable tools and resources (internet telephony)
→ The Internet and the WWW are no synonyms

A worldwide collection of computer networks: Short history

previous table of contents next
4 of 11 [68]
  • 1964: Paul Baran - Packet switched network
  • 1969: ARPANET - 4 nodes
  • 1972: 24 nodes
  • 1973: TCP/IP - Vinton Cerf - INTERNET
  • 1976: TCP/IP on ARPANET
  • 1977: demonstration of the Internet
  • 1981: BITNET and CSNET
  • 1983: domain server
  • 1984: NFS backbone
  • 1991: WWW - Tim Berners-Lee - Cern

The early days

previous table of contents next
5 of 11 [68]
  • 1964: Paul Baran - Packet switched network ← → Circuit network
  • 1969: ARPANET - 4 nodes
    First test with two computers: one in Los Angeles, one in Stanford
    On their first attempt to log into Stanford's computer by typing "log win", UCLA researchers crashed their computer when they typed the letter "g"
    • UCLA (Los Angeles)
    • Stanford
    • UC Santa Barbara
    • University of Utah
  • 1972: 24 nodes

1973: TCP/IP - Vinton Cerf - INTERNET

previous table of contents next
6 of 11 [68]
Transmission Control Protocol/Internet Protocol
  • TCP: information is split up in a couple of packages which need not arrive in the same order as they were sent, or via the same route.
  • IP: each package of information carries a stamp with the address of destination.
→ 1976: TCP/IP on ARPANET

1977: demonstration Internet

previous table of contents next
7 of 11 [68]

The Internet

previous table of contents next
8 of 11 [68]
  • 1981: BITNET and CSNET
  • 1983: domain server
  • 1984: NFS backbone

e-mail

previous table of contents next
9 of 11 [68]
  • 1971: Ray Tomlinson
  • 1972: Larry Roberts: read, reply, forward, save...
  • 1976: Queen Elisabeth sent mail
  • 1983: Dynamic Name Server (University of Wisconsin)

Domain Server

previous table of contents next
10 of 11 [68]
uia.ua.ac.be
  • be
  • ac
  • ua
  • uia
→ Exact IP number of addressee is not needed anymore.

The growth of the Internet

previous table of contents next
11 of 11 [68]
  • 08/1981: 213
  • 10/1985: 1,961
  • 10/1990: 313,000
  • 07/1995: 8,200,000
  • 07/2000: 93,047,785
  • 01/2001: 109,574,429
  • 07/2001: 125,888,197
  • 01/2002: 147,344,723
  • 07/2002: 162,128,493
  • 01/2003: 171,638,297
  • 01/2004: 233,101,481
→ Internet Systems Consortium http://www.isc.org/

IV. Monday 21 February : World Wide Web

4. World Wide Web

previous table of contents next
1 of 2 [68]
Technical definition:
  • all the resources and users on the Internet that are using the Hypertext Transfer Protocol (HTTP).
Tim Berners-Lee:
  • The World Wide Web is the universe of network-accessible information, an embodiment of human knowledge.

1991: WWW - Tim Berners-Lee - Cern

previous table of contents next
2 of 2 [68]
  • Information management
  • Original name: Mesh
  • Non-linear text system: hypertext

V. Monday 21 February Hypertext

5. Hypertext

previous table of contents next
1 of 12 [68]
Information management
  • Paul Otlet (1932)
  • Vannevar Bush (1945)
  • Ted Nelson (1965)
  • NLS (oNLine System): Doug Engelbart
  • HES (Hypertext Editing System): Andries van Dam en Ted Nelson
  • FRESS (File Retrieval and Editing System): Brown University
  • Commercial hypertext systems
  • HTML: Tim Berners-Lee & Robert Caillau
  • Browser war

Paul Otlet (1868-1944)

previous table of contents next
2 of 12 [68]
→ Traité de Documentation (1934)
  • Le Document: each carrier of information
    • facts
    • interpretation of facts
    • statistics
    • source material
  • Standard 3"x5" card - data sheet:
    • Monographic Principle
    • Universal Decimal Classification (UDC)
→ The Mundaneum: an international documentary network

Mundaneum

previous table of contents next
3 of 12 [68]

The work desk

previous table of contents next
4 of 12 [68]
  • fitted with machines and auxiliary instruments of intellectual work
  • machines to transform speech into writing and vice versa
  • an application of television, to allow texts to be made available for remote reading
  • reading machines scanning the physical items (search and retrieval)
  • add to existing texts held remotely in such a way that the original texts were not disturbed
On the work desk there might be no books or other documents at all, but only a screen and a telephone. The work station would be connected to a centre of knowledge by telephone, wireless telegraphy, television and telex ("téléaugraphie", "téléphotographie,")
  • screens
  • loudspeaker
  • selection machines

Virtual Machines

previous table of contents next
5 of 12 [68]
a machinery unaffected by distance which would combine at the same time radio, x-rays, cinema and microscopic photography. All the things of the universe and all those of man would be registered from afar as they were created. Thus the moving image of the world would be established -- its memory, its true duplicate. From afar anyone would be able to read any passage, expanded or limited to the desired subject, that would be projected onto his individual screen, Thus in his armchair, anyone would be able to contemplate the whole of creation or particular parts of it (1935, p. 390-1).

Vannevar Bush

previous table of contents next
6 of 12 [68]
→ As We May Think (1945)
Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Memex

Ted Nelson

previous table of contents next
7 of 12 [68]
→ A File Structure for the Complex, the Changing, and the Indeterminate. (1945)
  • Xanadu
  • docuverse
Hypertext was an audacious choice: hyper- has a bad odour in some fields and can suggest agitation and pathology, as it does in medicine and psychology. But in other sciences hyper- connotes extension and generality, as in the mathematical hyperspace, and this was the connotation I wanted to give the idea.

NLS, HES, FRESS

previous table of contents next
8 of 12 [68]
  • NLS (oNLine System): Doug Engelbart
  • HES (Hypertext Editing System): Andries van Dam en Ted Nelson
  • FRESS (File Retrieval and Editing System): Brown University
4 central concepts
  1. internal structure of nodes or documents
  2. alternate views
  3. bidirectional linking
  4. link classification

Commercial systems

previous table of contents next
9 of 12 [68]
  • Storyspace
  • Guide
  • Notecards
  • Hypercard

Hypertext Markup Language (HTML): Tim Berners-Lee & Robert Caillau

previous table of contents next
10 of 12 [68]
  • text
  • formulae
  • drawings
  • graphics
  • simple
  • http (hypertext transfer protocol)

Graphical browsers

previous table of contents next
11 of 12 [68]
→ MOSAIC: Marc Andreessen

Hypertext: a definition

previous table of contents next
12 of 12 [68]
  • An electronic multilinear visualisation of
  • a set of information units (nodes) which
  • may contain text, sound, and/or images, and which
  • are linked to each other by way of hyperlinks.

VI. Monday 14 February: Text & Computer

Helpdesk for the book

previous table of contents next
1 of 15 [68]

Workflow: mantra in 9 lines

previous table of contents next
2 of 15 [68]
  1. Project definition
  2. Document analysis
  3. Encoding design
  4. Encoding
  5. Validating
  6. Functionality: storyboard
  7. Transform, generate, implement
  8. Document
  9. Publish

6. Why electronic texts?

previous table of contents next
3 of 15 [68]
2 kinds of electronic texts:
  1. digitally created (born digital)
  2. digitized
→ 2 different answers to the questions:
  1. technological progress
    • create access to material
    • support preservation policy
    • collection acquisition/completion
    • flexible use
    • institutional and strategic advantage (prestige)
    • research
    • education

2 different strategies

previous table of contents next
4 of 15 [68]
  1. Short term thinking
    • document creation/production
    • print
    • mail
    • document lay-out
    • ease of the word processor
    • WYSIWYG interface
    • → .rtf, .doc, .wpd, .pdf, .xls, .dmb
    • → Word, WordPerfect, Adobe Acrobat, Excel, Access
  2. Long term thinking
    • justify investments
    • retain access to material
    • exchange of data
    • document structure
    • → ISO standards and W3C norms

Text & Computer

previous table of contents next
5 of 15 [68]
Texts cannot be put into computers. Neither can numbers. Computers can contain and operate on patterns of electronic charges, but they cannot contain numbers, which are abstract mathematical objects not electronic charges, nor texts, which are complex, abstract cultural and linguistic objects.
Michael Sperberg-McQueen, 'Text in the Electronic Age: Textual Study and Text Encoding with examples from Medieval Texts.' Literary and Linguistic Computing, 6/1 (1991): 34-46. (34)

Computers work with a representation of text

previous table of contents next
6 of 15 [68]

Output

previous table of contents next
7 of 15 [68]
Die Leiden des jungen Werther is an exceptionally good example of a book full of Weltschmerz.

Proprietary Code (RTF)

previous table of contents next
8 of 15 [68]
Die Leiden des jungen Werther is an exceptionally good example of a book full of Weltschmerz.
{\i Die Leiden des jungen Werther} is an {\i exceptionally} good example of a book full of {\i Weltschmerz}.

Binary Code

previous table of contents next
9 of 15 [68]
Die Leiden des jungen Werther is an exceptionally good example of a book full of Weltschmerz.
{\i Die Leiden des jungen Werther} is an {\i exceptionally} good example of a book full of {\i Weltschmerz}.
000101110000101000000011111010001111111011111 011001000000111111010100100010111101000100100 001111100100010011111110001010101010101010111 110001011100001010110001001000010000000010101 011110101001100010111000010101100010010000111

2 Main Problems

previous table of contents next
10 of 15 [68]
  • Interchange between systems and platforms causes loss of information
    → Short data life cycle
  • Loss of semantic information by translation to visual information

Example: Synoptic Edition

previous table of contents next
11 of 15 [68]

WP 5.1 → MS Word 2000

previous table of contents next
12 of 15 [68]

RTF → WP 9

previous table of contents next
13 of 15 [68]

Solution

previous table of contents next
14 of 15 [68]
An international(ly) accepted standard which:
  • is software and platform independent
  • can describe the logical, structural, and semantic elements of a text

Solution

previous table of contents next
15 of 15 [68]
An international(ly) accepted standard which:
  • is software and platform independent
  • can describe the logical, structural, and semantic elements of a text
→ Markup
<title>Die Leiden des jungen Werther</title> is an
<emph>exceptionally</emph> good example of a book full of
<lang="German">Weltschmerz</lang>.

VII. Monday 14 February: Text encoding and markup

Workflow: mantra in 9 lines

previous table of contents next
1 of 25 [68]
  1. Project definition
  2. Document analysis
  3. Encoding design
  4. Coding
  5. Validating
  6. Functionality: storyboard
  7. Transform, generate, implement
  8. Document
  9. Publish

7. Text encoding and markup

previous table of contents next
2 of 25 [68]
  • Texts are more than simply sequences of glyphs
    → They have structure and content and they also have multiple readings

Document analysis: Exercise

previous table of contents next
3 of 25 [68]

Workflow: mantra in 9 lines

previous table of contents next
4 of 25 [68]
  1. Project definition
  2. Document analysis
  3. Encoding design
  4. Encoding
  5. Validating
  6. Functionality: storyboard
  7. Transform, generate, implement
  8. Document
  9. Publish

Text encoding and markup

previous table of contents next
5 of 25 [68]
  • Texts are more than simply sequences of glyphs
    → They have structure and content and they also have multiple readings
  • Text encoding or markup provides a means of making such structure, content, and readings explicit
→ Only what is explicitly articulated can be digitally processed

Texts have structure and content 1

previous table of contents next
6 of 25 [68]
POOREDWARDDIDYOUHEARTHENEWSABOUTEDWARD?ONTHEBACKOFHISHE
ADHEHADANOTHERFACEWASITAWOMAN'SFACEORAYOUNGGIRLTHEYSAID
TOREMOVEITWOULDKILLHIMSOPOOREDWARDWASDOOMEDTHEFACECOULD
LAUGHANDCRYITWASHISDEVILTWINANDATNIGHTSHESPOKETOHIMTHIN
GSHEARDONLYINHELLBUTTHEYWEREIMPOSSIBLETOSEPARATECHAINED
TOGETHERFORLIFEFINALLYTHEBELLTOLLEDHISDOOMHETOOKASUITEO
FROOMSANDHUNGHIMSELFANDHERFROMTHEBALCONYIRONSSOMESTILLB
ELIEVEHEWASFREEDFROMHERBUTIKNEWHERTOOWELLISAYSHEDROVEHI
MTOSUICIDEANDTOOKPOOREDWARDTOHELL

Texts have structure and content 2

previous table of contents next
7 of 25 [68]
Poor Edward. Did you hear the news about Edward? On the back of
his head he had another face. Was it a woman's face or a
young girl? They said to remove it would kill him, So poor
Edward was doomed. The face could laugh and cry. It was his devil
twin. And at night she spoke to him things heard only in hell. But
they were impossible to separate, Chained together for life. Finally
the bell tolled his doom. He took a suite of rooms and hung himself
and her from the balcony irons. Some still believe he was freed
from her, but I knew her too well. I say she drove him to suicide,
and took poor Edward to hell.

Texts have structure and content 3

previous table of contents next
8 of 25 [68]
Poor Edward

Did you hear the news about Edward?
On the back of his head he had another face
Was it a woman's face or a young girl?
They said to remove it would kill him
So poor Edward was doomed

The face could laugh and cry
It was his devil twin
And at night she spoke to him
Things heard only in hell
But they were impossible to separate
Chained together for life

Finally the bell tolled his doom
He took a suite of rooms
And hung himself and her from the balcony irons
Some still believe he was freed from her
But I knew her too well
I say she drove him to suicide
And took poor Edward to hell

Texts have structure and content 4

previous table of contents next
9 of 25 [68]
Poor Edward
<line>Did you hear the news about Edward?</line>
<line>On the back of his head he had another face</line>
<line>Was it a woman's face or a young girl?</line>
<line>They said to remove it would kill him</line>
<line>So poor Edward was doomed</line>
<line>The face could laugh and cry</line>
<line>It was his devil twin</line>
<line>And at night she spoke to him</line>
<line>Things heard only in hell</line>
<line>But they were impossible to separate</line>
<line>Chained together for life</line>
<line>Finally the bell tolled his doom</line>
<line>He took a suite of rooms</line>
<line>And hung himself and her from the balcony irons</line>
<line>Some still believe he was freed from her</line>
<line>But I knew her too well</line>
<line>I say she drove him to suicide</line>
<line>And took poor Edward to hell</line>

Texts have structure and content 5

previous table of contents next
10 of 25 [68]
Poor Edward
<stanza>
<line>Did you hear the news about Edward?</line>
<line>On the back of his head he had another face</line>
<line>Was it a woman's face or a young girl?</line>
<line>They said to remove it would kill him</line>
<line>So poor Edward was doomed</line>
</stanza>
<stanza>
<line>The face could laugh and cry</line>
<line>It was his devil twin</line>
<line>And at night she spoke to him</line>
<line>Things heard only in hell</line>
<line>But they were impossible to separate</line>
<line>Chained together for life</line>
</stanza>
<stanza>
<line>Finally the bell tolled his doom</line>
<line>He took a suite of rooms</line>
<line>And hung himself and her from the balcony irons</line>
<line>Some still believe he was freed from her</line>
<line>But I knew her too well</line>
<line>I say she drove him to suicide</line>
<line>And took poor Edward to hell</line>
</stanza>

Texts have structure and content 6

previous table of contents next
11 of 25 [68]
<poem>
<title>Poor Edward</title>
<stanza>
<line>Did you hear the news about Edward?</line>
<line>On the back of his head he had another face</line>
<line>Was it a woman's face or a young girl?</line>
<line>They said to remove it would kill him</line>
<line>So poor Edward was doomed</line>
</stanza>
<stanza>
<line>The face could laugh and cry</line>
<line>It was his devil twin</line>
<line>And at night she spoke to him</line>
<line>Things heard only in hell</line>
<line>But they were impossible to separate</line>
<line>Chained together for life</line>
</stanza>
<stanza>
<line>Finally the bell tolled his doom</line>
<line>He took a suite of rooms</line>
<line>And hung himself and her from the balcony irons</line>
<line>Some still believe he was freed from her</line>
<line>But I knew her too well</line>
<line>I say she drove him to suicide</line>
<line>And took poor Edward to hell</line>
</stanza>
</poem>

Schematic

previous table of contents next
12 of 25 [68]

2 Operable Conditions

previous table of contents next
13 of 25 [68]
  • Markup should be separated from content
  • Markup should be processable → logical & predictable

Markup should be separated from content

previous table of contents next
14 of 25 [68]
Use of tags with open and close delimiters
<tag>content</tag>

Markup should be processable

previous table of contents next
15 of 25 [68]
  • Logical
  • Predictable
→ OHCO thesis

Russian Doll or OHCO Thesis

previous table of contents next
16 of 25 [68]
A document is an Ordered Hierarchy of Content Objects

Russian Doll or OHCO Thesis

previous table of contents next
17 of 25 [68]
A document is an Ordered Hierarchy of Content Objects

Markup Model for a Book

previous table of contents next
18 of 25 [68]

<book>
  <chapter n="1">
    <section n="1">
      <p>...</p>
      <p>...</p>
    </section>
    <section n="2">
      <p>...</p>
      <p>...</p>
    </section>
  </chapter>
  <chapter n="2">
    <!- - more sections and paragraphs - ->   </chapter>
</book>

Markup should be processable

previous table of contents next
19 of 25 [68]
  • Logical
  • Predictable
→ Document Type Definition (DTD)

Document Type Definition (DTD)

previous table of contents next
20 of 25 [68]
Different documenttypes → different organization/order & different content objects
  • Poetry
  • Prose
  • Drama
  • Letters
  • Bibliographies
  • Dictionaries
  • Lists
  • ...

Document Type Definition (DTD)

previous table of contents next
21 of 25 [68]
Different documenttypes → different organization/order & different content objects
  • Poetry
  • Prose
  • Drama
  • Letters
  • Bibliographies
  • Dictionaries
  • Lists
  • ...
→ The rules of the game

Document Type Definition (DTD)

previous table of contents next
22 of 25 [68]
A DTD specifies the vocabulary and the syntax of a markup language
It defines:
  • names for all your elements
  • names and default values for their attributes
  • rules about how elements can nest
  • names for re-usable pieces of data (entities)
  • and a few other things
A DTD does not specify anything about what elements "mean"

Summary

previous table of contents next
23 of 25 [68]
  • Computers work with representations of text
  • Proprietary software merges structure, content, meaning, and layout in one code
    → 2 problems
    • Short data life cycle
    • Visual information = semantic information
  • Solution: standard for text encoding
    → Markup explicitly articulates structure, content, and readings
  • Markup should be separated from content
    → Tags
  • Markup should be processable
    → OHCO thesis
  • A markup language is defined by its Document Type Definition

Workflow: mantra in 9 lines

previous table of contents next
24 of 25 [68]
  1. Project definition
  2. Document analysis
  3. Encoding design
  4. Encoding
  5. Validating
  6. Functionality: storyboard
  7. Transform, generate, implement
  8. Document
  9. Publish

previous table of contents next
25 of 25 [68]