B33080 Humanities Computing: Electronic Text
University of Antwerp, Campus Drie Eiken
Second Term 2005
Edward Vanhoutte
edward.vanhoutte@kantl.be

B33080 Humanities Computing: Electronic Text
Week 2: History of the Internet - Hypertext
XML theory and practice: Text & Computers - Text Encoding & Markup - Document Analysis - DTD.
University of Antwerp, Campus Drie Eiken
Monday 14 February
Edward Vanhoutte
III. Monday 14 February The Internet & the WWW
3. The Internet & the WWW | |
- Definition
- Services
- Short history
"The Federal Networking Council (FNC) agrees that the following language reflects our definition of the term "Internet".
"Internet" refers to the global information system that --
- is logically linked together by a globally unique address space based on the Internet Protocol (IP) or its subsequent extensions/follow-ons;
- is able to support communications using the Transmission Control Protocol/Internet Protocol (TCP/IP) suite or its subsequent extensions/follow-ons, and/or other IP-compatible protocols; and
- provides, uses or makes accessible, either publicly or privately, high level services layered on the communications and related infrastructure described herein."
On October 24, 1995, the FNC unanimously passed a resolution defining the term Internet. This definition was developed in consultation with the leadership of the Internet and Intellectual Property Rights (IPR) Communities.
- The Internet is a worldwide collection of computer networks connecting academic, governmental, commercial, and organizational sites.
- It provides access to communication services and information resources to millions of users around the globe.
- Internet services include:
- direct communication (e-mail, IRC-chat)
- online conferencing (Usenet News, e-mail discussion lists)
- distributed information resources (World Wide Web, Gopher)
- remote login and file transfer (telnet, ftp)
- and many other valuable tools and resources (internet telephony)
→ The Internet and the WWW are no synonyms
A worldwide collection of computer networks: Short history | |
- 1964: Paul Baran - Packet switched network
- 1969: ARPANET - 4 nodes
- 1972: 24 nodes
- 1973: TCP/IP - Vinton Cerf - INTERNET
- 1976: TCP/IP on ARPANET
- 1977: demonstration of the Internet
- 1981: BITNET and CSNET
- 1983: domain server
- 1984: NFS backbone
- 1991: WWW - Tim Berners-Lee - Cern
- 1964: Paul Baran - Packet switched network ← → Circuit network
- 1969: ARPANET - 4 nodes
First test with two computers: one in Los Angeles, one in Stanford
On their first attempt to log into Stanford's computer by typing "log win", UCLA researchers crashed their computer when they typed the letter "g"
- UCLA (Los Angeles)
- Stanford
- UC Santa Barbara
- University of Utah
- 1972: 24 nodes
1973: TCP/IP - Vinton Cerf - INTERNET | |
Transmission Control Protocol/Internet Protocol
- TCP: information is split up in a couple of packages which need not arrive in the same order as they were sent, or via the same route.
- IP: each package of information carries a stamp with the address of destination.
→ 1976: TCP/IP on ARPANET
1977: demonstration Internet | |
- 1981: BITNET and CSNET
- 1983: domain server
- 1984: NFS backbone
- 1971: Ray Tomlinson
- 1972: Larry Roberts: read, reply, forward, save...
- 1976: Queen Elisabeth sent mail
- 1983: Dynamic Name Server (University of Wisconsin)
uia.ua.ac.be
→ Exact IP number of addressee is not needed anymore.
The growth of the Internet | |
- 08/1981: 213
- 10/1985: 1,961
- 10/1990: 313,000
- 07/1995: 8,200,000
- 07/2000: 93,047,785
- 01/2001: 109,574,429
- 07/2001: 125,888,197
- 01/2002: 147,344,723
- 07/2002: 162,128,493
- 01/2003: 171,638,297
- 01/2004: 233,101,481
V. Monday 21 February Hypertext
Information management
- Paul Otlet (1932)
- Vannevar Bush (1945)
- Ted Nelson (1965)
- NLS (oNLine System): Doug Engelbart
- HES (Hypertext Editing System): Andries van Dam en Ted Nelson
- FRESS (File Retrieval and Editing System): Brown University
- Commercial hypertext systems
- HTML: Tim Berners-Lee & Robert Caillau
- Browser war
→ Traité de Documentation (1934)
- Le Document: each carrier of information
- facts
- interpretation of facts
- statistics
- source material
- Standard 3"x5" card - data sheet:
- Monographic Principle
- Universal Decimal Classification (UDC)
→ The Mundaneum: an international documentary network
- fitted with machines and auxiliary instruments of intellectual work
- machines to transform speech into writing and vice versa
- an application of television, to allow texts to be made available for remote reading
- reading machines scanning the physical items (search and retrieval)
- add to existing texts held remotely in such a way that the original texts were not disturbed
On the work desk there might be no books or other documents at all, but only a screen and a telephone. The work station would be connected to a centre of knowledge by telephone, wireless telegraphy, television and telex ("téléaugraphie", "téléphotographie,")
- screens
- loudspeaker
- selection machines
a machinery unaffected by distance which would combine at the same time radio, x-rays, cinema and microscopic photography. All the things of the universe and all those of man would be registered from afar as they were created. Thus the moving image of the world would be established -- its memory, its true duplicate. From afar anyone would be able to read any passage, expanded or limited to the desired subject, that would be projected onto his individual screen, Thus in his armchair, anyone would be able to contemplate the whole of creation or particular parts of it (1935, p. 390-1).
→ As We May Think (1945)
Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Memex
→ A File Structure for the Complex, the Changing, and the Indeterminate. (1945)
Hypertext was an audacious choice: hyper- has a bad odour in some fields and can suggest agitation and pathology, as it does in medicine and psychology. But in other sciences hyper- connotes extension and generality, as in the mathematical hyperspace, and this was the connotation I wanted to give the idea.
- NLS (oNLine System): Doug Engelbart
- HES (Hypertext Editing System): Andries van Dam en Ted Nelson
- FRESS (File Retrieval and Editing System): Brown University
4 central concepts
- internal structure of nodes or documents
- alternate views
- bidirectional linking
- link classification
- Storyspace
- Guide
- Notecards
- Hypercard
Hypertext Markup Language (HTML): Tim Berners-Lee & Robert Caillau | |
- text
- formulae
- drawings
- graphics
- simple
- http (hypertext transfer protocol)
→ MOSAIC: Marc Andreessen
- An electronic multilinear visualisation of
- a set of information units (nodes) which
- may contain text, sound, and/or images, and which
- are linked to each other by way of hyperlinks.
VI. Monday 14 February: Text & Computer
Workflow: mantra in 9 lines | |
- Project definition
- Document analysis
- Encoding design
- Encoding
- Validating
- Functionality: storyboard
- Transform, generate, implement
- Document
- Publish
2 kinds of electronic texts:
- digitally created (born digital)
- digitized
→ 2 different answers to the questions:
- technological progress
-
- create access to material
- support preservation policy
- collection acquisition/completion
- flexible use
- institutional and strategic advantage (prestige)
- research
- education
- Short term thinking
- document creation/production
- print
- mail
- document lay-out
- ease of the word processor
- WYSIWYG interface
- → .rtf, .doc, .wpd, .pdf, .xls, .dmb
- → Word, WordPerfect, Adobe Acrobat, Excel, Access
- Long term thinking
- justify investments
- retain access to material
- exchange of data
- document structure
- → ISO standards and W3C norms
Texts cannot be put into computers. Neither can numbers. Computers can contain and operate on patterns of electronic charges, but they cannot contain numbers, which are abstract mathematical objects not electronic charges, nor texts, which are complex, abstract cultural and linguistic objects.
Michael Sperberg-McQueen, 'Text in the Electronic Age: Textual Study and Text Encoding with examples from Medieval Texts.' Literary and Linguistic Computing, 6/1 (1991): 34-46. (34)
Computers work with a representation of text | |
Die Leiden des jungen Werther is an exceptionally good example of a book full of Weltschmerz.
Die Leiden des jungen Werther is an exceptionally good example of a book full of Weltschmerz.
{\i Die Leiden des jungen Werther} is an {\i exceptionally} good example of a book full of {\i Weltschmerz}.
Die Leiden des jungen Werther is an exceptionally good example of a book full of Weltschmerz.
{\i Die Leiden des jungen Werther} is an {\i exceptionally} good example of a book full of {\i Weltschmerz}.
000101110000101000000011111010001111111011111
011001000000111111010100100010111101000100100
001111100100010011111110001010101010101010111
110001011100001010110001001000010000000010101
011110101001100010111000010101100010010000111
- Interchange between systems and platforms causes loss of information
→ Short data life cycle
- Loss of semantic information by translation to visual information
Example: Synoptic Edition | |
An international(ly) accepted standard which:
- is software and platform independent
- can describe the logical, structural, and semantic elements of a text
An international(ly) accepted standard which:
- is software and platform independent
- can describe the logical, structural, and semantic elements of a text
→ Markup
<title>Die Leiden des jungen Werther</title> is an
<emph>exceptionally</emph> good example of a book full of
<lang="German">Weltschmerz</lang>.
VII. Monday 14 February: Text encoding and markup
Workflow: mantra in 9 lines | |
- Project definition
- Document analysis
- Encoding design
- Coding
- Validating
- Functionality: storyboard
- Transform, generate, implement
- Document
- Publish
7. Text encoding and markup | |
- Texts are more than simply sequences of glyphs
→ They have structure and content and they also have multiple readings
Document analysis: Exercise | |
Workflow: mantra in 9 lines | |
- Project definition
- Document analysis
- Encoding design
- Encoding
- Validating
- Functionality: storyboard
- Transform, generate, implement
- Document
- Publish
- Texts are more than simply sequences of glyphs
→ They have structure and content and they also have multiple readings
- Text encoding or markup provides a means of making such structure, content, and readings explicit
→ Only what is explicitly articulated can be digitally processed
Texts have structure and content 1 | |
POOREDWARDDIDYOUHEARTHENEWSABOUTEDWARD?ONTHEBACKOFHISHE
ADHEHADANOTHERFACEWASITAWOMAN'SFACEORAYOUNGGIRLTHEYSAID
TOREMOVEITWOULDKILLHIMSOPOOREDWARDWASDOOMEDTHEFACECOULD
LAUGHANDCRYITWASHISDEVILTWINANDATNIGHTSHESPOKETOHIMTHIN
GSHEARDONLYINHELLBUTTHEYWEREIMPOSSIBLETOSEPARATECHAINED
TOGETHERFORLIFEFINALLYTHEBELLTOLLEDHISDOOMHETOOKASUITEO
FROOMSANDHUNGHIMSELFANDHERFROMTHEBALCONYIRONSSOMESTILLB
ELIEVEHEWASFREEDFROMHERBUTIKNEWHERTOOWELLISAYSHEDROVEHI
MTOSUICIDEANDTOOKPOOREDWARDTOHELL
Texts have structure and content 2 | |
Poor Edward. Did you hear the news about Edward? On the back of
his head he had another face. Was it a woman's face or a
young girl? They said to remove it would kill him, So poor
Edward was doomed. The face could laugh and cry. It was his devil
twin. And at night she spoke to him things heard only in hell. But
they were impossible to separate, Chained together for life. Finally
the bell tolled his doom. He took a suite of rooms and hung himself
and her from the balcony irons. Some still believe he was freed
from her, but I knew her too well. I say she drove him to suicide,
and took poor Edward to hell.
Texts have structure and content 3 | |
Poor Edward
Did you hear the news about Edward?
On the back of his head he had another face
Was it a woman's face or a young girl?
They said to remove it would kill him
So poor Edward was doomed
The face could laugh and cry
It was his devil twin
And at night she spoke to him
Things heard only in hell
But they were impossible to separate
Chained together for life
Finally the bell tolled his doom
He took a suite of rooms
And hung himself and her from the balcony irons
Some still believe he was freed from her
But I knew her too well
I say she drove him to suicide
And took poor Edward to hell
Texts have structure and content 4 | |
Poor Edward
<line>Did you hear the news about Edward?</line>
<line>On the back of his head he had another face</line>
<line>Was it a woman's face or a young girl?</line>
<line>They said to remove it would kill him</line>
<line>So poor Edward was doomed</line>
<line>The face could laugh and cry</line>
<line>It was his devil twin</line>
<line>And at night she spoke to him</line>
<line>Things heard only in hell</line>
<line>But they were impossible to separate</line>
<line>Chained together for life</line>
<line>Finally the bell tolled his doom</line>
<line>He took a suite of rooms</line>
<line>And hung himself and her from the balcony irons</line>
<line>Some still believe he was freed from her</line>
<line>But I knew her too well</line>
<line>I say she drove him to suicide</line>
<line>And took poor Edward to hell</line>
Texts have structure and content 5 | |
Poor Edward
<stanza>
<line>Did you hear the news about Edward?</line>
<line>On the back of his head he had another face</line>
<line>Was it a woman's face or a young girl?</line>
<line>They said to remove it would kill him</line>
<line>So poor Edward was doomed</line>
</stanza>
<stanza>
<line>The face could laugh and cry</line>
<line>It was his devil twin</line>
<line>And at night she spoke to him</line>
<line>Things heard only in hell</line>
<line>But they were impossible to separate</line>
<line>Chained together for life</line>
</stanza>
<stanza>
<line>Finally the bell tolled his doom</line>
<line>He took a suite of rooms</line>
<line>And hung himself and her from the balcony irons</line>
<line>Some still believe he was freed from her</line>
<line>But I knew her too well</line>
<line>I say she drove him to suicide</line>
<line>And took poor Edward to hell</line>
</stanza>
Texts have structure and content 6 | |
<poem>
<title>Poor Edward</title>
<stanza>
<line>Did you hear the news about Edward?</line>
<line>On the back of his head he had another face</line>
<line>Was it a woman's face or a young girl?</line>
<line>They said to remove it would kill him</line>
<line>So poor Edward was doomed</line>
</stanza>
<stanza>
<line>The face could laugh and cry</line>
<line>It was his devil twin</line>
<line>And at night she spoke to him</line>
<line>Things heard only in hell</line>
<line>But they were impossible to separate</line>
<line>Chained together for life</line>
</stanza>
<stanza>
<line>Finally the bell tolled his doom</line>
<line>He took a suite of rooms</line>
<line>And hung himself and her from the balcony irons</line>
<line>Some still believe he was freed from her</line>
<line>But I knew her too well</line>
<line>I say she drove him to suicide</line>
<line>And took poor Edward to hell</line>
</stanza>
</poem>
- Markup should be separated from content
- Markup should be processable → logical & predictable
Markup should be separated from content | |
Use of tags with open and close delimiters
<tag>content</tag>
Markup should be processable | |
→ OHCO thesis
Russian Doll or OHCO Thesis | |
A document is an Ordered Hierarchy of Content Objects
Russian Doll or OHCO Thesis | |
A document is an Ordered Hierarchy of Content Objects
<book>
<chapter n="1">
<section n="1">
<p>...</p>
<p>...</p>
</section>
<section n="2">
<p>...</p>
<p>...</p>
</section>
</chapter>
<chapter n="2">
<!- - more sections and paragraphs - ->
</chapter>
</book>
Markup should be processable | |
→ Document Type Definition (DTD)
Document Type Definition (DTD) | |
Different documenttypes → different organization/order & different content objects
- Poetry
- Prose
- Drama
- Letters
- Bibliographies
- Dictionaries
- Lists
- ...
Document Type Definition (DTD) | |
Different documenttypes → different organization/order & different content objects
- Poetry
- Prose
- Drama
- Letters
- Bibliographies
- Dictionaries
- Lists
- ...
→ The rules of the game
Document Type Definition (DTD) | |
A DTD specifies the vocabulary and the syntax of a markup language
It defines:
- names for all your elements
- names and default values for their attributes
- rules about how elements can nest
- names for re-usable pieces of data (entities)
- and a few other things
A DTD does not specify anything about what elements "mean"
- Computers work with representations of text
- Proprietary software merges structure, content, meaning, and layout in one code
→ 2 problems
- Short data life cycle
- Visual information = semantic information
- Solution: standard for text encoding
→ Markup explicitly articulates structure, content, and readings
- Markup should be separated from content
→ Tags
- Markup should be processable
→ OHCO thesis
- A markup language is defined by its Document Type Definition
Workflow: mantra in 9 lines | |
- Project definition
- Document analysis
- Encoding design
- Encoding
- Validating
- Functionality: storyboard
- Transform, generate, implement
- Document
- Publish