B33080 Humanities Computing: Electronic Text
University of Antwerp, Campus Drie Eiken
Second Term 2004
Edward Vanhoutte
edward.vanhoutte@kantl.be

B33080 Humanities Computing: Electronic Text
Week 1: Introduction to this course - Humanities Computing.
University of Antwerp, Campus Drie Eiken
Monday 16 February
Edward Vanhoutte
1.a. Ojectives of this Course | |
- Digitization
- XML
- XSL
- Project management
1.a. Digitization of Texts and Images | |
After this course, you should be aware of:
- what a digital image is
- what digitization is
- why you would digitize an object
- ways of obtaining texts
- archival
- keyboarding
- optical character recognition
- how digitization works
- tools, techniques, and standards
- files and formats
1.a. XML: eXtensible Markup Language | |
XML is a metalanguage by which one can create separate markup languages for seperate purposes. After this course you should know how to:
- analyse a document
- create a well formed XML document
- create a valid XML document
- parse an XML document for validation
- read and interpret an XML document
- create and read small DTDs
- use and understand TEI-Lite
- use and understand DALF
1.a. XSL: eXtensible Stylesheet Language | |
XSLT is a tool for processing XML documents. After this course, you should know how to:
- transform XML from one datatype to another
- style XML for display in off-the-shelf Web browsers
- use XPath to denote specific parts of a document
- use XSLT as a search engine
- segment your data
- retag your data
- express recursive algorithms in XSLT
After this course, you should be aware of the issues involved in:
- planning and instigate a project
- costing a project
- running a project
- maintaining a project
- evaluating a project
1.a. Not covered in the course | |
- (X)HTML, CSS
- ECDL
- Web design, page design, typography
- XSL formatting objects
- Tricks of the trade (production use)
- Digital audio and video
- Three dimensional digitization
I assume that:
- You have elementary computer skills:
- know how to work with multiple windows
- work with the mouse
- create folders and files
- download files from the internet
- You are interested in literary and historical texts, databases, and images.
- You expect to work with XML data, and need to:
- markup texts
- display information
- extract information
- reformat information
- You expect to work with digitized texts and images
- Understand English
- (and a little bit of Scottish and Dutch)
I do not assume that:
- You know something about programming
- You know something about markup, transformations, and digitization
- You know XML and can edit XML documents
- You have used XSLT
- You are a programmer
1.c. The Lecturer: Edward Vanhoutte | |

- Co-ordinator, Centre for Scholarly Editing and Document Studies (Centrum voor Teksteditie en Bronnenstudie - CTB) of the Royal Academy of Dutch Language and Literature (KANTL)
- Associate University Teacher in Humanities Computing, University of Antwerp
- Degrees in Germanic Languages and Literature, Mediaeval Studies, Education
- email: edward.vanhoutte@kantl.be
- http://www.kantl.be/ctb/vanhoutte/
- http://www.kantl.be/ctb/staff/edward.htm

I want to know:
- Who you are (name)
- What you do
- What you study
- Why you're here
in 30 seconds
- Exam:
- Permanent Evaluation: presence and participation in the weekly lectures & exercises
- Group Assignment: electronic edition of the correspondence between Lynne Bryer & Daphne Rooke
- Viva Report
- Lectures:
- Academic 15 min for checking email
- Always bring a disk
- Preparation & exercises
- Required readings
- Suggested readings
- Course website:
- http://www.kantl.be/ctb/vanhoutte/teach/hc2004.htm
- Check it regularily
- No "I-didn't-know-it"s
- Updated every Wednesday noon
- A copy of your student card
- A mail with your email address to edward.vanhoutte@kantl.be
1.e. Overview of the course | |
- Monday 16 February: 1. Introduction to this course - Humanities Computing
- Monday 23 February: 2 Digitization of Images and Textual Resources: Dr. Melissa Terras
- Monday 1 March : 3. XML theory and practice: History of the Internet - Hypertext- Text Encoding & Markup - Document Analysis
- Monday 8 March: 4. XML theory and practice: SGML/XML - TEI - DTD - well formed XML
- Monday 15 March: 5. XML theory and practice: valid XML - validating
- Monday 22 March: 6. TEI: TeiXlite
- Monday 29 March: 7. TEI: DALF
- Monday 19 April: 8. XSL theory and practice: basics, XPath, function
- Monday 26 April: 9. XSL theory and practice: Real XSLT
- Monday 3 May: 10. Group Project - Documentary "Into the Future"
1.f. Test Elementary Computer Skills | |
- Make three folders on your C drive, name them "Man", "Woman", "Baby".
- Download the following files from http://www.kantl.be/ctb/vanhoutte/teach/hc2004.htm
- Week 1 > As We May Think (bushf.htm): put bushf.htm in folder "Man"
- Week 4 > teixlite.dtd: put teixlite.dtd in folder "Woman"
- Bottom > XHTML 1.0 logo (valid-xhtml10.png): put valid-xhtml10.png in folder "Baby"
- Put the contents of "Baby" in "Woman" and delete "Baby"
- Copy the contents of "Woman" to "Man"
- Save the remaining folders to a disk
- Pass the disk on to your right hand neighbour
- Explore the contents of the disk and show me
2. Introduction to Humanities Computing | |
- a. Humanities Computing: definitions
- b. Humanities Computing: a field, a discipline
- a. Associations
- b. Journals
- c. Mailinglists
- d. Publications
- e. Institutions
- c. Humanities Computing: short history
2.a. Humanities Computing: definitions | |
Humanities Computing is an academic field concerned with the application of computing tools to arts and humanities data or their use in the creation of these data.
Willard McCarty

To apply Computing solutions to valid, and generally complex, Humanities based problems to provide access and answers that otherwise would have been impossible.
Melissa Terras
2.a. Humanities Computing: definitions | |
To apply Computing solutions to valid, and generally complex, Humanities based problems to provide access and answers that otherwise would have been impossible.
Melissa Terras

2.a. Humanities Computing: definitions | |
Required reading:
- Willard McCarty (2002). Humanities Computing (Preliminary draft entry for The Encyclopedia of Library and Information Science, New York: Dekker, 2003.)
→ http://www.kcl.ac.uk/humanities/cch/wlm/essays/encyc/
2.b. Humanities Computing: a field, a discipline | |
Required reading
- Marilyn Deegan (2000). "Introduction." Frances Condron, Michael Fraser & Stuart Sutherland (eds.), Guide to Digital Resources for the Humanities 2000. Oxford: CTI.
2.b.a. Associations: A Selection | |
- Association for Literary and Linguistic Computing (ALLC)
<http://www.allc.org>
- Association for Computers and the Humanities (ACH)
<http://www.ach.org>
- Text Encoding Initiative Consortium (TEI)
<http://www.tei-c.org>
2.b.b. Journals (Print): A Selection | |
- Literary & Linguistic Computing

- Computers and the Humanities

- Human IT
- Markup Languages: Theory and Practice (discontinued)
- Computers and Texts
- Revue. Informatique et Statistiques dans les Sciences Humaines.
- Text Technology. The Journal of Computer Text Processing.
- Cultural & Heritage Informatics Quarterly
→ Institutional models for humanities computing
<http://www.kcl.ac.uk/humanities/cch/allc/imhc/>
2.b.b. Journals (on-line): A Selection | |
- DigiNews (RLG)
<http://www.thames.rlg.org/preserv/diginews/>
- D-Lib Magazine
<http://www.dlib.org/dlib/>
- Journal of Digital Information
<http://jodi.ecs.soton.ac.uk/>
- Journal of Electronic Publishing
<http://www.press.umich.edu/jep/>
- Ariadne
<http://www.ariadne.ac.uk/>
→ Institutional models for humanities computing
<http://www.kcl.ac.uk/humanities/cch/allc/imhc/>
2.b.c. Mailing Lists: A Selection | |
- Humanist
an international electronic seminar on the application of computers to the humanities whose primary aim is to provide a forum for discussion of intellectual, scholarly, pedagogical, and social issues and for exchange of information among members.
<http://www.kcl.ac.uk/humanities/cch/humanist/>
- TEI-L
a discussion for newbies and advanced users of the TEI.
- E-DOCS
a discussion list and website for professionals involved in the production, distribution, and organization of historical documents on the Internet. Offers an index of "Best Practices and Exemplary Sites."
2.b.d. Publications: A Selection | |
- Susan Hockey (2000). Electronic Texts in the Humanities. Oxford: Oxford University Press.
- Office for Humanities Communication Publications (King's College London, currently 16 titles in print)
- Jahrbuch für Computerphilologie - print (München, Germany) & online
<http://computerphilologie.uni-muenchen.de/ejournal.html>
→ Institutional models for humanities computing
<http://www.kcl.ac.uk/humanities/cch/allc/imhc/>
2.c. Humanities Computing: short history | |
- The Beginnings of Humanities Computing
- 1949-1980: Roberto Busa, Index Thomisticus: 8 million words, 60 volumes.
- 1950's: New Testament concordancing and stylistic analysis.
- Developments in Concordancing
- Quantitative basis for stylistic analysis, author attributorship, vocabulary studies of collocations, scholarly editing, ...
- Text Archives
- Developments in Databases and Statistics
- 1970's: use of statistical techniques in historical analysis
- → differences between historical source material
- → links between important names in a variety of different documents and document types such as ephemera, baptism records and images
- Associations and consortia: ALLC, ACH, TEI
Development of
- a. Hardware
- b. PC & Graphical Interface
Required reading:
- Michael Fraser (1996). A Hypertextual History of Humanities Computing
→ http://info.ox.ac.uk/ctitext/history/intro.html
First generation: tubes
- 1942: ABC (Atanasoff-Berry Computer):
- more than 300 vacuumtubes
- weight: 250 kg
- more than 1.5 km wire
- speed: 1 calculation every 15 seconds.
- 1948: ENIAC (Electronic Numerical Integrator and Calculator):
- 18,000 vacuumtubes
- 70,000 transistors, 10,000 condensators, 6,000 manual switches
- heighth: 2.5m
- length: 24m (167 sq m)
- weight: 30 ton
- 160 kilowatt electricity.
→ Dr. John von Neumann (1903-1957): computer memory & BIT (1 or 0)
EDSAC

Titan

Second generation: transistors
John Bardeen, Walter Brattain & William Shockley
- 1960: IBM 7090
- 650 EDPM computer: 60% discount for universities
Third generation: integrated circuits
→ transistors on semi conductors
- Jack Kilby: germanium
- Robert Noyce: silicium → CHIP
- 1970: Intel 1103 chip: DRAM (Dynamic Random Access Memory)
- → HP 9800
- 1970: Fairchild corporation: 256k SRAM chip (Static Random Access Memory)
Fourth generation: microprocessors
- 1971: Intel 4004: 2.300 transistors on a 4 x 3 mm chip
Storage media
- Keypunch Machine

- 1971: IBM 8" Memory Disk (Floppy)
- 1978: Wang Laboratories 5 1/2" Floppy
- 1981: Sony 3 1/2" Disk
3.b. PC & Graphical Interface | |
1. Personal Computer
Scelbi (1974)

3.b. PC & Graphical Interface | |
Altair (1975): DIY kit with switches for binary programming

3.b. PC & Graphical Interface | |
Basic: Bill Gates & Paul Allen wrote BASIC in 6 weeks

3.b. PC & Graphical Interface | |
Apple: Steve Wozniak & Steve Jobbs
- Apple I (1976)

- Apple II (1977)

3.b. PC & Graphical Interface | |
Commodore PET (1977): Personal Electronic Transactor

3.b. PC & Graphical Interface | |
Radio Shack (Tandy) TRS-80 (1978)

3.b. PC & Graphical Interface | |
2. User Software
- Disk drive → distribution of software
- Visicalc spreadsheet (1979) → Lotus 1-2-3 (1983)

- Wordstar (1979)

→ American Software Patent Law in 1981
3.b. PC & Graphical Interface | |
3. IBM PC (1981)
- First PC based on open architecture (off the shelf parts)
- 4.77 MHz Intel 8088 microprocessor
- one or two disk drives
- 16kb memory → 256k
- 16 bit Microsoft OS: MS-DOS 1.0
- price: from 1,565 USD onwards (ca. 4,300 EUR now)
3.b. PC & Graphical Interface | |
4. Graphical User Interface
- First User interfaces (MS DOS): command line

3.b. PC & Graphical Interface | |
- 1970: Xerox Corporation, Palo Alto Research Parc 'PARC Alto': the architecture of information
- 1973: Apple Lisa (Local Integrated Software Architecture)
- drop-down menu bars
- windows
- multiple tasking
- hierarchic file system
- copy & paste
- icons
- folders
- mouse (Doug Engelbart)


3.b. PC & Graphical Interface | |
More Graphical User Interfaces
- Microsoft Windows 1.0 (1985)
- IBM: Top View
- VisiOn (1983)
- GEM (Graphics Environment Manager)
3.b. PC & Graphical Interface | |
5. Cost factor
Sinclair ZX80 (1980)
- 22x17 cm
- 99.95 GBP or 79 GBP in kit form
- Screen: TV
- Storage medium: tape recorder
- 20,000 computers sold in 9 months
3.b. PC & Graphical Interface | |
5. Cost factor
Sinclair ZX81 (1981)
- 69.95 GBP or 49.95 GBP in kit form
- Sold per Mail order & in supermarkets
- Magazines, hobbyclubs
- 300,000 computers sold in 10 months
3.b. PC & Graphical Interface | |
5. Cost factor
Sinclair ZX Spectrum (1982)
- Low price → 'Home Computer'
- Sold 15,000 pieces a week