B33080 Humanities Computing: Electronic Text

Ron Van den Branden

ron.vandenbrandenkantl.be

TOC | First


I. : Overview

: Overview

previous table of contents next
1 of 2 [61]
  • 9.00 - 10.30: XPath
    1. What is XPath?
    2. Expressions as location ladders
    3. Axes
    4. Node tests
    5. XPath data model
    6. Predicates
    7. Long syntax, short syntax
    8. Functions and operators
    9. Generating document reports using XPath in an XSLT stylesheet
  • 11.00 - 12.30: XSLT

    Goals of this lecture

    previous table of contents next
    2 of 2 [61]
    After this lecture, you should be able to:
    • understand XPath syntax
    • use XPath functions and operators in expressions
    • assemble all nuts and bolts into (complex / fine grained) XPath expressions
    • understand how XPath functions in XSLT stylesheets (e.g. for extracting info from an XML document)

    II. XPath

    A bird's eye view on X(HT)ML

    previous table of contents next
    1 of 41 [61]
    By now, you probably love your XML file! who else would/could?
    Internet Explorer, Mozilla Firefox, Mozilla SeaMonkey! (in a way)

    A bird's eye view on X(HT)ML

    previous table of contents next
    2 of 41 [61]
    Let's have a look at another XML file: sampleXHTML.htm. Open it in your webbrowser.
    browsers love this one even more!

    A bird's eye view on X(HT)ML

    previous table of contents next
    3 of 41 [61]
    • XHTML is an XML translation of HTML 4.01, defined as W3C specification (W3C Recommendation 26 January 2000, revised 1 August 2002)
      http://www.w3.org/TR/xhtml1
    • HTML was designed as a universally understood language, the ‘publishing mother tongue’ that all computers (on the World Wide Web) may potentially understand.
    • Although XHTML is 'just' another XML application / vocabulary, this one is natively understood by web browsers. They are programmed to display certain elements in certain ways.
    Yet, XHTML is a fairly shallow representation language, making it ill-suited for complex expression of the theory of an encoder on a text (or: “why don't we encode everything in XHTML, then?”).
    • [+] very good presentation language
    • [-] shallow representation language
    Since XHTML, HTML is XML. This makes it a good candidate for a presentational reformulation of an XML representation format (e.g. TEI).
    means for translating / transforming XML to XHTML will be addressed in next session
    first: means for addressing parts of XML structures

    An electronic eye's view on X(HT)ML

    previous table of contents next
    4 of 41 [61]
    XPath Explorer Exchanger Lite
    tool for visualizing XPath expressions in documents
    • start MS-DOS command window
    • start XPE, type java -jar [path to xpe.jar] xpe.jar, eg. java -jar \xpe\xpe.jar
    • open sample XHTML file sampleXHTML.htm in XPE:
      • click ‘File’ ‘Open file...’
      • browse to appropriate folder, eg. ‘c:\META’
    XML editor with XPath visualizing functionality
    • start Exchanger Lite (from Start menu program files)
    • open sample XHTML file sampleXHTML.htm in Exchanger Lite:
      • click ‘File’ ‘Open’
      • browse to appropriate folder, eg. ‘c:\META’
    • select ‘Navigator’ in the ‘Controller’ (= leftmost) window part

    Getting into XML

    previous table of contents next
    5 of 41 [61]
    Key concept for working with XML = addressing its structure.
    W3C is main developer of whole suite of related standards around XML.
    XPath = XML Path, W3C standard for addressing parts of an XML document
    Features:
    • XPath defines a (non-XML) syntax for addressing parts of an XML document
    • XPath defines common functions for manipulating strings, numbers, booleans
    • XPath operates on the abstract, logical structure of an XML document
    XPath = XML Path, a URL-like path notation for navigating through the hierarchical structure of an XML document.

    Anatomy of an XML file: nodes

    previous table of contents next
    6 of 41 [61]
    7 node types:
    1. root node
    2. element nodes
    3. attribute nodes
    4. text nodes
    5. comment nodes
    6. processing instruction nodes
    7. namespace nodes

    Anatomy of an XML file: nodes

    previous table of contents next
    7 of 41 [61]
    html
    • bare name matches element node with that name

    Anatomy of an XML file: nodes

    previous table of contents next
    8 of 41 [61]
    html/head
    html//head
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’

    Anatomy of an XML file: nodes

    previous table of contents next
    9 of 41 [61]
    html/head/*
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node

    Anatomy of an XML file: nodes

    previous table of contents next
    10 of 41 [61]
    //body
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node
    • location paths can be absolute (starting from the root node with ‘/(/)’), or relative (starting from a context node)

    Anatomy of an XML file: nodes

    previous table of contents next
    11 of 41 [61]
    //style/preceding-sibling::*
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node
    • location paths can be absolute (starting from the root node with ‘/(/)’), or relative (starting from a context node)
    • we can look in other directions on the axis as well

    Anatomy of an XML file: nodes

    previous table of contents next
    12 of 41 [61]
    //div/@id
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node
    • location paths can be absolute (starting from the root node with ‘/(/)’), or relative (starting from a context node)
    • we can look in other directions on the axis as well
    • attribute nodes can be addressed directly

    Anatomy of an XML file: nodes

    previous table of contents next
    13 of 41 [61]
    div[1]//p[3]
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node
    • location paths can be absolute (starting from the root node with ‘/(/)’), or relative (starting from a context node)
    • we can look in other directions on the axis as well
    • attribute nodes can be addressed directly
    • node sets can be refined by further conditions (expressed in square brackets ‘[ ]’)

    Anatomy of an XML file: nodes

    previous table of contents next
    14 of 41 [61]
    //div[@*] (red)
    //div[not(@*)] (blue)
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node
    • location paths can be absolute (starting from the root node with ‘/(/)’), or relative (starting from a context node)
    • we can look in other directions on the axis as well
    • attribute nodes can be addressed directly
    • node sets can be refined by further conditions (expressed in square brackets ‘[ ]’)

      Note: div/@* =/= div[@*]

    Anatomy of an XML file: nodes

    previous table of contents next
    15 of 41 [61]
    8 + 5
    //*[contains(@*, 'fun')]
    //div[count(p) > 2]
    • bare name matches element node with that name
    • you can express location paths as a series of steps, separated by ‘/’ or ‘//’
    • the wildcard ‘*’ will match any element node
    • location paths can be absolute (starting from the root node with ‘/(/)’), or relative (starting from a context node)
    • we can look in other directions on the axis as well
    • attribute nodes can be addressed directly
    • node sets can be refined by further conditions (expressed in square brackets ‘[ ]’)
    • there are lots of functions, most useful in refining conditions

    Anatomy of an anthology XML file

    previous table of contents next
    16 of 41 [61]
    1 important note: namespaces for TEI P5 documents!
    • /tei:TEI
    • /tei:TEI/*
    • //tei:div[3]
    • //tei:div[count(tei:lg > 3]
    • //tei:div[.//tei:l[string-length(. > 20)]]
    • //tei:bibl/following-sibling::tei:lg[1]

    Conclusion: XPath, so what?

    previous table of contents next
    17 of 41 [61]
    Ok, we've created a TEI document, we know how to 'navigate' through its structure, what now?
    XPath is not used in isolation, but is embedded in ‘foster’ languages, defined as XML-related W3C standards:
    • eXtensible Transformation Language (XSL): transform XML documents into other structures
    • XQuery: an XML query language: query XML documents, extract relevant fragments, make query reports

    XPath location paths

    previous table of contents next
    18 of 41 [61]
    A location path is an expression for selecting a set of nodes by following a path (a sequence of one or more steps) from a given starting point (the context node, root node, or an arbitrary node-set evaluated from a function call or variable).
    A location path is a sequence of steps, separated by path operators / or //:
    /(/)step/step//step/step ... : absolute location path, starting from root node
    step/step//step/step ... : relative location path, starting from context node
    Each step
    • starts from some current node(s),
    • moves to some result node(s).
    Result = node set.

    XPath location steps

    previous table of contents next
    19 of 41 [61]
    A location path is a sequence of steps, separated by path operators
    / or //: /(/)step/step//step/step
    A step selects the set of nodes in the document which are related in a particular way to a supplied baseline.
    A step is:
    minimally optionally
    axis::node test axis::node test[predicate] [predicate] ...
    e.g. descendant::div e.g. descendant::div[@id="paragraphs"]
    where
    axis says what direction to move from the context node
    node test says which nodes go into result (restrictions on node type and node name)
    predicate adds further constraints (Boolean conditions, positional filters,...)

    XPath step (1): Axis specifiers

    previous table of contents next
    20 of 41 [61]
    Specifying what direction to look in.
    An axis specifies a path through the document tree, starting at a particular node and following a particular relationship between nodes.
    We've already encountered some ways of specifying relations between nodes:
    • //: selects current node itself and all child elements
    • @: selects attribute children of current node
      (note: @ = axis specifier; not nodetest)
    But wait... where's the axis in our paths?
    So far, we've never been explicit about what direction to look in: our expressions looked like body instead of axis::node test
    default: when no axis specifier is explicitly used, the child axis is selected by default
    body = child::body

    Axis specifiers

    previous table of contents next
    21 of 41 [61]
    An axis is a path through the document tree, starting at a particular node and following a particular relationship between nodes.
    forward axes (8) backward axes (5)
    child parent
    following preceding
    following-sibling preceding-sibling
    descendant ancestor
    descendant-or-self ancestor-or-self
    self
    attribute
    namespace
    Note: when no axis is explicitly named in the step, the child axis is selected by default.

    Axes: ancestor and descendant

    previous table of contents next
    22 of 41 [61]
    all ancestors
    all descendants
    //a/ancestor::div //div/descendant::a

    Axes: ancestor-or-self and descendant-or-self

    previous table of contents next
    23 of 41 [61]
    current node + all ancestors
    current node + all descendants
    //div/ancestor-or-self::div html/body/div/descendant-or-self::div
    //div//div

    Axes: parent and child

    previous table of contents next
    24 of 41 [61]
    parent of current node
    all children of current node
    //sup/parent::p
    //sup/..
    note: this always is a singleton node set
    //p/child::span
    //p/span

    Axes: preceding and following

    previous table of contents next
    25 of 41 [61]
    all nodes preceding current node's start tag
    all nodes following current node's end tag
    //img/preceding::p
    note: NO open ancestor nodes, only fully closed preceding nodes
    //div/following::div

    Axes: preceding-sibling and following-sibling

    previous table of contents next
    26 of 41 [61]
    all siblings preceding current node
    all siblings following current node
    //img/preceding-sibling::span //title/following-sibling::style

    Axes: self

    previous table of contents next
    27 of 41 [61]
    only the current node itself
    //a/self::*
    //a/.

    XPath Axes: long and short syntax

    previous table of contents next
    28 of 41 [61]
    Some axis names can have a long and a short (abbreviated) syntax:
    Axis name Long syntax Short syntax
    child child:: (EMPTY)
    attribute attribute:: @
    self self:: .
    parent parent:: ..
    descendant-or-self descendant-or-self:: //

    XPath Axes: summary

    previous table of contents next
    29 of 41 [61]
    • default axis: child axis, not explicitly formulated
    • 12 other axes, expressed by axis name followed by ‘::’ OR short expression syntax:
      • child:: vs. (EMPTY)
      • attribute:: vs. @
      • descendant-or-self:: vs. //
      • self:: vs. .
      • parent:: vs. ..

    XPath step (2): Node tests

    previous table of contents next
    30 of 41 [61]
    Specifying what nodes to look for.
    A node test tests whether a node satisfies specified constraints on the type of node or the name of the node.
    1. Element and attribute nodes: name tests:
      • name: matches any name of the specified form
      • *: matches any element / attribute
      ...possibly preceded by prefix: specifying the namespace declared for the element / attribute.
    2. Other node types: node type tests:
      • node(): matches any node type
      • text(): matches text nodes
      • comment(): matches comment nodes
      • processing-instruction(): matches processing instruction nodes
      • processing-instruction(target): matches processing instruction nodes with the specified target

    XPath data model

    previous table of contents next
    31 of 41 [61]
    XPath defines an XML document in terms of 7 data types:
    node type
    a root node singular node that contains all nodes of a document (=/= document node, but abstract representation of the document!)
    element nodes part of document bounded by start and end tags, or represented by a single empty-element tag
    attribute nodes name and value of attribute in start tag / empty element tag
    text nodes normalized sequence of consecutive characters in PCDATA part of an element
    comment nodes comments written between <!-- comment delimiters --> in XML source
    processing instruction nodes processing instruction between <? processing instruction delimiters ?> in XML source
    namespace nodes namespace declarations written before : namespace delimiter in start / empty tags

    Addressing XPath data model

    previous table of contents next
    32 of 41 [61]
    XPath defines an XML document in terms of 7 data types:
    node type axis node test other
    a root node
    • self:: child:: parent::
    • descendant(-or-self):: ancestor(-or-self)::
    • following(-sibling):: preceding(-sibling)::
    node() /
    elements node() * name
    text node() text()
    comments node() comment()
    processing instructions node() processing-instruction(target)
    attributes attribute:: node() * name
    namespaces namespace:: node()

    Specifying nodes using axes and node tests

    previous table of contents next
    33 of 41 [61]
    Find expressions that select
    • the title of the page
    • all text children of <p> elements
    • all text node descendants of <table>
    • all alternative texts for all images
    • all grandchildren of <div> elements inside <body>
    • all attributes of all <img> elements
    • all elements preceding the <sub> element

    Specifying nodes using tests

    previous table of contents next
    34 of 41 [61]
    Find expressions that select
    • the title of the page
      /html/*/title
    • all text children of <p> elements
      //p/text()
    • all text node descendants of <table>
      //table//text()
    • all alternative texts for all images
      //@alt
    • all grandchildren of <div> elements inside <body>
      //body//div/*/*
    • all attributes of all <img> elements
      //img/@*
    • all elements preceding the <sub> element
      //sub/preceding::*

    XPath step (3): Predicates

    previous table of contents next
    35 of 41 [61]
    Adding further constraints on selected nodes
    A predicate is a qualifying expression used to select a subset of the nodes in a step.
    A predicate may contain any XPath expression. Its result is interpreted as a
    • boolean: the expression evaluates to true if satisfied by the selected context node, and false otherwise
    • number: the expression evaluates to true if it equals the position of the selected context node, and false otherwise
    eg:
    • //div[@id] (any <div> element that has an id attribute)
    • table[.//th] (any <table> element that contains a <th> element)
    • //img[ancestor::p] (any <img> element that is a descendant of a <p> element)
    • //body//span[1] (each <span> descendant of a body element, appearing as first descendant of its parent)
    • (//body//span)[1] (only the first <span> descendant of the body element)
      (parentheses) can be used for grouping

    Evaluation of location steps

    previous table of contents next
    36 of 41 [61]
    Example: descendant::span[contains(@class, "ref")][position()=2], with as context node the second <p> element of the body.
    1. All nodes on the selected axis are found, starting at the context node.
      (all but only descendants of the <p> element)
    2. Those that satisfy the node test (ie, those of the required node type and name) are selected.
      (all but only <span> elements that are descendants of <p>)
    3. The remaining nodes are numbered from 1 to n in document order if the axis is a forward axis, or in reverse document order if it is a reverse axis.
      (all selected <span> elements are numbered following their order as descendants of <p> elements in the source document)
    4. The first (leftmost) predicate is applied to each node in turn, with that node as context node.
      (only if the selected <span> element has a class attribute whose string value contains “ref”, it is retained in the result node-set)
    5. Stages 3 and 4 are repeated for any further predicates.
      (only the <span> element whose class attribute contains the string “ref” that appears as the 2nd descendant in the list of descendants of the <p> element in the source document is retained)
    predicate order matters!

    XPath: long and short syntax (again)

    previous table of contents next
    37 of 41 [61]
    Some axis names can have a long and a short (abbreviated) syntax:
    Axis name Long syntax Short syntax
    child child:: (EMPTY)
    attribute attribute:: @
    self self:: .
    parent parent:: ..
    descendant-or-self descendant-or-self:: //
    Predicates can have one specific abbreviation:
    Expression Long syntax Short syntax
    position position()=(NUMBER) (NUMBER)

    XPath: long and short syntax

    previous table of contents next
    38 of 41 [61]
    ... this combines to some specific idioms:
    .: short form for ...
    ../@name: short form for ...
    //*: short form for ...
    node(): short form for ...
    *: short form for ...
    @*: short form for ...
    //p[2]/a[@href = '#top']: short form for ...

    XPath: long and short syntax

    previous table of contents next
    39 of 41 [61]
    ... this combines to some specific idioms:
    .: short form for self::node()
    ../@name: short form for parent::node()/attribute::name
    //*: short form for /descendant-or-self::node()/child::*
    node(): short form for child::node() (note: this will NOT select attribute nodes!)
    *: short form for child::*
    @*: short form for attribute::*
    //p[2]/a[@href = '#top']: short form for descendant-or-self::node()/child::p[position()=2]/child::a[attribute::href='#top']

    Restraining the context node set using predicates

    previous table of contents next
    40 of 41 [61]
    the first paragraph of divisions with a level 2 heading
    the second list item of an ordered list (start from the list item)
    any 4th son-of-a-paragraph, if it has a href attribute
    any element with an id attribute, if it is a <div> element (testing this in a predicate)

    Restraining the context node set using predicates

    previous table of contents next
    41 of 41 [61]
    the first paragraph of divisions with a level 2 heading
    //div[h2]/p[1]
    descendant-or-self::div[child::h2/child::p[position() = '2']
    the second list item of an ordered list (start from the list item)
    //li[ancestor::ol][2]
    descendant-or-self::li[ancestor::ol][position() = '2']
    any 4th son-of-a-paragraph, if it has a href attribute
    //p/*[4][@href]
    descendant-or-self::p/child::*[position() = '4'][attribute::href]
    any element with an id attribute, if it is a <div> element (testing this in a predicate)
    //*[@id][self::div]
    descendant-or-self::*[attribute::id][name(.) = 'div']

    Functions and operators

    previous table of contents next
    1 of 17 [61]
    • Data types
    • Functions
      • for node-sets
      • for strings
      • for booleans
      • for numbers
    • Operators
      • for node-sets
      • equality operators
      • for booleans
      • for numbers

    Data types

    previous table of contents next
    2 of 17 [61]
    Functions and operators operate on and return the four fundamental XPath data types:
    string
    A sequence of zero or more Unicode characters.
    number
    An IEEE-754 double. This is the same as Java's double primitive data type for all intents and purposes.
    boolean
    The value ‘true’ or ‘false’. Semantically the same as Java's boolean type. However, XPath does allow 1 and 0 to represent true and false respectively.
    node-set
    An unordered collection of nodes from an XML document without any duplicates. Since a node-set is a mathematical set, there is no fundamental ordering defined on the set. However, most node-sets have a natural document order that's derived from the order of the nodes in the set in the input document. In practice, most APIs use lists rather than sets to represent node-sets, and these lists are sorted in either document order or reverse document order, depending on how they were created.

    Functions and operators

    previous table of contents next
    3 of 17 [61]
    functions can occur within expressions
    • string-length('blabla')
    • ceiling(/numbers/set[2]/nr[1]/@value)
    operators can operate on expressions
    • string-length('blabla') + /numbers/set[2]/nr[1]/@value

    XPath node-set functions

    previous table of contents next
    4 of 17 [61]
    Functions from several sources: XPath functions for node sets
    position() number
    returns the value of the position of the current node in the context node set
    last() number
    returns a number equal to the context size (position number of last element in context)
    current() node-set [XSLT]
    returns a node-set that has the current node as its only member (important for determining current node when it's different from the context node: inside predicates)
    count(node-set) number
    returns the number of nodes in the context node-set in the argument
    name(node-set?) string
    returns the full, prefixed name of the first node in the argument node-set, or the name of the context node if the argument is omitted. It returns the empty string if the relevant node does not have a name (e.g. it's a comment or text node.)

    XPath node-set functions (2)

    previous table of contents next
    5 of 17 [61]
    id(object) node-set [XSLT]
    returns a node-set containing the single element node with the specified id as determined by an ID-type attribute. If no node has the specified ID, then this function returns an empty node-set.
    generate-id(node-set?) string [XSLT]
    generates a string, in the form of an XML name, that uniquely identifies a node in the document
    document(object) node-set [XSLT]
    finds an external XML document by resolving a URI reference, parses the XML into a tree structure, and returns its root node. When the argument is ommitted or consists of the empty string '', the root node of the current document (ie. the stylesheet) is returned.

    XPath node-set functions: examples

    previous table of contents next
    6 of 17 [61]
    eg:
    • //p[last()]
      (= any <p> element that is the last <p> child of its parent, short for /descendant-or-self::node()/child::p[last()])
      vs: (//p)[last()]
      (= the last <p> element of the document, short for for (/descendant-or-self::node()/child::p)[last()])
    • //name[position() < 3]
    • preceding::abbr[@expan = current()/@expan] [XSLT]
    • //p[count(name) > 3]
    • //*[name() = name(preceding-sibling::*)]
    • id('LB') [XSLT]
      (note: id() returns empty node-set when source XML has no DTD declaration)
      (= equal to //*[@id='LB']
    • generate-id(../../title) [XSLT]
    • document(xref/@doc)//poem[contains(author, 'Blake')] [XSLT]

    XPath string functions

    previous table of contents next
    7 of 17 [61]
    string(object?) string
    returns the string-value of the argument. Some caveats:
    • string value of node = concatenation of all decendant text nodes
    • string value of node set = string value of 1st node of the node set
    • string value of empty node set = empty string
    • default: when argument = omitted string-value of context node
    concat(string, string, string*) string
    converts each argument to a string and joins them into result string
    contains(string, string) boolean
    returns true when the first string contains the second; otherwise false
    string-length(string?) number
    returns the length (number of characters) of the specified string
    normalize-space(string?) string
    returns the specified string with leading and trailing whitespaces removed and internal whitespace reduced to single space characters
    translate(string, string, string) string
    places all characters in the first string that are found in the second string with the corresponding character from the third string.

    XPath string functions (2)

    previous table of contents next
    8 of 17 [61]
    starts-with(string, string) boolean
    returns true when the first string starts with the second; otherwise false
    substring(string, number, number?) string
    returns the substring of the first argument beginning at the index position given by the second argument and continuing for the number of characters specified by the third argument (or until the end of the string if the third argument is omitted.)
    substring-before(string, string) string
    returns that part of the first string that precedes the second string. It returns the empty string if the second string is not a substring of the first string. If the second string appears multiple times in the first string, then this returns the portion of the first string before the first appearance of the second string.
    substring-after(string, string) string
    returns that part of the first string that follows the second string. It returns the empty string if the second string is not a substring of the first string. If the second string appears multiple times in the first string, then this returns the portion of the first string after the initial appearance of the second string.

    XPath string functions: examples

    previous table of contents next
    9 of 17 [61]
    eg:
    • string(//body//name)
    • string-length(/TEI.2/text/envelope)
    • contains((//p[2]//name)[last()]/@reg, 'Ros')
    • concat('the answer is ', starts-with(//body//p[1], 'you'))
    • normalize-space(concat(substring-after(//name[3]/@reg, ','), substring-before(//name[3]/@reg, ',')))
    • translate(//title[starts-with(., 'EST')], 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
    Note: there is no ends-with() function. To test whether a string ends with another string, the simplest test is performed by a combination of string functions:
    substring(string1, string-length(string1) - string-length(string2) + 1) = string2
    eg:
    • substring(//salute[1], (string-length(//salute[1]) - string-length('Daphne')) + 1) = 'Daphne'
      =
      substring(//salute[1], (11 - 6 + 1)) = 'Daphne'

    XPath Boolean functions

    previous table of contents next
    10 of 17 [61]
    not(boolean) boolean
    turns true into false and false into true
    boolean(object) boolean
    converts the argument to a boolean in a mostly sensible way. NaN and 0 are false. All other numbers are true. Empty strings are false. All other strings are true. Empty node-sets are false. All other node-sets are true.
    true() boolean
    always returns true. It's necessary because XPath does not have any boolean literals
    false() boolean
    always returns false. It's necessary because XPath does not have any boolean literals

    XPath numeric functions

    previous table of contents next
    11 of 17 [61]
    number(object?) number
    converts its argument to a number. If argument is omitted, the string-value of the context node is used. Boolean false becomes 0; true becomes 1. Non-numeric strings become NaN (Not a Number). Node-sets are first converted to a string using the string() function; then the conversion is similar as with strings.
    sum(node-set) number
    returns the total of a set of numeric values contained in a node-set (by converting the string-value of each node to a number, and totaling these numeric values). It is an error if the argument is not a node-set.
    floor(number) number
    returns the greatest integer smaller than the number
    ceiling(number) number
    returns the smallest integer greater than the number
    round(number) number
    rounds the number to the nearest integer

    XPath numeric functions: examples

    previous table of contents next
    12 of 17 [61]
    eg:
    • round('4.345')
    • sum(//num/@value)
    • floor(1,45)
    • ceiling(1,45)
    • round(1,45)

    XPath node-set operators

    previous table of contents next
    13 of 17 [61]
    We've encountered some of them before:
    / node-set
    separates successive steps in a location path
    // node-set
    returns all of the descendants of the context node
    | node-set
    (union operator) returns a node-set containing all the nodes that are in any of the input node-sets
    eg:
    • //abbr[contains(.,'lover')] | //abbr[contains(@expan,'lover')]
    • (//abbr | //abbr/@expan)[contains(.,'lover')]

    XPath equality operators

    previous table of contents next
    14 of 17 [61]
    These operators test for the equality of their operands. Since the operands may be of all different data types, some basic rules apply:
    • if one operand is a boolean, the other is converted to a boolean and they are compared as such
    • otherwise, if one operand is a number, the other is converted to a number and they are compared as such
    • otherwise, they are compared as strings
    = boolean
    returns true if the (converted) values of its operands are the same; false otherwise
    != boolean
    returns true if the (converted) values of its operands are not the same; false otherwise
    Note: note the difference with the boolean function not()
    testing for an element that lacks an attribute:
    //name
    //name[@key != '']: only names with key attribute that is not empty
    //name[not(@key = '')]: all names with non-empty key attribute (also names without key attribute)
    //name[not(@key)]: only names without key attribute

    XPath boolean operators

    previous table of contents next
    15 of 17 [61]
    and boolean
    returns true only if both of the operands are true; false otherwise. The operands are first converted to booleans if necessary by an implicit call of the boolean() function.
    Note: note the differences with the node-set union operator ‘|’
    or boolean
    returns true if either or both of the operands are true; false otherwise. The operands are first converted to booleans if necessary by an implicit call of the boolean() function.
    Note: note the differences with the node-set union operator ‘|’
    eg:
    • //abbr[position()=1 or position()=last()-1]
    • //abbr[position()=1 and position()=last()-1]
      Note: =/= the same as
      //abbr[1 and last()-1]! (why does this one select (nearly) every <abbr>element ?)

    XPath numeric operators

    previous table of contents next
    16 of 17 [61]
    These operators can handle different operand types, by converting them to numbers before the operation by calling the function number(). In case of operands that are node-sets, the number() function is called on its string value.
    Arithmetic operators:
    + number
    returns the sum of its operands
    - number
    returns the result of the substraction of its operands
    * number
    returns the result of the multiplication of its operands
    div number
    returns the result of the division of the left-hand operand by the right-hand operand
    Note: the division operator is deliberately not defined as the usual ‘/’ to avoid interference with the path operator of the same form.
    mod number
    returns the remainder of integer division of the left-hand operand by the right-hand operand

    XPath numeric operators (2)

    previous table of contents next
    17 of 17 [61]
    These operators can handle different operand types, by converting them to numbers before the operation by calling the function number(). In case of operands that are node-sets, the number() function is called on its string value.
    Comparison operators:
    < boolean
    returns true if the left-hand operand is less than the right-hand operand
    Note: escape as &lt;!
    <= boolean
    returns true if the left-hand operand is less than or equal to the right-hand operand
    Note: escape as &lt;=!
    > boolean
    returns true if the left-hand operand is greater than the right-hand operand
    >= boolean
    returns true if the left-hand operand is greater than or equal to the right-hand operand
    eg:
    • //*[count(*) > 10]

    IV. Document reports using XPath in XSLT

    Resources

    previous table of contents next
    1 of 1 [61]
    Specific XPath resources are sparse (mostly integrated in XSLT resources). Useful resource: