3. A simple XML differencing algorithm

The XML differencing algorithm implemented in the Compare tool in XMLmind XML Editor - Online Help may be described as follows:

  1. Begin by comparing the root element of the original document to the root element of the revised document.

  2. If the element in the original document (let's call it the original element) and the element in the revised document (let's call it the revised element) have the same serial number, then compare their contents. Otherwise consider that these elements are completely different.

  3. Trivially compare the attributes of the original element to the attributes of the revised element.

  4. The child nodes of an element are converted to a sequence of comparable items prior to be compared:

    • A text item is added to the sequence for each word contained in the element[7].

    • A serial number item is added for each child element contained in the element.

    • A comment item is added for each XML comment contained in the element.

    • A processing-instruction item is added for each processing-instruction contained in the element.

    • An inclusion directive item is added for each range of included nodes contained in the element.

    Example:

    <p>The <i>quick <b>brown</b></i> fox jumps over the <b>lazy</b> dog.</p>

    gives:

    "The ", element_7223, " fox", " jumps", " over", " the ", element_10087, " dog."
  5. The sequence of items of the original element is compared to the sequence of items of the revised element using the well-known Unix diff algorithm[8] here applied to comparable items rather than to text lines:

    • Two text items are equal if they contain exactly the same text.

    • Two serial number items are equal if they have the same serial number.

    • Two comment items are equal if they contain exactly the same text.

    • Two processing-instructions items are equal if they have the same target and contain exactly the same text.

    • Two inclusion directive items are equal they have exactly the same XML contents. For example, <xi:include href="vars.xml" xpointer="copyright"/> and <xi:include xpointer="copyright" href="vars.xml"/> are equal, while <xi:include href="vars.xml" xpointer="copyright"/> and <xi:include href="vars.xml" xpointer="notice"/> differ.

  6. Compare each child element of the original element to the child element element having the same serial number in the revised element. Proceed as explained starting from the Compare attributes step.

Notes:



[7] If the element has or inherits xml:space="preserve", a text item is added for each text line contained in the element.

[8] "A file comparison program" by Webb Miller and Eugene W. Myers, 1985.

[9] By default, inclusion directives are always transcluded by XXE. Hence such directives do not really exist in the document being edited. Instead, they are recreated by XXE each time the document is saved to disk.