menu
XMLmind Word To XML
|Convert icons
Convert to PDF iconConvert to RTF (Word 2000+) iconConvert to WordprocessingML (Word 2003+) iconConvert to Office Open XML (.docx, Word 2007+) iconConvert to OpenDocument (.odt, OpenOffice/LibreOffice 2+) icon
FAQ

Change history

1.12 (September 17, 2024)

Enhancements:

  • Upgraded XMLResolver to version 5.2.5.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.5.2.
  • "FlatLaf Look and Feel" add-on: updated FlatLaf to version 3.5.1. (On Linux, the FlatLaf light theme is used as the default Look & Feel of desktop application w2x-app.)
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on the Java™ 22 platforms.

Bug fixes:

  • Desktop application w2x-app: choosing another input file automatically updated not only the output file name (which is fine), but also the output directory (which is generally not what you want).
  • The font stacks found in various stock CSS stylesheets were somewhat outdated.
  • It was not possible to use the FlatLaf Look&Feel on Windows even though this facility was also available on this platform.

1.11 (February 16, 2024)

Enhancements:

  • When generating any kind of output containing “semantic” XHTML 1.0 Strict, XHTML 1.0 Transitional, XHTML 1.1, the xml:lang attribute is automatically added to elements having a lang attribute. Note that is does not happen when generating “semantic” XHTML 5.0 output. However this feature may be controlled using new boolean transform.add-xml-lang parameter.
    Excerpts from XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition), C.7. The lang and xml:lang Attributes

    Use both the lang and xml:lang attributes when specifying the language of an element. The value of the xml:lang attribute takes precedence.

    Excerpts from XHTML™ 1.1 - Module-based XHTML - Second Edition, 3. The XHTML 1.1 Document Type

    This specification also adds the lang attribute to the I18N attribute collection as defined in [XHTMLMOD]. The lang attribute is defined in [HTML4]. When this attribute and the xml:lang attribute are specified on the same element, the xml:lang attribute takes precedence. When both lang and xml:lang are specified on the same element, they SHOULD have the same value.

    Excerpts from HTML Living Standard, The lang and xml:lang attributes

    The lang attribute in the XML namespace may be used on HTML elements in XML documents, as well as elements in other namespaces if the relevant specifications allow it (in particular, MathML and SVG allow lang attributes in the XML namespace to be specified on their elements). If both the lang attribute in no namespace and the lang attribute in the XML namespace are specified on the same element, they must have exactly the same value when compared in an ASCII case-insensitive manner.

    Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner.

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.5.1.
  • Upgraded XMLResolver to version 5.2.3.
  • "FlatLaf Look and Feel" add-on: updated FlatLaf to version 3.3. (On Linux, the FlatLaf light theme is used as the default Look & Feel of desktop application w2x-app.)
  • XMLmind Word To XML is now officially supported
    • on Java™ 21 platforms,
    • on macOS Sonoma (version 14) running on Intel® or Apple® Silicon processors.

Bug fixes:

  • Parameter transform.docbook-version was not honored when generating a DocBook 5.1+ assembly: the assembly element had no version attribute and all the generated topic elements had a version attribute set to fixed value "5.1".
  • When generating a DocBook 5.1+ assembly, the assembly /structure element now contains a merge child element rather than an info child element. This change was made after reading the clarifications found in DocBook 5.2: The Definitive Guide.
  • In some cases, text boxes were not processed by w2x.

1.10 (June 20, 2023)

Enhancements:

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.5.0, which supports the new corporate layout in addition to the classic and simple layouts.
  • Upgraded XMLResolver to version 5.2.0.
  • "FlatLaf Look and Feel" add-on: updated FlatLaf to version 3.1.1. (On Linux, the FlatLaf light theme is used as the default Look & Feel of desktop application w2x-app.)
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on the Java™ 20 platform.

Bug fixes:

  • XMLmind Word To XML did not support pictures cropped using MS-Word Crop tool. This bug is fixed now but with some limitations: linked (i.e. not embedded in the DOCX file) images cannot be cropped by w2x; SVG images or images automatically converted to SVG (e.g. WMF images) cannot be cropped by w2x.

    Outcropped pictures”, that is, adding a margin around a picture using MS-Word Crop tool, are supported too.

  • Standard or custom headings having an outline level > 6 were not correctly processed. For example, when these headings were numbered, they were converted to (semantic XHTML) a numbered list containing a single item.
  • Standard or custom headings having an outline level equal to 9, which means: no outline level, were not correctly processed. For example, a TOCHeading containing "Table of Contents", followed by the table of contents automatically generated by MS-Word, followed by a Heading1 containing "Introduction" were converted to (semantic XHTML) "<section><h1>Table of Contents</h1><p>Introduction</p>...". This quite common sequence of paragraphs is now converted to "<p>Table of Contents</p><section><h1>Introduction</h1>...".
  • A DOCX file ending with a bibliography and containing footnotes or/and endnotes cause w2x to generate invalid DocBook and DITA files.
  • The case where the paragraph containing a picture is given a Caption style, a not so uncommon MS-Word user mistake, was not gracefully handled.

1.9.1 (March 15, 2023)

Enhancements:

  • Replaced the Apache Commons Resolver (lib/resolver.jar) by the XMLResolver (lib/xmlresolver.jar; version 5.1.1).
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.4.0.
  • "FlatLaf Look and Feel" add-on: updated FlatLaf to version 3.0. (On Linux, the FlatLaf light theme is used as the default Look & Feel of desktop application w2x-app.)
  • XMLmind Word To XML is now officially supported on macOS Ventura (version 13.x), Intel® or Apple® silicon processor.

Bug fixes:

  • The bin/w2x and bin/w2x-app shell scripts did not use the bundled private Java™ runtime on on Macs having an Apple® silicon processor.

1.9 (October 3, 2022)

Enhancements:

  • Desktop application w2x-app:
    • Redesigned the dialog box letting the user add or modify an entry of the MS-Word style to XML element map in order to make this dialog box simpler and less error-prone to use. More information.

      Moreover, the mapping type of a paragraph style — "Paragraph", "Paragraph N to1", "Paragraph N to pre" or "Paragraph 1 to pre" — is now automatically suggested by this dialog box. However the heuristics used to implement the suggestions being rather conservative, the suggestion will be generally "Paragraph".

    • Now accepts a DOCX file as its last command-line argument. This implies that, for example on Windows, you can now use "Open With" to “open” a DOCX file using w2x-app.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.3.3.
  • "FlatLaf Look and Feel" add-on: updated FlatLaf to version 2.5. (On Linux, the FlatLaf light theme is used as the default Look & Feel of desktop application w2x-app.)
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on the Java™ 18 and 19 platforms.

Bug fixes:

  • Desktop application w2x-app: when converting DOCX to a DITA map or DocBook assembly, by default, the directory containing automatically generated topics was set to a file path (-p transform2.topic-path %{~nO}_files) and not to an URI (-p transform2.topic-path %{~no}_files). This caused w2x-app to create invalid maps and assemblies when, for example, the name of the DOCX file to be converted contained whitespace.

1.8.6 (March 10, 2022)

Enhancements:

  • "FlatLaf Look and Feel" add-on: updated FlatLaf to version 1.6.5. (On Linux, the FlatLaf light theme is used as the default Look & Feel.)
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported
    • on macOS Monterey (version 12.x), including on Macs having an Apple® M1 (ARM®-based) processor;
    • on Windows 11.

Bug fixes:

  • Fixed a layout bug in the dialog box letting the user add or modify an entry of the MS-Word style to XML element map.

1.8.5 (October 4, 2021)

Enhancements:

  • Now supports SVG pictures embedded in the DOCX file or linked from it. Note that only recent versions of MS-Word (MS-Word 2016+?) are capable of dealing with SVG files.
  • Upgraded wmf2svg to version 0.9.11.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.3.0.
  • Some internal changes were needed to make XMLmind Word To XML compatible with XMLmind XML Editor v10+.
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 17 platforms.

    XMLmind Word To XML is not yet supported on Macs having an Apple M1 (ARM-based) processor despite the fact that a “native” OpenJDK and a “native” OpenJFX are now available for this platform. We plan to provide official support and a .dmg distribution for this platform before the end of this year.

Bug fixes:

  • Relative paths of external resources were automatically converted to absolute URIs by resolving them against the location of the DOCX file. For example, field
    INCLUDEPICTURE \d "..\\Pictures\\My Picture.png" \* MERGEFORMATINET

    contained in "C:\Users\John\Documents\My Document.docx", was processed as if it were

    INCLUDEPICTURE \d "C:\\Users\\John\\Pictures\\My Picture.png" \* MERGEFORMATINET

    Now such relative paths are just converted to relative URIs —to make it simple, kept as is— by XMLmind Word To XML.


1.8.4 (May 20, 2021)

If you plan to run the "Word To XML" servlet on Apache Tomcat version 10, please first read this important note as there is a major breaking change between latest version of Tomcat (>= 10) and older versions (<= 9).

Enhancements:

  • Desktop application w2x-app:
    • The location and size of the application is now persistent across conversion sessions.
    • Unless the useNativeFileChooser preference has been enabled (which is not the case by default), the size of the file chooser dialog box is now persistent across conversion sessions.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.2.0.
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 16 platforms.

Bug fixes:

  • In some cases, XMLmind Word To XML failed to extract some useful description text out of a VML (a legacy graphical objects format) picture.

1.8.3 (December 2, 2020)

Enhancements:

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.1.1.
  • Upgraded wmf2svg to version 0.9.8.
  • XMLmind Word To XML is now officially supported on Java™ 15 platforms and on macOS Big Sur (version 11.0).

Bug fixes:

  • In some cases, XMLmind Word To XML failed to extract useful image data found in DOCX pictures.

Possible incompatibilities:


1.8.2 (August 4, 2020)

Enhancements:

  • New DocBook 5.1 assembly output format. For example, running
    w2x -o assembly mydoc.docx out/mydoc.xml 

    generates in the out/ directory DocBook 5.1 assembly file mydoc.xml and also all the topic files referenced by this assembly.

  • w2x-app:
    • On Linux, FlatLaf and its light theme (called "FlatLight") is now used as the default Look & Feel. This is needed because on Linux, the “system” Look & Feel (called "Metal") looks rather outdated.

      If, for any reason, you prefer to use the “system” Look & Feel, please start w2x-app by running

      w2x-app -putpref lookAndFeelClassName fallback

      This setting is done once for all. If after doing that, you finally prefer to revert to FlatLaf, simply run

      w2x-app -putpref lookAndFeelClassName
    • The setup assistant (AKA the “wizard” style dialog box) now supports the DocBook 5.1 topic and DocBook 5.1 assembly output formats.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.1.
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 14 platforms.

1.8.1 (March 9, 2020)

Enhancements:

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 3.0.1. The Web Help generated by whc v3 gets a fresh new look. Moreover:
    • It is now “responsive” by default, that is, it adapts its layout to the size of the screen (e.g. it can adapt to the screen of a smartphone in portrait mode). This feature is controlled by new parameter wh-responsive-ui.
    • It does not leverage jQuery UI anymore (only jQuery now). However some new parameters may be used to override most fonts and colors used in the generated Web Help.
    • New parameter wh-ui-language may be used to specify the language used by the messages of the generated Web Help (tab labels, button tool tips, etc). The default is to use the language of the Web browser.
  • All programs which are part of XMLmind Word To XML are now officially supported on macOS Catalina (version 10.15).
  • w2x-app: added a user preference called useNativeFileChooser which instructs the w2x-app desktop application to display the native file chooser in preference to the multi-platform file chooser. This option is turned off by default because file extension filters are not supported when the native file chooser is invoked by Java™.

Bug fixes:

  • Processing-instructions created by w2x were very often inserted at wrong locations. For example, a <?break-page?> which should be found between two list items was inserted after their list parent.
  • When generating single page or multi-page styled HTML, the background color of the body element was not specified. This is a problem with some Web browsers for which this color is by default a light gray.

    Now the page color found in the DOCX file is used to set the background color of the body element. When this page color is not specified, the background color of the body element is set to white.

Incompatibilities:

  • Web Help output format: the following parameters, all related to jQuery UI, are not supported anymore: webhelp.wh-jquery-css, webhelp.wh-jquery-custom-theme, webhelp.wh-jquery-theme, webhelp.wh-jquery-ui.

1.8 (September 30, 2019)

Enhancements:

  • XMLmind Word To XML is better at choosing user-specified bookmarks (expected to have long and descriptive names like "Edit_a_citation") over bookmarks automatically generated by MS-Word (e.g. "BM3" ).

    This may have important benefits when converting a DOCX file to multiple semantic XML files (e.g. a DITA map and its associated topics) because in such case, the names of the generated files are generally inferred from user-specified bookmarks.

    In order to implement this enhancement, we had to replace parameter edit.ids.automatic-ids by new parameter convert.automatic-ids. This has been done to move the detection of automatic bookmarks at an earlier stage of the conversion process.

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.3.2.
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 13 platforms.

Bug fixes:

  • Some “complex” fields containing a reference to a bookmark (e.g. field REF (Foo) \h, where the bookmark name "(Foo)" contains characters U+FF08 and U+FF09) were converted to broken links. This bug may have had important and somewhat surprising consequences, for example, on generated DITA maps, so please upgrade to version 1.8.
  • First index entry marking a given page range (e.g. field XE "XML" \r "OpenXMLPageRange") in the DOCX document was correctly converted to an DITA or DocBook index term range (e.g. <indexterm start="OpenXMLPageRange">XML</indexterm> and <indexterm end="OpenXMLPageRange"/>), but subsequent index entries marking the same page range (e.g. field XE "Extensible Markup Language" \r "OpenXMLPageRange") were converted to single point index terms (e.g. <indexterm>Extensible Markup Language</indexterm>).

    Now subsequent index entries marking the same page range are converted to index terms containing a redirection to the first index term referencing of this page range (e.g. <indexterm>Extensible Markup Language<index-see>XML</index-see></indexterm>).

Incompatibilities:

  • XMLmind Word To XML now requires a Java 8+ runtime in order to compile and run.
  • Parameter edit.ids.automatic-ids is now ignored after reporting a warning. This parameter has been replaced by new parameter convert.automatic-ids. See above enhancement.
  • On Windows, EMF graphics are now converted to PNG using a resolution of at least 300DPI. Therefore the default rule used to perform this conversion is (resolution -300 means: use 300DPI if intrinsic resolution is less than 300DPI):
    .emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300

    Previously this rule was (resolution 0 means: use intrinsic resolution)

    .emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution 0

1.7 (April 02, 2019)

Enhancements:

  • Command-line utility w2x and desktop application w2x-app now support plugins.

    A plugin is simply a text file having a ".w2x_plugin" suffix, containing a number of w2x command-line arguments and starting with comment lines containing information about the plugin (for example, its name). Example, rss.w2x_plugin:

    ### plugin.name: rss
    ### plugin.outputDescription: RSS 2.0
    ### plugin.outputExtension: xml
    ### plugin.multiFileOutput: no
    
    -c
    -e w2x:xed/main.xed
    -t rss.xslt
    
    # Image files not useful here.
    -step:com.xmlmind.w2x.processor.DeleteFilesStep:cleanUp
    -p cleanUp.files "%{~pO}/%{~nO}_files"

    This plugin converts DOCX to RSS. This process is partly implemented by XSLT 1.0 stylesheet rss.xslt which is part of this plugin. Stylesheet rss.xslt transforms its input, the semantic XHTML 1.0 Transitional file created by the Edit step (-e w2x:xed/main.xed), to RSS.

    Aside XSLT 1.0 stylesheets, a plugin may also include XED scripts as well as ".jar" files containing custom conversion steps implemented in Java™.

    A plugin is registered with w2x and w2x-app by copying all its files anywhere inside directory w2x_install_dir/plugin/. However it's strongly recommended to group all the files comprising a plugin in a subdirectory of its own having the same name as the plug-in (e.g. w2x_install_dir/plugin/rss/). Alternatively, this plugin may be installed anywhere you want provided that the directory containing the ".w2x_plugin" file is referenced in the W2X_PLUGIN_PATH environment variable. Example: set W2X_PLUGIN_PATH=C:\Users\John\w2x\rss;C:\temp\w2x_plugins.

    Once registered with w2x, a plugin may be invoked as it were a stock conversion, for example:

    w2x -o rss my.docx my.xml

    In w2x-app, you'll find the registered plugins in the "Convert to" combobox and in the "Output format" screen of the setup assistant.

  • When a DOCX file contains revision info (i.e. "Track Changes"), w2x implements its own, automatic, very crude, interpretation of "Accept All Changes". That's why, a warning is now issued informing the user that she/he would better use MS-Word to manually accept or reject the tracked changes before submitting the DOCX file to w2x.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.1.3_04.
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 12 platforms.

Bug fixes:

  • w2x-app: a user converting a DOCX file to a multi-file output format (e.g. Web Help), choosing the default output directory (which is the directory containing the input DOCX file) and choosing by mistake to make the output directory empty before proceeding with the conversion ended up deleting the input DOCX file.
  • The online help browser of w2x-app displayed a blank window when the computer running w2x-app was not connected to the Internet.

1.6 (December 21, 2018)

Enhancements:

  • Index entries marking a page range (e.g. field XE "XML" \r "OpenXMLPageRange") in the DOCX document are now supported when generating DITA and DocBook documents.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.1.3_02.
  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 11 platforms.
  • All programs which are part of XMLmind Word To XML are now officially supported on macOS Mojave (version 10.14).

1.5.1 (August 31, 2018)

Enhancements:

  • A new —very low-level— parameter -p edit.ids.automatic-ids regex_pattern lets the user specify which bookmarks automatically generated by MS-Word ("_GoBack", "_Toc123", etc) are to be preserved by the conversion process, and this, even when such bookmarks are not referenced anywhere in the generated document.
  • Slightly improved the way DOCX metadata (e.g. author, publisher, etc) are translated to DITA.
  • A warning is now reported when processing a DOCX index entry marking a page range (e.g. field XE "XML" \r "OpenXMLPageRange"). This kind of index entries is currently not supported by XMLmind Word To XML. Only simple index entries marking a single location in the DOCX document are currently supported.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.1.3_01.

Bug fixes:

  • Setting the paragraph indentation of stock MS-Word style "List Paragraph" to a number of Character Units (ch) caused XMLmind Word To XML to generate incorrect lists.

    Note that we are not 100% sure that this bug is really fixed now. Unfortunately, the behavior of MS-Word when it comes to processing w:ind/@w:xxxChars attributes is not completely documented in the "ECMA-376, Office Open XML File Formats" specification.

  • Defining a custom style named "Normal +" caused XMLmind Word To XML to enter an endless loop.
  • XMLmind Word To XML always created DITA fn (footnote) elements having an id attribute. The bug is that these fn elements were never referenced by an <xref type="fn"> (footnote call). The consequence was that these footnotes were automatically discarded when converting to HTML, PDF, DOCX, etc, the DITA files created by w2x.

1.5 (April 25, 2018)

Enhancements:

  • When generating semantic XHTML, the following inline element names are no longer “hard-wired”: sup, sup, small, big, s, u, tt, b, i. Alternate element names may be specified using the following parameters: inlines.sup-element, inlines.sup-element, inlines.small-element, inlines.big-element, inlines.s-element, inlines.u-element, inlines.tt-element, inlines.b-element, inlines.i-element. Example 1: generate code rather than tt elements: -p edit.inlines.tt-element "code". Example 2: do not generate small elements: -p edit.inlines.small-element "span style='font-size:x-small'" (notice how one or more attributes may be specified too).

    This facility is useful only when generating semantic XHTML and all formats based on semantic XHTML. Using it when generating DITA or DocBook may give poor results.

  • When generating semantic XML of any kind, all the XHTML meta elements but author, description, dcterms.* are automatically suppressed from the semantic XHTML 1.0 Transitional document generated by the Edit step and used as an input by the Transform step.

    If you want to keep some or all the meta elements in this intermediate semantic XHTML 1.0 Transitional document, you may now specify -p edit.metas.keep regexp_matching_meta_name. Examples: -p edit metas.keep '.*' keeps all metas; -p edit metas.keep '^dc.' keep all metas having a name starting with "dc." (e.g. <meta name="dc.subject" content="..." />).

  • XMLmind Word To XML can now generate DITA indexterm elements having index-sort-as children and DocBook indexterm/primary, secondary, tertiary elements having sortas attributes. For this to happen, the input DOCX file must contain XE (index entry) fields having \y "yomi" (first phonetic character for sorting indexes) field arguments.

    Unlike MS-Word which considers \y "yomi" only for East Asian languages, w2x uses this XE field argument to sort the index entries whatever the language of the document. English examples: {XE "<span>" \y "span"}, {XE "Operation:+" \y ":Addition"}.

  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 10 platforms.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.1.3.

Bug fixes:

  • The language of DOCX files written in an East Asian language is not correctly detected.

    Unfortunately, this will always be the case because w2x never examines the characters actually contained in a text span having <w:lang w:eastAsia="ja-JP" w:val="en-US"/> to determine whether this text span is written in ja-JP or is written in en-US or is written is a mix of both languages.

    However, a partial workaround for this limitation is to specify for example -p convert.set-lang ja-JP or -p convert.default-lang ja-JP. When parameter convert.set-lang or parameter convert.default-lang is set to a language code starting with ja, zh or ko, then it is attribute w:lang/@w:eastAsia which is used to determine the language of a text span and not attribute w:lang/@w:val.

    Note that -p convert.default-lang ja-JP is just used as a hint to favor attribute w:lang/@w:eastAsia over attribute wlang/@w:val. Given the way MS-Word sets these two attributes, using parameter -p convert.default-lang ja-JP will not cause a vastly incorrect detection of the language when converting a German DOCX file for example.

  • The Convert step sometimes generated XHTML elements having attribute lang="x-NONE". Value "x-NONE" is invalid.

1.4.0_01 (February 24, 2018)

  • Minor internal changes needed to make XMLmind Word To XML code compatible with XMLmind XML Editor v8.

1.4 (December 18, 2017)

  • All semantic XHTML formats and all formats based on semantic XHTML (EPUB, Web Help, frameset): paragraphs having p-Title and p-Subtitle styles (to make it simple; see parameters edit.title.title-style-names and edit.title.subtitle-style-names) are now converted to equivalent semantic XHTML elements.

    In the previous versions of XMLmind Word To XML, such titles were converted only to head/title and to head/meta name="description" which made them invisible to the user (though usable by programs such as the XSLT stylesheets generating DITA or DocBook).

    This feature can be controlled by specifying the following new parameters:

    • edit.title.keep-title. Default value when generating semantic XHTML: "yes". Default value when generating DITA and DocBook: "no".
    • edit.title.title-container. Default value: "h1 class='role-document-title'". Ignored when edit.title.keep-title is "no".
    • edit.title.subtitle-container. Default value: "p class='role-document-subtitle'". Ignored when edit.title.keep-title is "no".
  • EPUB formats: added parameter epub.omit-toc-root (default value: "no"). Web Help formats: added parameter webhelp.omit-toc-root (default value: "no").

    By default, the Table of Contents (TOC) generated for an EPUB or Web Help document has a single “root”. This single root always points to the page containing the title, subtitle, author, etc, of the document. Setting this parameter to "yes" prevents the generated TOC from having such single root.

  • XMLmind Word To XML, which passed all non-regression tests, is now officially supported on Java™ 9 platforms.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.1. The new compiler uses window.sessionStorage rather than cookies to store the internal state of the Web Helps it generates.
  • Changed the technology used to implement the context-sensitive online help from obsolete JavaHelp to a dedicated, embedded Web browser displaying Web Help.

    if the Java™ runtime used to run w2x-app is older than version 1.8.0_40, the system Web browser rather than the dedicated, embedded Web browser is used to display the Web Help, which is much less convenient for the user.


1.3 (November 08, 2017)

Please do not use new Java 9 to run the programs which are part of XMLmind Word To XML. XMLmind Word To XML has not yet been tested against this version of Java.

Enhancements:

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 2.0, which supports 2 layouts for the generated Web Help: classic, the default layout and simple, a new layout. When generating Web Help, pass w2x option -p webhelp.wh-layout simple to give it a try.
  • Setup assistant of w2x-app:
    • Added a "Layout of the generated Web Help" combobox to the "Output format options" screen when the chosen output format is Web Help. This combobox makes it easy choosing between the classic and simple layouts.
    • The dialog box allowing to add or modify an entry of the MS-Word style to XML element map now displays the localized name of a style (e.g. "Definition Char") next to the w2x name of this style (e.g. "c-DefinitionChar"). This is really needed when you give for example Japanese names to your custom MS-Word styles.
  • Parameter edit.remove-styles.preserved-classes now accepts class patterns as well as class names. For example, specify -p edit.remove-styles.preserved-classes "^(t|(tr)|(tc)|(tp)|p|(pn)|n|c)-.+$" if you want to preserve in the semantic XHTML the class names corresponding to all the CSS styles generated during the Convert step.
  • Hidden text runs (<w:vanish/>) are now converted to <span style="display:none">. When generating semantic XML, these invisible span elements are then discarded.
  • “Word To XML” servlet: added an optional params servlet parameter which allows to augment or to override some of the options of the conversion specified by the conv servlet parameter. Example:
    curl -s -S -o manual.epub \
      -F "docx=@manual.docx;type=application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
      -F "conv=epub" \
      -F "params=-p epub.identifier urn:x-mlmind:w2x:manual -p epub.split-before-level 8" \
      http://localhost:8080/w2x/convert
  • XMLmind Word To XML is now available as a macOS X native .dmg distribution including a private Java™ 1.8.0_152 runtime.
  • All programs which are part of XMLmind Word To XML are now officially supported on macOS High Sierra (version 10.13).

Bug fixes:

  • When a table was inserted inside a sequence of paragraphs having the same border, the conversion to styled XHTML (and to all output formats based on styled XHTML, like EPUB) failed with the following error message: error in action "group": missing attribute "g:container" for element .../html:p[NN].
  • When generating semantic XHTML, for some rare cases, class name role-bridgeheadI was added to li elements.
  • Field codes like "XE" (index entry) were not normalized to upper-case. For example, this bug could cause some index entries to be missing in the generated semantic XML.
  • It was not possible to use built-in image converter factory com.xmlmind.w2x_ext.emf2png.EMF2PNG to convert WMF to PNG despite the fact that this factory supports the WMF format in addition to the EMF format.
  • Marking as being deleted all the text contained in DOCX table caused w2x to generate an invalid XHTML table having no cells at all.
  • w2x generated invalid DITA when a table or figure caption contained index terms.

1.2.3 (June 20, 2017)

Enhancements:

  • Converting a DOCX file to Web Help now automatically creates an index.html file (if an index.html file does not already exist). This feature is controlled using parameter webhelp.add-index. The default value of this parameter is yes.
  • Added option -liststeps to the w2x command-line utility. When this option is specified, w2x lists all the conversion steps to be executed and then exits. This option is useful to determine how to customize the conversion steps. Example:
    $ w2x -o bookmap -liststeps
    -step:com.xmlmind.w2x.processor.ConvertStep:convert
    -p convert.create-mathml-object no
    -p convert.set-column-number yes
    -step:com.xmlmind.w2x.processor.EditStep:edit
    -p edit.xed-url-or-file file:/opt/w2x/xed/main.xed
    -step:com.xmlmind.w2x.processor.TransformStep:transform
    -p transform.out-file %{~pnO}.dita
    -p transform.single-topic no
    -p transform.xslt-url-or-file file:/opt/w2x/xslt/topic.xslt
    -step:com.xmlmind.w2x.processor.TransformStep:transform2
    -p transform2.xslt-url-or-file file:/opt/w2x/xslt/bookmap.xslt
    -p transform2.topic-type %{transform.topic-type}
    -p transform2.output-path %{~po}
    -step:com.xmlmind.w2x.processor.DeleteFilesStep:cleanUp
    -p cleanUp.files %{~pnO}.dita
  • Added parameter convert.resource-prefix which is useful when used in conjunction with convert.resource-directory and when several files generated by w2x share the same resource directory.
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 1.4.4, which contains an important bug fix.

Bug fixes:

  • Specifying parameter convert.resource-directory as "." in order to create all image files in the same directory as the other automatically generated files did not work.
  • The resource directory was always made empty if it already existed. This behavior was very dangerous and could delete important user files (dangerous, harmful, example: w2x doc.docx /home/john/doc.html). Now the resource directory is made empty if and only if it's the “automatic” output_file_basename_files/ folder, which is at the same time safe and convenient.

Other changes:

  • Changed "Licensor" from "Pixware SARL" to "XMLmind Software" in all licenses.

1.2.2 (April 14, 2017)

Enhancements:

  • Added parameter edit.ids.generate-section-ids. Setting this parameter to yes (default value is no) ensures that all the sections found in the semantic XHTML resulting from the conversion of a DOCX file have a unique ID.

    When this ID is missing, it is computed using the content of the h1, h2, ..., h6 heading which is the first child of the section. Example:

    <div class="role-section2" id="Title_of_this_section">
      <h2>Title of this section</h2>
      ...

    The maximum length of the automatically computed ID may be specified using parameter edit.ids.section-id-max-length. The default value of this parameter is 32.

    Setting edit.ids.generate-section-ids to yes is especially useful when converting a DOCX file to a DITA map or bookmap. With this parameter, the filenames of the topics referenced by the generated map are guaranteed to have meaningful values (e.g. "Introduction.dita" rather than "d0e35.dita").

  • Added XSLT parameter shortdesc-class-name to W2X_install_dir/xslt/topic.xslt, the XSLT stylesheet which is used to convert intermediate semantic XHTML document to a DITA topic.

    This parameter is used to specify the class name of the XHTML <p> which acts as a short description of the section. Examples: -p transform.shortdesc-class-name p-Shortdesc, -p transform.shortdesc-class-name p-Abstract.

    When this parameter is not specified (or is specified as the empty string which is its default value), the following style mapping, created by the w2x-app wizard:

    -p edit.blocks.convert "p-Shortdesc p class='p-Shortdesc'"
    ...
    <xsl:template match="h:p[@class='p-Shortdesc']">
      <shortdesc>
        <xsl:call-template name="processCommonAttributes"/>
        <xsl:apply-templates/>
      </shortdesc>
    </xsl:template>

    causes DITA <shortdesc> elements to generated inside topic bodies, which is invalid.

    After specifying -p transform.shortdesc-class-name p-Shortdesc, this issue is fixed and DITA <shortdesc> elements are generated before topic bodies.

  • Added an "Other parameters" screen to the w2x-app wizard. This new screen lets the user specify parameters which are not supported by the "Output format options" and "MS-Word style to XML element map" screens. For example, when generating a DITA document, the other screens do not let the user specify -p transform.pre-element-name codeblock (default value being pre).
  • Upgraded XMLmind Web Help Compiler (whc for short) to version 1.4.2_03.

Bug fixes:

  • For some DOCX paragraphs, significant whitespace was removed by XMLmind Word To XML. This gave incorrect results when these DOCX paragraphs were converted to DocBook programlisting, DITA pre, XHTML pre, etc.
  • In the source DOCX file, fields having an empty code (that is, somewhat abnormal fields) caused XMLmind Word To XML to raise a StringIndexOutOfBoundsException.
  • When generating semantic XHTML of any kind with parameter edit.convert-tabs.to-table set to no (the default value), attribute class="role-tabs-XXX" and elements <span class="role-tab"> were not discarded.

    Not only this markup is not useful, but it also prevented some style mappings created the w2x-app wizard from working. Example, the following style mapping of MS-Word paragraph style Note to a DITA element <note>:

    -p edit.blocks.convert "p-Note p class='p-Note'"
    ...
    <xsl:template match="h:p[@class='p-Note']">
      <note>
        <xsl:call-template name="processCommonAttributes"/>
        <xsl:apply-templates/>
      </note>
    </xsl:template>

    failed for the following paragraph (intermediate semantic XHTML preceding the transformation to DITA):

    <p class="role-tabs-35.45-0-117 p-Note">Note:
    <span class="role-tab"> </span>Body of the note here.</p>
  • In rare cases, foot/end notes were numbered starting from 2 and not starting from 1 as expected.

Incompatibilities:

  • w2x_all.jar, the self-contained JAR file, is no longer used by the following scripts: bin/w2x, w2x.bat, w2x-app, w2x-app-c.bat. This prevented advanced users from easily modifying the scripts found in subdirectories xed/ and xslt/. This self-contained JAR file is still available but its use should be reserved to embedding w2x in a third-party application.

1.2.1 (November 24, 2016)

Enhancements:

  • Conversion of images found in the DOCX file (TIFF, WMF, EMF, etc) to standard formats (SVG, PNG, JPEG) may now be controlled using environment variable (or Java™ property) W2X_IMAGE_CONVERSIONS. The default value of this variable is (all specifications on a single line):
    .wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;
    .tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl

    On Windows, the default value of W2X_IMAGE_CONVERSIONS is (all specifications on a single line):

    .wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;
    .emf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution 0;
    .tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
  • Added two new image converters:
    External image converter

    This image converter executes an external program to perform the conversion.

    Examples of W2X_IMAGE_CONVERSIONS specifications (see above): convert EMF to SVG using OpenOffice/LibreOffice:

    .emf.svg soffice --headless --convert-to svg -–outdir %~po %i

    Convert EMF/WMF to PNG using ImageMagick:

    .emf.png.wmf.png magick convert -density 288 "%I" -scale 25% "%O"
    com.xmlmind.w2x_ext.emf2png.EMF2PNG

    This image converter is available only on Windows. It leverages Windows own GDI+ to convert EMF (in fact, Windows metafiles of any kind, including WMF) to PNG.

    This is not that great because, unlike com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory which converts WMF (Windows vector graphics format) to SVG (standard vector graphics format), EMF2PNG converts a vector graphics format to a raster image format. However, having EMF2PNG is better than nothing at all.

  • Upgraded XMLmind Web Help Compiler (whc for short) to version 1.4.2, which leverages jQuery v3.1.1 and jQuery UI v1.12.1. This implies that the Web Help generated by w2x no longer supports Internet Explorer 8 and older versions.

Bug fixes:

  • Images which were used to statically render objects embedded in the DOCX file (e.g. a PowerPoint slide) were ignored.

1.2 (August 01, 2016)

Enhancements:

  • Desktop application w2x-app has now a setup assistant (AKA “wizard” style dialog box) making it quick and easy creating w2x option files. This new setup assistant has a screen which may be used to map MS-Word character and paragraph styles (e.g. p-CodeSample) to XML elements possibly having attributes (e.g. DITA pre outputclass="code-sample").
  • New “semantic” output formats:
    • Multi-page semantic XHTML 1.0 Strict (-o frameset_strict), XHTML 1.0 Transitional (-o frameset_loose), XHTML 1.1 (-o frameset1_1), XHTML 5 (-o frameset5).
    • Web Help containing semantic XHTML 1.0 Strict (-o webhelp_strict), XHTML 1.0 Transitional (-o webhelp_loose), XHTML 1.1 (-o webhelp1_1), XHTML 5 (-o webhelp5).
    • EPUB 2 containing semantic XHTML 1.1 (-o epub1_1).
  • MS-Word math (that is, OpenXML math) is now automatically converted to MathML. However not all output formats may embed MathML. By default, MathML elements are added only to documents having the following formats: XHTML 5, EPUB (through the use of <ops:switch>), DITA and DocBook 5. When targeting any other format, XMLmind Word To XML generates external files containing MathML then adds elements pointing to these external ".mml" files. XHTML 1 example: <object data="doc_files/math-010.mml" type="application/mathml+xml"/>.

    The parameters related to MathML support are: convert.create-mathml-object, edit.finish-styles.mathjax (MathJax support).

  • Added a useful variant of parameter edit.blocks.convert called edit.blocks.convert-to-pre. This new parameter is best explained by comparing it to edit.blocks.convert.

    When using MS-Word, there two ways to represent code samples:

    1. Use a sequence of paragraphs having the same style. Each paragraph contains one line of the code sample. Let's call the style of these paragraphs Code1.
    2. Use a single paragraph containing the whole code sample, which means that this single paragraph contains significant whitespace and line breaks. Let's call the style of this paragraph Code2.

    A sequence of Code1 paragraphs may be converted to an XHTML pre using:

    –p edit.blocks.convert "p-Code1 span g:id='pre' g:container='pre'"

    A Code2 paragraph may be converted to an XHTML pre using:

    –p edit.blocks.convert-to-pre "p-Code2 pre"
  • New parameter transform.pre-element-name may be used to specify to which DocBook or DITA element, an HTML pre element is to be converted. The default value of transform.pre-element-name is pre when generating DITA and literallayout when generating DocBook.
  • When converting a DOCX file to semantic XHTML, new parameter remove-styles.preserved-classes may be used to preserve some of the classes (e.g. c-Code, p-Note, etc) used to style the elements found in the intermediate, automatically generated, styled XHTML document.

    Moreover specifying both parameters prune.preserve and remove-styles.preserved-classes is currently the only way to keep in the generated semantic XHTML empty paragraphs having a given MS-Word style. For example, specifying -p prune.preserve p-PlaceHolder and -p remove-styles.preserved-classes p-PlaceHolder may be used to keep in the semantic XHTML output all empty paragraphs having the p-PlaceHolder style.

  • The conversion to DITA may now generate some DITA 1.3 elements and attributes, for example: equation-block, equation-inline, mathml, line-through, entry/@rotate.

Bug fixes:

  • DOCX to styled HTML: fixed a couple of bugs related to numbering.
  • In some cases, option transform.generate-xref-text=yes (the default value) generated "???" (e.g. "See example ???.") rather than useful hyperlink text link "above" or "below" (e.g. "See example below.").
  • Specifying parameters split.use-id-as-filename=true and webhelp.use-id-as-filename=true caused w2x to generate files having incorrect names when the input DOCX had duplicate bookmarks or when it had bookmarks containing the '.' character.
  • In some cases, changing the style of the footnote number automatically created by MS-Word caused w2x to raise a NullPointerException.

1.1 (March 15, 2016)

It's now possible to convert a DOCX document to the following styled HTML formats (that is, XHTML+CSS):

Files generated this way look like the source DOCX document. Previously the only way to generate Web Help or EPUB was to first convert the source DOCX document to DITA or DocBook (semantic XML) and then to convert the intermediate DITA or DocBook files to Web Help or EPUB using external tools such as DITA Open Toolkit, XMLmind DITA Converter, DocBook XSL stylesheets. However in such case, the generated Web Help or EPUB does not look like the source DOCX document.

Note that a frameset is automatically generated along the multi-page styled HTML pages. While an obsolete HTML feature, a frameset makes it easy browsing these HTML pages. Moreover the table of contents used as the left frame is a convenient way to programmatically list all the generated HTML pages. Example: excerpts from w2x_install_dir/doc/manual/manual-TOC.html:

...
<body>
<p class="toc-entry-0"><a href="manual-0.html" target="contentFrame">XMLmind Word To XML Manual</a></p>
<p class="toc-entry-1"><a href="manual-1.html" target="contentFrame">Contents</a></p>
<p class="toc-entry-1"><a href="intro.html" target="contentFrame">1 Introduction</a></p>
<p class="toc-entry-1"><a href="install.html" target="contentFrame">2 Installing w2x</a></p>
<p class="toc-entry-2"><a href="distribution.html" target="contentFrame">2.1 Contents of
the installation directory</a></p>
...

How does this work?

In order to generate these 3 new formats, we need to automatically split the source DOCX document into parts. A new part is created each time a paragraph having an outline level less than or equal to specified split-before-level parameter is found in the source. An outline level is an integer between 0 (e.g. style Heading 1) and 8 (e.g. style Heading 9). The default value of parameter split-before-level is 0, which means: for each Heading 1, create a new page starting with this Heading 1.

Example: for each Heading 1 and Heading 2, create a new page (out/manual-1.html, out/manual-2.html, ..., out/manual-N.html) starting with this Heading 1 or Heading 2:

w2x -p split.split-before-level 1 -o frameset manual.docx out/manual.html

Important tip

Generating any of these 3 new formats should work great if, for the DOCX document to be converted, you can use MS-Word's "References > Table of Contents" button to automatically create a table of contents. Note that the source DOCX document is not required to have a table of contents, but MS-Word should allow to automatically create a good one. In other words, automatically creating a table of contents using MS-Word is the best way to check that your outline levels are OK.

Other enhancements:

  • When a DOCX document is converted to styled HTML of any kind (as opposed to semantic XML), the generated processing instructions are now automatically removed and all the footnotes and endnotes are now automatically given a number. If you don't want this to happen, pass parameters -p edit.do.remove-pis "" and -p edit.do.number-footnotes "" to w2x.
  • New parameter -p edit.finish-styles.custom-styles-url-or-file CSS_URL_OR_FILE makes it easy customizing the CSS styles used by the generated styled HTML pages. The custom CSS styles found in file CSS_URL_OR_FILE are simply appended to the automatically generated CSS styles.
  • New parameter -p convert.lower-case-resource-names yes (default value: no) is needed to keep quiet epubcheck on platforms where filenames are case-sensitive (e.g. Linux). Not for general use.

Bug fixes:

  • w2x-app: added a workaround for an Apple Java bug which caused any scrolled window to become garbled when scrolling quickly. This bug seems to be specific to Apple Java and to non-Retina Macs running El Capitan.

1.0.0_01 (December 4, 2015)

Bug fix: a span class=role-tabs having a negative X coordinate caused expand-tabs.js to loop forever.


1.0.0 (November 17, 2015)

First version of the commercial product.

Enhancements:

  • Text runs aligned on tab stops are now processed as follows:
    • When generating XHTML+CSS, some JavaScript™ code is added to the output file. This code computes and gives a width to all <span class="role-tab">. This allows to decently emulate tab stops in any modern Web browser.

      If you don't want this code to be added to the output file, pass option -p edit.do.expand-tabs "" to w2x.

    • When generating semantic XHTML and all the other semantic XML formats (DocBook, DITA, etc), it's now possible to convert consecutive paragraphs containing text runs aligned on tab stops to a borderless table.

      However because, in the general case, it's not possible to emulate tab stops using tables, this XED script is disabled by default. If you really want to emulate tab stops using tables, pass option -p edit.convert-tabs.to-table yes to w2x.

    Note that the alignment of a tab stop (right, center, etc) is ignored. That is, the text run is always considered to be left aligned.

  • DOCX files using the "Strict Open XML Document" format are now supported. DOCX files using this format conforms to the Strict profile of the Open XML standard (ISO/IEC 29500). This profile of Open XML doesn't allow a set of features that are designed specifically for backward-compatibility with existing binary documents, as specified in Part 4 of ISO/IEC 29500.
  • Tested XMLmind Word To XML against the DOCX files created using MS-Word 2016.
  • Desktop application w2x-app now works fine on computers having very high resolution (HiDPI) screens. For example, it now works fine on a Mac having a Retina® screen and a Windows computer having an UHD (“4K”) screen. On Windows, all DPI scale factors —100%, 125%, 150%, 200%, etc— are supported.

    On a Linux computer having a HiDPI screen, HiDPI is not automatically detected. You'll have to to specify the display scaling factor you prefer using the -putpref command-line option. Example: w2x-app -putpref displayScaling 200.


1.0.0-beta04 (September 8, 2015)

Enhancements:

  • The “Word To XML” servlet now provides the user with a minimal work in progress feedback during the execution of a lengthy conversion.

Bug fixes:

  • Added more DOCX files coming from different origins to the test suite of the XMLmind Word To XML. Had to slightly modify the software to cope with some specificities of these DOCX files.
  • XMLmind Word To XML add-on for XMLmind XML Editor: a user preferring to use the native file chooser on Windows or on the Mac forced the add-on to also use the native file chooser. Using the native file chooser in the context of the add-on is not convenient as this prevents the file filters specified by the add-on (DOCX, TXT, XML, DITA, etc) from working.

1.0.0-beta03 (July 13, 2015)

New “Word To XML” servlet is a Java™ Servlet (server-side standard component) which has the same functions as the w2x-app desktop application.

The “Word To XML” servlet comes in a software distribution of its own: w2x_servet-1_0_0_beta03.zip. This distribution contains a ready-to-deploy binary w2x.war, as well as the full Java™ source code of the servlet.

More information.


1.0.0-beta02 (May 6, 2015)

  • New graphical application w2x-app should be easier to use than the w2x command-line utility.
  • New application w2x-app is also available as an add-on for XMLmind XML Editor. This add-on adds an "Import DOCX" item to the File menu. The "Import DOCX" menu item displays a non modal dialog box almost identical to w2x-app. XML output files created using the "Import DOCX" dialog box are automatically opened in XMLmind XML Editor.

    This add-on is compatible with XMLmind XML Editor v6.3+. In order to install it, please follow the instructions found in XMLmind Word To XML Manual, Installing the "Word To XML" add-on.

  • Added parameter edit.headings.convert which allows to easily convert to h1, h2, ..., h6 headings paragraphs not having a outline level property.

1.0.0-beta01 (March 30, 2015)

First public release.


© 2017-2024 XMLmind Software. Updated on 2024/9/16.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Acrobat and PostScript are trademarks of Adobe Systems Incorporated.