Home|News|Products|Store|Contact | ||
XMLmind Word To XML | ||
Enhancements:
w2x-app
.)Bug fixes:
Enhancements:
xml:lang
attribute is automatically added to elements
having a lang
attribute. Note that is does not happen when
generating “semantic” XHTML 5.0 output. However this feature
may be controlled using new boolean transform.add-xml-lang
parameter.Excerpts from XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition), C.7. The
lang
andxml:lang
AttributesUse both the
lang
andxml:lang
attributes when specifying the language of an element. The value of thexml:lang
attribute takes precedence.
Excerpts from XHTML™ 1.1 - Module-based XHTML - Second Edition, 3. The XHTML 1.1 Document Type
This specification also adds the
lang
attribute to the I18N attribute collection as defined in [XHTMLMOD]. Thelang
attribute is defined in [HTML4]. When this attribute and thexml:lang
attribute are specified on the same element, thexml:lang
attribute takes precedence. When bothlang
andxml:lang
are specified on the same element, they SHOULD have the same value.
Excerpts from HTML Living Standard, The
lang
andxml:lang
attributesThe
lang
attribute in the XML namespace may be used on HTML elements in XML documents, as well as elements in other namespaces if the relevant specifications allow it (in particular, MathML and SVG allowlang
attributes in the XML namespace to be specified on their elements). If both thelang
attribute in no namespace and thelang
attribute in the XML namespace are specified on the same element, they must have exactly the same value when compared in an ASCII case-insensitive manner.Authors must not use the
lang
attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang
" on HTML elements in HTML documents, but such attributes must only be specified if alang
attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner.
w2x-app
.)Bug fixes:
transform.docbook-version
was not honored
when generating a DocBook 5.1+ assembly
: the
assembly
element had no version
attribute and
all the generated topic
elements had a version
attribute set to fixed value "5.1
".assembly
, the
assembly
/structure
element now contains a
merge
child element rather than an info
child
element. This change was made after reading the clarifications found in
DocBook
5.2: The Definitive Guide.w2x
.Enhancements:
corporate
layout in addition to the classic
and simple
layouts.w2x-app
.)Bug fixes:
w2x
; SVG images or images automatically
converted to SVG (e.g. WMF images) cannot be cropped by
w2x
. “Outcropped pictures”, that is, adding a margin around a picture using MS-Word Crop tool, are supported too.
TOCHeading
containing "Table of
Contents", followed by the table of contents automatically
generated by MS-Word, followed by a Heading1
containing
"Introduction" were converted to (semantic
XHTML) "<section><h1>Table of
Contents</h1><p>Introduction</p>...". This quite
common sequence of paragraphs is now converted to "<p>Table of
Contents</p><section><h1>Introduction</h1>...".w2x
to generate invalid DocBook and
DITA files.Enhancements:
lib/resolver.jar
) by
the XMLResolver
(lib/xmlresolver.jar
; version 5.1.1).w2x-app
.)Bug fixes:
bin/w2x
and bin/w2x-app
shell scripts
did not use the bundled private Java™ runtime on on Macs having an
Apple® silicon processor.Enhancements:
w2x-app
: Moreover, the mapping type of a paragraph style — "Paragraph", "Paragraph N to1", "Paragraph N to pre" or "Paragraph 1 to pre" — is now automatically suggested by this dialog box. However the heuristics used to implement the suggestions being rather conservative, the suggestion will be generally "Paragraph".
w2x-app
.w2x-app
.)Bug fixes:
w2x-app
: when converting DOCX to a
DITA map or DocBook assembly, by default, the directory containing
automatically generated topics was set to a file path
(-p transform2.topic-path %{~nO}_files
)
and not to an URI
(-p transform2.topic-path %{~no}_files
).
This caused w2x-app
to create invalid maps and assemblies
when, for example, the name of the DOCX file to be converted contained
whitespace.Enhancements:
Bug fixes:
Enhancements:
XMLmind Word To XML is not yet supported on Macs having an Apple M1 (ARM-based) processor despite the fact that a “native” OpenJDK and a “native” OpenJFX are now available for this platform. We plan to provide official support and a
.dmg
distribution for this platform before the end of this year.
Bug fixes:
INCLUDEPICTURE \d "..\\Pictures\\My Picture.png" \* MERGEFORMATINET
contained
in "C:\Users\John\Documents\My Document.docx
", was
processed as if it were
INCLUDEPICTURE \d "C:\\Users\\John\\Pictures\\My Picture.png" \* MERGEFORMATINET
Now such relative paths are just converted to relative URIs —to make it simple, kept as is— by XMLmind Word To XML.
If you plan to run the "Word To XML" servlet on Apache Tomcat version 10, please first read this important note as there is a major breaking change between latest version of Tomcat (>= 10) and older versions (<= 9).
Enhancements:
w2x-app
: useNativeFileChooser
preference has been
enabled (which is not the case by default), the size of the file
chooser dialog box is now persistent across conversion
sessions.Bug fixes:
Enhancements:
Bug fixes:
Possible incompatibilities:
edit.finish-styles.mathjax-url
from (obsolete) https://cdn.mathjax.org/mathjax/latest/MathJax.js
to https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/MathJax.js
.Enhancements:
w2x -o assembly mydoc.docx out/mydoc.xml
generates
in the out/
directory DocBook 5.1 assembly file
mydoc.xml
and also all the topic files referenced by this
assembly.
w2x-app
: If, for any reason,
you prefer to use the “system” Look & Feel, please
start w2x-app
by running
w2x-app -putpref lookAndFeelClassName fallback
This setting is done once for all. If after doing that, you finally prefer to revert to FlatLaf, simply run
w2x-app -putpref lookAndFeelClassName
Enhancements:
wh-responsive-ui
.wh-ui-language
may be used to specify the language
used by the messages of the generated Web Help (tab labels,
button tool tips, etc). The default is to use the language
of the Web browser.w2x-app
: added a user preference
called useNativeFileChooser
which instructs the
w2x-app
desktop application to display the native file
chooser in preference to the multi-platform file chooser. This option is
turned off by default because file extension filters are not supported
when the native file chooser is invoked by Java™.Bug fixes:
<?break-page?>
which
should be found between two list items was inserted after their list
parent.body
element was not specified.
This is a problem with some Web browsers for which this color is by
default a light gray.Now the page color found in the DOCX file is
used to set the background color of the body
element. When
this page color is not specified, the background color of the
body
element is set to white.
Incompatibilities:
webhelp.wh-jquery-css
,
webhelp.wh-jquery-custom-theme
,
webhelp.wh-jquery-theme
,
webhelp.wh-jquery-ui
.Enhancements:
This may have important benefits when converting a DOCX file to multiple semantic XML files (e.g. a DITA map and its associated topics) because in such case, the names of the generated files are generally inferred from user-specified bookmarks.
In order to implement this enhancement, we had to
replace parameter edit.ids.automatic-ids
by new parameter
convert.automatic-ids
. This has been done to move the
detection of automatic bookmarks at an earlier stage of the conversion
process.
Bug fixes:
REF (Foo) \h
,
where the bookmark name "(Foo)
" contains
characters U+FF08
and U+FF09
) were
converted to broken links. This bug may have had important and somewhat
surprising consequences, for example, on generated DITA maps, so please
upgrade to version 1.8.XE "XML" \r "OpenXMLPageRange"
) in the DOCX
document was correctly converted to an DITA or DocBook index term range
(e.g. <indexterm
start="OpenXMLPageRange">XML</indexterm>
and
<indexterm end="OpenXMLPageRange"/>
), but
subsequent index entries marking the same page range (e.g. field
XE "Extensible Markup
Language" \r "OpenXMLPageRange"
) were converted to single
point index terms (e.g. <indexterm>Extensible Markup
Language</indexterm>
). Now subsequent index
entries marking the same page range are converted to index terms
containing a redirection to the first index term referencing of this
page range (e.g. <indexterm>Extensible Markup
Language<index-see>XML</index-see></indexterm>
).
Incompatibilities:
edit.ids.automatic-ids
is now ignored after
reporting a warning. This parameter has been replaced by new parameter
convert.automatic-ids
. See above enhancement.resolution -300
means: use 300DPI if
intrinsic resolution is less than 300DPI):.emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300
Previously
this rule was (resolution 0
means: use intrinsic
resolution)
.emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution 0
Enhancements:
w2x
and desktop application
w2x-app
now support plugins.A plugin is
simply a text file having a ".w2x_plugin
" suffix,
containing a number of w2x
command-line arguments and
starting with comment lines containing information about the plugin (for
example, its name). Example, rss.w2x_plugin
:
### plugin.name: rss ### plugin.outputDescription: RSS 2.0 ### plugin.outputExtension: xml ### plugin.multiFileOutput: no -c -e w2x:xed/main.xed -t rss.xslt # Image files not useful here. -step:com.xmlmind.w2x.processor.DeleteFilesStep:cleanUp -p cleanUp.files "%{~pO}/%{~nO}_files"
This plugin converts DOCX to RSS. This process
is partly implemented by XSLT 1.0 stylesheet rss.xslt
which
is part of this plugin. Stylesheet rss.xslt
transforms its
input, the semantic XHTML 1.0 Transitional file created by the Edit step
(-e w2x:xed/main.xed
), to RSS.
Aside XSLT 1.0
stylesheets, a plugin may also include XED scripts as well
as ".jar
" files containing custom
conversion steps implemented in Java™.
A plugin is
registered with w2x
and w2x-app
by copying all
its files anywhere inside directory
w2x_install_dir/plugin/
. However it's strongly
recommended to group all the files comprising a plugin in a subdirectory
of its own having the same name as the plug-in (e.g.
w2x_install_dir/plugin/rss/
). Alternatively, this
plugin may be installed anywhere you want provided that the directory
containing the ".w2x_plugin
" file is referenced in the
W2X_PLUGIN_PATH environment variable. Example:
set W2X_PLUGIN_PATH=C:\Users\John\w2x\rss;C:\temp\w2x_plugins
.
Once registered with w2x, a plugin may be invoked as it were a stock conversion, for example:
w2x -o rss my.docx my.xml
In
w2x-app
, you'll find the registered plugins in the
"Convert to" combobox and in the
"Output format" screen of the setup assistant.
Bug fixes:
w2x-app
: a user converting a DOCX file to a multi-file
output format (e.g. Web Help), choosing the default output directory
(which is the directory containing the input DOCX file) and choosing by
mistake to make the output directory empty before proceeding with the
conversion ended up deleting the input DOCX file.w2x-app
displayed a blank
window when the computer running w2x-app
was not connected
to the Internet.Enhancements:
XE "XML" \r "OpenXMLPageRange"
) in the DOCX
document are now supported when generating DITA and DocBook
documents.Enhancements:
-p edit.ids.automatic-ids regex_pattern
lets the
user specify which bookmarks automatically generated by MS-Word
("_GoBack
", "_Toc123
", etc) are to be
preserved by the conversion process, and this, even when such bookmarks
are not referenced anywhere in the generated document.
XE "XML" \r "OpenXMLPageRange"
). This kind of index
entries is currently not supported by XMLmind Word To XML. Only simple
index entries marking a single location in the DOCX document are
currently supported.Bug fixes:
ch
)
caused XMLmind Word To XML to generate incorrect lists.Note that we
are not 100% sure that this bug is really fixed now. Unfortunately, the
behavior of MS-Word when it comes to processing
w:ind/@w:xxxChars
attributes is not completely
documented in the "ECMA-376, Office Open XML File
Formats" specification.
Normal +
" caused XMLmind
Word To XML to enter an endless loop.fn
(footnote)
elements having an id
attribute. The bug is that these
fn
elements were never referenced by an
<xref type="fn">
(footnote call). The consequence was
that these footnotes were automatically discarded when converting to
HTML, PDF, DOCX, etc, the DITA files created by w2x.Enhancements:
sup
,
sup
, small
, big
, s
,
u
, tt
, b
, i
.
Alternate element names may be specified using the following parameters:
inlines.sup-element
, inlines.sup-element
,
inlines.small-element
, inlines.big-element
,
inlines.s-element
, inlines.u-element
,
inlines.tt-element
, inlines.b-element
,
inlines.i-element
. Example 1: generate code
rather than tt
elements:
-p edit.inlines.tt-element "code"
. Example 2: do not
generate small
elements:
-p edit.inlines.small-element "span style='font-size:x-small'"
(notice how one or more attributes may be specified too).This facility is useful only when generating semantic XHTML and all formats based on semantic XHTML. Using it when generating DITA or DocBook may give poor results.
meta
elements but author
,
description
, dcterms.*
are automatically
suppressed from the semantic XHTML 1.0 Transitional document generated
by the Edit step and used as an input by the Transform
step.If you want to keep some or all the meta
elements
in this intermediate semantic XHTML 1.0 Transitional document, you may
now specify -p edit.metas.keep regexp_matching_meta_name
.
Examples: -p edit metas.keep '.*'
keeps all metas;
-p edit metas.keep '^dc.'
keep all metas having a name
starting with "dc.
" (e.g.
<meta name="dc.subject" content="..."
/>
).
indexterm
elements having index-sort-as
children and DocBook
indexterm
/primary
, secondary
,
tertiary
elements having sortas
attributes.
For this to happen, the input DOCX file must contain XE
(index entry) fields having \y "yomi"
(first
phonetic character for sorting indexes) field arguments. Unlike
MS-Word which considers \y "yomi"
only for East
Asian languages, w2x uses this XE
field
argument to sort the index entries whatever the language of the
document. English examples: {XE "<span>" \y "span"}
,
{XE "Operation:+" \y ":Addition"}
.
Bug fixes:
Unfortunately, this will always be the case
because w2x never examines the characters actually
contained in a text span having
<w:lang w:eastAsia="ja-JP" w:val="en-US"/>
to
determine whether this text span is written in ja-JP
or is
written in en-US
or is written is a mix of both
languages.
However, a partial workaround for this limitation is to
specify for example -p convert.set-lang ja-JP
or
-p convert.default-lang ja-JP
. When parameter
convert.set-lang
or parameter
convert.default-lang
is set to a language code starting
with ja
, zh
or ko
, then it is
attribute w:lang/@w:eastAsia
which is used to determine the
language of a text span and not attribute
w:lang/@w:val
.
Note that
-p convert.default-lang ja-JP
is just used as a
hint to favor attribute w:lang/@w:eastAsia
over
attribute wlang/@w:val
. Given the way MS-Word sets these
two attributes, using parameter
-p convert.default-lang ja-JP
will not cause a
vastly incorrect detection of the language when converting a German DOCX
file for example.
lang="x-NONE"
. Value "x-NONE
" is
invalid.p-Title
and
p-Subtitle
styles (to make it simple; see parameters
edit.title.title-style-names
and
edit.title.subtitle-style-names
) are now converted
to equivalent semantic XHTML elements.In the previous versions of
XMLmind Word To XML, such titles were converted only to
head
/title
and to
head
/meta name="description"
which made them
invisible to the user (though usable by programs such as the XSLT
stylesheets generating DITA or DocBook).
This feature can be controlled by specifying the following new parameters:
edit.title.keep-title
. Default value when
generating semantic XHTML: "yes"
. Default value when
generating DITA and DocBook: "no"
.edit.title.title-container
. Default value:
"h1 class='role-document-title'"
. Ignored when
edit.title.keep-title
is "no"
.edit.title.subtitle-container
. Default value:
"p class='role-document-subtitle'"
. Ignored when
edit.title.keep-title
is "no"
.epub.omit-toc-root
(default value: "no"
). Web Help formats: added parameter
webhelp.omit-toc-root
(default value: "no"
).
By default, the Table of Contents (TOC) generated for an
EPUB or Web Help document has a single “root”. This single
root always points to the page containing the title, subtitle, author,
etc, of the document. Setting this parameter to "yes"
prevents the generated TOC from having such single
root.
window.sessionStorage
rather than cookies to store the internal state of the Web Helps it
generates.if the Java™ runtime used to run
w2x-app
is older than version 1.8.0_40, the system
Web browser rather than the dedicated, embedded Web browser is used to
display the Web Help, which is much less convenient for the
user.
Please do not use new Java 9 to run the programs which are part of XMLmind Word To XML. XMLmind Word To XML has not yet been tested against this version of Java.
Enhancements:
w2x
option
-p webhelp.wh-layout simple
to give it a try.w2x-app
:c-DefinitionChar
"). This is really needed when
you give for example Japanese names to your custom MS-Word
styles.edit.remove-styles.preserved-classes
now
accepts class patterns as well as class names. For example,
specify
-p edit.remove-styles.preserved-classes "^(t|(tr)|(tc)|(tp)|p|(pn)|n|c)-.+$"
if you want to preserve in the semantic XHTML the class names
corresponding to all the CSS styles generated during the Convert
step.<w:vanish/>
) are now converted
to <span style="display:none">
. When generating
semantic XML, these invisible span
elements are then
discarded.params
servlet parameter which allows to augment or to
override some of the options of the conversion specified by the
conv
servlet parameter. Example: curl -s -S -o manual.epub \ -F "docx=@manual.docx;type=application/vnd.openxmlformats-officedocument.wordprocessingml.document" \ -F "conv=epub" \ -F "params=-p epub.identifier urn:x-mlmind:w2x:manual -p epub.split-before-level 8" \ http://localhost:8080/w2x/convert
.dmg
distribution including a private Java™ 1.8.0_152
runtime.Bug fixes:
error in action "group": missing attribute "g:container" for element .../html:p[NN].
role-bridgeheadI
was added to li
elements.XE
" (index entry) were not normalized
to upper-case. For example, this bug could cause some index entries to
be missing in the generated semantic XML.com.xmlmind.w2x_ext.emf2png.EMF2PNG
to convert WMF to PNG
despite the fact that this factory supports the WMF format in addition
to the EMF format.w2x
to generate an invalid XHTML table having no cells at
all.Enhancements:
index.html
file (if an index.html
file does
not already exist). This feature is controlled using parameter
webhelp.add-index
. The default value of this parameter is
yes
.-liststeps
to the w2x
command-line utility. When this option is specified, w2x
lists all the conversion steps to be executed and then exits. This
option is useful to determine how to customize the conversion steps.
Example:$ w2x -o bookmap -liststeps -step:com.xmlmind.w2x.processor.ConvertStep:convert -p convert.create-mathml-object no -p convert.set-column-number yes -step:com.xmlmind.w2x.processor.EditStep:edit -p edit.xed-url-or-file file:/opt/w2x/xed/main.xed -step:com.xmlmind.w2x.processor.TransformStep:transform -p transform.out-file %{~pnO}.dita -p transform.single-topic no -p transform.xslt-url-or-file file:/opt/w2x/xslt/topic.xslt -step:com.xmlmind.w2x.processor.TransformStep:transform2 -p transform2.xslt-url-or-file file:/opt/w2x/xslt/bookmap.xslt -p transform2.topic-type %{transform.topic-type} -p transform2.output-path %{~po} -step:com.xmlmind.w2x.processor.DeleteFilesStep:cleanUp -p cleanUp.files %{~pnO}.dita
convert.resource-prefix
which is useful
when used in conjunction with convert.resource-directory
and when several files generated by w2x share the same resource
directory.Bug fixes:
convert.resource-directory
as
".
" in order to create all image files in the same
directory as the other automatically generated files did not work.w2x doc.docx /home/john/doc.html
). Now the resource
directory is made empty if and only if it's the “automatic”
output_file_basename_files/
folder, which is
at the same time safe and convenient.Other changes:
Enhancements:
edit.ids.generate-section-ids
. Setting
this parameter to yes
(default value is no
)
ensures that all the sections found in the semantic XHTML resulting from
the conversion of a DOCX file have a unique ID.When this ID is
missing, it is computed using the content of the h1
,
h2
, ..., h6
heading which is the first child
of the section. Example:
<div class="role-section2" id="Title_of_this_section"> <h2>Title of this section</h2> ...
The maximum length of the automatically computed ID may be
specified using parameter edit.ids.section-id-max-length
.
The default value of this parameter is 32.
Setting
edit.ids.generate-section-ids
to yes
is
especially useful when converting a DOCX file to a DITA map or bookmap.
With this parameter, the filenames of the topics referenced by the
generated map are guaranteed to have meaningful values (e.g.
"Introduction.dita
" rather than
"d0e35.dita
").
shortdesc-class-name
to
W2X_install_dir/xslt/topic.xslt
, the XSLT stylesheet
which is used to convert intermediate semantic XHTML document to a DITA
topic.This parameter is used to specify the class name of the XHTML
<p>
which acts as a short description of the section.
Examples: -p transform.shortdesc-class-name p-Shortdesc
,
-p transform.shortdesc-class-name p-Abstract
.
When this parameter is not specified (or is specified as the empty string which is its default value), the following style mapping, created by the w2x-app wizard:
-p edit.blocks.convert "p-Shortdesc p class='p-Shortdesc'" ... <xsl:template match="h:p[@class='p-Shortdesc']"> <shortdesc> <xsl:call-template name="processCommonAttributes"/> <xsl:apply-templates/> </shortdesc> </xsl:template>
causes DITA <shortdesc>
elements to generated inside topic bodies, which is
invalid.
After specifying
-p transform.shortdesc-class-name p-Shortdesc
, this issue
is fixed and DITA <shortdesc>
elements are generated
before topic bodies.
-p transform.pre-element-name codeblock
(default value
being pre
).Bug fixes:
programlisting
, DITA
pre,
XHTML pre
, etc.StringIndexOutOfBoundsException
.edit.convert-tabs.to-table
set to no
(the
default value), attribute class="role-tabs-XXX"
and
elements <span class="role-tab">
were not
discarded.Not only this markup is not useful, but it also prevented
some style mappings created the w2x-app wizard from working. Example,
the following style mapping of MS-Word paragraph style Note
to a DITA element <note>
:
-p edit.blocks.convert "p-Note p class='p-Note'" ... <xsl:template match="h:p[@class='p-Note']"> <note> <xsl:call-template name="processCommonAttributes"/> <xsl:apply-templates/> </note> </xsl:template>
failed for the following paragraph (intermediate semantic XHTML preceding the transformation to DITA):
<p class="role-tabs-35.45-0-117 p-Note">Note: <span class="role-tab"> </span>Body of the note here.</p>
Incompatibilities:
w2x_all.jar
, the self-contained JAR file, is no longer
used by the following scripts: bin/w2x
,
w2x.bat
, w2x-app
, w2x-app-c.bat
.
This prevented advanced users from easily modifying the scripts found in
subdirectories xed/
and xslt/
. This
self-contained JAR file is still available but its use should be
reserved to embedding w2x in a third-party application.Enhancements:
W2X_IMAGE_CONVERSIONS
. The default value of this variable
is (all specifications on a single line):.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory; .tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
On
Windows, the default value of W2X_IMAGE_CONVERSIONS
is
(all specifications on a single line):
.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory; .emf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution 0; .tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
This image converter executes an external program to perform the conversion.
Examples of
W2X_IMAGE_CONVERSIONS
specifications (see
above): convert EMF to SVG using OpenOffice/LibreOffice:
.emf.svg soffice --headless --convert-to svg -–outdir %~po %i
Convert EMF/WMF to PNG using ImageMagick:
.emf.png.wmf.png magick convert -density 288 "%I" -scale 25% "%O"
com.xmlmind.w2x_ext.emf2png.EMF2PNG
This image converter is available only on Windows. It leverages Windows own GDI+ to convert EMF (in fact, Windows metafiles of any kind, including WMF) to PNG.
This is not that great because, unlike
com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory
which converts WMF (Windows vector graphics format)
to SVG (standard vector graphics format),
EMF2PNG
converts a vector graphics format to a
raster image format. However, having EMF2PNG
is better than nothing at all.
w2x
no longer supports Internet
Explorer 8 and older versions.Bug fixes:
Enhancements:
w2x-app
has now a setup assistant
(AKA “wizard” style dialog box)
making it quick and easy creating w2x
option files. This
new setup assistant has a screen which may be used to map MS-Word
character and paragraph styles (e.g. p-CodeSample
) to XML
elements possibly having attributes (e.g. DITA
pre outputclass="code-sample"
).-o frameset_strict
), XHTML 1.0 Transitional
(-o frameset_loose
), XHTML 1.1
(-o frameset1_1
), XHTML 5
(-o frameset5
).-o webhelp_strict
), XHTML 1.0 Transitional
(-o webhelp_loose
), XHTML 1.1
(-o webhelp1_1
), XHTML 5
(-o webhelp5
).-o epub1_1
).<ops:switch>
), DITA and DocBook 5. When targeting
any other format, XMLmind Word To XML generates external files
containing MathML then adds elements pointing to these external
".mml
" files. XHTML 1 example: <object
data="doc_files/math-010.mml"
type="application/mathml+xml"/>
.The parameters related to
MathML support are: convert.create-mathml-object
,
edit.finish-styles.mathjax
(MathJax support).
edit.blocks.convert
called edit.blocks.convert-to-pre
. This new parameter is
best explained by comparing it to
edit.blocks.convert
.When using MS-Word, there two ways to represent code samples:
A sequence of Code1 paragraphs may be converted to an XHTML pre using:
–p edit.blocks.convert "p-Code1 span g:id='pre' g:container='pre'"
A Code2 paragraph may be converted to an XHTML pre using:
–p edit.blocks.convert-to-pre "p-Code2 pre"
transform.pre-element-name
may be used to
specify to which DocBook or DITA element, an HTML pre
element is to be converted. The default value of
transform.pre-element-name
is pre
when
generating DITA and literallayout
when generating
DocBook.remove-styles.preserved-classes
may be used to preserve
some of the classes (e.g. c-Code
, p-Note
, etc)
used to style the elements found in the intermediate, automatically
generated, styled XHTML document.Moreover specifying both parameters
prune.preserve
and
remove-styles.preserved-classes
is currently the only way
to keep in the generated semantic XHTML empty paragraphs having
a given MS-Word style. For example, specifying
-p prune.preserve p-PlaceHolder
and
-p remove-styles.preserved-classes p-PlaceHolder
may be
used to keep in the semantic XHTML output all empty paragraphs having
the p-PlaceHolder
style.
equation-block
,
equation-inline
, mathml
,
line-through,
entry/@rotate
.Bug fixes:
transform.generate-xref-text=yes
(the default value) generated "???" (e.g.
"See example ???.") rather than useful hyperlink text
link "above" or "below" (e.g. "See example
below.").split.use-id-as-filename=true
and
webhelp.use-id-as-filename=true
caused w2x to generate
files having incorrect names when the input DOCX had duplicate bookmarks
or when it had bookmarks containing the '.'
character.NullPointerException
.It's now possible to convert a DOCX document to the following styled HTML formats (that is, XHTML+CSS):
Files generated this way look like the source DOCX document. Previously the only way to generate Web Help or EPUB was to first convert the source DOCX document to DITA or DocBook (semantic XML) and then to convert the intermediate DITA or DocBook files to Web Help or EPUB using external tools such as DITA Open Toolkit, XMLmind DITA Converter, DocBook XSL stylesheets. However in such case, the generated Web Help or EPUB does not look like the source DOCX document.
Note that a frameset
is automatically generated along the multi-page styled HTML pages. While an
obsolete HTML feature, a frameset makes it easy browsing these HTML pages.
Moreover the table of contents used as the left frame is a convenient way to
programmatically list all the generated HTML pages. Example: excerpts from
w2x_install_dir/doc/manual/manual-TOC.html
:
... <body> <p class="toc-entry-0"><a href="manual-0.html" target="contentFrame">XMLmind Word To XML Manual</a></p> <p class="toc-entry-1"><a href="manual-1.html" target="contentFrame">Contents</a></p> <p class="toc-entry-1"><a href="intro.html" target="contentFrame">1 Introduction</a></p> <p class="toc-entry-1"><a href="install.html" target="contentFrame">2 Installing w2x</a></p> <p class="toc-entry-2"><a href="distribution.html" target="contentFrame">2.1 Contents of the installation directory</a></p> ...
How does this work?
In order to generate these 3 new
formats, we need to automatically split the source DOCX document into parts.
A new part is created each time a paragraph having an outline
level less than or equal to specified split-before-level
parameter is found in the source. An outline level is an integer between 0
(e.g. style Heading 1) and 8 (e.g. style Heading 9). The
default value of parameter split-before-level
is 0, which
means: for each Heading 1, create a new page starting with this
Heading 1.
Example: for each Heading 1 and
Heading 2, create a new page (out/manual-1.html
,
out/manual-2.html
, ..., out/manual-N.html
)
starting with this Heading 1 or Heading 2:
w2x -p split.split-before-level 1 -o frameset manual.docx out/manual.html
Important tip
Generating any of these 3 new formats should work great if, for the DOCX document to be converted, you can use MS-Word's "References > Table of Contents" button to automatically create a table of contents. Note that the source DOCX document is not required to have a table of contents, but MS-Word should allow to automatically create a good one. In other words, automatically creating a table of contents using MS-Word is the best way to check that your outline levels are OK.
Other enhancements:
-p edit.do.remove-pis ""
and
-p edit.do.number-footnotes ""
to w2x
.-p edit.finish-styles.custom-styles-url-or-file CSS_URL_OR_FILE
makes it easy customizing the CSS styles used by the generated styled
HTML pages. The custom CSS styles found in file
CSS_URL_OR_FILE
are simply appended to the
automatically generated CSS styles.-p convert.lower-case-resource-names yes
(default value: no
) is needed to keep quiet epubcheck
on
platforms where filenames are case-sensitive (e.g. Linux). Not for
general use.Bug fixes:
Bug
fix: a span class=role-tabs
having a negative X coordinate
caused expand-tabs.js
to loop forever.
First version of the commercial product.
Enhancements:
<span class="role-tab">
. This allows to decently
emulate tab stops in any modern Web browser.If you don't want
this code to be added to the output file, pass option
-p edit.do.expand-tabs ""
to w2x
.
However because, in the general case, it's
not possible to emulate tab stops using tables, this XED script is
disabled by default. If you really want to emulate tab stops using
tables, pass option -p edit.convert-tabs.to-table yes
to w2x
.
Note that the alignment of a tab stop (right, center, etc) is ignored. That is, the text run is always considered to be left aligned.
w2x-app
now works fine on computers
having very high resolution (HiDPI) screens. For example, it
now works fine on a Mac having a Retina® screen and a Windows computer
having an UHD (“4K”) screen. On Windows, all
DPI scale factors —100%, 125%, 150%, 200%, etc—
are supported.On a Linux computer having a HiDPI screen, HiDPI is
not automatically detected. You'll have to to specify the
display scaling factor you prefer using the -putpref
command-line option. Example: w2x-app -putpref displayScaling
200
.
Enhancements:
Bug fixes:
New
“Word To XML” servlet is a Java™ Servlet
(server-side standard component) which has the same functions
as the w2x-app
desktop application.
The “Word To
XML” servlet comes in a software distribution of its own:
w2x_servet-1_0_0_beta03.zip
. This distribution contains a
ready-to-deploy binary w2x.war
, as well as the full Java™
source code of the servlet.
w2x-app
should be easier to
use than the w2x
command-line utility.w2x-app
is also available as an
add-on for XMLmind XML Editor. This
add-on adds an "Import DOCX" item to the File menu. The
"Import DOCX" menu item displays a non modal dialog box almost
identical to w2x-app
. XML output files created using the
"Import DOCX" dialog box are automatically opened in XMLmind XML
Editor.This add-on is compatible with XMLmind XML Editor v6.3+. In order to install it, please follow the instructions found in XMLmind Word To XML Manual, Installing the "Word To XML" add-on.
edit.headings.convert
which allows to
easily convert to h1
, h2
, ..., h6
headings paragraphs not having a outline level property.First public release.