The Convert step does not support the following MS-Word features.
By “does not support”, we mean that w2x will not generate something useful corresponding to such features. We don’t mean that using such features in a DOCX file would cause w2x to fail or to generate invalid XML documents.
lang
attribute).When a DOCX file contains revision info (i.e. "Track Changes"), w2x implements its own, automatic, very crude, interpretation of "Accept All Changes". That's why, a warning is issued informing the user that she/he would better use MS-Word to manually accept or reject the tracked changes before submitting the DOCX file to w2x.
The Convert step generates XHTML+CSS documents having the following specificities:
<span class="role-tab"> </span>
. See About tab stops.meta
equivalent are given names starting with “ms-
”. Example:<meta content="Hussein Shafie" name="ms-cp-lastModifiedBy" />
-ms-
” prefix. Example:.p-Heading3 { -ms-outlineLvl: 2; color: #4F81BD; font-family: Cambria; ...
<?break-page?>
. Column breaks are translated to <?break-column?>
. End of sections are signaled by <?end-of-section?>
.Conversion from OpenXML math to MathML is implemented by an XSLT 1.0 stylesheet called omml2mml.xsl
coming from open source project XSL stylesheets for TEI XML. If you think you have access to a better XSLT stylesheet than open source omml2mml.xsl
, then you may use it by specifying environment variable (or Java™ system property) W2X_MATH_CONVERTER_XSLT
. Example:
set W2X_MATH_CONVERTER_XSLT=C:\Users\john\My better omml2mml.xsl
<?field code?>
having a <span class="role-field">
parent. Example:<span class="role-field"> <?field DATE \@ "MMMM d, yyyy" \* MERGEFORMAT ?> August 27, 2014 </span>
<?begin-smartTag tag?>
and <?end-smartTag tag?>
. Example:<?begin-smartTag {urn:schemas-microsoft-com:office:smarttags}PersonName#0?> <?begin-smartTag {urn:schemas:contacts}GivenName#1?> Bill <?end-smartTag {urn:schemas:contacts}GivenName#1?> <?begin-smartTag {urn:schemas:contacts}Sn#2?> Gates <?end-smartTag {urn:schemas:contacts}Sn#2?> <?end-smartTag {urn:schemas-microsoft-com:office:smarttags}PersonName#0?>
<?begin-sdt control_id?>
and <?end-sdt control_id?>.
Example:<?begin-sdt comboBox#6?> <td class="tc-TableGrid--bb tc-TableGrid" style="padding-bottom: 7.2pt; padding-left: 7.2pt; padding-right: 7.2pt; padding-top: 7.2pt;"> <p class="tp-TableGrid p-Normal" lang="fr-FR"> <span class="c-PlaceholderText">Choose an item.</span> </p> </td> <?end-sdt comboBox#6?>
Unfortunately, this will always be the case because w2x never examines the characters actually contained in a text span having <w:lang w:eastAsia="ja-JP" w:val="en-US"/>
to determine whether this text span is written in ja-JP
or is written in en-US
or is written is a mix of both languages.
However, a partial workaround for this limitation is to specify for example –p convert.set-lang ja-JP
or –p convert.default-lang ja-JP
. When parameter convert.set-lang or parameter convert.default-lang is set to a language code starting with ja
, zh
or ko
, then it is attribute w:lang/@w:eastAsia
which is used to determine the language of a text span and not attribute w:lang/@w:val
.
Note that –p convert.default-lang ja-JP
is just used as a hint to favor attribute w:lang/@w:eastAsia
over attribute wlang/@w:val
. Given the way MS-Word sets these two attributes, using parameter –p convert.default-lang ja-JP
will not cause a vastly incorrect detection of the language when converting a German DOCX file for example.
indexterm
elements having index-sort-as
children and DocBook indexterm
/primary
, secondary
, tertiary
elements having sortas
attributes. For this to happen, the input DOCX file must contain XE
(index entry) fields having \y "yomi"
(first phonetic character for sorting indexes) field arguments.Unlike MS-Word which considers \y "yomi"
only for East Asian languages, w2x uses this XE
field argument to sort the index entries whatever the language of the document. English examples: {XE "<span>" \y "span"}
, {XE "Operation:+" \y ":Addition"}
.