The “Word To XML” servlet is a Java™ Servlet (server-side standard component) which has the same functions as the w2x-app
desktop application.
Because it’s a server-side component and not a desktop application, please do not attempt to deploy the “Word To XML” servlet if you are an end-user of “Word To XML”. Please ask your IT personnel to do that for you.
The “Word To XML” servlet comes in a software distribution of its own: w2x_servet-1_12_0.zip
. This distribution contains a ready-to-deploy binary w2x.war
, as well as the full Java™ source code of the servlet.
w2x.war
src/
src/build.xml
src/
in order to use src/build.xml
to rebuild w2x.war
.w2x/
w2x.war
. Needed to rebuild w2x.war
.lib/
w2x.war
.File w2x.war
may be easily installed in any servlet container implementing at least the Servlet 2.3 standard. Example of such servlet containers: Apache Tomcat, Jetty, Caucho Resin.
About Apache Tomcat version 10 and above
Beware that there is a major breaking change between latest versions of Apache Tomcat (>= 10) and older versions (<= 9). This is documented in this migration article.
To make a long story short, if you need to deploy the “Word To XML” servlet on Tomcat version 10+, then you first must create a webapps-javaee/
folder next to TOMCAT_INSTALL_DIR/webapps/
then copy w2.war
to this TOMCAT_INSTALL_DIR/webapps-javaee/
.
Though copying file w2x.war
to the webapps/
folder of the servlet container and then restarting the servlet container is generally sufficient to deploy the “Word To XML” servlet, please refer to the documentation your servlet container to learn about the best deployment procedure.
On Windows, the .dll
files contained in w2x_servlet_deployment_dir\WEB-INF\lib\
must be copied to a directory referenced by the PATH
environment variable of the computer running the servlet.
The “Word To XML” servlet is configured by specifying a number of init-param
parameters. These parameters are found in WEB-INF/web.xml
, where folder WEB-INF/
is contained in w2x.war
.
All these init-param
parameters are documented in web.xml
. Example, parameter workDir
:
<!-- workDir ============================================================= Uploaded files and files generated during the conversion process are stored in temporary subdirectories of this directory. If specified directory does not exist, it will be created. Value: this directory and its contents must be readable and writable by the operating system account used to run the Word To XML servlet. Default: dynamic; supplied by the Servlet Container. ====================================================================== --> <init-param> <param-name>workDir</param-name><param-value></param-value> </init-param>
Let’s suppose your servlet container runs on host localhost
and uses 8080
as its port. In order to use the “Word To XML” servlet, please point your Web browser to http://localhost:8080/w2x/
. This will cause the browser to display a page containing a simple DOCX convert form.
In order to convert a DOCX file to another format:
.zip
(or .epub
) archive containing the result of the conversion. Generating this .zip
(or .epub
) file may take several seconds to several minutes depending on the size of the DOCX input file.If the name of the DOCX input file contains non-ASCII characters (e.g. accented characters), please make sure to use Zip extractor software supporting .zip
files having UTF-8 encoded filenames.
Note that most Zip extractor software do not support .zip
files having UTF-8 encoded filenames[1]. Such extractors will succeed in unpacking the .zip
file, but will generate files having incorrect names.
It’s also possible to use the conversion services of the “Word To XML” servlet by sending URL /w2x/convert
an HTTP POST
request having a multipart/form-data
encoding.
curl -s -S -o manual_docbook5.zip \ -F "docx=@manual.docx;type=application/vnd.openxmlformats-officedocument.wordprocessingml.document" \ -F "conv=docbook5" \ http://localhost:8080/w2x/convert
Other example:
curl -s -S -o manual.epub \ -F "docx=@manual.docx;type=application/vnd.openxmlformats-officedocument.wordprocessingml.document" \ -F "conv=epub" \ -F "params=-p epub.identifier urn:x-mlmind:w2x:manual -p epub.split-before-level 8" \ http://localhost:8080/w2x/convert
The conversion request has three emulated form fields:
docx
<input type=”file”>
field. Required. Contains the DOCX input file.conv
<input type=”text”>
field. Required. Contains the name of one of the conversion
N
.name
init-param
defined in WEB-INF/web.xml
.WEB-INF/web.xml
defines the following conversions to styled HTML:xhtml_css
(single page styled HTML), frameset
(multi-page styled HTML, split on Heading 1), frameset2
(multi-page styled HTML, split on Heading 1, 2), frameset3
(multi-page styled HTML, split on Heading 1, 2, 3), webhelp
(split on Heading 1), webhelp2
(split on Heading 1, 2), webhelp3
(split on Heading 1, 2, 3), epub
(split on Heading 1), epub2
(split on Heading 1, 2), epub3
(split on Heading 1, 2, 3)docbook
, docbook5
, topic
, map
, bookmap
, xhtml_strict
, xhtml_loose
, xhtml1_1
, xhtml5
.params
<input type=”text”>
field. Optional. Contains some w2x
command-line options, generally -p parameters. These options are appended to the options of the conversion specified in the conv
emulated form field.The response to a successful conversion request is a .zip
(or .epub
) archive containing the result of the conversion.
[1]However, “jar xvf converted.zip
” works fine. jar
is a command-line utility which comes with all Java Development Kits (JDK).
[2]curl is an open source command line tool and library for transferring data with URL syntax.