Chapter 13. Low-level method: embedding com.xmlmind.ditac.preprocess.PreProcessor

Advanced embedding method: first invoke a preprocessor which will generate intermediate .ditac files, then invoke the XSLT 2.0 engine in order to transform all these .ditac files.

This method consists in first invoking the PreProcessor Opens in new window in order to pre-process the DITA source files into a ditac_lists.ditac_lists file and one or more .ditac files; then invoking the Saxon Opens in new window XSLT 2.0 engine in order to transform all the .ditac files.
For some output formats, PDF, RTF, etc, the final third step consists in invoking an XSL-FO processor such as Apache FOP Opens in new window in order to convert the XSL-FO generated by the XSLT stylesheets to the desired output format.
The full source code of the Embed2 sample is found in Embed2.java Opens in new window.
  1. Invoke the ditac PreProcessor to pre-process the DITA source files into a ditac_lists.ditac_lists file and one or more .ditac files.
    1. Create and configure the PreProcessor.
      Console console = new Console() {
          public void showMessage(String message, MessageType messageType) {
              System.err.println(message);
          }
      };
      
      PreProcessor preProc = new PreProcessor(console);
      preProc.setChunking(Chunking.SINGLE);
      preProc.setMedia(Media.SCREEN);
      
      ResourceCopier resourceCopier = new ResourceCopier();
      resourceCopier.parseParameters("img");
      preProc.setResourceHandler(resourceCopier);
      • Console Opens in new window is a very simple interface. Implementing this interface allows to do whatever you want with the messages reported by a PreProcessor.
      • Specifiying preProc.setChunking(Chunking.SINGLE) Opens in new window allows to generate a single HTML page using a DITA map designed to generate multiple HTML pages.
      • A PreProcessor is not concerned about the exact output format. However its behaves differently depending on the target Media Opens in new window.
      • A PreProcessor handles to an ResourceHandler Opens in new window all the resource files, typically image files, referenced in the DITA source using relative URLs. An ResourceHandler is registered with a PreProcessor using method setResourceHandler Opens in new window.
        In the case of the Embed2 sample, we use the simplest possible ResourceHandler which is ResourceCopier Opens in new window.
    2. Pre-process the DITA source files.
      URL inFileURL = null;
      try {
          inFileURL = inFile.toURI().toURL();
      } catch (MalformedURLException cannotHappen) {}
      
      File[] preProcFiles = null;
      try {
          preProcFiles = preProc.process(new URL[] { inFileURL }, outFile);
      } catch (IOException e) {
          console.showMessage(e.toString(), Console.MessageType.ERROR);
      }
      if (preProcFiles == null) {
          return false;
      }
      The process Opens in new window method of a PreProcessor returns null if an error other than an IOException has caused the pre-processing to fail. When this is the case, errors messages are displayed on the Console.
      Note that a PreProcessor is not thread-safe. Each thread must own its PreProcessor. However, the process method of a PreProcessor may be invoked several times.
  2. Invoke the Saxon XSLT 2.0 engine, in order to transform all the .ditac files. Note that this is done using the standard JAXP Opens in new window API.
    1. Pass required system parameters to the XSLT stylesheets, in addition to the normal, user, parameters.
      String ditacListsURI = "";
      
      int count = preProcFiles.length;
      for (int i = 0; i < count; ++i) {
          File ditacFile = preProcFiles[i];
      
          if (ditacFile.getPath().endsWith(".ditac_lists")) {
              ditacListsURI = ditacFile.toURI().toASCIIString();
              break;
          }
      }
      
      String[] params = {
          "ditacListsURI", ditacListsURI,
          "xsl-resources-directory", "res",
          "use-note-icon", "yes",
          "default-table-width", "100%"
      };
      These required system parameters are:
    2. Use the Saxon XSLT 2.0 engine to create a TransformerFactory, then configure this TransformerFactory.
      private static
      TransformerFactory createTransformerFactory(URIResolver uriResolver, 
                                                  ErrorListener errorListener) 
          throws Exception {
          Class<?> cls = Class.forName("net.sf.saxon.TransformerFactoryImpl");
          TransformerFactory transformerFactory = 
              (TransformerFactory) cls.newInstance();
      
          ExtensionFunctions.registerAll(transformerFactory);
      
          transformerFactory.setURIResolver(uriResolver);
          transformerFactory.setErrorListener(errorListener);
      
          return transformerFactory;
      }
      • Creating an instance of Saxon 11 is absolutely needed. XMLmind DITA Converter is not designed to work with any other XSLT engine (e.g. the Xalan XSLT 1.0 engine, which is part of the Java™ runtime).
      • The ditac XSLT 2.0 stylesheets make use of a few XSLT extension functions written in Java™. These extension functions must be registered with Saxon. This is done using ExtensionFunctions.registerAll Opens in new window.
    3. Create and configure a Transformer.
      private static Transformer createTransformer(String[] params, 
                                                   Console console) 
          throws Exception {
          URIResolver uriResolver = Resolve.getURIResolver();
          ErrorListener errorListener = new ConsoleErrorListener(console);
      
          TransformerFactory factory = createTransformerFactory(uriResolver,
                                                                errorListener);
      
          File xslFile = AppUtil.getXSLResourceFile("xhtml/html.xsl");
          Transformer transformer = 
              factory.newTransformer(new StreamSource(xslFile));
      
          transformer.setURIResolver(uriResolver);
          transformer.setErrorListener(errorListener);
      
          for (int i = 0; i < params.length; i += 2) {
              transformer.setParameter(params[i], params[i+1]);
          }
      
          return transformer;
      }
      • Resolve Opens in new window is a helper class making it easy to use the services of XML Catalog resolvers Opens in new window.
        By default, Resolve automatically loads all the XML catalogs specified using the xml.catalog.files Java™ system property. Excerpts of the ant build.xml Opens in new window file:
        <target name="embed2" depends="compile,clean_embed2">
          <java classpathref="cp" fork="yes" classname="Embed2">
            <sysproperty key="xml.catalog.files" 
                         value="${ditac.dir}/schema/catalog.xml" />
            <arg value="${ditac.dir}/docsrc/manual/manual.ditamap" />
            <arg value="manual.html" />
          </java>
        </target>
        However, static method setXMLResolver Opens in new window allows to configure this thread-safe utility class (used by ditac in many places) differently.
      • ConsoleErrorListener Opens in new window is an implementation of ErrorListener which displays its messages on a Console.
      • AppUtil.getXSLResourceFile Opens in new window is a utility function used to locate files found in the XSL directory (normally ditac_install_dir/xsl/).
    4. Invoke the Transformer to transform each .ditac file.
      for (int i = 0; i < count; ++i) {
          File ditacFile = preProcFiles[i];
      
          String ditacFilePath = ditacFile.getPath();
          if (ditacFilePath.endsWith(".ditac")) {
              File transformedFile = new File(
                  ditacFilePath.substring(0, ditacFilePath.length()-5) + 
                  "html");
      
              try {
                  transformer.transform(new StreamSource(ditacFile), 
                                        new StreamResult(transformedFile));
              } catch (Exception e) {
                  console.showMessage(e.toString(), 
                                      Console.MessageType.ERROR);
                  cleanUp(preProcFiles);
                  return false;
              }
          }
      }
      In the case of Embed2, the above loop is not strictly needed. We specified preProc.setChunking(Chunking.SINGLE) and therefore the PreProcessor generates a single .ditac file.
  3. Copy the resources of the XSLT stylesheets (CSS stylesheets, icons, etc) to output subdirectory res/. Note that the images referenced in the DITA source, if any, have already been copied to output subdirectory img/ by the ImageCopier.
    File dstDir = new File("res");
    if (!dstDir.exists()) {
        File srcDir = AppUtil.getXSLResourceFile("xhtml/resources");
        try {
            FileUtil.copyDir(srcDir, dstDir, false);
        } catch (IOException e) {
            console.showMessage(e.toString(), Console.MessageType.ERROR);
            cleanUp(preProcFiles);
            return false;
        }
    }
  4. Delete the ditac_lists.ditac_lists and .ditac files.
    cleanUp(preProcFiles);

§ Environment required for running this kind of embedding

Aside ".jar" files like ditac.jar, xmlresolver.jar, saxon12.jar, etc, which are all listed in ditac_install_dir/doc/manual/embed/build.xml (see below), this kind of embedding also needs to access:
  • The DITA DTD, schemas and XML catalogs normally found in ditac_install_dir/schema/.
  • The XSL stylesheets normally found in ditac_install_dir/xsl/.
Therefore the requirements for running this kind of embedding are:
  1. Use system property xml.catalog.files Opens in new window to point to ditac_install_dir/schema/catalog.xml or to an equivalent of this XML catalog.
  2. Stock ditac_install_dir/schema/catalog.xml contains the following entry:
    <rewriteURI uriStartString="ditac-xsl:" rewritePrefix="../xsl/" />
    This <rewriteURI> entry Opens in new window is needed to find the location of the directory containing the XSL stylesheets. Make sure that this entry exists in your XML catalogs and that it points to the actual location of the directory containing the XSL stylesheets.

§ Compiling and executing the Embed2 sample

Compile the Embed2 sample by running ant in ditac_install_dir/doc/manual/embed/.
Execute the Embed2 sample by running ant embed2 in ditac_install_dir/doc/manual/embed/. This will convert ditac_install_dir/docsrc/manual/manual.ditamap to single HTML 4.01 page ditac_install_dir/doc/manual/embed/manual.html.