Making zippy HTML (or other zippy output)

Have you ever wished that DITA-OT could return your files as a zip, instead of thousands of little files? Are you willing to experiment with a small DITA-OT plugin, or tweak one you've already got? If no – that's great! Have a good day! Otherwise – read on to learn about a new feature in DITA-OT 2.5!

A self-serving post about a self-serving feature?

This post is really a sort of tutorial on how to make use of a new feature in DITA-OT 2.5. Specifically, an under-the-covers feature that allows a plugin to easily tweak where any output files are generated. It's a function I needed over and over, got tired of working around, and submitted as new feature to DITA-OT 2.5.

Before version 2.5 (going all the way back to DITA-OT 1.0), the toolkit always generated result files directly in the specified output directory. Basically, you set output.dir to indicate where files should go, and we generate them there rather than generating everything inside the temp/ directory and then copying. Generally this is good. Creating stuff in side temp/ just so that we can copy it someplace else is a waste of time.

But.

It also makes some things difficult, like post-processing.

I don't like to run DITA-OT processes inside the output directory. I worry about touching files already in output.dir. Unlike a temp/ directory, I don't know what else is already in an output directory, so I'm in danger of modifying (or corrupting!) something unrelated to my build. If my process fails, the output directory is left full of files in an unknown state. These things are all bad.

DITA-OT already sets up a tidy little place in temp/ to do a series of things. We do most of our processing there, generating and cleaning up a lot of files. If I want to do that, plus one extra thing at the end, I should be able to use that same directory – just like I'd use that same directory to do anything in the middle of the process.

Using temp/ in that way used to be hard. Now it's easy.

Self-serving? Sure. This makes my life easier. But I hope it also makes your lives easier.

A new parameter is born

DITA-OT 2.5 defines a new internal parameter for use by plugin developers. The new parameter is called temp.output.dir.name. Here's the general idea:

  1. You set temp.output.dir.name as part of the initialization for your custom transform type. It should be a directory name (a relative directory - probably just one word). If you're not familiar with adding an initialization step to the start of a transform type, I'd recommend reading about how and why in the Happy HTML tutorial.
  2. Next, the everybody-must-use-this build-init target will set up a property (dita.output.dir) that combines your value with the usual temporary directory name. For example, setting the new parameter to zippy will get you a property like /path/to/temp12345/zippy/. DITA-OT 2.5 places all output files into the directory specified by dita.output.dir.
  3. If you didn't initialize the temp.output.dir.name parameter? No need to worry: dita.output.dir is set to your specified output directory. So, you're still good, and nothing has changed.
Note: This means you don't set dita.output.dir directly. Trying to override it will Mess Things Up. Just keep specifying the output directory like you always have.

For example, I set up my new "zipme" transform type so that it sets temp.output.dir.name to zippy, and then runs the normal dita2html5 target. By doing that, everything that would normally go in the output directory is now in temp12345/zippy/. Initializing that one parameter has redirected all of my output to a clean spot for post processing – without the need for an alternate temporary directory or the need to mess about in the output directory.

Could you do this another way? Absolutely. But I found it was surprisingly common to do [normal processing] followed by [one little post process step like zipping]. I asked other DITA-OT contributors and found out I was not alone.

Given how often I and others need to do this, it should be easy. And I generally feel that to make things clean, elegant, and most importantly not silly, we should be able to do those operations in the same temporary directory where we do everything else. That's … sort of what the directory is there for, isn't it?

Sample: zipping HTML5

With this parameter, if all I want is to use the default HTML5 build, my plugin needs just 3 files.

  1. plugin.xml: the file every plugin needs to declare itself.
  2. A simple XML file that says where to find any Ant code added by the plugin (I've named it conductor.xml).
  3. The build file with my Ant code. To create a new transform type that returns zipped HTML, I need three targets.
    1. dita2html5zip, which runs an initialization target, followed by the normal HTML5 target, followed by one target to zip the result.
    2. html5zip.init, my initialization target that just sets the new temp.output.dir.name property.
    3. ziphtml5, a target that zips up the result files from dita.output.dir and writes the zip directly to the output directory.

Please do it for me now

Ok, but first I want to show you how it's done.

The following build file does the bare minimum, meaning it builds HTML5 – exactly the same HTML5 you'd get today, using whatever other properties you've set – and returns a zip file. Because it's the bare minimum, this doesn't let you control the zip file name. It always returns ThisIsFun.zip.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:dita="http://dita-ot.sourceforge.net" name="dita2html5zip">

  <target name="dita2html5zip" depends="html5zip.init,dita2html5,ziphtml5"/>

  <target name="html5zip.init">
    <property name="temp.output.dir.name" value="internal_zip_dir"/>
  </target>
  
  <target name="ziphtml5">
    <zip destfile="${output.dir}${file.separator}ThisIsFun.zip" basedir="${dita.output.dir}"/>
  </target>

</project>

Uhhh that's not a very good zip name

Well yeah. But I wanted to illustrate how simple this could be.

If I want to use the map name by default – or maybe allow somebody to customize the zip name with a build parameter – I could make the zip target a bit more complicated.

<target name="ziphtml5">
  <condition property="html5.zipname" value="${dita.map.filename.root}.zip">
    <and>
      <isset property="dita.map.filename.root"/>
      <not><isset property="html5.zipname"/></not>
    </and>
  </condition>
  <zip destfile="${output.dir}${file.separator}${html5.zipname}" basedir="${dita.output.dir}"/>
</target>

Just gimme the plugin already

Here's a zip: http://metadita.org/toolkit/org.metadita.html5zip.zip.

Full disclosure: it's a bit more complicated than what I showed above. Specifically:

  1. The zip name defaults to the map name, as in the previous section. It also falls back to the topic name if the input is a topic.
  2. The zip task sets up a new parameter _map.dir.within.temp.zipdir. As you might expect, this is the directory of the map within the new temporary output directory. That is – if there are topics referenced above the map directory, those don't go in my zip, because I want the zip to start at the map level. It's like setting generate.copy-outer=1 for your build.
  3. This plugin defines its own template and Ant extension point depend.html5.postprocess.before.zip. The extension lets you use the html5zip transform type together with your own post-processing. It works just like preprocessing extension points in the rest of DITA-OT, by adding a dependency that will always run as part of the "html5zip" transform type. Anything you add here will run after the normal HTML5 build but before the results are zipped; this means you can add to or post-process everything in dita.output.dir before the zip is created.
  4. Rather than ThisIsFun.zip, if it can't figure out an input map or topic name, it defaults to the zip name please.set.html5.zipname.property.zip

Adapting the plugin to other default output formats

The most obvious candidate for this sort of extension is eclipsehelp, which generates a whole mess of files from a complicated string of targets. But, that's not needed thanks to the new parameter that lets DITA-OT generate an Eclipse JAR file. (Why, yes, the implementation of that Eclipse JAR feature does use exactly the process I'm writing about here. And yes, this is a plug for yet another new DITA-OT 2.5 feature.)

For any other output format, you could just take the plugin above and tweak it to return a zip of your xhtml, troff, or even PDF. For example, the following Ant code sets up a xhtmlzip transform type that creates and zips XHTML instead of HTML5.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:dita="http://dita-ot.sourceforge.net" name="dita2xhtmlzip">

  <target name="dita2xhtmlzip" depends="xhtmlzip.init,dita2xhtml,zipxhtml"/>

  <target name="xhtmlzip.init">
    <property name="temp.output.dir.name" value="internal_zip_dir"/>
  </target>
  
  <target name="zipxhtml">
    <zip destfile="${output.dir}${file.separator}ThisIsFun.zip" basedir="${dita.output.dir}"/>
  </target>

</project>

Why yes, I really did just do a search/replace and change html5 to xhtml.

Why yes, it really was that simple.

Adapting the plugin to custom output formats

This could be a bit trickier.

If your custom transform type doesn't actually create any output files – then you can use the same process as above, just initialize the temp.output.dir.name parameter at the start and add a new zip target at the end.

If you do generate any output files, it may take a few days of work to update your plugin. Here's the process:

  1. For every Ant target that uses output.dir to generate or copy a file, change output.dir to dita.output.dir. A nice search/replace tool works wonders here.
  2. Wait a few days, just to draw things out. I suggest reading a book.
  3. As above, initialize the temp.output.dir.name parameter at the start of your transform type and add a new zip target at the end.
Note: If your eyes glazed over at the "few days" part, or you went into panic mode and didn't read that list closely, please go back and read step 2 again. Picking out the proper book is critical here.

For my own purposes, I had a number of custom transform types that did little more than initialize several parameters, run a normal XHTML or HTML5 build, and maybe generate a couple extra output files. Updating those to use this new feature took just a few minutes, and I no longer have to worry about any new output files that DITA-OT generates in the future.

Summary

  • There's a new feature in DITA-OT 2.5 that simplifies post-processing of content.
  • That processing can be (and often is) as simple as "turn this big set of files into one zip".
  • The plugin linked above is meant as a tutorial or sample - if you need to zip up output, take it and tweak the transform type as needed.
  • If you just want the current HTML5 output, but need it zipped … use the plugin! It works [with DITA-OT 2.5 or later]! Yay!
  • If you have your own transform types, and need to add post processing or zipping, the new temp.output.dir.name is here to make your life easier – and it really shouldn't take you long to work it in. (Just remember you'll need to be on DITA-OT 2.5 for it to work.)

Good luck, and happy plugin-ing!