Why would I want a new transform type?

And what could a new transform type possibly have to do with having to type in an annoyingly long set of options every time I want to publish?

When I first entered this industry (gulp) late in the last millennium, I was supporting a huge number of teams across IBM, working on different types of content for use in different situations. (That's still true, by the way). There were only a few constants:

  • Everybody wanted a book or PDF
  • (Almost) Everybody wanted HTML
  • IBM set out a set of style rules
  • People wanted or needed to break those rules

As a result, we ended up with a very small set of transform types, with a growing set of "do this different" options. For example: "I want HTML, but I want labels on images, and I want to add this CSS file, and change the extension from html to htm, and I need a TOC file for Eclipse, and bundle the output as a file named doc.zip, and..."

This sort of process is probably familiar to anybody still reading. Anecdotally, for many in this situation, the list of "but I want..." options doesn't change very often.

Resetting the context: DITA Open Toolkit

In DITA-OT, the transform type you want is just another option. (Literally – the Ant property has been called transtype going back to DITA-OT 1.0 – though with DITA-OT 2.x the -input parameter is suggested for command line use.) That said, because each default transform type corresponds with a single output format or platform, it can be hard to think of this as "just another option". I've worked with DITA-OT since version 1.0, and my earlier experience formed such a rut that for years I had a hard time imagining a build as anything other than "(Here is how you pick HTML or PDF) + (Here is where you specify your options)."

Of course typing all of those options in every time I want to build can be a pain - especially with an Ant command line, which was the only real way to run a build in the early days:

ant -f build.xml -Dtranstype=xhtml -Dargs.input=myMapFile.ditamap -Doutput.dir=OUT -Dargs.draft=yes -Dargs.outext=htm -Ddita.input.valfile=myFilters.ditaval -Dcss=myBrand.css -Dargs.gen.task.lbl=YES -Dargs.ftr=standardFooter.xml

So how do we make setting all of those options easier? Especially if, as commonly happens, the only one that ever changes is the input file?

Quick and dirty: the batch file, property file, or Ant script

10 or 5 or maybe even 3 years ago, I'd have recommended these go in a batch file. If you're clever, you could make the input file a parameter … or you can just copy your batch file for each new input map, and change the args.input parameter accordingly. Running the build becomes a simple double-click or a one-word command line action.

Similarly – you can set up a properties file to store common options; the properties file can consolidate most (but not quite all) options into a single parameter. The DITA-OT documentation has a good tutorial on setting up properties files for use specifically with HTML5 output.

Setting up an Ant build is another way to go. It's not uncommon to see a build like this:

  
<target name="xhtml" depends="integrate" description="Build XHTML">
    <ant antfile="${dita.dir}${file.separator}build.xml" target="init">
      <property name="args.input" value="myInputDoc.ditamap"/>
      <property name="args.filter"
        value="usualFilterFile.ditaval"/>
      <property name="transtype" value="xhtml"/>
      <property name="args.css" value="myBrand.css"/>
      <property name="args.copycss" value="yes"/>
      <property name="args.draft" value="yes"/>
    </ant>
</target>

The Ant build approach is slightly harder to run than a batch file - but can be a very effective way to set up multiple builds. Add in a couple variables, and it becomes easy to manage a series of related builds. This is what we used to build the DITA Specification, where 4 packages times 4 formats resulted in 16 slightly different conversions.

But … could we do better?

Of course we can do better

You're right, it was a silly question. Obviously, I wouldn't have asked it if the answer was "no". But let's keep going.

By setting up a new transform type – that is, a new value for the transtype parameter – we can set up a repeatable transform that already defaults to all of our favorite options, usable from anywhere, by anyone with the plugin.

I don't want a new transform, I just want HTML with a few changes!

Here's the mind-shift that comes with making the transform type "just another option". A new transtype isn't necessarily a whole from-scratch transform; it can instead be a minor variation on something that already exists.

Over the last couple years, we (my team in IBM) have shifted from a toolset that hides DITA-OT under the covers to one that directly extends DITA-OT. As this started, I initially set up new transform types that mimic existing ones – "ibmhtml5" is a new transform type that tweaks HTML5 for a specific platform. "ibmpdf" extends pdf2 with our common styles.

There was some early resistance to this – expressed as "we don't need a new transform, it's just PDF!" Alternatively: "But we don't want a new html transform type for every different platform! Do we?" To which my answer is … maybe? Probably? Why not?

Put simply: a new transtype just gives the toolkit down a new path through the processing chain. It can be wildly different than any existing path. For example: "Produce a strangely formatted text file that is consumed by my 20 year old mainframe program". Or … 99.13% of the build can overlap an existing path. For example: "I want to run a normal HTML5 build, but before I do, set a default for the args.draft parameter".

Did you catch that? A whole new transform type that just sets a default for one single option on an existing transform type. A default that you could still (as always) override with the command line.

Whoa. And … why?

OK, it's unlikely you'll do this just to set a default for one option. But what about for 6 options? 10? And what if some of those include long path values that are easy to mess up?

Real world versions of the new transform type

Here's a summary of what our "ibmpdf" transform type does:

  1. Set up default values for about 10 options – including several path values, a config files for the formatter, and the args.xsl.pdf option that says what XSL to run.
  2. Run the PDF transform.

The XSL customizations that we have are (relatively) complex, but they don't have to be. Basically, our new transform type is a shorthand that lets us set up a whole bunch of default options without having to remember them every time.

The "ibmxhtml" type is also simple - it sets only three options (one of them the XSL override), builds XHTML, and adds a step to copy our own CSS.

"ibmhtml5" is the most complex - because rather than building the usual HTML5 documents, it places all of the rendered files in a ZIP, so more Ant processing is needed. Still - it's basically HTML5 with a twist.

Summary

While it might sound like a big deal - don't underestimate the value of a new "transform type".

Within IBM, we've now got a whole series of new transform types that build on existing values – "ibmpdf", "ibmhtml5", "ibmtroff", and so on. In most cases, these just set up a few default parameters that apply to most builds. We've also got a few more specific transform types, such as one for Quick Start Guides, that collects a slightly different set of common options for a slightly different PDF.

The DITA Thoughts on this web site are built the same way - with a transform type of "ditathoughts" that sets up CSS values, and uses XSL to wrap content in my default headers.

In the end, a new transform type can effectively become just a shorthand for "all the option values I usually want". Yes, it's possible to go overboard. You could create 100 new "transform types" that all represent slightly different combinations of defaults. That would probably be silly – even sillier than having a "happyhtml" transform type. But having 2, 5, or even 10 transform types – that all correspond to a single, memorable, targeted platform or format – is really a good practice that can save you headaches when it's time to publish.