Why would I want a new transform type?
And what could a new transform type possibly have to do with having to type in an annoyingly long set of options every time I want to publish?
When I first entered this industry (gulp) late in the last millennium, I was supporting a huge number of teams across IBM, working on different types of content for use in different situations. (That's still true, by the way). There were only a few constants:
- Everybody wanted a book or PDF
- (Almost) Everybody wanted HTML
- IBM set out a set of style rules
- People wanted or needed to break those rules
As a result, we ended up with a very small set of transform types, with a growing set of "do this different" options. For example: "I want HTML, but I want labels on images, and I want to add this CSS file, and change the extension from html to htm, and I need a TOC file for Eclipse, and bundle the output as a file named doc.zip, and..."
This sort of process is probably familiar to anybody still reading. Anecdotally, for many in this situation, the list of "but I want..." options doesn't change very often.
Resetting the context: DITA Open Toolkit
In DITA-OT, the transform type you want is just another option. (Literally – the Ant property has
been called transtype going back to DITA-OT 1.0 – though with DITA-OT 2.x the
-input
parameter is suggested for command line use.) That said, because each default transform type
corresponds with a single output format or platform, it can be hard to think of this as "just
another option". I've worked with DITA-OT since version 1.0, and my earlier experience formed such a
rut that for years I had a hard time imagining a build as anything other than "(Here is how you
pick HTML or PDF) + (Here is where you specify your options)."
Of course typing all of those options in every time I want to build can be a pain - especially with an Ant command line, which was the only real way to run a build in the early days:
ant -f build.xml -Dtranstype=xhtml -Dargs.input=myMapFile.ditamap -Doutput.dir=OUT -Dargs.draft=yes -Dargs.outext=htm -Ddita.input.valfile=myFilters.ditaval -Dcss=myBrand.css -Dargs.gen.task.lbl=YES -Dargs.ftr=standardFooter.xml
So how do we make setting all of those options easier? Especially if, as commonly happens, the only one that ever changes is the input file?
Quick and dirty: the batch file, property file, or Ant script
10 or 5 or maybe even 3 years ago, I'd have recommended these go in a batch file. If you're
clever, you could make the input file a parameter … or you can just copy your batch file for each
new input map, and change the args.input
parameter accordingly. Running the build
becomes a simple double-click or a one-word command line action.
Similarly – you can set up a properties file to store common options; the properties file can consolidate most (but not quite all) options into a single parameter. The DITA-OT documentation has a good tutorial on setting up properties files for use specifically with HTML5 output.
Setting up an Ant build is another way to go. It's not uncommon to see a build like this:
<target name="xhtml" depends="integrate" description="Build XHTML">
<ant antfile="${dita.dir}${file.separator}build.xml" target="init">
<property name="args.input" value="myInputDoc.ditamap"/>
<property name="args.filter"
value="usualFilterFile.ditaval"/>
<property name="transtype" value="xhtml"/>
<property name="args.css" value="myBrand.css"/>
<property name="args.copycss" value="yes"/>
<property name="args.draft" value="yes"/>
</ant>
</target>
The Ant build approach is slightly harder to run than a batch file - but can be a very effective way to set up multiple builds. Add in a couple variables, and it becomes easy to manage a series of related builds. This is what we used to build the DITA Specification, where 4 packages times 4 formats resulted in 16 slightly different conversions.
But … could we do better?
Of course we can do better
You're right, it was a silly question. Obviously, I wouldn't have asked it if the answer was "no". But let's keep going.
By setting up a new transform type – that is, a new value for the transtype parameter – we can set up a repeatable transform that already defaults to all of our favorite options, usable from anywhere, by anyone with the plugin.
I don't want a new transform, I just want HTML with a few changes!
Here's the mind-shift that comes with making the transform type "just another option". A new transtype isn't necessarily a whole from-scratch transform; it can instead be a minor variation on something that already exists.
Over the last couple years, we (my team in IBM) have shifted from a toolset that hides DITA-OT
under the covers to one that directly extends DITA-OT. As this started, I initially set up new
transform types that mimic existing ones – "ibmhtml5" is a new transform type that tweaks HTML5 for
a specific platform. "ibmpdf" extends pdf2
with our common styles.
There was some early resistance to this – expressed as "we don't need a new transform, it's just PDF!" Alternatively: "But we don't want a new html transform type for every different platform! Do we?" To which my answer is … maybe? Probably? Why not?
Put simply: a new transtype just gives the toolkit down a new path through
the processing chain. It can be wildly different than any existing path. For example: "Produce a
strangely formatted text file that is consumed by my 20 year old mainframe program". Or … 99.13% of
the build can overlap an existing path. For example: "I want to run a normal HTML5 build, but before
I do, set a default for the args.draft
parameter".
Did you catch that? A whole new transform type that just sets a default for one single option on an existing transform type. A default that you could still (as always) override with the command line.
Whoa. And … why?
OK, it's unlikely you'll do this just to set a default for one option. But what about for 6 options? 10? And what if some of those include long path values that are easy to mess up?
Real world versions of the new transform type
Here's a summary of what our "ibmpdf" transform type does:
- Set up default values for about 10 options – including several path values, a config files for
the formatter, and the
args.xsl.pdf
option that says what XSL to run. - Run the PDF transform.
The XSL customizations that we have are (relatively) complex, but they don't have to be. Basically, our new transform type is a shorthand that lets us set up a whole bunch of default options without having to remember them every time.
The "ibmxhtml" type is also simple - it sets only three options (one of them the XSL override), builds XHTML, and adds a step to copy our own CSS.
"ibmhtml5" is the most complex - because rather than building the usual HTML5 documents, it places all of the rendered files in a ZIP, so more Ant processing is needed. Still - it's basically HTML5 with a twist.
Summary
While it might sound like a big deal - don't underestimate the value of a new "transform type".
Within IBM, we've now got a whole series of new transform types that build on existing values – "ibmpdf", "ibmhtml5", "ibmtroff", and so on. In most cases, these just set up a few default parameters that apply to most builds. We've also got a few more specific transform types, such as one for Quick Start Guides, that collects a slightly different set of common options for a slightly different PDF.
The DITA Thoughts on this web site are built the same way - with a transform type of "ditathoughts" that sets up CSS values, and uses XSL to wrap content in my default headers.
In the end, a new transform type can effectively become just a shorthand for "all the option values I usually want". Yes, it's possible to go overboard. You could create 100 new "transform types" that all represent slightly different combinations of defaults. That would probably be silly – even sillier than having a "happyhtml" transform type. But having 2, 5, or even 10 transform types – that all correspond to a single, memorable, targeted platform or format – is really a good practice that can save you headaches when it's time to publish.