But what if I want a whole new transform dot dot dot?

In my many years talking about DITA-OT transformation types, perhaps the most requested format has been "morse code". I'm pretty sure everybody was joking dot dot dot but why should I let that stop me?

If it looks like a new format, and it quacks like a new format...

Not long ago I wrote a couple of tutorials about creating new transformation types that don't really feel like a new output format. This time, I'm going to describe how to set up a new process that really is a new thing: how to convert DITA content to Morse code.

Oh, c'mon, that's useless. Why would I want to read any further?

I assume you mean, "That's useless unless you give me something that converts it to audio". Well, perhaps. But: if you need audio, a quick online search shows there are a lot of tools to help, so why go there?

My goal – and the reason you might want to read further – is to illustrate how to take some existing DITA-OT pieces, add in a couple of simple new steps, and get something entirely new. Once you can do that, you can swap the Morse steps out for your own code that does … well, pretty much anything you want with DITA content.

If you were a tree, what kind of tree would...

Obviously, the DITA Specification defines DITA markup. It also defines a lot of features along with that markup - features like content referencing and key resolution. With those features (and more) already supported in existing transformation, there is no reason to re-implement them for every new type of output.

The toolkit groups processing for those features in a thing called preprocess. It's not only evaluating the standard features - it can also copy images around, insert debug information, and potentially do all sorts of other things that are useful across most formats. But once it's done, the basic stuff that really needs resolution has really been resolved.

Now, we'd all agree DITA-OT processing shouldn't be compared to a tree, but I'm going to do it anyway.

In every normal transformation type, the preprocess is the trunk of a fully grown tree. It gets you off the ground. It does all the hard work and holds everything else up. Only once you get well up off the ground is time to branch off.

Branches can go off in any direction. Some of them only need to be big enough to hold up a tiny finch. (I would have suggested a puffin, but … I've never seen a puffin in a tree, and, well … it's DITA.) Other branches (PDF2 anyone?) might be big enough to support the entire Swiss Family Robinson, with additional extended branches of their own (PDF2 for FOP, PDF2 for Antenna House, PDF2 for RenderX).

Thankfully, for this tutorial, we're sticking to finch-sized branches.

Enough with the trees, let's explore the forest

My new finch-sized plugin will have just three files:

org.metadita.morse\plugin.xml
org.metadita.morse\morsebuild.xml
org.metadita.morse\xsl\morse.xsl

We'll start with plugin.xml. As with any DITA-OT plugin, it needs to declare an ID; because this creates a new transformation type, it also needs to declare that transformation type and the Ant build file.

<?xml version="1.0" encoding="UTF-8"?>
<plugin id="org.metadita.morse">
  <transtype name="morse" desc="-- --- .-. ... . / -.-. --- -.. ."/>
  <feature extension="dita.conductor.target.relative" file="morsebuild.xml"/>
</plugin>
Tip: I like to give my Ant build files a name that reflects their purpose. This is not required. My main reason is: if something goes wrong as I'm developing, it's a lot easier to debug a message like "error on line X of morsebuild.xml" than one like "error on line X build.xml". With the first message, I know exactly where to look, while the second message could indicate a problem in DITA-OT code (maybe I called a common step wrong?), or a problem in my new build target (typo in my Morse transformation?), or maybe I'm just not testing often enough and forgot that my unrelated plugin was already broken. Many other DITA-OT developers I've talked to just prefer the consistency of naming all Ant build files build.xml. Their loss, I suppose... ;-)

The build file: morsebuild.xml

For any new format named "X", DITA-OT expects your build file to declare a target named "dita2X". This is where you define how to create that format. My "morse" build file thus needs to include a target named dita2morse, which does only four things:

  1. Initialize some properties (described in the next section)
  2. Run build-init to initialize DITA-OT processing + any properties not already set in my own initialization step
  3. Run preprocess to get all the DITA goodies without having to do any extra work
  4. Convert the result of preprocessing into Morse code
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:dita="http://dita-ot.sourceforge.net" name="dita2happyhtml">

  <!-- Define "morse": first set params in an initialization step, 
       then run preprocess, then call render.morse to make the resulting Morse documents -->
  <target name="dita2morse" depends="morse.init, build-init, preprocess, morse.render"/>

  <target name="morse.init">
    <!-- Hold on, we'll fill this in soon -->
  </target>

  <target name="morse.render">
    <!-- This one too -->
  </target>

</project>

At this point, you can actually install the plugin and run a build. It won't produce Morse output yet, because the morse.render target is empty - but it will run your content through the entire preprocess and report a successful build.

Initializing early: how to ignore tweak preprocess without touching it

One incredibly useful aspect of new DITA-OT transformation types is the ability to initialize your own properties before the actual build kicks off. I did this in the happy html tutorial to set up default CSS and a couple other things. I'll use it here to override a single core DITA-OT property, which is pretty trivial, but also pretty important. (I'll also use it to set a default output extension for my result files.)

If you didn't run the build created in the previous section, go ahead and do so, with the old garage map (docsrc/samples/hierarchy.ditamap) as your input file. As promised, it won't yet produce any Morse output. However - you will end up with the image file image/carwash.jpg in your output directory. This is because DITA-OT assumes new formats will need any referenced images.

I want to turn off the image copy for Morse output. (If I'm limited to dots and dashes, images seem pretty useless.) I can turn off the image copy with a single property in the morse.init target:

  <target name="morse.init">
    <property name="preprocess.copy-image.skip" value="true"/>
    <property name="out.ext" value="txt"/> <!-- While here, why not set a default extension? -->
  </target>

Converting to Morse

Now, I want to generate something that qualifies as "morse". Realistically, there are many ways I could go about this. Most obviously, I could convert my entire map to an Audio file. I'm not going to do this. To answer the expected "why not":

  1. It would be complex, and I want a finch-sized branch off of preprocess
  2. It wouldn't be a useful template for anybody looking to create their own new output
  3. I want to get you through this tutorial in a reasonable amount of time, and I've learned that when dealing with audiophiles, nothing is ever perfect

Instead, I'm going to do a simple document in → document out mapping, similar to the default HTML transformations. This scenario seems fairly common, and it still allows for easy single-document output by setting chunk="to-content" on your root map. With that in mind, I'll use XSLT to convert each individual topic document into a single output file full of (meaningful) dots and dashes.

Restriction: No, I'm not going to worry about generated text like "Note" or "Restriction". Because that would make the code sample longer.
Restriction: I'm also not going to worry about characters that do not have Morse representation, or even some punctuation characters that do. After all, this is just a slightly silly tutorial that will hopefully help you with your own work; it's not a real, production-level Morse rendering plugin. Blah blah blah <legalNotice>use at your own risk not responsible for erroneous telegraphy</legalNotice> and so on.

My XSLT processes everything with the "text-only" mode that ships with the toolkit. This (among other things) ensures that you pick up alternate text for any images, while dropping elements like index terms that are not typically rendered inline. After getting a text-only version of each DITA document, it uses a character map to convert to Morse.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

  <xsl:import href="plugin:org.dita.base:xsl/common/dita-utilities.xsl"/>
  <xsl:import href="plugin:org.dita.base:xsl/common/output-message.xsl"/>
  
  <xsl:variable name="msgprefix" select="'DOTX'"/>
  
  <xsl:character-map name="morse">
    <xsl:output-character character=" " string=" / "/><xsl:output-character character="a" string=".- "/>
    <xsl:output-character character="b" string="-... "/><xsl:output-character character="c" string="-... "/>
    <xsl:output-character character="d" string="-.. "/><xsl:output-character character="e" string=". "/>
    <xsl:output-character character="f" string="..-. "/><xsl:output-character character="g" string="--. "/>
    <xsl:output-character character="h" string=".... "/><xsl:output-character character="i" string=".. "/>
    <xsl:output-character character="j" string=".--- "/><xsl:output-character character="k" string="-.- "/>
    <!-- Yes, this code style is ugly, but it condenses lines for the sake of this blog post -->
    <xsl:output-character character="l" string=".-.. "/><xsl:output-character character="m" string="-- "/>
    <xsl:output-character character="n" string="-. "/><xsl:output-character character="o" string="--- "/>
    <xsl:output-character character="p" string=".--. "/><xsl:output-character character="q" string="--.- "/>
    <xsl:output-character character="r" string=".-. "/><xsl:output-character character="s" string="... "/>
    <!-- You're not really checking these for accuracy, are you? -->
    <xsl:output-character character="t" string="- "/><xsl:output-character character="u" string="..- "/>
    <xsl:output-character character="v" string="...- "/><xsl:output-character character="w" string=".-- "/>
    <xsl:output-character character="x" string="-..- "/><xsl:output-character character="y" string="-.-- "/>
    <xsl:output-character character="z" string="--.. "/><xsl:output-character character="1" string=".---- "/>
    <xsl:output-character character="2" string="..--- "/><xsl:output-character character="3" string="...-- "/>
    <xsl:output-character character="4" string="....- "/><xsl:output-character character="5" string="..... "/>
    <xsl:output-character character="6" string="-.... "/><xsl:output-character character="7" string="--... "/>
    <xsl:output-character character="8" string="---.. "/><xsl:output-character character="9" string="----. "/>
    <xsl:output-character character="0" string="----- "/><xsl:output-character character="." string=".-.-.- "/>
    <xsl:output-character character="," string="--..-- "/><xsl:output-character character=":" string="---... "/>
    <xsl:output-character character="?" string="..--.. "/><xsl:output-character character="'" string=".----. "/>
    <xsl:output-character character="-" string="-....- "/><xsl:output-character character="/" string="-..-. "/>
    <xsl:output-character character="@" string=".--.-. "/>
  </xsl:character-map>
  
  <xsl:output use-character-maps="morse" method="text"/>

  <xsl:template match="/">
    <xsl:variable name="file-as-string">
      <xsl:apply-templates mode="text-only"/>
    </xsl:variable>
    <xsl:value-of select="normalize-space(lower-case($file-as-string))"/>
  </xsl:template>

</xsl:stylesheet>

Actually running that conversion step

In theory (or in reality, if you've already copy/pasted the whole sample), you now have XSLT that can convert a DITA document to Morse. So, how do you run it?

In cases like this, I always find it easiest to copy and tweak. You'll find plenty of DITA-OT steps that run a single XSLT over every topic, sending the result to the output directory. By copying one of those and deleting unnecessary parameters, you'll end up with something like this in your morse.render target:

<target name="morse.render">
    <pipeline message="Convert DITA topic to Morse text files" taskname="xslt">
      <xslt basedir="${dita.temp.dir}"
        destdir="${output.dir}"
        reloadstylesheet="${dita.morse.reloadstylesheet}"
        classpathref="dost.class.path"
        extension="${morse.out.ext}"
        style="${dita.plugin.org.metadita.morse.dir}/xsl/morse.xsl">
        <ditaFileset format="dita" processingRole="normal"/>
        <mapper classname="org.dita.dost.util.JobMapper" to="${out.ext}"/>
        <xmlcatalog refid="dita.catalog"/>
      </xslt>
    </pipeline>
  </target>

The three bold lines inside of <xslt> are important to understand.

  1. The <ditaFileset> element is a custom DITA-OT thing that tells Ant which files to send into your XSLT. In this case, it sends in all of your DITA topics with a normal processing role (that is, skipping any non-DITA files + DITA files marked as resource-only).
  2. The <mapper> element causes generated files to use your desired output extension.
  3. The <xmlcatalog> element tells processor to use DITA-OT catalog processing. Without this, the XSLT above will fail, because Saxon won't be able to process import lines that start with the DITA-OT plugin: syntax. If your build fails with a message like "XTSE0165: java.net.MalformedURLException: unknown protocol: plugin", then you probably forgot to add this <xmlcatalog> element.

It should work now

At this point, you should be able to install the plugin and run a successful build to "morse":

dita-ot-2.4.4> bin/dita --install
dita-ot-2.4.4> bin/dita --input docsrc/samples/hierarchy.ditamap --format morse --output out

The result will be a series of text files in the out directory. The text files will be an irritating mix of dots, dashes, and the occasional slash to break up words.

Making this your own

So, what should you do with this?

One option is to take this plugin as described, and run the result files through some other process that converts them to audio. This would be a huge waste of time, but you're welcome to try.

A more reasonable next step is to tweak this to create an entirely different sort of output. You might also want to add in a step to do something with the map.

Maybe you want to create a text format that doesn't exist? Tweak the initialization step (or remove it entirely), and replace the XSLT file with a more useful one. Maybe you want to convert every DITA topic into some sort of colorful SVG file? I won't judge. After all, you were kind enough to read to the end of this post, and that really took some . ..-. ..-. --- .-. - .

Tip: If you followed the instructions here but something went wrong, originals created during the writing of this blog are stored here in zip form (or find the source at github) for comparison.

Summary, for those who want to make use of this nonsense

  1. You'll probably always want to use the preprocess as-is. I've had a few occasions to modify it with a new transformation type, but that's rare.
  2. After the preprocess, your transformation can be simple like this one, or it can be extremely complicated. If you need to, you can bundle your own Java libraries (like PDF + FOP). You can even run something totally unexpected, like an EXE written in Basic – the only limitations are your imagination and what you can figure out how to run from an Ant target.
  3. Was this whole article just an excuse for me to use the audio file / audiophile pun? … probably not
  4. When you're done, you'll get good Open Source Karma Points if you share your plugin for others to play with. I like to post my experimental plugins at github.
  5. Was this whole blog post just an excuse for me to use dot dot dot in the title? dot dot dot maybe
  6. A lot of the smaller features / enhancements I've put into DITA-OT recently came about when I was writing plugins and found myself thinking "Well this would be easier if..." Which means, if you find yourself thinking along those lines, join the club! Submit a pull request, listen in on our contributor calls, whatever works for you! It's how most of us got started...