DITA's ugliest feature: volume 1

Recently, the world rose up and demanded to know: WHAT IS DITA'S UGLIEST FEATURE?

OK not quite the whole world. 13 of the 24 people who voted in a twitter poll, so ... a clear majority.

It's chunking, right?

Um... what are you talking about?

Chunking. It's DITA's worst feature, right? I hear you complain about it all the time.

...no.

...what? Why not?

I complain about chunking a lot. But it's not the worst feature. Also: you didn't read the poll question very well, did you? I clearly said ugliest feature, not worst.

Bad, ugly, what's the difference?

It would be hard to pick the worst feature in any markup specification – what's the criteria?

Hardest to use? Most complex markup? You can have complex or hard-to-use markup that is extremely useful, or easily automated, and everyone loves it.

Least useful feature? If an attribute or feature is only used by 1% of authors, but the other 99% never see it, it's not causing any harm.

Markup that's easy to abuse? That's a good candidate, but some people are clever enough to figure out how to abuse even the best features – that doesn't make the feature bad.

Sigh. I thought this was going to be an easy, short post: chunking and done.

I would never spoil a ditasplainer with a quick and easy answer.

OK. What makes a feature ugly?

For me: a complex or nonsensical design that's harder than it should be to use (regardless of the overall utility of the feature!).

And the ugliest feature is...

Conref push. Specifically, pushing before or after another element. (Pushing to replace an element isn't too bad.)

But I like that feature!

See above: regardless of the overall utility of the feature. I think it's a neat feature.

I know one person who's used it to do some pretty clever stuff. In his case, product docs were frozen and even translated early, but confidential feature names or product names were subject to change until just before release. Pushing in to replace an old name or code name solved the problem.
Important: This only works in a carefully controlled environment – you can run into lots of grammatical issues when replacing little phrases if you're not careful. Especially, but not exclusively, if you translate.

The DITA-OT documentation also uses conref push in a way I find pretty cool. Our build generates a topic (pulling content from non-DITA sources) and adds stable IDs alongside that pulled content. This in turn lets us push additional content into the topic, annotating that generated information in a safe way where our annotations never get lost. Specifically - we generate the topic with DITA-OT messages, putting them all in a table with an empty fourth column. That fourth column has an @id value based on the message ID, allowing us to push additional information into that column when useful. Result: our message topic can be regenerated every release (or every month, or every day), while annotations authored in DITA are maintained separately.

That's … it? Conref push? Kind of anti-climactic.

You finally asked an easy, direct question. The harder question is "Why do you consider this to be the worst feature?" For that, we have to get into ugly markup details. I assume you want to know about those,

Not rea…

so I'll explain. It goes back to the DITA specification itself wanting to make sure you as an author don't end up with something that's invalid. Which – don't get me wrong – is a great goal, and is one of the reasons that DITA works across implementations and specializations. But (as I've come to realize) the need to make sure nothing breaks can sometimes lead us down a dark, dark road.

That sounds ominous.

It's not that ominous. And again - it's often a bonus. For example: the rules around one-element <=> one-element @conref references are pretty complicated. But they're mostly invisible – authors never really encounter the complexity. The simple "X can only refer to X" rule is all they need to know. But pushing is different … at least with the before/after case.

Pushing to replace is (relatively) easy - the main thing to know is still that X can only refer to X. There are some odd rules and exceptions, but generally, pushing one <thingie> in topic A to replace one <thingie> in topic B isn't going to break a document. Again: the complex rules are hidden.

Pushing before or after is harder.

If you want to be sure you're safe pushing one <thingie> in topic A before or after some other <thingie> in topic B, then you need some way to verify more than one <thingie> is even allowed over in topic B.

Uhhh.….

Remember: for better or worse, when DITA gives you a feature to do CoolThings, it wants to make sure you're not broken after you evaluate CoolThings.

If you're only allowed to have one <thingie> in topic B, tools are written with that in mind. If a feature suddenly lets you shove in a second <thingie> where it wasn't allowed, those tools might break, throw out exceptions or pointer errors, or otherwise cause headaches in your workday.

I still don't understand why this is ugly
CAUTION: If you're not particularly technical, you're about to lose interest and close this browser tab. You have been cautioned.
The original design logic is roughly:
  1. OK, we need to let you push an element somewhere. We'd better make sure that you don't break anything, like giving your topic a second title! How do we do that?
  2. Well … obviously if you're going to end up with 2 of something in topic B, then we should verify that 2 of something is already allowed here in topic A.
  3. ...which means we need to have 2 <thingie> elements in topic A, in order for you to push just one of them to topic B. Erg... that's gonna be kind of confusing. But not sure what else to do.
    Note: You can have 2 <thingie> inside some elements but that doesn't mean you can have 2 of them inside every element. See the specification for even more rules to handle that.
  4. So, we have two <thingie> elements, but we only want to push one. So … how do we do that?
  5. Obviously one element is just there as some sort of marker … so let's label that with conaction="mark"! But … that @conaction (conref action) attribute is also where we would declare what we want to happen with the pushed element (conaction="pushbefore" or conaction="pushafter") … so … hmm. This is gonna be ugly, right? [Editor's note: THIS IS WHERE I START TO SAY: I TOLD YOU SO!]
  6. So, one of your two elements is the marker, and the other is the … pusher? OK. Clearly one of those should use @conref to refer to the target. I guess … in this scenario … the marker represents the target element. So … the marker element should be where you have the @conref, and the pusher element holds the pushed content?
  7. And the content … well … if you're pushing after, then the pusher element obviously it has to come after the marker element? And if you're pushing before, the pusher element should come before the marker element? And that won't be confusing?
  8. Cool! We have a design! Spread across two elements, with several attributes required, and the order of your "conref push" content varies depending on whether you're pushing before or after! That's not confusing at all! [Editor's note: I don't use this feature very often … meaning every time I use it I have to look up the syntax. Maybe I'll remember it better after this giant rant.]

I remember near the end of DITA 1.2 development – where this was added along with many, many other features – there was a telling comment from one member of the OASIS DITA Technical Committee. I won't name him, but he'd had a lot of experience with content standards, and was not a part design meetings for this. When he reviewed it near the end, his comment was something like … "Wait … is this really how it's going to work? Really?"

That … um … I don't know what to say at this point

Then I'll move on to the markup, thank you.

If you didn't follow the abstract design logic above, you can see examples of both pushing before and pushing after in the specification.

As a short example … this would work to push list items both before and after a list item in another topic. Note that there's a marker and a pusher for each condition:
<ol>
  <li conaction="pushbefore">THIS GOES BEFORE B</li>
  <li conaction="mark" conref="example.dita#example/b"/>
</ol>
<ol>
  <li conaction="mark" conref="example.dita#example/b"/>
  <li conaction="pushafter">THIS GOES AFTER B</li>
</ol>
But if you didn't have that example right there in front of you, and you wrote your markup like this:
<ol>  <!-- DON'T COPY THIS ONE. IT WON'T WORK. -->
  <li conaction="pushafter">THIS GOES AFTER B</li>
  <li conaction="mark" conref="example.dita#example/b"/>
</ol>
Or this:
<ol>  <!-- DON'T COPY THIS ONE. IT ALSO WON'T WORK. -->
  <li conaction="mark" conref="example.dita#example/b">THIS GOES AFTER B</li>
  <li conaction="pushafter"/>
</ol>
Or this:
<ol>  <!-- DON'T COPY THIS ONE. THAT'S RIGHT, IT WON'T WORK EITHER. -->
  <li conaction="mark"/>
  <li conaction="pushafter" conref="example.dita#example/b">THIS GOES AFTER B</li>
</ol>

Would you have any idea why it failed to resolve? No. No, you wouldn't. Because this is complex markup and it's not intuitive. You can't honestly say all 3 of those examples make less sense than the working one.

On the rare occasion I need to push before or after, I usually get the syntax wrong at least once before looking it up … which in turn discourages me from using this feature.

Now that I think about it, that's a pretty good summary of what makes a feature ugly.

Tip: In case the above is mistaken for a tutorial (rather than just an illustration of the markup), I want to point out that Kris Eberlein wrote an entire tutorial on this feature soon after it came out. The markup remains the same, so if you were hoping I'd explain this thing rather than whine about it, please take a look at her DITA 1.2 overview of conref push..
So, should we get rid of it?

No! Even though it's ugly, it's also extremely useful or even critical for some situations. That's why we have it!

So, should we redesign it?

Ideally, but I don't know a design that's both intuitive and continues to ensure your result is valid. I think all 4 examples above are equally plausible so switching to a different one won't help anybody.

The only thing I can think of to make the markup easier would be to
Warning: DITA heresy ahead
… give up on ensuring validity. Just allow you to use one element to push after, and assume authors will know what they're doing. For example, just leave it at this, without the "marker" element:
<li conaction="pushafter" conref="example.dita#example/b">THIS GOES AFTER B</li>
That sounds … good? Can we just do that?

Maybe, but probably not?

Lately I've been almost all-in on loosening up the language to make things like this simpler. But then yesterday I was in a meeting where some managers expressed concern about how much time authors have to spend debugging bad builds or bad output instead of … you know, writing. The current design explicitly and successfully reduces that debugging time, at the expense of dealing with bad syntax. (At least, when you get the before/after marker/pusher syntax right).

So how high is the debugging cost if we just let authors create invalid scenarios? Let's look at an example.

It's completely plausible that an author would decide to push a second <title> into a <concept>. I don't know why, maybe they think "I want to add a bit to the title, and if I have 2 shouldn't they just show up in order?" (I've definitely seen less sound reasoning, so it's not outrageous to imagine this.) You can't do this today: if you're pushing <title> as a child of <concept>, the current rules ensure you've already got 2 titles inside the <concept> you're pushing from. That's not valid, so you can't do it.

Now let's imagine if you could do it. I'll give an example based on DITA-OT, which definitely assumes at many points that any topic will have one (and only one) title.
  • Source documents with 2 titles simply will not parse - so you already get warned about them in your editor or in the first step of a build. We don't waste time double-checking this again with every build step.
    Note: DITA-OT generally doesn't try to check if you follow the basic XML rules – that's the parser's job. Asking "Do I have 2 of <X> where I shouldn't?" after every step would be horribly inefficient and (today at least) a waste of time.
  • DITA's core "one topic title" rule lets us safely grab titles and store them in variables. This happens a lot today. If you end up with two titles inside some variables, you get XSLT failures and a build crash.
    Note: When you get this error in a build, your first thought is probably "Ugh, a bug in the code". In this case though, the code is relying on explicit rules that cannot be broken – those rules let it declare "Treat this variable as one element node". When you find a way to break core rules, any assumption the code makes becomes dangerous.
  • In HTML5, you'll probably end up with two <h1> tags, which looks bad and would definitely break some CSS & accessibility rules. Your TOC might concatenate the titles, or might drop one entirely. In PDF, which goes through the XSL-FO format, you'll probably end up with invalid XSL-FO markup that breaks FOP or some other processor. It's hard to say for sure because you simply can't get to this point today.
  • This could be addressed by making every step of every program validate that there's only one title. But that's a horrible (and time consuming) programming requirement. This is also just one example; you'd have to do the same for <shortdesc>, <steps>, <body>, …
I'm tired, and a bit bored at this point. Will DITA 2.0 will fix this?

I'm tired too, and probably not.

Fixing it would require, first and foremost, a good design to replace what's there. Because the feature itself is useful and I can't see us removing it.

As much as this feature bugs me, I don't have a good suggestion to replace it, and I don't have enough time to design + implement one. Adding features to the standard is hard, and for good reason. (It's a lot harder now than when this design was created.)

I would be thrilled if somebody else came up with a design that's easier to use, and that also addresses the issue of invalid result content.

So if you can't fix it, what was the point of all this?

Twitter voted and said that I had to write about DITA's ugliest feature. As I understand the rules of social media, that meant I had no choice.

</fin>