Who cares about modular DITA grammar files, anyway?

What follows is a summary / paraphrasing / imaginary retelling of a conversation I've had with several people over the last couple years. In most of these conversations I'm asked a lot of questions, like "Does the specification really say this?", or "Surely I'm reading this wrong?", or "Why could the specification possibly care?" Some of you may remember asking those questions; thank you for making me think about this, and also, curse you for making me think about this.

I hear there's something bothering you.

Yes. Several things. But one stands out at the moment.

DITA grammar files?

Yes. In particular, modular DTDs, XSDs, and RNGs.

What's wrong with modular grammar files?

Nothing. Absolutely nothing. I'm all in favor of modular grammar files. I consider them a best practice.

Wait … I'm confused.

Happens a lot when I'm talking to people. I don't know why 😢.

So they're a best practice?

Yes.

Explain?

Modular grammar files encourage reuse of definitions. Reuse is a key part of DITA. It all fits nicely.

To explain a bit more – DITA specialization generally works with groups of related items. A group of elements in a domain. A group of elements in a new type of map. A global attribute or two. Defining these in modular grammar files is a best practice because:
  • I've found that the act of creating specializations in modules really does help you define them properly - with proper class attributes that are required for all DITA processing / processors.
  • When extensions are defined in a reusable package, it encourages you to think about them as a unit. What belongs, what doesn't? Should these elements really be in different modules? A module with a good collection of related elements (and nothing more) is likely to be usable exactly as-is by other documents.
  • "Usable by other documents" - if I have a set of "music" elements, and somebody is working on topics about music, they can integrate my module with their own modules. If I don't use a nice module, they can't share. If my module also includes "dance" elements, they won't want to share. Sharing reduces overall work. Sharing is good.

That list is not exhaustive. There are other good things about modular grammar files (the DITA 1.3 specification has a better explanation). But I'm here to complain about them, so I'll stop there.

Um … right. They sound like a good thing. Why do they bother you?

Modular grammar files themselves do not bother me.

What bothers me is that the DITA specification requires them (or appears to require them). It has almost 40 pages of rules on how to create conforming grammar files, and warns about the use of non-conforming grammar files. There is an admission that non-conforming grammar files might be useful, but it's buried in a non-normative appendix. Even there, it only mentions optimization rather than the issues that bug me.

I think this focus on modular grammar files causes problems. Among them: it scares people; it makes DITA look more complicated than it needs to; it costs you money. This is demonstrably true when you work with different tool sets that require different formats (for example, one requires DTD and another XSD).

OK, that got my attention. How does it cost me money?

Say you have a tool that works with DTDs. You know and love DTDs, so you maintain your specializations as DTD files. Your tools support DTDs.

Now you suddenly have another tool. You've been told it's a DITA tool. You were not told that it only works with XSDs. Your manager already bought 20 licenses and paid for 5 years of maintenance on the tool. What do you do?

Um … I guess I make an XSD. I don't know how to do that.

Yes. And most people don't. So where do you start?

If you read the DITA 1.3 specification about XSD coding practices, you'll find many pages with complex rules that probably only make sense if you already know XSD. So, go learn XSD. Don't worry, the XSD specifications [Part 1: Structures and Part 2: Datatypes] aren't quite as big as the DITA specification.

You'll also discover that for every file you maintain in your DTD – and making modular grammar files requires an often ridiculous number of files – you'll probably need to maintain another one for XSD. You've doubled your maintenance burden, and half of your burden is for a format you don't really know or want.

Ugh. I hope there's another option?

Sure. You could pay someone to do it.

I thought we were trying to get away from that "cost me money" part?

Right. In that case, ignore the specification when you make your XSD. There is a tool that can –

Stop. Didn't you help write the spec? How can you say to ignore it?

Years ago I went to a workshop presented by the editors of a small XML specification. They went through every element in the spec, and for each element, they described why it was there and how to use it. Then they got to one element and said roughly: "Writing a specification is an exercise in compromise. This element is here because we compromised. You will never need it, and should ignore it."

That's it?

I think that's enough.

When I interrupted, you were about to tell me another option for my XSD?

Yep.

There are tools available that let you convert between grammar file formats (Trang is a popular one). Most such tools know nothing about DITA, and don't care about DITA modularity. Use one of those, and you'll generate a single XSD file. Anticipating your next question, no, this does not fit the specification's definition of a "conforming grammar file". But it will still result in conforming DITA documents, and should work with any DITA tool that supports or needs XSD. On the plus side - it's a generated artifact rather than a new set of source. If you ever need to change your specialization, change your DTD, re-generate the XSD, and you're good.

I've heard RelaxNG will save me. Does it help here, or make things worse?

I've heard that too. But the issue is the same.

Modular RNG grammar files make creating specializations easier. But a lot of tools still require DTD or XSD, so you may need to create (or generate) that format.

In that case, you're in the exact same situation as above. And again, I'd suggest - generate the DTD or XSD as a single file, and ignore the specification rules about modularity for whatever isn't your single-source format.

Doesn't OASIS generate modular DTD and XSD files from RelaxNG?

Yes, but that's different. As explained, modular grammar files are a best practice. Anything OASIS delivers should follow OASIS-approved specification-described best practices.

Also, at a more practical level, a lot of people still just use one format. In order to upgrade any specialization modules or document type shells created with DITA 1.2, OASIS needs to continue providing the core pieces as modules.

OASIS did use a tool to generate modular DTD and XSD grammar files. As of this writing, that tool doesn't quite work with non-OASIS grammar files, and doesn't quite work with a few edge cases. In the future, that could be an option for generating modular DTDs or XSDs from your own specializations.

So, you're saying it's OK for me to do this, but not for OASIS, because OASIS says don't do it?

More or less. The DITA TC would look pretty silly if it filled the specification with rules for modular grammar files, but didn't follow them.

But if you want to do it, eh … your tools will work. They might even work a bit faster (remember that non-normative note about optimizing document types). And any DITA content written using a generated grammar file will still be valid DITA.

If the tools will work, and my DITA is still valid / compliant, then why is the specification so insistent about this?

DITA 1.0 and DITA 1.1 each had two paragraphs about modularization of design, and one topic each about how to code DTDs and XSDs. As DITA features got more complex, many more rules were added in DITA 1.2. Once rules are added, it's hard to take them out.

In DITA 1.3 we tried to simplify the rules, removing MUST rules where possible, but the rules are still based on DITA 1.2. For anybody trying to follow the modularization best practice, they're absolutely good rules and good content. It's just … I feel they're a best practice. Having them take up almost 40 pages of the specification is misleading and gives them too much weight.

Are you saying they shouldn't be in the specification at all?

I have to be careful here, because it is a topic likely to raise strong feelings. [See all of the above.]

I think these rules are a best practice - but violating them should not affect interoperability of content. I don't think the specification should create rules like these when interoperability is not an issue. I do think the current rules provide a necessary guide to creating modular shells, and that the content should be provided by / owned by the OASIS DITA Technical Committee. I just think it would be better off as a separate document, only required reading for those few who have a need to work with DITA grammar files.

Will that happen in DITA 2.0?

My crystal ball broke in 2008 when I used it to predict DITA 1.2 release dates.

Are there any down sides to generating a non-modular format?

Yes, but I think they're unlikely. If you only work in RelaxNG and generate a non-modular DTD, then you can't share your modules with somebody who only works in DTD and generates non-modular RelaxNG. It's possible, but I doubt that comes up much. I'll admit there could be other edge cases I haven't considered, but again, they're probably edge cases.

Anything else you'd like to DITAsplain today?

Don't make up words.

But really, that's what you're doing, right?

I think we're done here.

Did you really just use a <dl> for questions and answers? Is that semantically appropriate?

I said we're done.