What is “DITA for the Web?”

“Does Don Day know about DITA?” That’s a skills endorsement question that is likely to pop up if you visit my profile page at the business-oriented social service, LinkedIn. �Beyond that quixotic thought, consider the more important question: “What does Don Day think about DITA?” In fact, what I can tell you about DITA may help you understand this XML markup standard in a much different way than you may have expected.

Mainstream DITA

The DITA XML standard (DITA stands for Darwin Information Typing Architecture) has become a popular markup standard for any company that needs to manage content as a corporate asset. They see information as having business value, just like programming code, customer lists, or other software assets. Content as business collateral requires reviews, translations, sign-offs, and scheduled integration with product release and fulfillment operations. It must enable producing new revisions or deliverable formats at any time. Often, the content and the process must track to quality and operational certifications. These requirements describe a fairly well understood framework consisting of XML-based end-to-end tools with a workflow orchestrated by a back-end Content Management System (or some equivalent process, whether manual or automated) and one or more production/fulfillment systems.

As pervasive as that particular use case may be, it is by no means the only manner in which the DITA standard can be used. The DITA standard appeals to some companies because of its ubiquity (it has large communities of users, trainers, and providers of tools and services) and because of its extensible architecture (its base design can be forked for new topic types or extended for greater structural and semantic specificity, all with guaranteed fall-back behavior in editors and processing tools^{Footnote 1}).

Non-mainstream DITA

In that vein, I’m aware of DITA having been adopted in these non-mainstream ways:

As a transport format in help authoring tools
As a report generation format for database query results
As a formal and extensible way to represent business rules (for example, driving review workflows)
To gather any kind of written contribution in a format from which more intentional use can be made
Using the architecture itself to convert existing DITA topics to more semantically-enriched document types (the principles of generalization and respecialization)

But for me the most interesting, non-mainstream use of DITA is its potential as a standard form of XML behind many kinds of writing or text-based application on the Web, replacing or extending HTML as the de facto source format. I call this usage DITA for the Web.

Defining DITA for the Web

DITA for the Web is the solution to a big problem that HTML content strategists are having lately. In some ways, it is much broader than what DITA has been used for before. To understand the What of DITA for the Web, we should really begin with the Why of the idea.

DITA for the Web follows the “X for the Y” naming pattern, which we can parse in this way:

Y presents a need or challenge that can be solved by X,
But solving Y with X will take some creative thinking, even if the eventual solution will seem obvious to all.

The “X for the Y” pattern gave birth to an air conditioner for your car, a stereo for your pocket, and hundreds of other inventions that we encounter every day. So how does this relate to why we need DITA for the Web?

As hinted at, the Web faces some serious challenges right now as developers and content strategists try to figure out how to meet consumers’ demands for immediate access to relevant content on any device they choose. Various projects have pioneered approaches to the problem with some success (for example, the Adaptive Content and Responsive Design strategies for Web content), but frankly, none have succeeded well enough to suggest a sustainable best practice or repeatable model.

Coming back to DITA’s popularity with the Technical Publishing community, the DITA standard has become practically synonymous with technical publishing, whereas it can be much more than that. To truly understand how DITA applies to problems like web content delivery, we need to think a little bit outside the box.

Teasing out the meaning

The beauty of the long tail.

I have been working on different aspects of “DITA for the Web” since I first began developing an IBM-internal project called the IBM DITA Wiki starting in 2008. These are some of the long-tail use cases that I have explored and presented on in my journey to understand where the DITA standard fits on the Web and how we can improve our use of it there:

SME knowledge capture here 2008 and here 2009 and here 2009
As an architecture for designing an Emergency Response Playbook scenario, Congility 2011
As a tool in Knowledge Management scenarios, Intelligent Content 2011
As a capture format from Word and Twitter sources, From Intelligent Wiki�s to Intelligent Tweets: �Yes You Can Create Intelligent Content�, Intelligent Content 2012
As a crucial component in any dynamic publishing strategy with personalization as a goal, Lavacon 2012
DITA makes HTML5 smarter (DITA North America 2013)
DITA for Presentations (dual role content in topics)
It Was With You All Along: Adaptive Content in DITA, SVDIG Users Group, Spring 2013
Involvement with the current OASIS activity to define a Lightweight DITA to encourage adoption in new markets
The expeDITA experience, which I’ll cover in more depth.

To round out this perspective with how others have also been promoting “DITA on the Web” using responsive application designs, I recently began hosting the mobiledita.com “responsive DITA” showcase.

I plan to reprise the lessons from these explorations in this “DITA for the Web” category on the DITA per Day blog. These topics should be an authoritative resource on what DITA is capable of doing in a direct-to-the-Web context, and help to discern some best practices for both application designers and for those who create DITA content for direct-to-the-Web use (generally mapping one managed “page” of DITA content to one Web-based address or role in a Web application). In effect, the idea embraces the blog and wiki concept of a single database managing the viewed content directly via dynamic rendering rather than the conventional source CMS behind a firewall with static copies of content served from distribution databases

By keeping that distinction in mind, we can have separate discussions about “DITA for Tech Pubs” where the appropriate platforms for that use case are already customary and necessary. As I started off explaining, these are the cases of using desktop- or Web-based tools for managing DITA materials in a controlled workflow for publication to particular conditions and audiences, usually as PDF or HTML deliverables that are often installed with code or served from infocenters or help systems, and less often as standalone Web resources.

Note that the web site for my consulting company, Contelligence Group LLC, is in fact based on the expeDITA toolkit as the live-rendering platform for DITA content. We can look forward to exploring the use of expeDITA in other web sites, wikis, blogs, forums, web applications, and CMS content browsers.

________

^{Footnote 1}: “Fall-back behavior” refers to DITA’s extension mechanism being based on subclassing of existing elements and their content models, which means that the styling or processing of any newly specialized markup will always fall back dependably to that of its ancestor class in the absence of more specific styling/processing. In effect, the DITA architecture enables reuse of both existing design patterns and existing processing for any new specializations.

7 thoughts on “What is “DITA for the Web?””

Scott Abel says:

May 23, 2013 at 8:35 pm

Don:

Great idea. I’d love to help you get this message out to the masses!

Scott

Reply
David Farbey says:

June 10, 2013 at 2:45 pm

Thanks Don. Recently Tom Johnson prompted a lively discussion on his blog by daring to suggest that structured authoring and writing for the web might not be compatible (though, to give him credit, he has subsequently updated his post in response to the debate). (see http://idratherbewriting.com/2013/05/14/structured-authoring-versus-the-web/ followed by http://idratherbewriting.com/2013/06/08/structured-authoring-by-for-and-or-nor-with-in-the-web/).

I’m not at all surprised to read that your view is that far from being incompatible, structured authoring with DITA for the web is something to strive for.

Reply
Rahel Bailie says:

June 30, 2013 at 3:21 pm

You’ve hit the nail on the head with this one, Don. I think we need to consider the most common scenarios within businesses and decide where it makes the most sense to make inroads:

Scenario 1. Assume that there is a robust Web CMS in place already, and it’s a given that you need to push transformed DITA out into the existing Web CMS. When I consider the processes, it seems like a no-brainer, but most of the technical people I’ve spoken with recently make it sound like rocket surgery (sic). Large companies that have a multi-million dollar investment in a Web CMS are going to see DITA as a feeder system to their main system, which is likely to deliver a host of other types of content in complex ways. But if the Web CMS code meets the Content Management Interoperability Standard, there should be no problem feeding DITA output into the Web CMS to be managed in a way that lets the Web CMS do what it does best and lets DITA do what it does best.

Scenario 2. There is no front-end delivery system in place, and this becomes a pure-play Web use case, where content is created in DITA, and transformed on-the-fly for Web display in some home-grown system. (Every time I hear that a large corporation is building their own CMS, I cringe – it’s like building your own automobile plant to produce a one-off automobile – but it happens more often than we think.) In this case, if they’re building custom code anyhow, there is no reason that they can’t build code for seamless DITA-to-Web delivery. (Except that this never seems to be the case, because of reasons like the core code wasn’t written that way, there is no project budget for this, and so on. But that’s practicality, not theory.)

Scenario 3: This is where there the front end is one of the basic systems – WordPress comes to mind, although at the risk of alienating my develop friends, I’ve seen Drupal implementations that would fit this category – where getting DITA content into the display layer would likely be painful.

Scenario 4: Another scenario is when DITA content has to be delivered into multiple places on the Web – such as into a Web app as embedded assistance, plus into a Web-based knowledge base, plus get sent to partner companies for integration into their Web product descriptions. It’s your combination of creating adaptive content to meet personalization needs, the need to integrate unstructured content, filterable content, and so on.

Can’t wait to work with you on a proof-of-concept project!

Reply
- vineeth M says:
  
  October 30, 2013 at 6:50 am
  
  Thanks Don for this great article.
  As Rahel Bailie said, I am in Scenario 1. So can you give an idea to handle this situation ?
  Thanks, Vineeth
  
  Reply
  - Don Day says:
    
    October 30, 2013 at 2:08 pm
    
    I think Rahel’s Scenario 1 best fits the simple case of making DITA resources appear to be HTML resources from the end user’s viewpoint, but requires more sophistication for the admin side of managing the DITA content. Conventional wisdom for most databases would be to drop the entire DITA document into a string field, but CMIS has a non-production-ready feature called FileShare Repository that looks very interesting: https://cwiki.apache.org/confluence/display/CMIS/OpenCMIS+FileShare+Repository .
    
    If this is what Rahel had in mind, you would associate your DITA file storage locations with the WebCMS application so as to allow those files to be indexed and searched in the same way as other HTML resources in the CMS. When a known DITA resource is requested, the server would use the mime type to shift into live DITA-to-HTML transcoding (ideally using a cached HTML version for performance). Some work would need to be done in mapping the DITA navigational syntax (xrefs, conrefs, links, etc.) into the application’s normal linking system (how the user interacts with resources through the abstraction layer) in order to make interaction with DITA content appear seamless.
    
    On the other hand, if you want to actively manage the DITA content with more awareness of the “DITAness” of the resources, you would need to specifically support interactions with topicrefs and conref targets, with DITA processing features (conditionality, indexes, relationship tables, etc.), and perhaps with more awareness of inner structure than CMIS normally exposes (but is found in the eXist database, for example). The various DITA middleware products like Componize and DITA Too specifically add this level of administrative interaction with the content, but this is normally required more for “DITA as technical publications converted into HTML” rather than “DITA as direct-to-web content.”
    
    expeDITA seems to fit into Rahel’s Scenario 2, by the way. I’ve tested the #1 approach but found that the limiting factor is trying to use the same HTML editor in the WebCMS for DITA content editing as well–it can work only for the simplest of documents. If you want full and validating DITA editing capability, you have to add a true DITA editor, and that adds to the complexity of the scenario.
    
    Reply
Jeff Conn says:

November 26, 2013 at 9:48 pm

I think we are still in the early stages of re-usable content that can be easily repurposed for use by sales, marketing, tech docs, customer service. Today, an enterprise can throw thousands of recurring dollars for, say, a SAAS content management system or database solution (e.g., SDL platform) and then more thousands of recurring dollars for integration with CRM like Salesforce.com service cloud. An SMB does not have the dollars, but still has to solve the problem. DITA, per se, does not solve the problem of multiple content stove pipes in an organization. If I have to choose between DITA or HTML/CSS, I fall on the HTML side because the Web is the common denominator and DITA appears to be harder to integrate unless the org has thousands of recurring dollars (licensed seats) a year to throw at it.

Reply
- Don Day says:
  
  November 27, 2013 at 7:14 am
  
  Your concern is reasonable, Jeff. Do the only choices truly come down to just DITA or HTML? Depending on the type of content, an SMB’s site that hosts primarily how-to type content, for example, may end up using a structured intermediate format to help facilitate a multiple-output publishing service, much like the popular iFixit.com site. I’ve been concerned that every such site has no common tools with which to build their application and therefore ends up developing yet another bespoke format/workflow/CMS. So I’m thinking strongly of how to amortize those development costs for all who can use a common platform based on a standard, interoperable content format. I hope that by raising awareness about alternative solutions that we might soon have something in addition to DITA and HTML5 as a workable choice for the SMB.
  
  Reply