The participants included:
- Erin (Telecom) who is in unstructured FrameMaker and wants to know how to planning the move to DITA ("it's exciting and scary"). She's also interested in contextualized content
- Eric (Services) says he's listening in so he can get a better sense of things
- Cecile (Healthcare, from France) is moving to DITA from XML and she's wants to talk about reuse strategies given that she expects that her XML modules will become to multiple DITA topics
- Abbie (Services) is here to learn
- Liz M (Services, Abbie's coworker) is also here to learn
- Bob (Medical Device) is looking for questions
- Ed (High Tech) is also moving from XML to DITA but he wants to be sure they're doing it for the right reasons. He's also got a legacy data concern and seconds the request to talk about reuse strategies.
- Renee (High Tech) is tasked with fixing a DITA environment. She wants to talk QA, processes, staffing roles
- Moderator: Tracy Baker from F5, DITA early adopter
- Moderator: Me, Liz Fraley from Single-Sourcing Solutions and the moderator of the TC Dojo Mastermind sessions.
Tracy Baker is a regular at the DITA NA conference. She's from F5 and started her road to DITA in 2009. She's an Information Architect, a writer, a tools person. She wrote the Information Model that F5 still uses today. In fact, only two elements have been added since she wrote it originally. She says it's been a long road and it's only this year, in 2015, that all the documents published for all the products have been 100% DITA. She's worn a lot of hats but warns that "you can't wear all the hats. You can try. You will die."
Together, Tracy and I moderated the discussion and what follows are notes from the conversation. Tidbits, tips, suggestions, and advice for the folks at the table. One-on-one time for them to get their burning questions answered, their comfort levels...well, leveled.
The first step in any DITA project should always be the Content Audit or Content Inventory. One quick way to accomplish this is to do a big extraction of all the Head1s an Head2s in your content set. Sort them alphabetically. What do you find? How many titles are so similar such that they hint at being the same topic? Now you've got somewhere to start to see what really is similar and where the reuse is, where the copy and paste happened over the years. Your gut may tell you that you've got a lot of reuse, but an audit will give you the evidence of it.
Where does duplicated content come from? Well, one way it often happens when you just shove your content over whenever you change tools. You always think you can fix it later. But we all know that "later" never comes. In fact, one of the attendees mentioned that in her environment, they have a wiki. And that wiki is pretty much the wild west. They found out when they pulled all the content out to put in their published documentation: They found they had 700 of the same topic because it had been repeatedly added (with maybe slight differences from the other versions), by different people over the years. The content had gone wild with no governance provisions built in.
While ungoverned contributors can be a contributing factor, duplication is guaranteed if you simply pull all your content into DITA. Whenever you just shove content from one tool to another you are guaranteed to get what Ann Rockley has called the "same mess, different tools."
Sometimes you've got to take a step back and look hard at the content. And sometimes that means you're going to have to say some tough things to your content contributors. You're going to have to help them get over "Book Brain". You need to help them understand that it's OK to say "goodbye" to that content they created, including the one they missed a cruise for and worked all weekend to get written.
Bookbrain is hard to get rid of. But if you help your team understand that you're doing this for their future audience -- those millennials who have, this year, overtaken boomers in the workforce -- they will start to understand. Remember, those millennials don't read front to back. They search into your content for the very specific thing they need to do. Your content contributors need to author for that audience, not their own comfort. Repeated/spread out content makes find-ability an issue. It tracks directly back to a case of "bookbrain."
DCL will tell you that there are three ways to move content into DITA - shove it over (same mess, new tools), reauthor everything (time consuming but gives the best results), or prioritize (high touch, high maintenance docs first, and some maybe never). Kyocera's printer division is at 89% reuse because chose to reauthor everything when they moved from unstructured content to DITA. Not only did they get the tagging right, but they could reauthor content with the DITA model in mind. Content that was originally written in a more free-form unstructured way became uniform, reusable units that pushed their reuse numbers as high as they could possibly go. Reauthoring your content will not only improve your reuse numbers, but it will reduce book bloat and save a ton of time for maintenance tasks later on. ("Did it get fixed everywhere?")
But how do you get your day job done all while planning the move to DITA? First, don't migrate your content. We've already talked about the dangers of shoving content over and the benefits of reauthoring for your new content model. But, in addition to that, you need somewhere to get your feet wet and get experience with DITA, topic types, and the information patterns that appear naturally in your content base.
In Tracy's case, she started with hardware books that were already very, very similar. She took a 200-page book and reduced it to 60 pages. Most of that can be attributed to reorganizing and gathering content that had been spread out all over those 200 pages. It gave her a good base of content. Once that was done, she could cherry pick the content (what she calls "content shopping") and build the next book. When it came time to do more difficult books, she started with a book that was was going to have all brand new content. In both cases, she started with a good base of content to fill the content store and every book that came after was curated in: reauthored, discarded, or replaced with content from the existing content store.
You can absolutely cherry pick the content to bring similar docs together and to guide your path in choosing books to move into your DITA store. The trick is to make sure you create a storehouse of good DITA content that you can now use in other places. You might as well do it up front, you can't get out of doing the thinking part of the work. Doing it up front results in a higher quality, more reusable, content in your content store.
Now, if you're going to get your day job done while you do all this, you can't do it alone. Hand pick some teammates who will do it right, from the start. You're looking for teammates who are the innovators, who are well-respected by others, who are excited for the move because they also happen to be choking on their workload. These are the people who will help you get started with solid, well-constructed content. They'll help you figure out the information patterns in your content and get you good content to fill your content store. You'll want these people later one to help spread the message.
Once you're ready to begin, you've got a couple of helpers and a couple of books picked out, where do you start? Do the Tasks first. They're the easiest. Do the Concepts last, and take a cold hard look at them. Concepts are where authors typically have the most self-investment. Once the Task is done, look at the Concepts as being in support to the task - what is it? how does it work? - and think about the audience. References? Well, they're relatively easy and can be done as you go. What do they have? They get the data that your audience is going to scan for specific details. It's pretty easy to identify and separate out from conceptual and task information.
Why save concepts for last? Well, it's the conceptual stuff that has the highest variance. It's here that you will work hardest to normalize the content and aim for consistent organization. For example, when Tracy took that book from 200 pages to 60, she found that there was a lot of repeated information all spread out. She also found that most of that was the writer not looking at the scope of the document, thus repeating information in several places. That not only increased book bloat, but made maintenance more difficult as the content grew over time.
How do you untangle content? Well, remember that each topic should answer one question well. "What it is" and "How it works" are two different questions. Figuring out the question is the hard part. Keep your eye on the goal. Find the pattern. Do the upfront planning and think it through. And don't forget to go content shopping before you start writing. (If you find yourself with a big topic and don't think you can separate your content into the basic DITA information types, and unless you're doing translation, then maybe DITA isn't for you. This might be a time to reevaluate why you're going to DITA.)
One thing to keep in mind is that work is never done. With the constant customer feedback and the constant development, you will always be going through the process of shopping for content, curating, and normalizing your content store. You'll always be cleaning up, changing or deleting old content. After all, your content will be going through constant iterations. So, make sure you never tell management that one day you'll be done. You won't. You can have milestones along the way, but you won't ever finish. So be careful how you frame the milestone language.
OK, now that you have some topics, what next? It's time to build some maps. Think of the bookmap as the "on the shelf item". Then look to using mini-maps or sub-maps to serve other purposes.
Maps can be extremely useful for writers as they're developing content. Maps can be used to organize content along functional areas instead of features (features change) and writers can track what they're working on without worrying about the final deliverable. In Tracy's case, they use a mini-map for every chapter. They think of the map like a Swiss Army knife: Which blade do you use for what? Maps are used for gathering information about something specific.
Early on in her project, Tracy was the Information Architect. She made the bookmaps for the writers. She did a lot of training and brought the rest of the team up to speed a little at a time. First, she had them focus on writing the content: it's their wheelhouse, it's their first home. Then, for their first few of maps, she would get together with the writer and the writer's manager and they would all build the map together. Six years later, authors only come to her if they've got a tricky situation. She still keeps an eye on the content and reviews publication and topic structures at production time. She fixes things when they're broken and always circles back to the writer to provide feedback to help them for the next time.
What else? Take advantage of tools available to you. The QA checker in the OT can look for specific business issues in the content. Adding Schematron can also do that freeing up editors and peer reviewers from looking at tagging mistakes so they can go back to editing content. Schematron is available to most every tool. It's the default standard way of implementing business rule checking in XML content.
Even with all this, you will still need to watch out for those team members who say "Yeah, yeah, yeah" and then go back to doing their own thing and try to game the system. Decades ago before the OT or Schematron, Sun's SunBook implementation had extensive QA tests that happened at production time. For example, if the publishing routine saw two titles in a row with no content in between, it would flag the book as being in error and send it back to the writer to fix. Schematron can help automate this kind of testing for business rule compliance.
For review, most people are still using PDF. Some use Acrobat Shared Review which gathers review comments in one central location. The problem with PDF review is that it keeps writers and reviewers married to the PDF even when your customer aren't using it as their primary information source. PDF is comfortable and safe, and you'll be surprised who will fear stepping away from PDF. (And you'll find these people in all parts of the organization at all levels in the management chain.)
Companies with extensive online help are thinking about hosting web help internally and using the review and comment feature to capture comments. You want your reviewers to be consuming content in the same way your customers are. And if your customers aren't consuming your PDFs, then likely your reviewers shouldn't be either.
What about the Editing staff? How do they do their jobs? Most often, editing staff works directly in the XML authoring environment. They put comments directly into the XML. With automated business rule checking, editing staff can focus on the content and not worry about tagging or layout. Improving the quality of your documentation in a significant way.
Resources mentioned during the discussion
- Mailing List: email@example.com (and historical archives)
- Book: "Practical DITA" by Julio Vazquez (easy to read)
- Book: "DITA Best Practices" by Laura Bellamy et al. (very accessible)
- Book: "DITA Style Guide" by Tony Self (not particularly readable, but a good reference)
Want to participate in conversations like this every month?
Ever feel like you are spinning your wheels trying to solve an issue? How valuable would it be for you to have a sounding board? The TC Dojo Mastermind sessions are unique learning opportunities where members talk about real time issues they are solving and share ideas for possible solutions. Chances are if you are facing something, someone has gone before you or they are in the same place. Masterminds meet once a month for an hour. These are Member Only sessions.