Using Warehouse Files to Improve Findability hero image

Can a comprehensive index be created in a modern DITA XML publishing environment made up of hundreds of tiny files? And how can an editor and indexer be incorporated into the DITA environment for better consistency and findability of content?

For your content to be found, you need to ensure your content is indexed and consistent across all deliverables and channels. This means taking advantage of structured metadata and keyword research and the application of consistent terms to boost the findability.

This presentation will show one company’s use of warehouse files to bring consistency to index terms and their application to a body of content–a series of published reference books. Learn how a good editing/indexing team, who adopts the same strategies and techniques used by authors, working in modern DITA XML dynamic publishing environments, bring new and improved value, consistency, and efficiency to content findability.

About Liz Fraley

Liz Fraley, CEO of Single-Sourcing Solutions, is well known for her advocacy of defining requirements. She has founded two companies, sits on the boards of three non-profits, and is constantly coming up with new ways to share knowledge in the technical communications and content industries. She has worked in high-tech and government sectors, at companies of all different sizes (from startups to huge enterprises). She advocates approaches that directly improve organizational efficiency, productivity, and interoperability. If you ask her, she’ll say she’s happiest when those around her are successful. Her first book, “Arbortext 101: Best Practices for Configuring, Authoring, Styling, and Publishing with Arbortext,” is available on Amazon. She has several more planned.

Presented at:

View the Slides

Transcript (Expand to View)

[0:00:00.00] OPENING MUSIC

[0:00:12.05] Liz Fraley

Hello! I'm Liz Fraley Welcome to my virtual session on using warehouse files to improve content findability. Reuse is the #1 topic requested in our TC Dojo webinar series. This is one reuse method that we don't always think about and I'm thrilled to share it with you today. Normally, I like interacting with an audience and hearing about their experiences. So, if you are so inspired, send me your comments, your stories, or whatever comes to mind. And, in the future, this will be a better session for all of us.

[0:00:47.21] Liz Fraley

Again, I’m Liz Fraley. The main thing I think you should know about me is that I love creating connections with people and sharing knowledge.

I have a BS in Computer Science, and an MA and BA in English. I read close to 200 books, thousands of articles, blogs, news, and mailing lists, every year.  I sit on the boards of and volunteer for 3 non-profits, one of which I started.  I'm always searching for new ways to increase the knowledge of the professionals in the communities I serve. 

For me, it's all about empowerment. I may not know the answer, but I always know where to get it -- or at least how to find it. Think of me like the librarian behind the research desk at the library.Presentations like this are my way of bringing people together to share knowledge, ask questions, and enable each other. It's how I learn too. You know you understand something when you can explain it to someone else

The link in the lower left hand corner https://bit.ly/stc21-fraley is the public event page for this presentation. These slides are already on that page as are many other related, technical resources.  We know that you want to learn and so do we. So be sure to visit that page. While this video won't change, that page will always be current.

[0:02:04.28] Liz Fraley

For those of you who don’t know me, I'm with Single-Sourcing Solutions. We've been working with single-sourcing, multichannel publishing, content architecture and reuse strategies since 1999. In fact, I gave a presentation about my first single sourcing project at the same SIGDOC conference where Michael Priestly was in the early stages of introducing DITA to the world at large.

We've always been a little different than other consultants.  Our dedication to growing the expertise and influence of our writing professionals is manifest through the growing list of public works projects we've developed and supported over the years. The TC Dojo is a webinar series that's been running for more than 8 years where the topics are chosen by the community voicing their needs. Room 42, our latest project that launched last year, connects academic research to the daily lives of writing professionals. We do these public works projects because we love learning and it's part of our nature to empower others.

[0:03:12.11] Liz Fraley

I am a published author. All of these books were written to help others gain knowledge and confidence. I'll be talking about two of these three books today because they are an example I'm allowed to share openly.

[0:03:24.06] Liz Fraley

There's a lot of content that gets created every day for products, brands, and companies. Content publications teams across the enterprise work in concert to explain complex technical content to customers (both current and potential) in a way that is easy for them to conceptualize, understand, and adopt successfully

Despite the pandemic, content producers are working harder than ever. The demand for content increases daily. And the work to produce a consistent voice, consistent branding, accurate information, in a dozen formats and a dozen languages -- while still meeting every new content delivery requirement out there -- is only increasing

And it's not happening in isolation: 

  • Increasing product complexity
  • Increasing number of languages
  • Increasing volume of content
  • Increasing number of content outlets
  • Increasing product overlap
  • Shorter product life cycles 
  • Shrinking resource allocations

It's only getting harder. As we become ever more digital with ever more places where audiences for our products gather, we find other content creation groups are struggling the same way we did. Marketing content writers are facing the same challenges that technical communicators faced dozens of years ago, in slightly different ways.

[0:05:03.19] Liz Fraley

For technical publications, this is what we faced. Small changes would trigger massive amounts of rework. You can see the yellow box and how it appears in multiple deliverables. When engineering changes something, the user guide, the quick start poster, the sales brochures, and other product content all also need to be changed. Authors copy and paste or edit - over and over - pushing the same changes everywhere that's affected. That almost seems manageable until you realize... this is just the tip of the iceberg

[0:05:41.06] Liz Fraley

Because all of that collateral gets delivered in multiple formats. Now, I know that says CD, but these pictures are from 2002. At the time Hamilton Sundstrand had some very specific high security and dis-connected requirements for their final deliverables. Still, you can use your imagination to change CD to something more relevant to you.

After all, you're delivering to many different formats, channels, and media today. Video? Knowledge base? Mobile app?  Substitute CD for any of those and then extend to all the different content target channels you deliver to. All at once this picture seems almost typical. And yet...

[0:05:35.14] Liz Fraley

This is what they were really delivering. That simple picture was just One Product. .. Think of your content in any single dimension you're managing -- language? Product? Product family? Device? OEM? Whatever. The picture starts to get really complicated and the need to reduce the effort required to change the content in that single yellow block is significant. Techpubs teams no longer work in content silos. That yellow block has been extracted and every deliverable you see in this picture is built - like legos - and delivered automatically. Changing any single reusable content bit in the primary source content and then simply  re-issuing the replicas sees that change populated across all of them. A huge reduction in work and a huge cost savings in time spent

[0:07:34.05] Liz Fraley

Techcomm departments do not get the big money in an organization. They've had to do a lot more with a lot less. They've been forced to  figure out how to make many small efficiencies add up to greater gains. They adopted processes, tools, and content strategies that let them leverage content in such a way that they did not have to manage things by hand. There just isn't enough time. One by one, as more techcomm departments started looking at their content strategically, they found more and more reasons to employ what we call single-sourcing systems. Because inevitably, techcomm being last on the funding list, they had to make due, meet their responsibilities and make their deadlines, faster and faster with less and less resources available to them. It's not uncommon to find 2-3 person techcomm teams managing hundreds of thousands of quote unquote pages of product-related content. Over the years, they've shed deliverables they could -- like the indexes I'm going to talk about -- but the skills and expertise are still there for any content practitioner in the enterprise to leverage.

[0:08:49.27] Liz Fraley

Everyone says  no one needs indexes because no one uses paper.  Forget paper. We Search! And everyone knows indexes are a wholly paper thing. An index is that thing at the back of a printed book! The truth is that indexes are far more than that. Yes, we've all been trained to search, but search needs keywords, metadata, access points -- in order to return results. And, I'm willing to bet that none of you live on search alone.

[0:09:28.20] Liz Fraley

Ever try to take away someone's ability to create folder structures in their file system and have them only retrieve things by search? Content management systems are already taking steps in this direction, but users are still very uncomfortable without having even a smart folder view of their content repository. Do you really think you could live with only search? Before you say yes... Do you have folders on your computer? Do you insist on "filing things" ? Do you find yourself "browsing" for that file you want? I like to think I'm organized, but what does organized mean in a search-only world? Which fridge would you rather use? Can you tell anything about who filled the fridge and how they operate based on these pictures? Which one searches? Which one browses?

[0:10:31.21] Liz Fraley

All kidding aside, we all know search is important. Information that can't be located might as well not exist. You want your content to forward the organization's goals, to serve customer needs, to increase their engagement so they stay customers. Making people feel like they have to consult a psychic to find your content -- that's not what anyone wants.

[0:10:59.08] Liz Fraley

Yes, people search. Mobile and web information consumers have been trained to search over the last couple of decades. But look how SEO guides describe how search engines talk about listing your pages. Google calls what it does "indexing". They're making organized lists of relationships between keywords and destinations, so they can deliver results in response to someone looking for those keywords. Back when Yahoo started, it was one of the first search engines. And they hired a bunch of librarians to index the web. The librarians cataloged, grouped, came up with keywords. And yahoo surged because they had these categories people could browse. They thought that browsing through this big index was how it was going to work. After all, what are indexes?

[0:12:00.21] Liz Fraley

Indexes are pointers. They're lists of keywords. They're maps of content. They identify key themes and ideas contained within the content. Indexes are generally an alphabetized list (vs hierarchical outline). One of the things they do well is act as benchmarks to level-set audience contexts. We have pointers because sometimes search can fail: Ever wonder what you called that file? Ever wonder what folder you stuck it in? Or if we're talking about web bookmarks: Ever wonder how you tagged it? Sometimes you're forced to go looking..

[0:12:47.23] Liz Fraley

The bookmarks we've saved act as our personal pointers - pointers to things are significant in our context. I love this quote by Barbara Quint:  "If you don't index it, it doesn't exist.  It's out there but you can't find it, so it might as well not be there." Exactly. It isn't there IF it's not findable. Indexes are not just a problem for printed books. The skills that go into making traditional indexes are the same ones that keyword researchers use. And managing those keywords are key to providing a consistent view into your content for your customers.

[0:13:37.12] Liz Fraley

Have you ever gone searching for a site that you were sure you bookmarked only to check everywhere you think of, everywhere you think you might have stored it and still not find it? Then, you search for it on Google, find it, go to bookmark it only to discover that you HAD IN FACT bookmarked it before but  have no idea why you tagged it the way you did? No wonder you didn't find it! This is what happens when we don't strategically design our information architecture - and we fail to include keywords terms in that architecture.  The ad-hoc tagging results in just as big a mess as writing without a plan.

[0:14:23.01] Liz Fraley

Now I'm going to switch contexts on you and change our vocabulary. I'm going to talk about index terms (keywords) and indexes ( an organized list of keywords that are ordered according to some overarching organizing principle and that reflect the content we'll be tagging and categorizing )

[0:14:50.08] Liz Fraley

Deriving your index is going to take a full content audit, a thorough examination and deep understanding of your company brand to develop. After all, the terms you use represent the way you address, reference, and catalog your content. It's the corporate primer that helps outsiders learn what to do and how to do it. And it's going to take work to sort through everything, to understand truly how you want to organize and present yourself to the world. This is where a professional can help - an indexer, an editor, or someone trained in library science -- who can act as the liaison between the author and the reader. They will sort through the vocabulary of the author and the vocabulary of the reader and develop the list that acts as access points into your content.

[0:15:49.01] Liz Fraley

They'll take that pile of content and sort it into neat little boxes. They will place your content into a context for your content consumers. Ultimately, your index will provide guidance -- a framework and a structure -- for your content writers as well . You certainly don't want them to make up unvetted terms or use existing terms inconsistently. Is it a bolt or a screw? Do we group things as 1" or 2" or do we just care about "under 2" ? Or are we grouping by tool - philips, flat head, or wrench? What is the keyword that matters in your context? Whatever it is. You want your team of writers to tag things the same way every time and  no matter what day it is, who's writing, or what kind of day they're having so as to preserve the integrity of your corporate voice and corporate brand. A proper index identifies key themes and ideas and groups similar concepts.  It removes undifferentiated subterms and carefully takes control of vocabulary

[0:17:13.25] Liz Fraley

Usually, this is what writers do. They insert the index terms right into the topic. Either in the metadata and keywords or spread throughout the topic. Either way, they are typically authored directly into the text. This method introduces the potential for copy-and-paste errors and replicated effort (remember those yellow boxes?) But, more concerning, is that they may not apply or include terms consistently.  Is the term plural or singular? Is the term the primary term or is it the secondary term? Do I even remember there's a term that needs to go here? Working in a small piece means you don't see the whole picture and there's a churn: Edit, test, fix, test again, repeat. Repeat, repeat, repeat. That's both frustrating for the authors and expensive to do. Not only do you have to proof the final index that gets generated over and over, but it can be difficult to reconcile exactly which tiny content unit was the one with that one term that went wrong?

[0:18:27.08] Liz Fraley

It's far better to include your terms in your overall architecture strategy and to create one or more warehouses for your index term management. We found that applying grouping principles to our index terms made them easier to manage. This is an example of our Product Branding Index Term warehouse. All index terms that have branded or trademarked terms go in this warehouse. They're organized into logical units that match the work habits of the content creation team. For example, this set includes all index terms used anywhere in our content that contain the word "Arbortext" without any other product modifier  Arbortext alone is the product family but the phrase "Arbortext Editor" is a specific product name and those entries are in a different grouping.

Now I haven't talked about how to do keyword research or architect the categorization strategies for your content.  That's a whole different talk -- I've talked a lot about both topics over the years because it's something that we care a lot about and work hard to do well. In fact, we work with a reference librarian whenever we do that kind of customer training. However, if you want to learn more, I've got the link at the bottom (bit.ly/convex20-libraryscience)   But for the purposes of this session, I'm going to assume you know your reuse strategy and have already divined your key organizing principles. And that's all I have to say about that. Let's get back to where we left off...

[0:20:23.01] Liz Fraley

Once you have your warehouse, it's a simple matter to insert the terms via the con-key-ref mechanism. This mechanism pulls in the terms - by reference - directly from the warehouse file. Rather than being authored in place. You get all the benefits of reuse here. You avoid copy and paste errors. And if you need to change the term? You can correct it in the warehouse and it's fixed everywhere it's used. In addition, you can reduce one of the frustrations for the authoring team. They won't have to track down the place that one of the terms wasn't like the others, because they're all the same term - they'll all be wrong the same way. Fix it once, fix it everywhere! Great! That's one frustration point down. What about the other one? That's still here. In order for someone to edit the index - to verify that the terms have been applied everywhere they should be, they still have to open each topic individually and insert the relevant terms This is also a very time consuming process. Find file. Open file. Locate term. Insert term. Regenerate, verify, repeat. Repeat. Repeat. Repeat Luckily, there's something we can do to mitigate that too.

[0:21:48.05] Liz Fraley

Editors have long struggled with working in XML environments because it's difficult to get a holistic view of the content deliverables because they're  made up of many tiny constituent pieces. Part of an editor's role is to look at the whole picture, to judge consistency at a higher level. The global rather than local level. And working in pieces, it can be just as hard for editors to do what they need to do as it is for authors to work index terms into all those pieces

[0:22:23.11] Liz Fraley

We make use of the Resolved Document for Edit  or  the R D E to get a holistic view of all the content in context for any specific deliverable configuration. You can work on all the little units all at the same time. All the content is fully resolved and visible. It's all editable. And the boundaries between them are clearly delineated. (See the dotted lines?) It makes the Editing job a whole lot easier. My own editor nearly quit when faced with having to edit a far larger book using only XML tools that didn't have an RDE feature. This way made it incredibly easy to insert tags, review the entire deliverable, and normalize any odd or outstanding issues. In fact...

[0:23:22.12] Liz Fraley

The Resource Manager gives me a way to reach directly into the content management system and look into the relevant warehouse file and apply search to the warehouse contents. In the search box I type in (.dcf). The search results winnow down the options in real time. And I can pluck out the exact index term I want and insert it right into the source where it belongs. The same mechanisms that work generally for conref and conkeyref work brilliantly for index terms and, when combined with the RDE, you can drastically reduce the frustration level of your team -- and the time spent doing everything they have to do.

[0:24:16.17] Liz Fraley

One of the other things we did to ease the editor's job was to create a Comprehensive Index. In this case, the deliverable construct was made up simply of index-term warehouse files. There was no other content in the deliverable. Because all the warehouse files are included, all the index terms we use throughout all our content are combined into a single, holistic, and comprehensive view. This makes it a lot easier to review - especially as terms get added over time - and easier to edit as well. You don't have to track anything down, since you can work in the RDE source and you can see everything you want to see. It was an unexpected benefit that paid serious dividends when reconciling the overall architecture with the needs of specific content deliverables.

[0:25:40.22] Liz Fraley

One last story before we go. Sears was originally a mail order catalog company and, as late as 1990, they were the largest retailer in the US. They were particularly known for their extensive catalog. You could even order eggs from Sears that you could incubate into full grown chickens. With a huge catalog, the index becomes very, very important.

In their consumer guide, they had a note: "If you don't find it in the index, look very carefully through the entire catalog." The quote stands out to me for two reasons: (1) it's great that they acknowledge they're not perfect. That kind of humanizing position was ahead of its time; but, really, this advice is the last thing you want. Your customers don't care enough about you to read every single piece of content carefully to find the one thing they want. They'll go elsewhere - after all, that's what we all do. When we can't find it, we go looking for it a different way. It's not a treasure hunt when you're under the gun and there isn't a million dollars waiting for you under the X.

So, include your index terms -- these are your key terms -- in your strategy. You'll thank me for it later.

[0:27:03.20] Liz Fraley

Alright, Thanks for attending. I'm Liz Fraley. Be sure to connect with me on LinkedIn! You can find these slides at the link on the top of this slide: https://bit.ly/stc21-fraley

Send me your stories, your experiences or anything else that comes to mind. Learning is best when we connect with it and share with each other. And I'm looking forward to learning from you.

Thank you!

[0:27:40.16] CLOSING MUSIC

Where to find the video

Members of the Single-Sourcing Solutions Mastermind Groups can access the video, sample code, notes, collateral, and any other event material free on the members website.

ConVEx attendees can watch the video via the conference site through September 2021.

STC Summit attendees can watch the video via the conference site through September 2021.

You might also be interested in...

DITA output can be pretty

Liz Fraley, Single-Sourcing Solutions

Styling DITA in Arbortext is easy

Liz Fraley, Single-Sourcing Solutions

DITA was developed on Arbortext

DITA Resource Center

Key concepts

dita, productivity, reuse

Filed under

CIDM, Presentations, STC

%d bloggers like this: