one man writes
one man designs
one man blogs

Archive of Single Source posts

 
 

Everything is connected

This post has been bubbling for the past year or so, ever since I started this blog. It’s a bit of a ramble but if I don’t publish it now I’ll just keep adding to it and it’s long enough as it is!

I question everything. It’s part of the way my mind works, and is something I’ve embraced and believe it makes me better at my job as a technical communicator. That attitude has also helped me realise that there is a common thread that can be found across several different areas of our industry, which I (and others) are slowly pulling together. Convergence is the word that springs to mind, and as businesses clamber onto the social networking bandwagon, now is an excellent time to grab the reigns and take control.

Let’s step back a little.

Late last year, on two separate mailing lists, I followed discussions about what the myriad of people who share my profession have as job titles. I prompted one discussion on the ISTC mailing list, and chipped in some thoughts on the TechWR mailing list before dropping out later on when the noise ratio, as ever, got too high.

I wonder how much useful information I miss when I do that? Ahhh something else to ponder. But not today.

Anyway, discussions around how we as a profession should be referring to ourselves, envitably leads to discussions and thoughts about what we do, where our skills lie, and the benefits we can bring to an organisation. Something I’ve toyed with before, but which is wrapped up in many layers of ifs, buts and other such caveats.

Following on from that, I read an article by Virginia Lynch in the CIDM newsletter (and if you aren’t subscribed to their newsletter, you should be) entitled Information Developers - The New Role of Technical Writers in a Flat World which encapsulates a lot of my current thinking on how to take my current team forward, making sure we are matching company strategy whilst allowing the team members to retain a focus on maintaining and developing their core skills. The article title rather neatly alludes to Thomas Friedman’s book The World Is Flat: The Globalized World in the Twenty-first Century which is certainly worth a read.

Virginia mentions that JoAnn Hackos recently referred to these core skills as “Basic Hygiene”, citing the fact that, regardless of how the collation and production, distribution and usage of information may change, as we explore the burgeoning arena of new tools available to us under the banner of “social web applications” our core skills remain. Typically they tend to drop off as we are pushed to create more, faster, with a rise in quantity favoured over a maintenance of quality.

style, grammar, punctuation, spelling, and even clarity seem to have been sacrificed for quantity —JoAnn points out that knowledge of basic writing skills is still critical to our success as writers. Basic Hygiene also comprises an understanding and appreciation of editing, the information development life cycle, fundamental web and computer skills, and of course attention to detail.

However it is important to note the nod towards quantity being a business leader, and those of us tasked with managing a team need to consider how we achieve that business aim, without impacting our integrity as Technical Writ… umm… Information Developers?

So, how do we produce more whilst maintaining quality?

Wait! What’s that coming over the hill? Ahhh yes, the shining white knight of single source, armour gleaming, his trusty DITA (or DocBook) in hand, ready to do battle against the ills of productivity measurements and over-zealous QA departments. What else were you expecting? Ohh more resource? No, not these days when everyone is a “content creator”, not these days when we should be embracing and encouraging our audience to help plug the gaps in our information dykes (I really must stop mixing my metaphors).

Topic-based writing certainly seems to tick the required boxes and every business case and ROI I’ve read (and I’ve written a couple myself) points us towards the promises found over the horizon and the “he’ll be here real soon, honest” arrival of the aforementioned white knight. The trouble is that, whilst it is easy to agree with the theory, I’m not all that sure the white knight is all he seems. Certainly as we climb the hill towards him, auditing our content, deciding on chunking levels, agreeing metadata requirements, we begin to see that that armour seems a little thin and dented in areas, and I’m not entirely sure the knight is filling that armour as much as he should. Aren’t they supposed to be big strapping warriors? He looks a little weedy to me…

Topic driven content written with a minimalist slant, deferring here to the instructions of Strunk and White* rather than Roy Carroll, are where we seem to be (need to be?) heading and that’s fine and good from where I’m sitting.

* A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid detail and treat his subject only in outline, but that every word tell.

On the flip side, there is a definite growth in awareness around the use of Web 2.0 technologies and systems, building online communities, integrating Wikis, blogs, RSS feeds into the information flow either as part of end user deliverables or as methods for encouraging information creation by everyone involved with the product, internal or external.

A large part of our job concerns the collation and filtering of information so as far as I’m concerned anything we can do to make the creation of source information easier has to be welcomed. Extending these mechanisms beyond internal usage means it should be easier to provide information to the people who really need it, with the added bonus of a greater level of trust in that information. Don’t believe me? Which type of information do you put most weight on, the information passed to you by a trusted colleague who you know uses the product heavily, or the product documentation? (and bear in mind that we technical writers pre-disposed to favour the work of our peers). That in itself is another issue which may be alleviated by embracing social content creation, pulling on the goodwill generated by openly inviting contribution and collaboration, whilst giving technical writers a chance to show their worth in full public view.

So where is all this heading? I’m not sure if anyone is too sure but there do seem to be some trends appearing. The use of Wikis to host documentation, the creation of community websites with few restrictions, and more. There are plenty of tools, and with a little work you can get them talking to each other. Technology is not the limiting factor anymore, attitudes are now the only things stopping us trying these wonderous new things. It’s a big step for some companies, and some people, to free their information, to pass their hard earned knowledge about willy-nilly without a clue as to how it will be used.

Once you’ve gotten past the limitations, the real effort, once you have your community or collaboration up and running, is the surrounding processes. Do you want to pump content into the website regularly? (yes). Do you want to allow anyone and everyone to contribute to that same store of information? (yes). Do you want to allow others to quietly correct your mistakes? (yes). Do you want to give the people who need it, access to information about your product, regardless where it originates, trusting them to use their judgement? (yes).

The final pieces of the jigsaw are the finer details of implementation. Presuming we want to reuse information as often as possible where do you store information and how do you allow access to it? Who should be involved in verifying new information? Where/how is the level of trust established?

Pulling together the threads of this emerging role is tricky, with so much overlap into multiple areas and so much to consider there is a danger of not seeing the wood for the trees. This post is an attempt to step back and make a little more sense of what I can see, what I know, and the changes starting to drag our profession in interesting new directions. I fear I may have muddied the waters, but hopefully they’ll settle and things will start to make sense.

Regardless of whether I’m right or wrong, one thing is for sure, these are exciting times and we have a great opportunity to finally leverage technical communications into the spotlight. The value of information is finally being properly realised, and we are ideally placed to help any organisation make the most of what information they have and help them understand and create the information they really need.

Back to DITA?

I’ve mentioned DITA a few times on this blog, and my DITA is not the answer post is still attracting attention. As I’ve said, I think the DITA standard is an excellent one for software documentation and the DITA movement is slowly catching up to the hype. I’ve never given up on DITA and had always planned to use it as the basis for the next stage of our content development, and as it happens the switch to a full DITA/CMS based solution may be closer than I had anticipated.

We have been considering how best to publish up to date information in keeping with patches and minor releases, and if we can tidy up and publish useful information from our internal Wikis and support system. The nature of the product we work with means there are a lot of different usage patterns, not all of which we would document as they fall outwith typical (common) usage.

So, how to publish formal product documentation, in-line with three versions of the product, in PDF for ‘printed’ manuals, JavaHelp to be added to our product, and HTML to be published to a live website alongside other technical content (ideally maintained in the same system as the product documentation). Storing the content as XML chunks also allows us to further re-use the content programmatically (which can be tied into our product in a smarter, dynamic, fashion).

The obvious answer is single source using DITA to structure the content, storing the content as XML to give us the greatest potential avenues for re-use. Nothing particularly startling there I know, but it’s a switch from the direction we had been considering. So I’ve been catching up on what’s new in DITA-land and have to admit I’m a little disappointed.

We already have FrameMaker and Webworks in-house, although both are a couple of versions old, and thinking we might keep using those applications I’ve been hunting about to see if I can find a solution that offers a coherent, end-to-end, story. There are several CMS solutions which require an editor, editing solutions which require a CMS, and a few products that straddle both CMS and editing but then require publishing engines.

I understand that it would take a collaboration between vendors to be able to offer a simple, seamless solution

In addition to that there does seem to be a tendency for any DITA focused solution to remain the remit of the overly technical. Don’t get me wrong, I’m quite happy delving into XML code, hacking elements, or running command line scripts to get things done. But surely I shouldn’t have to resort such things? Now, I’m sure there are many vendors who will tell me that I don’t need to worry, but I’ve seen several demos and all of them miss a part of the FULL story.

Come on then vendors, stick your necks out. If you are a CMS provider, then recommend an editor. If you sell editing software then talk nice to a CMS vendor and start promoting each other (yeah Adobe, I’m looking at you!).

And yes, I’ll happily admit that maybe I’m just not looking closely enough. If only there was some sort of technical community website that I could join, perhaps with a group or two on DITA? That’d be great.

Ohhh wait. There is! (not the most subtle plug in the world, was it? I think the new Content Wrangler communities could be a big hit, do check them out).

Have a got the wrong end of the stick, are there really gaps in the market in this area at present or is it just my imagination? I guess I’ll be running a fair few evaluations over the coming few weeks and, of course, I’ll post my thoughts and findings here.

The tool is not important

The tool is not important. The tool is not important. The tool is not important.

I have been repeating this mantra in my head for the past week or so, over and over, like a broken record. I’m in the middle of pulling together the requirements and scope for a new technical community website for our users, which will become the key focus of our technical information. The more traditional product documentation set will be maintained as we move forward, so there is some thought to be given towards how we manage the information as well as how it is published, or rather where.

I must stop considering the how. The tool is not important.

At present I have a list of requirements, all of which I’m thinking through from the point of view of how the process will work as far as creating and maintaining the information. Who will be access the source, who will be viewing the published information, who can edit what, how will the information be used by the audience? All the while there is a part of my brain dragging me towards HOW this will work. What tool will be able to handle our requirements?

The tool is not important.

I enjoy a challenge, and this is most certainly a new venture for me, but the basic foundations of this idea are rooted in things I know well, single sourcing content, developing online communities (I run a website for Scottish Bloggers (currently dead after our hosting service disappeared)). As such I’m confident I can get this off the ground, but even so I’m being careful to properly gather requirements, and fully understand the impact of changing our publishing model. Note I said “model”.

The tool is not important.

So with a list of requirements, and a full understanding of the processes that will be involved both to maintain the main documentation set and the development of other supporting information (culled from internal Wikis, mailing lists and anywhere else we stumble across something useful) one change is the way in which we plan, design and write product documentation.

As I’ve said, this is all about the processes that support the way we work. I’m being quite deliberate in how I pull together the requirements, focussing discussions on the audience, the expectations, the information and processes, with no mention of the technology which will need to support the new website.

The tool is not important.

Last year’s X-Pubs conference drilled this message home, and it’s good to be able to draw on the information and knowledge gained there. Get your requirements sorted out and agreed, understand the impact of changing the way people access information, and the impact of changing how people work, figure out how best to handle the reaction to change and agree the expectations and limitations of your system. Decide which models you will follow, how the processes will hang together and outline the various roles that will be required, and make sure they understand what is required of them.

Then and only then should you consider what tools you require and make sure they are serving you.

Why AuthorIT?

As I mentioned before, we are planning to migrate content from FrameMaker to AuthorIT, staging the migration across two different product sets (and no small amount of time!). I’m in the process of evaluating AuthorIT for, despite having used it before, it has recently been overhauled with a spiffy new UI and some new features.

AuthorIT is a single source system, with content stored in a central database, which can publish to most (all?) of the formats that anyone would ever need. It includes an editor, supports multiple users, and has some additional add-ons for localisation and so on. Their website is very good if you want more information on their product.

After downloading and installing the trial version, which limits your import and publishing but otherwise has all the features available for use, I fired it up and was greeted with the new interface. Based on the ribbons used in the latest version of Microsoft Office, it is quite a shift away from the previous version and it took me a while to get to grips with. However it is a huge improvement over the old version and once you are used to it, like anything, it’s very nice to use. Yes I know there are still issues being dealt with, but I didn’t run across that many during my testing, so I’m happy.

During my evaluation I spoke to their Business Development Manager who was very helpful in delving into some of the issues I had around versioning and set my mind at rest. I’ll outline how we are going to handle maintaining multiple versions of documents in another post, once I’ve given it a dry run or two.

One issue that cropped up was the location and format of the supporting database. You can run AuthorIT on a Jet database either locally or on a network drive although that is particularly performant, or run it on a SQL Server. As we are a small team I did consider the Jet database but our situation suggests a server database would be better. Which introduced another problem, price. SQL Server isn’t the cheapest and we don’t have an installation in-house. Thankfully one of our IT guys suggested SQL Express (a limited free version of SQL Server) as a possibility, and after a quick check on the AuthorIT Yahoo Group, I’ve found that it will run quite happily on that database.

There is a limit of 4GB on the database size but as long as we keep our images elsewhere there is little chance we’ll hit that limit. Our total content at present, including images, tops out under 500MB for one version of the documentation. So we’ll actually be saving space on a server as we won’t be maintaining multiple versions of entire documents. Must remember to point that out to our IT guys!

Aside from versioning the only feature I was unfamiliar with was the batch runner, which allows you to run a batch file (.bat) as a scheduled task. Our current system runs at night, using Webworks to create a Javahelp file which is then included in the software build and AuthorIT will give us similar functionality.

Why AuthorIT? Well, quite simply it gives us what we need.

I spent some time at the X-Pubs conference last year, and throughout the presentations the underlying message was “get your requirements sorted before hunting for a system”. The premise is obvious enough, if you decide on a system first, you end up shoe-horning your processes around how it works rather than getting a system that works you way YOU work.

I also spent some time considering DITA but ultimately switching to an XML-based system is still too cost-prohibitive. AuthorIT is a compromise, allowing us to work how we want to work, whilst giving us single source benefits. We will use DITA as a framework for how we plan and write the content, but the simple fact is that AuthorIT is a much better value proposition than a bespoke system, both in monetary and resource terms. This makes the business case much easier to sell.

If you are considering single sourcing your content, then I’d strongly suggest you investigate AuthorIT as a possibility. It has limitations, including the oft-cited reliance on Word as a publishing engine, but for me the advantages outweight those.

And no, I am not being paid to endorse AuthorIT.

Content Analysis for re-use

The basic premise of “single source” can be summed up in one word.

Re-use.

Sounds simple enough but there is a wealth analysis and work that is required before that, somewhat elegant, aim can be met.

Analysing your content for potential re-use opportunites is, by and large, an onerous task. Whether you do it all by hand, printing out reams of documentation and annotating by hand, or electronically compiling spreadsheets using colour coding or obscure (”they made sense to me at the time”) codes, it takes time to do it properly and there are no shortcuts. Sorry to break it to you so bluntly.

However it does mean that you are forced to spend some time re-reading your content, content which you might not have visited for some time or in some cases, may not have written yourself. You’ll likely find inconsistencies in the content itself, styling errors and quite probably a completely different writing style. Whilst it may seem obvious I urge you, should it arise, to fight the urge to start editing as you go along.

My basic understanding of single source, and the re-use of information, is that there are times when you’ll need to rewrite content so it can be easier used in multiple locations. A change of tense perhaps, a rephrasing or reconstruction of a sentence may be all that is required, and hell, if you have the document open in front of you, why not just go ahead and make that change? Suffice to say that editing content that you are analysing has only one potential outcome. Chaos. Regardless of how well organised, how well planned your analysis is, if you start making changes to your content on the fly, you will soon find yourself with a blurred view of the very thing you are trying to analyse.

Yeah, I know. It’s sounds obvious, and it is when viewed from a distance.

However what I really wanted to discuss, for I’m certainly not 100% certain on this, is at what level does content granularity become too granular? If I want to re-use a paragraph then, obviously, breaking up content to the paragraph level makes sense but that immediately seems like overkill in many cases. So I’ve been steering away from that kind of structural thinking, away from paragraphs and sentences into semantically discrete blocks. So a short product description, containing a heading and a paragraph, is one block and a long product description, containing a heading and several paragraphs, is another. I’m pretty sure this is the correct approach but it does mean that, once you’ve made that decision, you are stuck with fairly large chunks of information.

I’m hoping that this is a good balance though, for if we are to break our content into smaller granules, the overhead of maintaining and manipulating them surely increases. Remember, in a single source system we are concerned with more than content, we also have to contend with the metadata associated with that content, and the more pieces of information we have to maintain, the increase in risk that the metadata becomes so complex as to be useless?

I think. Maybe.. I’m really not that sure.

Have you conducted any content analysis? If so how did you approach the granularity issue? I get the sense that, for a lot of people, the level of granularity is reached once the content analysis is complete, that it basically decides itself.

As we slowly progress towards a single source solution, I’m intrigued as to what to expect next, any thoughts or comments are much appreciated. After all, all the articles, conferences and books in the world can replace real life experience.

Notes:
This post was, in part, inspired when pondering if semantic analysis might be a way to tackle this but, for now, I wonder if it is perhaps a step too far for most?

CSS for layout

… and why you should use it.

Separating content from structure and style is a common theory, widely accepted to those of us either using or investigating single source solutions for our documentation. The same theory has been applied to web development and offers similar benefits.

CSS-based web design developed in parallel with the growing movement towards (and promotion of) the use of standards on the web. The web standards movement was a direct response to the increasing problems faced by web designers as they struggled to keep pace with the bespoke features introduced by the browser software of the day. Advocating support for the W3 maintained standards around, initially, HTML it soon found a band of supporters who were challenging themselves, and everyone else, to stop using tables as a mechanism for controlling page layout, and instead switch to using Cascading Style Sheets (CSS).

The origin of table-based layout was, essentially, a clever hack. Early versions of HTML, and the internet browsers that people used to view web pages didn’t have any way to control the layout of a page so tables were used. Nesting tables within tables to provide discrete areas for navigation, content and so on, became the norm and some very complex examples still exist. However, as the web gained popularity and large sites started to emerge, it became apparent that table-based layout were no longer workable. They were far too hard and too time consuming to maintain, and many web developers recognised this and started searching for a solution.

Separating the content from the layout elements was an obvious step and is easily achieved using CSS. Whilst it was primarily created to allow more flexible and powerful styling, it was soon evident that, as each page element can have positioning assigned, that it could also be used as the positioning mechanism.

The basic theory of CSS-based layout is pretty simple. If you draw out the sections of your web page you’ll probably end up with several different blocks. One for the banner, one for the navigation, another for secondary navigation, one for the content, and so on. Each of those blocks can be positioned separately, or in relation to one another and as each block is uniquely identified division, then all you need to do is apply layout rules to every division to position it where you want. OK, maybe it’s a little flippant to say “all you need to do” as there is a wealth of issues to be aware of when using CSS for layout but don’t panic, there are plenty of templates to get you started, I’ve linked to some at the end of this post.

Mind you, this doesn’t really sound much different from using tables though. Right?

Wrong. The real power of using CSS for layout comes when you need to change the position or other layout characteristics of one of those divisions. For example, let’s say you have a set of navigation links in a column down the left of the page. In a table-based layout you’d have a separate table cell holding those links (which may in turn be held in a nested table to help you align them). Simple enough.

Now, you need to switch that list of links to the right of the page. In table-based layout you’d need to cut-n-paste that table cell and move it on EVERY PAGE in your website or across your help system. Do you fancy doing that for every page in a 500 page help system, because I don’t.

Using CSS for layout, you’d make a change to the stylesheet (the .CSS file) and all the pages in your website would be updated. For a large website, or for anything more than 20 or so pages, the time savings soon become evident. I’d advocate that you take this approach for smaller static websites as whilst, table-based layout is still possible, the repetition of making any minor layout change still needs to be reflected across every page.

Ultimately, using CSS for layout isn’t really about web standards, nor is it just a trend. It’s a justified and valid use of technology to allow you to work smarter, to concentrate on the content you are delivering, and not spend a disproportionate amount of time editing multiple pages of a web-based help system or website. When your boss asks you what you did last week, what would YOU rather say?

Learning CSS-based layout is not without problems, there are still browser compatibility issues to overcome, although most are now well documented and easy to grasp but I truly believe that it is worthwhile learning the basics. Of course, the internet being what it is, there are a myriad of templates available to get you started, in fact some may even provide all you need.

Related reading:
Layoutomatic - offers three simple CSS-based layouts. A good way to learn the basics.
Free CSS layouts and templates - compiled by the wonderful Smashing Magazine.
Web Standards Project - keep up to date with the latest news in web standards.
CSS Zen Garden - one structured page of content, hundreds of different CSS layouts and styles. THE example of the power of CSS-based layout.
A List Apart - an excellent online magazine for web design, chock full of good stuff.

Content Audits

The basic premise behind auditing your content is to better understand both the structure and the content itself. Conceptually the idea seems simple enough, but in reality performing a content audit can be fairly boring. However, whether you are conducting the audit as part of a single source conversion project, or if you have recently inherited a large documentation set, I’d suggest that it is an excellent way to gain an understanding of what already exists and, with little guesswork on your part, start to understand what may be missing.

Content Audits are usually one of the early tasks undertaken by a team moving towards a single source publishing model but they can also provide a clear indicator about whether you need to single source or not. For many teams the primary driver of a move towards single source comes when an additional product platform or customer is introduced, or perhaps through a requirement to translate and localise. However, a thorough audit of your content will show whether what you believe to be true is valid and may indicate that you don’t need to start single sourcing your documentation at all (you might just need to change your working practises).

As I mentioned, the act of auditing, in any form, can be repetitive, onerous and very much a chore, so my first piece of advice is to break it up into short manageable chunks and most certainly don’t try and do it all at once. Perhaps aim to do a couple of chapters a week, thus leaving you time to do fulfill other duties, keeping the documentation up-to-date for example.

For me, the aim of a Content Audit is two-fold, on the one hand you will end up with a very detailed breakdown of the structure of your documentation, and on the other you should also be able to extrapolate the types of information that your documentation holds (e.g. procedures, concepts, and so on). A key benefit, which almost comes as a bonus, is that having spent time looking at your content, you will also have a good plan of which parts of the documentation can be reused and which parts may need rewritten before reuse is possible.

If you’ve done any research into this area, you probably have a good idea of what is involved and what the aims are. But what is a Content Audit, what does it look like?

Well it’s fairly simple and the easiest way to get started is to use your existing Table of Contents. Pull that out into a spreadsheet and you have an excellent starting point, particularly if your documentation has been written in short sections. Then you need to get into the content itself, and analyse the structure in a bit more detail. Again there are obvious chunks of information that can very easily be pulled out, or broken down, into discrete chunks. Procedures, illustrations, tables of data, anything that is of a similar type and is repeated throughout your documentation is easily identifiable as a distinct unit (you probably have unique paragraph formats for these too, another quick way to check!).

A simple example for you.

All of our product guides and online help have “Overview” sections. They are, typically, very very similar. The product guide Overview is longer than that in the online help.

With a small amount of re-writing, we can create chunks for “Overview” and an “Overview Extension”, with the former being used in the online help, and the latter appended when used in a product guide.

Ultimately a content audit will involve a lot of time reading, cross-checking, double-checking, and I’d advise you grab a nice big desk (in the boardroom perhaps?) so you can layout printed copies of your documentation. I’d also advocate that you don’t try and do the entire process, across all of your documentation, in one fell swoop. Pausing between batches, and discussing the findings with your co-workers, will stop you missing potential re-use opportunities AND stop you trying to re-use (re-write) chunks of information that need to be kept discrete.

Once you understand your own content, then you can start the process of seeing how it stacks up against the content created in the other areas of your company. More on that another time.