Thursday, 31 December 2015

An Alternative to a Hierarchy

When building documentation management systems (DMS) there is a temptation to put all of the content into a hierarchy of one kind or another.  This isn't always a mistake, but it shouldn't be the default structure.  In fact, I'd go so far as to suggest that it should only be used when there are compelling reasons to do so (and "I'm the most important person and I want a hierarchy" is not compelling).

A note for those who might not have a DMS in place yet, or those who want to learn more about them: A DMS is more properly called an EDMS, or Electronic Documentation Management System, but the "E" is often left off due to the assumption that people are talking about the management of documents on computer systems.  A non-electronic DMS would nowadays probably be akin to archival work.  DMS is not carried out using a normal file system like Windows Explorer (at least, not by professionals), but rather on specialised software tools that have functionality like version control, check in/check out and permissions.  Examples would be Documentum and SharePoint, but there are many others.

So, if not a hierarchy then the obvious question becomes: What structure should be used instead?

The obvious answer is: A tagged heap.

By a tagged heap, I mean a heap, a pile, a bucket, a single folder/list/library that contains all of your documents, with each document tagged, labelled, indexed, categorised or having otherwise had metadata applied.  Retrieval becomes a matter of searching, filtering and ordering rather than navigating, drilling down and visually scanning.  In most cases it's quicker for users and easier for administrators to maintain in the long term.

This is a big claim, especially if you've never been exposed to the power of searching a tagged heap. Also, hierarchies are popular, and the power of control they offer is a hard thing to give up, so I might have some convincing to do.  Let's start that by looking at the pros and cons of hierarchies:

Pros of folder hierarchy

  • Already understood and used by computer users - low to none learning curve for functionality;
  • Easy to permission a folder (and by extension all of the files inside it);
  • Low cost of setup;
  • High level of control for administrators.

Cons of folder hierarchy:

  • Content can only be stored in one folder (and by extension one place in the hierarchy);
  • High learning curve for large or complex hierarchies;
  • Only someone who knows the whole hierarchy can move through it efficiently or quickly;
  • High cost of moving content in the event of a change of hierarchy (e.g. due to organisational change);
  • "Logical" for the hierarchy creator will not be the same as "logical" for a lot of the users;
  • Very inflexible;
  • No standardised metadata between different file types (or versions, such as .doc and .docx);
  • Searching is difficult if you don't already know the name and location, leading to user frustration.
  • Slow to click through folders, read the next list of folders, click the next folder, etc, etc.

What can we take from this? Well, folder hierarchies might be familiar and easy to use, but they're inefficient, inflexible and illogical for a lot of users.  Familiarity and ease of use are all well and good, but there is a significant life-long cost to that which is hard to justify. The two costs that stick out immediately are the high cost of administration AND the high cost of using.  Isn't that actually a worse case scenario?  That's only 1 step above not having a DMS at all.

Still, familiarity breeds both comfort and contempt, and maybe I've gone too far towards the latter.  After all, there's a reason folder hierarchies are so popular and ubiquitous, and it can't be because people actively like things that are rubbish (no X Factor jokes, thank you very much).  So people must value the control a hierarchy gives them and the low cost of setting up and permissioning folders, because those are the big benefits, right? Or, is it because there is no obvious alternative? Or that people can't see the benefits of an alternative?

A tagged heap is a strong alternative, and has a lot of benefits.  Let's look at the pros and cons.

Pros of tagged heaps

  • Content can be viewed based on any number of searches, orders and filters, so location doesn't matter;
  • No learning curve as search is as ubiquitous as folders;
  • A system novice can find something as quickly as a system expert;
  • No cost of hierarchy change;
  • No logic gap between creator and user;
  • Infinitely flexible, as views can be created in any way the system allows;
  • Standardised metadata across all file types (using things like mandatory columns, labels, etc);
  • Searching is easy because of the metadata (such as date of creation, author, file type, description, etc)
  • Extremely fast file retrieval (modern search algorithms can search and return results from tens of thousands of records in less time than it takes to blink your eye).

Cons of tagged heaps:

  • Higher learning curve for users who are adding documents, especially for adding metadata;
  • Permissions can require more design;
  • Higher initial setup cost;

Unlike a folder hierarchy, the more a user puts in the more they get out. This specifically applies to metadata, but on the basis that, as every administrator of every system ever built knows, not every user is as diligent as they should be, any modern DMS system can be setup to make metadata entry mandatory.  The more metadata is entered, the more powerful the views and searches will become.  This does require an intelligent, thoughtful approach to metadata design though, hence the higher initial cost of a tagged heap system.

My argument would be that ease of setup and ongoing administration is not as important as ease of use for the people who'll be using the system on a daily basis.  A balance does need to be struck, because a system that's incredible for the 50 people who use it, but for which 50 administrators are needed, isn't a sensible purchase, but in general the users of the system are the ones who'll determine its success (or not).  We've all worked in places where a decision has been made to implement a system that the directors and/or administrators love, but which is execrable for the users.  And what happens? People use it as little as they can get away with, thus removing a lot of the intended benefit.  There's no point implementing a system that has such a poor ROI.

Although using a tagged heap does have a higher initial cost, the benefits for the users are legion and compelling.  On that basis, I'd use a tagged heap and consign complicated hierarchies to the recycle bin by default.

If you've got experiences that confirm or contradict the benefits and ROI of a tagged heap, drop a comment in below and let everyone know which you prefer.