The creation of taxonomies, and the organization of metadata involve some of the most important and challenging aspects of knowledge management. Both of these deal with one of the major problems at the heart of information sciences, classification.

Classification is the logical arranging of information for the purpose of finding it quickly when it is needed. The structure of this arrangement is usually represented as a taxonomy and the information associated with the items are metadata. Creating taxonomies and developing optimal metadata is quite difficult since our language has many ways of expressing the same or similar ideas, and each individual mentally organizes his or her thoughts in ways unique to their understanding and vocabulary.

As per ISO 15489-1, classification is the “systematic identification and arrangement of business activities and/or records into categories according to logically structured conventions, methods, and procedural rules represented in a classification system”.

Understanding the importance of Metadata is vital for anyone seeking to design an effective Content Management system. To support more effective Content Management, a variety of models and standards have been established. In this article, we will outline some of the common Metadata models designed to improve resource discovery for users.

What is Metadata?

What is Metadata

Metadata is the data about the data that makes everything work smoothly.

It refers to structured data describing anything that can be named. This can include anything from publishing date to web pages, author names to books, images to songs, and more. Metadata tags are used to aid resource discovery, improved resource organization, and the exchangeability of data and resources.

Metadata provides contextual and descriptive details about resources that are not necessarily self-describing. Without a useable Metadata Model, users of a given system will not be able to efficiently search or access information. Indeed, poor Content Management as a result of poor Metadata practices is as good as pushing your data into a filing cabinet. A strong content metadata model will lead to enhanced Content Management.

From an organization perspective, metadata is the cornerstone to better compliance, business intelligence, and improved workflow automation.

Metadata can be used to track things like the dates associated with a document’s associated record schedule. Or, metadata can be used to flag a security setting, validating access and edit rights, and thus controlling distribution. Metadata can also be used as a way to capture users’ rating of content. For example, indicating that content is “valuable or “useless” or even “dated.”

Metadata is an important part of the content capture, creation and organization phases of the information lifecycle. If associated metadata is not captured at the same time that the information is, you will quickly create a collection of information that is difficult to manage, find and retrieve. Metadata is extremely valuable as a search and retrieval enhancing mechanism.

Metadata can also provide value in the way content interfaces to business processes. For example, if loan applications have to be reviewed in the order that they were submitted, if the date of receipt was not captured as a metadata value, there would be no way to ensure the applications documents were addressed in the right order. Application reviewers also could then have the opportunity to query, aggregate or group applications from the same client for better, accurate processing.

Metadata also serves as a point of integration. Different content in different applications, even across different information management systems, becomes “linkable” through common metadata properties and values. A customer or project ID can be consistent across document repositories, as well as ERP or financial applications.

Because metadata is data about the data, it can also provide a variety of different views or slices through the content, providing a layer of insight and context about the knowledge contained in the information.

How to Develop a Classification Scheme for Metadata?

Classification Scheme

A basic approach for developing a classification scheme can be summarized as below:

Identify the Stakeholders

For a classification scheme, this will include:

  • The project sponsor, in whose area of responsibility the classification scheme will be deployed and who is sufficiently senior in the organization to be able to champion the initiative.
  • Business unit managers from those areas that will be using the classification scheme. It is their users who will be using it and, in the pilot phase, piloting the scheme so it is in their interest to ensure it meets their needs.
  • Records management – So many of these classification schemes relate to records management that their participation is a must; moreover, in many organizations that is where the expertise and experience in classification scheme development resides.
  • IT – For any information management program there will ultimately be a requirement to load, or encode, or otherwise implement the classification scheme in one or more applications. There will also be business logic to assign, and in most organizations one or both of these come under the purview of IT.

Define the Purpose

This will affect the structure of the scheme, the approaches used to categorize the information in the scheme, the metadata, etc.

For example, a classification scheme for managing records, the records retention schedule, may be structured according to retention categories and rules, while a file plan may be structured differently in order to enhance findability for end users.

Taxonomy Metadata

Determine the Approach

The next step is to determine the approach to creating the scheme. This will be based of course on the purpose of the scheme and the scope. The approach could be a more traditional, hierarchically‐based scheme or a taxonomy/thesaurus‐based scheme.

For the former, the approach should also identify the number of categories at the top of the scheme and the number of levels deep it will go. In the latter, the approach will include common classification schemes, taxonomies, and vocabularies as part of their records management or enterprise search solutions.

Classification essentially provides context for records. This is important because we can segregate records of value from records of little or no value. While the value of certain information can be a subjective matter and differs depending on the goals for seeking the particular information, records classification helps narrow down places where reliable information may reside.

Subsequently, it aids searchability. When there is a particular topic of records to search but hardly any information other than a few keywords are provided, users initiate the research by identifying classifications that may apply to that particular topic. When records are filed properly and/or they are audited systematically by records managers, the success rate for this methodology is often high.

The Importance of Metadata in Content Management

Importance of Metadata

Content Management refers to the organization of content, like text, graphics, and multimedia along with an effective tagging system (the Metadata). A good Content Management system will store the data it structures in the most efficient way possible on a single repository. It will also allow content to be reused by different publications on the system, edited, and allow it to be repurposed as and when required.

For Metadata to be effective for any Content Management system, it needs to provide at least the following 4 functions:

  • Searching

Users need to be able to search for key data. As such, content must be categorized and described well. This will allow users to search for files and content in a range of ways, be it by author, date of publication, title, or specific keywords.

  • Distribution

Values linked to content can be required by different applications to regulate where content is distributed or shared and when this happens. Distribution Metadata also establishes the journey content has made. An example of this is a log of actions an object has had taken against it.

  • Accessibility

Metadata models need to consider the security of managed objects. Through filtering at the distribution stage based on matching Metadata values, appropriate content can be accessed when required. Targeted content can then be securely delivered based on business rules across domains.

  • Retention

Metadata is usually used by Record Management applications to implement retention rules of a given organization. When Metadata is insufficient, this can cause issues with regards to retention rules, how records are preserved, and in what format.

Example of a Metadata Model for a Managed PDF Repository

Example of Metadata Model

When designing Metadata models it is important to ensure that the number of fields for a specific type of content is minimized to ensure all important fields are always updated. Where possible, free form fields should be resisted and pre-defined lists used instead, as they help to reduce input errors and loss of control. Nevertheless, cross-organizational as well as department-specific fields need to be allowed for.

An example of a Metadata model built on the Dublin Core methods is outlined below:

Functional Specifications

  • Users should be able to search for PDFs.
  • Using standard attributes of PDF files as well as status classifications should be possible.
  • Through the provision of relevant Metadata values as files are archived, appropriate files should be able to be found.

 

Domain Model

Managed Object: PDFs

  • Content-Type – PDF
  • Responsible Party – Author
  • Classifications:
    • Type
    • Status

 

Metadata Terminology

  • Content-Type – PDF (Auto-selected)
  • Creation Date – (Date Field)
  • Title – (free form field)
  • Topic – (free form field)
  • Classification Type – (predefined list)
    • Value 1 – Internal Document
    • Value 2 – External Document
    • Value 3 – Legal Documentation
  • Classification Status – (predefined list)
    • Value 1 – Draft
    • Value 2 – Pre-Approval
    • Value 3 – Approved
    • Value 4 – Archived

Are You Planning to Implement an Enterprise Content Management System?

Metadata in ECM

Understanding Metadata is vitally important when designing effective Enterprise Content Management systems. Abiding by established models and standards and mapping out the requirements of domains makes keeping information organized easier. It also ensures that the use of data is maximized without wasting resources.

Paper storage may require significant physical space. A content management platform can integrate disparate documents for greater control, access and process efficiency. It offers advantages in terms of information retrieval, security, governance and lower cost of operations. What’s more, proper records management is becoming a legal imperative.

In case you have any further questions related to this topic, feel free to reach out to us.

Author

Sreekumar RajanConsultant – Enterprise Software Solutions
A Qualified and Experienced Consultant with over 15 years of proven expertise in implementing Enterprise Content & Business Process Management Solutions