Building a Metadata-Based Website
by Brett Lider and Anca Mosoiu on 2003/04/21 | [11 Comments]
The online world has been flooded in recent years with talk of metadata, structured authoring, cascading style sheets (CSS). These ideas have at their core the idea of standardizing document creation by separating content from display. Additionally, the idea of a semantic web, consisting of ontologies and controlled vocabularies is gaining momentum. These ideas are about representing knowledge so that machine agents can understand them. At the confluence of these two broad categories of activity, new models of websites are emerging that can be as easily navigable by humans as maintained by rigorous processes.
The goal of this article is to help readers develop an understanding of core and supporting metadata and the benefits of using it to build a website.
Criteria for use: Why do this? Who should do this?
Basing your website’s navigation on centralized metadata is not the correct design for all websites. Many websites might never be candidates for this treatment at all. It is most helpful for sites and/or organizations of sufficient size and complexity enterprise-size companies with 5,000 or more employees and websites with complex product lines 1,000 or more configurations of their products. Other qualifiers are organizations where it is important to have a shared understanding of the core business concepts that the site is about across divergent audiences, from customers to partners to different groups within the organization publishing the site. Because metadata technology and expertise for dealing with it is relatively immature, the costs for implementing a metadata-based navigation may be prohibitively high for even good candidates. For now, at least, the benefits may be reserved for those companies with such size and scale as to give investment a sufficiently high return. That being said, there are several companies trying to create a market for metadata-based services in the enterprise as it relates to the Web, among them Ontopia, Ontoprise, Brandsoft, and others.
Differences between metadata-based websites and a traditional CMS
One of the first questions a savvy reader will ask is, “What is the difference between this approach and a traditional content management system (CMS)?” Traditional CMS’s organize their “metadata” so that it is entirely presentation-specific. The “product taxonomy” is a mixture of business concepts and content, the “links” are page-specific, rather than being derived from associations between topics. Business concepts are ideas central to the organization such as product names, solution names, etc.
What is so bad about this, another inquisitor will ask? The intent of using a centralized metadata repository as the basis of navigation for a website is to separate business concepts from the content or functionality about those concepts. A product as a business concept is the subject of its photo and introduction copy, not the same thing as the image and text. This is what we mean by separating concepts from content making sure the subjects and objects are delineated. Likewise, a product comparison tool is about the products, it is not a product itself. Links between pages should not be hard-coded between files in a directory, but should be derived from semantic associations between the core business concepts.
Advantages and disadvantages of a traditional CMS
Advantages
- Allows authors to use WYSIWYG tools to create documents and lay them out.
- Provides version and source control capabilities.
- Simple workflows for assigning and authoring content.
- Provides automated tools for getting content to the website.
- Provides a centralized store of content.
- Content is “trapped” in a proprietary system.
- CMS as a publishing tool restricts what you can do with the website and how you can reuse your content.
- Users have to understand complicated file organization structures in order to be able to place their content in the right place.
- Difficult to change layout and templates once a design is chosen.
- Doesn’t always keep up with the pace of business.
![]()
Navigation System: A set of user interface modules and interaction designs that work in concert to facilitate the user experience of a website.
Metadata: Most simply, data about data. For the sake of this discussion, we will make the distinction between Core Metadata, which is defined as the central ideas for a given business or entity, and Supporting Metadata, which is also centralized, but only serves to support the Core Metadata, such as concepts for Content Types, Page Templates, and the content itself.
Metatag: A text string added to a document with a type and value intended to aid in indexing or filtering the document for its contents. In a traditional CMS, the metatag types and values are often controlled through the CMS itself. A more holistic approach has metatags being added to documents based on metadata the metatags become simply one implementation of the metadata for the purpose of document indexing. In a centralized system, the type and values of the metatags are controlled centrally and the CMS or some other tagging tool reads them from there.
Concept: Business relevant ideas or topics (again, we will make the distinction between Core and Supporting concepts).
Relationship Type: A semantic structure for information. A surprisingly tricky idea, a relationship type defines the nature of the relationship between two concepts. Sure, there is a relationship between Bruce Lider and Brett Lider, but what type of relation type is it? One answer might be, it is the “is_parent_of” relationship type. Relationship types can have more than two participants, and a given concept may have one or more instances of a given relationship type (Bruce Lider has a couple more instances of is_parent_of, to the concepts Jessica Lider and Zachary Lider).
Ontology: Network of concepts and relationships between them. An example could be a ontology of wines, with relationships between each vintage to indicate its type, producing, taste qualities, food it is best served with, etc. Because all the concepts are discrete and important, and because the all the relationships between them are semantically structured, we can infer a lot of information from an Ontology that one just can’t from crawling the links on a series of web pages. (The question of how to take the spaghetti diagram of an ontology and turn it into a comprehensible set of linked web pages is addressed later.) What makes a formal ontology more robust than a thesaurus or faceted classification is rich semantic relationships, semantic restrictions on relations, range, domain, cardinality, logical sets, inverse relationships, etc.
Presentation Layer: Generally considered to be application layer that takes raw content and/or functional elements and formats it according to desired specifications, such as XSL, CSS, and Java’s Swing. For the sake of this discussion, the presentation layer is also part of the metadata, as we will have supporting concepts that are used only to help core concepts get onto web pages.
Content: Text, images, moving images. Always need to be associated with a core concept and a supporting concept to identify the type of content they are (Introduction or Product Small Photo?).
Functionality: Both the functionality derived from the interaction design of a site as well as specific, web applications embedded in a larger website.
Case study
Here is where we diverge from the abstract and generic, and dive into the concrete and specific. The problem space for this case study has been genericized as appropriate, but is not universal. We would like to think that the applicability of metadata systems and navigation built from them would work well in many other circumstances.
The business problem:
Stove-piped data and user experiences overlapping data and functionality provided by separate business groups because they have no way to share (Fig. 1). The database for the product recommendation tool is not the same one used for the feature finder, and this causes confusion and uncertainty, impacting sales and increasing support costs.
Site mirrors organizational structure because there is no incentive to do otherwise (Fig. 2). The site breaks down into marketing, instructional, and support sub-sites, each with a section for a product that might be known by slightly different names in each sub-site.
In addition to problems of inconsistent terminology, the site has the problem of inconsistent navigation and design (Fig. 3). This results in users having to learn multiple navigation systems, decreasing user satisfaction, and increased costs for the company, which must maintain each distinct user interface.
A metadata solution:
Step 1: Centralize the core concepts in a taxonomy.
Since the company sells discrete items, such as products, it should be possible to gain consensus on what each of these items is and what its name is. The first step to building an Ontology is to identify the core concepts one wants in the Ontology, and if possible, a primary organization scheme between them, such as a product hierarchy or taxonomy (Fig. 4, Fig. 5). A great technique for performing these steps is facilitated collaboration, based on the Delphi Process.
Figure 4. Taking an inventory, performing consolidation, creating a taxonomy. (click to enlarge)
Step 2: Develop core relationship types between the core concepts.The reason we add relationship types between core concepts is because these relationships, like the concepts themselves, are central to the business. Which Product goes with which Service and implements which Technologies in a certain Solution is one of the key answers RouterCo’s users seek when using the website or its tools. These relationships will be used to connect web pages to one another and to form the basis for Product Recommendation and Product Configuration web applications.
Figure 5. The core taxonomies. (click to enlarge)
Once you have developed a draft of the core concepts and business process for adding and revising the list and its hierarchical arrangement, you start a process by which core relationship types connect the core concepts.Examples: prod_has_tech, to connect a product with a technology it implements; prod_has_service, to connect a product with a service program to support it, etc. Relationship Types, as defined above, can have N number of named participants. These participants can be constrained such that a concept is only allowed to be in one instance of a relationship, only concepts from a certain taxonomy or branch of a taxonomy are allowed to be used in a certain participant, etc.
In the example given, prod_has_tech has two participants, product and technology. Concepts in the product participant must be from the Products taxonomy and a concept can be in N relationship instances. Likewise with the technology participant.
Figure 6. Illustration of sample Relationship Types and Relationship Instances an Ontology (click to enlarge)
Once you have the relationship types, you need to populate them with actual relationship instances, and develop a business process for maintaining this data (Fig. 6). By the way, here is where we are going from a set of taxonomies to an ontology adding all these Relationship Types and specific instances is the rich overlay of data required to bring a staid concept taxonomy to life and make it really useful for the enterprise and you as an Information Architect.Step 3: Associate content to the concepts.
Figure 7. Supporting taxonomies. (click to enlarge)
With the core concepts in place, develop a set of supporting concepts for all the content types that need to be published. Then place the content under their appropriate Content Types in the taxonomy and tag the content objects to the core concept they reference (Fig. 7, Fig. 8). This can be done by integrating an existing CMS with the metadata system or by building a new CMS with this functionality in mind. By associating content types with content, we can ensure that content is always labeled the same way on the site that a Data Sheet is always a Data Sheet for all products and not sometimes a Product Overview document.
Figure 8. Content chunks assigned to core concepts and content types. (click to enlarge)
The Relationship Type we use to tag content objects to a core concept will tell the web serving application whether a valid page exists for a given core concept and therefore whether or not to serve or render one for end users to view.Step 4: Make the website use the Ontology’s concepts and relationships for navigation.
Now that we have a core list of concepts, their hierarchy, and relationship types and instances between them, why replicate this data in a CMS? Instead, have the presentation logic of the website use the hierarchical relationships between core concepts to build the site hierarchically from top-level pages to deeper concepts (Fig. 9).
Figure 9. An ontology and the website built from it, side-by-side. (click to enlarge)
The website presentation logic can also use the supporting relationships that tie Content Objects to concepts and Content Types to display links to content.One of the affordances this offers is the ability to create consistent navigation for all core concepts of the same type. Now that we have all the Products in a hierarchy, we can say that all products at level 4 in the hierarchy should have a specific set of content types. This makes it easier for users to find information about products across the full line and decreases enormously the amount of maintenance work for each product’s sub-site.
Detailed examples of how the data in the Ontology can be used to create the website
Figure 10. (click to enlarge)
Example: Cross-links between concepts can create a list of Related links for a given core concept (Fig. 10). This is one of the ways in which building an Ontology with Core and Supporting Metadata is superior to a traditional system. The same enterprise resource for tracking which Technologies apply to which Products is used to create navigation on the website.
Figure 11. (click to enlarge)
Example: Cross-links between concepts can create alternative navigation flows to pages (Fig. 11). Users who don’t know what Services they want, but know they want Services for their Voice-Video solution, can navigate to them in a solution-centric manner, as illustrated here.All of the above examples could be merged to be used by a Product Recommendation tool, where a user indicates what Technology they need to implement on their network, and the tool tells them what Products fit that need, and shows them what Service programs they could get for those products (and what Partners they could purchase them from, etc.). This is an example of using the same metadata in more than one place on the site: relationship instances can be more than just links, they can be used in a complicated web application as well. Instead of the data having to be in two places a hand-coded link on a web page and a row in a database for the web application it is in one location, the central metadata system, and referenced by the cross-linking functionality and Product Recommendation web application.
Figure 12. (click to enlarge)
Example: Cross-links between concepts can facilitate internal data needs (Fig. 12). If the core metadata system were integrated with the Enterprise Resource Planning (ERP) system, the data below would facilitate knowing which Products were owned by what Organizations within the company and how much money each product had made for each business unit.Why this is so important? What are the benefits?
Zero degrees of separation. RouterCo has a single source of data for internal and customer-facing uses. Links on the website correspond to relationships between core business concepts, creating zero degrees of separation between the business and its representation on the web. This means that when the company, its concepts, and their relationships change, the website changes in tandem, without any conscious effort or hand-coding.
To make this concrete: the business process for conducting a “product launch” includes adding a concept for the new product in the metadata system with a launch date, so when the day comes for the product to launch, the metadata system creates a page for the new product.
Consistency of message. The single source of data ensures that all business units talk about the same core business concepts the website cannot get fractured because the only way to publish content is to tag it against the core concepts. No more rogue sub-sites. If people want to do something different, they need to convince everyone who has a stake in the core concepts and the content types.
Consistency of user experience. Centralizing the core concepts and giving core concepts of the same type the same navigation template keeps the navigation options for users consistent across the entire site.
Reporting and metrics. Because all content must be tagged in order to appear on the website, reporting how many content objects appear on the website and how many content objects a given concept has, etc., becomes very easy. Integrating the metadata system with a traffic analysis program (such as Clickstream) adds more functionality, so that one can look at traffic for a given content object, or all the pages for a given concept or group of concepts.
Conclusion
The goal of this article was to help readers develop an understanding of core and supporting metadata and the benefits of using them to build a website. We hope that by walking through an example of a website for a fictitious company that has chosen to build their website this way, we have shown the power of this technique.
The progression towards core metadata-based websites is a the progression of separating data from display, core data from supporting data, and linking as tightly as possible the organization producing the website to the website itself. It is the movement away from fractured technical and user architectures, away from groups within an organization just doing their own thing. Producing a website like we have described is not just a huge information architecture challenge, but a huge organizational, technical, and even a fundamental computer science challenge. In terms of a suggested approach, we recommend getting some kind of executive support for a small-scale, technical- and user- proof-of-concept, building momentum and buy-in through the pilot, and iterating based on the pilot until the project has sufficient momentum to take over the external web presence of the company.
The value of a project like this is similar to that of any user-centered design project: orienting the experience design of the website around core user types. It is also a huge change management project, which involves business stakeholders, information architects, and Information Technology groups across the organization.
We didn’t mention this much in the article, but it is assumed that the taxonomy design, naming of content types, and other user-impacting design decisions are done in a user-centric manner. In this type of project, where the organization that produces the website has to change dramatically to support a new paradigm, the business value for the project must be and is higher. In addition to making things easier for end users, moving to a metadata-based business management tool (which is another name for this type of project) provides value for the company internally, in areas like content publishing and reducing organizational costs associated with different websites and their differing technical underpinnings. A website based on core and supporting metadata aligns the entire organization and its ecosystem of users into the same paradigm, reaping the benefits that occur when everyone speaks the same language.
“Metadata in a nutshell,” Michael Day. UKOLN: the UK Office for Library and Information Networking, University of Bath, UK, 2001.
“Structured Content: What’s in it for Writers?,” Mark Baker, Senior Consultant, Content, OmniMark Technologies Corporation. CMSWatch, November 17, 2002.
“A CSS Redesign in Five Easy Pages.” A List Apart, February 16, 2001.
SemanticWeb.org: The Semantic Web Community Portal
“The Semantic Web,” Tim Berners-Lee, James Hendler and Ora Lassila. Scientific American, May 2001.
“Development 101: A Guide to Creating Your First Ontology,” Natalya F. Noy and Deborah L. McGuinness, Stanford University.
“What Is A Controlled Vocabulary?,” Karl Fast, Fred Leise and Mike Steckel. Boxes and Arrows, December 2002.
Ontopia: The Topic Map company
Ontoprise: Semantics for the Web
“Ontologies Come of Age,” Deborah L. McGuinness, Associate Director and Senior Research Scientist, Knowledge Systems Laboratory, Stanford University. In Dieter Fensel, J im Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2002.




Readers' Comments (11)
Reputation points
Posted 2003/04/23 @ 08:55AM with
I would like to point out the existence of several model-based methods for the design of web applications. Some of the most popular include OOHDM (Object Oriented Hypermedia Design Method), and WebML. While such methods have their origin in academia, they have also been successfully applied to industrial strength websites.
These methods, while not cast as using metadata directly (yet), do show that it is desirable to have an additional abstraction layer between your ontology and the actual navigation structure…
The ontology describes how the information is defined – its characteristics and relations, and the navigation structure provides a “view” over these items to support a given set of target user types and tasks.
Reputation points
Posted 2003/04/23 @ 11:32AM with
Sorry if it is not clear in the article, but Figure 9 (http://www.boxesandarrows.com/archives/images/042103_lide… shows the use of a Relationship Type in the Ontology used for “navigation structure”—we called this Rel Type website_hierarchy. Agreed that the structure of the taxonomy and what is displayed to users should be separated in most cases. I am not familiar with OOHDM and WebML—I will definitely check them out—but at least on a conceptual level what they are do seems like providing yet another view or facet through the core concepts.
Reputation points
Posted 2003/04/24 @ 23:05PM with
Great article except that I would dismiss CMSes so quickly. Done right, (metadata is such a political issue; it’d be easy to get a metadata project mired down forever), metadata can make a CMS so much more useful. I’m not sure I’d agree with some of your CMS “cons.” We use a CMS; we have no problem republishing our content in multiple formats and our users don’t have to know anything about file locations (they do have to be able to make some basic assumptions about the content, though, for example, whether an article is “news” or “opinion”).
Reputation points
Posted 2003/04/25 @ 09:04AM with
Most situations seem to be combinations of what you refer to as traditional CMS’s and metadata-based sites. Rarely can a metadata-based site derive all contextual information from the core content of a page, but allowing the “exceptions” to be input along with the core content can provide much of the flexibility of a traditional CMS.
Reputation points
Posted 2003/04/28 @ 12:51PM with
One thing that we didn’t talk about in this article is the modeling methodology that we used. We didn’t use a “traditional” or “academic” method for the simple reason that it would have meant teaching a variety of people a number of different visual languages (UML, WebML, etc). The system that we developed was used with business owners, developers, and even users to create and describe the site.
But that’s a topic for another set of articles.
Reputation points
Posted 2003/05/14 @ 04:14AM with
I had one of those moments reading this article, a sort of “I’m not along in the world” one. The metadata based principles outlined resonate so well with a belief and in practice what I have been pursuing for some years now.
I build metadata web applications using WebML with a tool called WebRatio (www.webratio.com). This is web development environment that you can use to design a model-driven solution which automatically generates all the implementation code. Amazing as the tool is, the key is that I have applied a metadata based design approach to the way I build solutions.
I produce an information model (UML class diagram) comprising of business level objects like “Products, Solutions, Services, Organisations, ..” and most importantly define the relationships/associations between. In some cases I even have relationship classes between classes to better define the specific nature of inter-relationships.
This underlying information model is then exploited at the business logic level by using the metadata as a dynamic navigation schema. There is only ever one instance of anything. Page content assembly and navigation is a function of the relationships that exists between the information chucks.
If you take a look at my site www.coherencedesign.co.uk you can see a complete working implementation of this approach. All the navigation elements and page content are completely data driven. New elements and pages can be added using an associated content management application (also designed with WebRatio).
I would be happy to submit an article which explains in more detail how I am applying the concepts of Building a Metadata based Website.
Reputation points
Posted 2003/05/14 @ 21:29PM with
congratulations!
your article is great- in fact, i´ve been waiting for stuff of this kind for quite a while now- but should be taken with a grain of salt. explaining the need for a metadata based approach to managers, even engineers, is the hard part. maybe the following idea helps: relating products to technologies, solutions and services is an excellent basis for controlling purposes such as “technology controlling” or product lifecycle management, since a product is only the embodiment of several concepts (technology, distribution etc) that in fact are the real targets for the effects of time.
yours, jens
Reputation points
Posted 2003/08/15 @ 08:20AM with
This is indeed a very informative article. Its one of the best articles I have read regarding website creations using metadata and taxonomies.
Just one request Brett and Anca – I would like to know if there is any book or other articles published which have more elaborate and detailed examples
Reputation points
Posted 2004/11/22 @ 18:38PM with
how to redisgn web using OOHDM
Reputation points
Posted 2005/08/25 @ 19:50PM with
Thanks for the wonderful article, it gives readers like me a great starting point.
I do have the following questions however:
a) Of the various CMS products (interwoven, vignette etc.), which ones (if any) in your opinion could best incorporate a metadata driven approach
b) We (a leading IT services & solutions company) are currently in the process of embarking on a site architecture revamp & would like to move to a metadata driven site architecture. Could you recommend some further reading material or organizations that specialize in deploying metadata driven websites
Cheers!
Yeu Wen
0 Reputation points
Posted 2006/04/20 @ 22:48PM with
I am unable to see the graphics or figures on this page. Am I missing anything?