<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Boxes and Arrows: Stories by Karl Fast</title>
    <link>http://www.boxesandarrows.com/person/46</link>
    <pubDate>Mon, 19 Aug 2002 12:00:01 GMT</pubDate>
    <description>Stories by Karl Fast</description>
    <item>
      <title>Recording Screen Activity During Usability Testing</title>
      <link>http://www.boxesandarrows.com/view/recording_screen_activity_during_usability_testing</link>
      <guid>http://www.boxesandarrows.com/view/recording_screen_activity_during_usability_testing</guid>
      <description>Recording what users do is a crucial aspect of usability testing. One of the most useful recordings you can make is a video of screen activity, recording everything on the screen, much like a VCR: the mouse moving, pages scrolling, clicking links, typing in the search terms, and so on.

&lt;pullquote&gt;Recording screen activity doesn&amp;#8217;t necessarily cost much. Three Windows-based software programs&amp;#8212;Lotus ScreenCam, TechSmith Camtasia and Hyperponics HyperCam&amp;#8212;range between $30 and $150.&lt;/pullquote&gt;A visual record of these mouse movements, keystrokes, and other activities is most useful for usability testing. While there is no substitute for good observational skills, it can be difficult to remember everything that happened during the test. Having a visual record not only reminds you of what happened, it allows for more detailed analysis after the test and comparisons between individuals.&lt;p&gt;&lt;img src="/files/banda/art_end.gif" alt="" title="" width="8" height="8" /&gt;&lt;/p&gt;Recording screen activity doesn&amp;#8217;t necessarily cost much. Three Windows-based software programs&amp;#8212;Lotus ScreenCam, TechSmith Camtasia and Hyperponics HyperCam&amp;#8212;range between $30 and $150 and all have free trial versions available for download so you can try before you buy. All three offer good performance, but unfortunately, I can only recommend two, since the third is no longer being actively developed by its maker.

&lt;span class="subhead"&gt;How to record screen activity&lt;/span&gt;
Before we get to the review, let&amp;#8217;s take a brief look at the three ways of recording screen activity: a camcorder, a VCR, or software. All the tools described in this article use the software approach, but to understand the benefits and drawbacks it&amp;#8217;s useful to compare all three methods.&lt;ol&gt;&lt;li&gt;&lt;b&gt;Camcorder&lt;/b&gt;&amp;#8212;This is the simplest method. Put your camcorder on a tripod, point it at the screen and record. Although simple, the resulting video will be a bit fuzzy and hard to read. It&amp;#8217;s useful for getting an idea of what the user did, but it can be difficult (sometimes impossible) to read small text.&lt;/li&gt;&lt;li&gt;&lt;b&gt;VCR&lt;/b&gt;&amp;#8212;If your video card has a TV-out option (a feature that&amp;#8217;s fairly common on modern video cards) you can probably connect it to a VCR and record directly to tape. The result should be an improvement on the camcorder method, but because the resolution and sharpness of a television is lower than a computer screen the result will still be fuzzy and downgraded from the original image. To get something readable you&amp;#8217;ll need to limit your screen resolution to 800x600 at most, preferably 640x480.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Software&lt;/b&gt;&amp;#8212;In the software solution a program runs in the background, silently capturing everything that appears on the screen and saving it to a video file. The result is a perfect recording with no loss of detail. Each frame of the resulting video could serve as a screenshot. Indeed, that&amp;#8217;s one way to think of how the software works: taking series of screenshots and stringing them together into a techno-flipbook (of course the technical details are more involved).&lt;/li&gt;&lt;/ol&gt;The software approach is the most appealing, but traditionally it&amp;#8217;s had one huge drawback: performance. The software has to capture and compress an immense amount of data in real time, without slowing down the machine. When I tested these programs on older hardware they would sometimes bog down so much it took ten seconds for a pull-down menu to appear.

In my tests the performance problem vanished when testing on a 1 GHz machine and a good video card. As I write this, 1 GHz machines are near the bottom end of the scale for desktop PCs. Hardware requirements are no longer the hurdle they used to be.

There is one obvious limitation to the software approach&amp;#8212;it will only record what happens on the screen. It won&amp;#8217;t record users themselves. If you want to learn something from the body language and physical movements of the user then you&amp;#8217;ll still need a camcorder.

&lt;span class="subhead"&gt;Features and requirements&lt;/span&gt;
This article arose out of a research project I was doing on how people search. For this project I developed the following set of software requirements. They should satisfy most usability testing situations:&lt;ul&gt;&lt;li&gt;&lt;b&gt;Record at 10 frames-per-second and 800x600 in 16-bit color with no noticeable impact on system performance.&lt;/b&gt; Obviously lower frame rates, resolution, and color depth would improve performance, but this was my bare minimum. Much of the web doesn&amp;#8217;t look good in 8-bit color and even participants in a research project shouldn&amp;#8217;t be forced to suffer the indignities of 640x480. While I was willing to settle for 5 frames-per-second, I was hoping for 10 or more. Given a fast enough machine all three programs were able to meet this requirement.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Unobtrusive recording.&lt;/b&gt; I wanted the capture software to be invisible during recording. I didn&amp;#8217;t want users to be distracted or feel anxiety by being constantly reminded of the recording. Most of the tools didn&amp;#8217;t completely disappear when recording, but they all reduced to a small and unobtrusive icon in the toolbar.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Low cost.&lt;/b&gt; I couldn&amp;#8217;t spend more than a few hundred bucks.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Pause, Fast Forward, and Rewind.&lt;/b&gt; Some of the tools use a special video format and thus a special program for playing the video. The playback tool needed have a pause feature and preferably fast-forward and rewind.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Immediate playback.&lt;/b&gt; My project used a technique known as retrospective verbal reports, more commonly called a &amp;#8220;think after.&amp;#8221; In this technique the user is recorded while doing the assigned task. When the task is completely they are shown the video and asked to conduct a think aloud. For think afters it&amp;#8217;s best to watch the video immediately after the test to minimize forgetting. The only program that had problems here was ScreenCam which required a minute or two to write out the video file after recording. Even for the think after protocol this wasn&amp;#8217;t a showstopper.&lt;/li&gt;&lt;/ul&gt;Those were my required features. There were a few other features I was also interested in but they weren&amp;#8217;t critical.&lt;ul&gt;&lt;li&gt;&lt;b&gt;Record Sound.&lt;/b&gt; All three products can record an audio track along with the video. Of course this requires even more computing power. Since I needed to record video for only part of the session, but audio for the entire thing (participants were interviewed after the think after session), I went analog and used a tape recorder for the audio recording. I didn&amp;#8217;t need this feature, but you might.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Hotkeys.&lt;/b&gt; To minimize futzing with the program during test sessions I wanted hotkeys for important commands like Record, Pause, Play, and Stop. All of the programs had hotkeys. I found this to be a useful feature.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Record &amp;#8220;raw&amp;#8221; data.&lt;/b&gt; My dream program would have recorded a separate data stream of every keystroke, every mouse click, every URL visited, and so on. It would have time stamped each event, and automatically correlated it with the video. None of the programs did anything close to this so I had to record this data by hand by reviewing the video. One possible solution here is using a &amp;#8220;spyware&amp;#8221; program to record this raw data stream and then manually correlate them. I never seriously investigated this option.&lt;/li&gt;&lt;/ul&gt;Curiously, none of the tools I investigated were designed for usability testing. They&amp;#8217;re mainly used for creating tutorial videos and software demos. This means they have a lot of other features that look nifty, but for someone engaged in usability testing they are thoroughly useless and so I&amp;#8217;ve ignored them here.

&lt;span class="subhead"&gt;Testing the Software&lt;/span&gt;
I wound up testing three software packages: Lotus ScreenCam, TechSmith Camtasia and Hyperponics HyperCam.

I tested the products on three different machines with differing capabilities. (Note:  I was only able to test ScreenCam  on Machine A because ScreenCam  only runs on Windows 95, 98, and NT.)

&lt;table width="80%" border="0" cellspacing="1" cellpadding="2" class="black"&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td class="bodytext"&gt;Machine A&lt;/td&gt;&lt;td&gt;&lt;span class="bodytext"&gt;Machine B&lt;/span&gt;&lt;/td&gt;&lt;td&gt;Machine C&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Processor&lt;/td&gt;&lt;td&gt;200 MHz (Pentium Pro)&lt;/td&gt;&lt;td&gt;333 MHz AMD K62-333)&lt;/td&gt;&lt;td&gt;1 GHz (AMD Duron)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RAM&lt;/td&gt;&lt;td&gt;64 MB&lt;/td&gt;&lt;td&gt;320 MB&lt;/td&gt;&lt;td&gt;256 MB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Video Card&lt;/td&gt;&lt;td&gt;Matrox Millenium&lt;/td&gt;&lt;td&gt;Matrox Millenium II&lt;/td&gt;&lt;td&gt;ATI Radeon 8500&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Video Card RAM&lt;/td&gt;&lt;td&gt;8 MB&lt;/td&gt;&lt;td&gt;16 MB&lt;/td&gt;&lt;td&gt;64 MB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Operating&lt;br /&gt;System&lt;/td&gt;&lt;td&gt;Windows NT 4.0 with Service Patch 6a&lt;/td&gt;&lt;td&gt;Windows 2000 with Service Patch 2&lt;/td&gt;&lt;td&gt;Windows 2000 with Service Patch 2&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;

My test procedure was as follows:
&lt;ul&gt;&lt;li&gt;Set the display to 800x600 and 16-bit color.&lt;/li&gt;&lt;li&gt;Set the frame capture rate to 15 frames per second.&lt;/li&gt;&lt;li&gt;Start recording.&lt;/li&gt;&lt;li&gt;Start Internet Explorer, maximize the browser to fill the entire screen, and begin browsing the Web.&lt;/li&gt;&lt;li&gt;If the performance is not acceptable:&lt;ul&gt;&lt;li&gt;Reduce the frame rate until either performance is acceptable or the frame rate is 5 frames per second. Never go lower than 5 frames per second.&lt;/li&gt;&lt;li&gt;If performance still suffers and the frame rate has been reduced to 5 frames per second, reduce the color depth to 8-bits (ie: 256 colors). Keep the resolution at 800x600.&lt;/li&gt;&lt;li&gt;If it still doesn&amp;#8217;t work, reduce the resolution to 640x480. Keep the color depth at 8-bits and the frame rate at 5 frames per second.&lt;/li&gt;&lt;li&gt;If it still doesn&amp;#8217;t work give up on the program and have a nice cup of tea. I have found that second flush Darjeeling from the Margaret&amp;#8217;s Hope Tea Estate to be particularly relaxing.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;If performance is acceptable:&lt;ul&gt;&lt;li&gt;Continue browsing for about five minutes. Visit sites with long pages (so I can scroll), complex layouts, forms, and other features. My standard routine was a few searches on Google, Yahoo, Amazon, Salon, and CNN.&lt;/li&gt;&lt;li&gt;Repeat the test at 1024x768. If that works, move up to 1280x1024.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;
The following aspects of the test environment should also be noted:
&lt;ul&gt;&lt;li&gt;The browser cache was cleared before each test.&lt;/li&gt;&lt;li&gt;No proxy servers were used.&lt;/li&gt;&lt;li&gt;The Internet connection was a 384 KBps ADSL line.&lt;/li&gt;&lt;li&gt;Only video was recorded. All of the tools can optionally record an audio track. Camtasia and HyperCam can also add sounds and visual effects to certain events like mouse clicks. None of these features were used.&lt;/li&gt;&lt;/ul&gt;

&lt;pb /&gt;

&lt;span class="subhead"&gt;Lotus ScreenCam&lt;/span&gt;
Website: &lt;a href="http://www.lotus.com/home.nsf/welcome/screencam/"&gt;http://www.lotus.com/home.nsf/welcome/screencam/&lt;/a&gt;
Version tested: Lotus ScreenCam for NT
Price: $86

ScreenCam is a story of good news and bad news.

The good news is that it offers excellent performance. When compared with Camtasia and HyperCam on the same machine it had the highest frame capture rate while having the least impact on overall system performance.

The bad news is that according to the web site &amp;#8220;there are no plans to create a version of ScreenCam to work on Windows 2000 or Windows XP.&amp;#8221; In other words, ScreenCam  is a dead product, though you can still buy it. ScreenCam  is only available for Windows 95 and Windows NT. It will also run on Windows 98 and variants like Win98 SE and ME, but won&amp;#8217;t work with certain video cards (see the website for details).

&lt;span class="subhead"&gt;Results from Machine A (200 MHz)&lt;/span&gt;
ScreenCam was the clear winner on Machine A, the oldest and slowest system I tested on. It had the least impact on system performance and captured the most data. The resulting video was smooth and sharp. However, to get good performance at 800x600 I had to reduce the color depth to 8 bits. It worked at 16-bit color but it was noticeably slower. Pages were slower to display and scrolling felt chunky. It worked, but not very well.
If you&amp;#8217;re stuck with an old 200 MHz machine and it&amp;#8217;s running an older version of Windows, then ScreenCam is definitely your best bet. Even so, you may be forced to go with 8-bit color depending on the system speed.

&lt;span class="subhead"&gt;Results from Machine B (333 MHz)&lt;/span&gt;
ScreenCam  was not tested on Machine B because it&amp;#8217;s doesn&amp;#8217;t run on Windows 2000. 

&lt;span class="subhead"&gt;Results from Machine C (1 GHz)&lt;/span&gt;
ScreenCam  was not tested on Machine B because it&amp;#8217;s doesn&amp;#8217;t run on Windows 2000. 

&lt;span class="subhead"&gt;Details about ScreenCam&lt;/span&gt;
For ScreenCam  to work you need to install special ScreenCam video drivers. I found this surprisingly painless, but it&amp;#8217;s unique to ScreenCam. Neither Camtasia nor HyperCam require special drivers. These video drivers are the reason for ScreenCam&amp;#8217;s superior performance, enabling ScreenCam to access the video display through low level operating system calls. The downside to this approach is the ScreenCam must be rewritten to support each version of Windows. That&amp;#8217;s why it works on Windows 95, and NT, most versions of 98 (depending on the video card), but not at all on Windows 2000 or XP. 

ScreenCa m  records data to a special file format that can only be played back using the ScreenCam player. The player can be downloaded for free and runs on any version of Windows. The good news here is that while you can only record on Windows 95/98/NT, you can play ScreenCam recordings on any version of Windows, including Windows XP.

When recording is finished, ScreenCam needs to spend time processing and creating the final video. The amount of time this takes depends on the length of the recording. For a nine-minute test at 800x600 and 8-bit color, ScreenCam spent approximately 70 seconds &amp;#8220;processing data.&amp;#8221; This processing creates a temporary file that can then be played back. But this file still needs to be saved if you want to keep it. In my test this took an additional 30 seconds.

It&amp;#8217;s possible to convert ScreenCam videos to standard AVI movie files, but I don&amp;#8217;t recommend it. My nine-minute test produced a 58 MB ScreenCam file. When I converted this to an AVI file at 10 frames per second the resulting file was 2.5GB.

&lt;span class="subhead"&gt;Pros:&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;Better performance than either Camtasia or HyperCam.&lt;/li&gt;&lt;li&gt;Can be used even on older hardware.&lt;/li&gt;&lt;li&gt;ScreenCam player is free and runs on all versions of windows.&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;Cons:&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;Only supports Windows 95, Windows NT, and most versions of Windows 98.&lt;/li&gt;&lt;li&gt;No longer being developed. No plans to support Windows 2000 or XP.&lt;/li&gt;&lt;li&gt;Requires special video driver (easy to install).&lt;/li&gt;&lt;li&gt;Uses a proprietary video format and converting to standard formats like AVI creates huge files.&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;The bottom line: &lt;/span&gt;
ScreenCam had the best performance of any program tested, but the lack of support for Windows 2000 and XP makes it hard to recommend.  It&amp;#8217;s probably the best choice if you&amp;#8217;re stuck with older hardware running Windows 95, 98, or NT.

&lt;pb /&gt;

&lt;span class="subhead"&gt;TechSmith Camtasia&lt;/span&gt;
Website: &lt;a href="http://www.techsmith.com/products/camtasia/camtasia.asp"&gt;http://www.techsmith.com/products/camtasia/camtasia.asp&lt;/a&gt;
Version tested: 3.02
Price: $150 

&lt;img src="/files/banda/recording_screen_activity_during_usability_testing/fast_img1.gif" alt="" align="right"&gt;Camtasia offers excellent performance, the richest feature set, and it runs on all versions of Windows. On Machine C, the fast machine in my test group at 1 GHz, Camtasia had no troubles recording 15 frames per second at resolutions up to 1280x1024 in 16-bit color. Even at 1600x1200 it was able to record 15 frames with only a hint of sluggishness. Camtasia also performed well on the 333MHz machine B. It had no troubles at 800x600 and was only slightly sluggish at 1024x768.

There are only two downsides to Camtasia. It has a lot of features that you probably don&amp;#8217;t need for usability testing, and it&amp;#8217;s by far the most expensive tool in this review. At $150 it&amp;#8217;s almost double the price of ScreenCam and five times the cost of HyperCam.

&lt;span class="subhead"&gt;Results from Machine A (200 MHz)&lt;/span&gt;
Camtasia didn&amp;#8217;t run particularly well on this machine, but it did run. In 16-bit color at 800x600 I was able to capture 5 frames per second, but just barely. The cursor would flash constantly as the machine tried to keep up, pages loaded slowly, and scrolling felt sluggish. It worked, but it was far too slow for usability testing.

Dropping to 8-bit color made a noticeable improvement. Although performance was much improved, I couldn&amp;#8217;t increase the frame rate significantly. I was barely able to capture five frames a second at 1024x768 in 8-bit color.
ScreenCam was definitely better on this system (which is admittedly ancient). Camtasia was almost good enough to be usable at 8-bit color on this machine, but not quite.

&lt;span class="subhead"&gt;Results from Machine B (333 MHz)&lt;/span&gt;
Camtasia had no troubles capturing the required 15 frames per second at 800x600 in 16-bit color. Bumping the resolution up a notch to 1024x768 was acceptable, though there was a noticeable pause when loading pages. Performance wasn&amp;#8217;t quite smooth, but it was usable. For someone used to browsing the web over a 56k modem the pauses would probably seem normal. At higher resolutions Camtasia began to bog down.

Still, this was a significant improvement. Machine B is roughly 60 percent faster overall than machine A, but where Camtasia was just barely able to capture 5 frames per second at 800x600 on machine A, it grabbed 15 frames a second on machine B with no performance impact and even worked well at 1024x768.
Results from Machine C (1 GHz)

Camtasia performed flawlessly on this machine. It recorded 15 frames per second at resolutions up to 1600x1200. There was a slight sluggishness at the highest resolution, but nothing significant. The machine was still perfectly usable. At lower resolutions there were no performance degradations.

&lt;span class="subhead"&gt;Details about Camtasia&lt;/span&gt;
When you buy Camtasia you actually get three pieces of the software. There is Camtasia Recorder for recording the video, Camtasia Player for playing the videos, and Camtasia Producer which is a basic video editing tool.
Camtasia also requires that you install a special Camtasia video codec called TSCC (it&amp;#8217;s free). Using TSCC dramatically reduces the size of captured video files without any loss in image quality. One of my Camtasia tests ran 19.5 minutes in 800x600 at 16-bit color. The resulting video file was 36.8MB. Installing the codec is easy and quick (and doesn&amp;#8217;t require rebooting your system).

An important trick to using Camtasia is the &amp;#8220;hardware acceleration&amp;#8221; setting. It&amp;#8217;s counter-intuitive, but turning hardware acceleration off results in a dramatic performance improvement. With hardware acceleration on, Machine B was chunky and sluggish at 800x600. When I turned it off, this sluggishness vanished.

The hardware acceleration option is actually a Windows setting and has to do with your video card. Camtasia has an option to automatically disable acceleration when you start recording and enable it when you recording ends.
Camtasia will automatically attempt to determine the best video and audio capture rates. For my tests I elected to set these values manually, but I also ran tests to see how the auto-detect feature worked. No complaints here. 
Unlike ScreenCam, the Camtasia video was available immediately after recording. No post-processing was required for straight video. If sound is being recorded, Camtasia records it in a separate file. When recording is stopped, Camtasia merges the two files. During a five minute test merging the audio and video streams took about 15 seconds on the 333 MHz machine B.

Camtasia has a wealth of other features. I won&amp;#8217;t go into all of them, but here are the highlights:&lt;ul&gt;&lt;li&gt;You can choose to capture the entire screen, a single window, or a specific region of the screen.&lt;/li&gt;&lt;li&gt;Although Camtasia includes a free Camtasia player, you don&amp;#8217;t need to use it. Any video player will work so long as you have the TSCC codec installed.&lt;/li&gt;&lt;li&gt;Camtasia Producer is a video editing tool for combining, editing, and otherwise munging your videos. None of the other tools included something like this.&lt;/li&gt;&lt;li&gt;Camtasia also sells an Software Developer Kit (SDK) &amp;#8220;to allow you to easily add screen recording functionality into your Windows application.&amp;#8221; The SDK is available as a separate product. None of the other tools offer a similar package.&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;Pros:&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;Excellent performance.&lt;/li&gt;&lt;li&gt;Excellent features.&lt;/li&gt;&lt;li&gt;Easiest to use of the programs tested.&lt;/li&gt;&lt;li&gt;Supports all versions of Windows, except for Windows 95 (but does support Windows 95 OSR2).&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;Cons:&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;The most expensive tool reviewed. At $150 it&amp;#8217;s almost twice the cost of ScreenCam and five times more than HyperCam.&lt;/li&gt;&lt;li&gt;Includes features you probably don&amp;#8217;t need for usability testing (like Camtasia Producer).&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;The bottom line: &lt;/span&gt;
Camtasia offers the best blend of performance, features, and ease of use among the programs tested. It runs on every version of Windows (except the original Windows 95) and installation is a snap. The only drawback is price, but at $150 it&amp;#8217;s still within the range of almost every budget. Highly recommended.
&lt;pb /&gt;

&lt;span class="subhead"&gt;Hyperponics HyperCam&lt;/span&gt;
Website: &lt;a href="http://www.hyperionics.com/hc/"&gt;http://www.hyperionics.com/hc/&lt;/a&gt;   
Version tested: 1.70.03
Price: $30

&lt;img src="/files/banda/recording_screen_activity_during_usability_testing/fast_img2.gif" alt="" align="right"&gt;HyperCam is by far the cheapest of the products tested, yet it probably has all the features you need for usability testing. It offers slightly less performance than Camtasia, but at one-fifth the price. Almost any machine you buy today will have enough spare computing power to make up the difference. The biggest drawback to HyperCam is that it&amp;#8217;s a little harder to configure properly. Most of these are minor and, considering the price, you may be willing to live with them.

&lt;span class="subhead"&gt;Results from Machine A (200 MHz)&lt;/span&gt;
HyperCam performed almost as well as Camtasia on this machine. It was barely able to capture 5 frames a second at 800x600 in 16-bit color. It performed much better at 8-bit color. As with Camtasia it wasn&amp;#8217;t great, but it did work at the reduced color depth and at modest frame rates, though not well enough to use for usability testing.

&lt;span class="subhead"&gt;Results from Machine B (333 MHz)&lt;/span&gt;
HyperCam required a bit of coaxing to get it working properly on this machine. Once I got the settings right, which took some fiddling (more on this below), it captured 15 frames per second at 800x600 in 16 bit color. At 1024x768 I could do no better than 11 frames per second, but performance was smooth. Overall Camtasia performed better on this machine, but HyperCam&amp;#8217;s performance was certainly acceptable.

&lt;span class="subhead"&gt;Results from Machine C (1 GHz)&lt;/span&gt;
On this, the fastest test machine, the difference between Camtasia and HyperCam was almost negligible. HyperCam had no troubles with the base requirement of 15 frames per at 800x600 and 16-bit color. Even at 1024x768, 1280x1024, and 1600x1200 HyperCam was able to capture a full 15 frames per second with little or no performance problems.

&lt;span class="subhead"&gt;Details about HyperCam&lt;/span&gt;
HyperCam has most of the same features and options as Camtasia, but I found it a little harder to use. For example, HyperCam lets you capture either a window or any rectangular region of the screen. Camtasia does this too, but it also has a one button feature for capturing the entire screen. To capture the entire screen in HyperCam you have to first define a region which covers the entire screen and then press record.

Admittedly this is a little thing. But there are three other &amp;#8220;little things&amp;#8221; related to performance that I found frustrating. Once I figured them out HyperCam worked like a champ, but until I figured them out HyperCam left me unimpressed.

The first of the &amp;#8220;little things&amp;#8221; is hardware acceleration. Like Camtasia, HyperCam works best if the video hardware acceleration is turned off. Unlike Camtasia you have to muck around with the Windows display properties to turn this off, then run HyperCam, and when you&amp;#8217;re finished recording you have to turn it back on. Camtasia has a &amp;#8220;Disable display acceleration during capture&amp;#8221; checkbox that automatically disables acceleration when you start recording and enables it when recording is finished. A small but helpful touch.

The second little thing is the frame capture rate. Camtasia will automatically try to determine the best frame capture rate for your system. You can also set it manually, and if you set it too high Camtasia will automatically drop frames and keep recording (though the system will probably slow down).

HyperCam takes a different approach to frame capture rates. First, there is no auto-configuration option&amp;#8212;you must set the frame rate manually. This isn&amp;#8217;t a big deal, but if you set the frame rate too high HyperCam will start recording, then stop suddenly and display an error message saying the frame rate is too high. In my tests I started at fifteen frames per second and lowered the frame rate step-by-step until HyperCam stopped complaining.

The third little thing is the video codec. HyperCam lets you select which video codec to use for the recording. Since most users (including me) know nothing about video codecs, HyperCam has an autoselect feature which is &amp;#8220;Strongly Recommended.&amp;#8221; Unfortunately, HyperCam was much slower than Camtasia when I chose autoselect. 

Wanting to give HyperCam a fair shake I decided to try other codecs. Scanning the list I saw an entry for the &amp;#8220;Techsmith Screen Capture Codec.&amp;#8221; This is the codec that Camtasia installed (TSCC). When I tried recording with TSCC, HyperCams&amp;#8217; performance shot up to the point where it ran almost as fast as Camtasia.

In other words, HyperCam by itself has some performance problems, but you can overcome these problems by using the TSCC codec from Camtasia. I have been unable to find any reason why this would not be allowed. The TSCC codec from Camtasia is available as a free download and I had no technical difficulties using it with HyperCam.

&lt;span class="subhead"&gt;Pros:&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;Inexpensive. At only $30 USD, that&amp;#8217;s one-fifth the cost of Camtasia.&lt;/li&gt;&lt;li&gt;Supports all versions of Windows.&lt;/li&gt;&lt;li&gt;Performs almost as well as Camtasia as long as you&amp;#8217;re using Camtasia&amp;#8217;s TSCC codec.&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;Cons:&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;Not as many goodies and features as Camtasia, but probably enough for the usability professional.&lt;/li&gt;&lt;li&gt;Harder to use and configure for decent performance.&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;The : &lt;/span&gt;
My first impression of HyperCam was that for $30 I was getting what I paid for. But once I fiddled with it and found the &amp;#8220;secret&amp;#8221; of using Camtasia&amp;#8217;s TSCC codec, I was entirely satisfied. Unless you need the extra features of Camtasia, HyperCam will probably do the job (but download the trial version and test it to make sure).

&lt;pb /&gt;

&lt;span class="subhead"&gt;Summary and recommendations&lt;/span&gt;
Before you make a decision I strongly recommend that you download these programs and try them yourself. It&amp;#8217;s the only way to be sure you&amp;#8217;ll get acceptable performance on your hardware. The trial versions are free, installation is a snap, and running some basic tests will take just a few minutes.

Camtasia is clearly the best of the bunch, but it&amp;#8217;s also the most expensive. With a bit of fiddling, HyperCam will perform almost as well as Camtasia for a fraction of the cost. Camtasia has a lot more features, especially since it includes a basic editing and production tool, but for usability testing the programs are roughly equivalent when it comes to features.

I can&amp;#8217;t recommend ScreenCam. While it used to be the gold standard in this area, it&amp;#8217;s now a dead product with no future.

Choosing between Camtasia and HyperCam is difficult. I preferred Camtasia for it&amp;#8217;s ease of use. It&amp;#8217;s a slightly faster and more polished than HyperCam. Still, HyperCam is a bargain at $30 and it&amp;#8217;s probably worth the fiddling required to make it perform well.

&lt;table border="0" width="80%" cellspacing="1" cellpadding="2" class="black"&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;&lt;b&gt;ScreenCam&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;Camtasia&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;HyperCam&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;&lt;b&gt;Purchase Options&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cost (USD), single copy&lt;/td&gt;&lt;td&gt;$86.00&lt;/td&gt;&lt;td&gt;$149.95&lt;/td&gt;&lt;td&gt;$30.00&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;Free trial for download?&lt;/td&gt;&lt;td valign="top"&gt;Yes (15 days)&lt;/td&gt;&lt;td valign="top"&gt;Yes (30 days)&lt;/td&gt;&lt;td valign="top"&gt;Yes (no time limit, but all videos are stamped with message saying it&amp;#8217;s unregistered)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Buy online?&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Site license available&lt;/td&gt;&lt;td&gt;Unsure&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Educational discount&lt;/td&gt;&lt;td&gt;Unsure&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;&lt;b&gt;Platform Support&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;Windows 95&lt;/td&gt;&lt;td valign="top"&gt;Yes&lt;/td&gt;&lt;td valign="top"&gt;Yes (only on Windows 95 OSR2)&lt;/td&gt;&lt;td valign="top"&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Windows 98 (including Win 98, 98 SE, and ME)&lt;/td&gt;&lt;td valign="top"&gt;Yes (may not work with all video cards)&lt;/td&gt;&lt;td valign="top"&gt;Yes&lt;/td&gt;&lt;td valign="top"&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Windows NT&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Windows 2000&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Windows XP&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;&lt;b&gt;Recording Features&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Hot-keys to start, stop, and pause recording&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Record sound&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Record full screen&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Record any region&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Set frame capture rate&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Choose codec used for recording&lt;/td&gt;&lt;td&gt;N/A&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Hide when recording&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;&lt;b&gt;Playback Features&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Pause&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Fast Forward&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Reverse&lt;/td&gt;&lt;td&gt;No&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;Special player required?&lt;/td&gt;&lt;td valign="top"&gt;Yes.&lt;br /&gt;The player is a free download. It runs on all versions of Windows, including 2000 &amp; XP. You can only record on 95/98/NT, but you can playback on anything.&lt;/td&gt;&lt;td valign="top"&gt;No.&lt;br /&gt;A special player is available as a free download, but any video player will do as long as you&amp;#8217;ve got the TSCC codec installed. The codec is a free download.&lt;/td&gt;&lt;td valign="top"&gt;No.&lt;br /&gt;Any video player will work. If playing back on a different machine you must have the codec installed that was used for recording.&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;end&gt;

&lt;biobox&gt; &lt;a href="http://www.boxesandarrows.com/people/archives/karl_fast.php"&gt;Karl Fast&lt;/a&gt;  was an information architect at Argus Associates. He is currently pursuing a Ph.D. in information visualization at the University of Western Ontario.&lt;/biobox&gt;</description>
      <pubDate>Mon, 19 Aug 2002 12:00:01 GMT</pubDate>
      <author>Karl Fast</author>
      <category>Methods</category>
    </item>
    <item>
      <title>All About Facets &amp; Controlled Vocabularies</title>
      <link>http://www.boxesandarrows.com/view/all_about_facets_controlled_vocabularies</link>
      <guid>http://www.boxesandarrows.com/view/all_about_facets_controlled_vocabularies</guid>
      <description>&lt;pullquote&gt;&amp;#8220;Our aim is to make this complex and important subject accessible to practicing information architects.&amp;#8221;&lt;/pullquote&gt;Information architects are fascinated with faceted classification and its application to information architecture problems. However, facets remain difficult to understand and there are few options for learning about them. 

This is the first in a series of articles that aims to correct this situation. We intend to explain both facets and the more general concept of controlled vocabularies. We want to make the subject accessible to those who don't have advanced degrees in library and information science. Furthermore, we want to show how these concepts can be applied to solve information architecture problems for the Web and other digital information environments.

The concept of faceted classification is decades old, and controlled vocabularies go back even further. Consequently a great deal has already been written about the subject. But these writings are not always helpful to the practicing IA. Some are too simple, others too academic. Most are hard to find, and many were written decades before this Web thing happened.

Throughout this series we will strive to be:
&lt;ul&gt;&lt;li&gt;&lt;b&gt;Practical.&lt;/b&gt; We will give you a practical guide to controlled vocabularies and faceted classification. We will not only explain the concepts, we will show you how to apply them in solving real information architecture problems.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Readable.&lt;/b&gt; Too much of the existing literature is hard to understand. It may be comprehensible to someone with a master's in library and information science, but this excludes a large number of practicing IAs (and we know some librarians who don't understand this stuff). We will use plain talk to explain this stuff, without dumbing it down.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Relevant.&lt;/b&gt; We will make this relevant to the Web and other digital information environments. A great deal was written about this topic in the 1950s and 60s. It's excellent material, but back then transistors were still a pretty neat idea. We believe that faceted classification has even more applications today than it did back then.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Accessible.&lt;/b&gt; Everything will be published here on Boxes &amp;amp; Arrows: on the web, easy to access, and free. While a lot has been written on this topic, it's often hard to obtain. For example, B.C. Vickery's excellent book, &amp;#8220;Faceted Classification: A Guide to the Construction and use of Special Schemes&amp;#8221; was written in 1960 and is rather difficult to obtain today (at least one of the authors has resorted to finding a copy in a library and, in desperation, photocopying the whole thing).&lt;/li&gt;&lt;/ul&gt;

&lt;span class="subhead"&gt;The plan &lt;/span&gt;
Our main goal is to explain faceted classification. However, a faceted classification scheme is actually a special case of what are called controlled vocabularies. To properly explain facets we will begin with this more general topic and work our way up to facets.

Our travels through this strange land will include the following:
&lt;ul&gt;&lt;li&gt;&lt;b&gt;Controlled Vocabularies.&lt;/b&gt; In the first full article in the series we'll describe controlled vocabularies in general. We'll talk about what they are and how they work. &lt;/li&gt;&lt;li&gt;&lt;b&gt;Synonym Rings &amp;amp; Authority Files.&lt;/b&gt; Before moving on to facets, we'll describe these simpler types forms of controlled vocabularies. There are many situations where they are more useful solutions because they're easier to create, implement, and maintain. Sometimes they're not enough and it's time to step up to facets.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Facets &amp;amp; Facet Analysis.&lt;/b&gt; With the fundamentals in place we will move on to the heart of our subject. This will take a while, but it'll be worth it. We'll also take time to describe facet analysis, the process used to develop facets.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Interface Issues.&lt;/b&gt; A long-standing weak point of controlled vocabularies is how to use them effectively in an interface. This is particularly true of facets. We'll explore these issues and give you the best advice we can.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Decision Factors.&lt;/b&gt; Not every project calls for a full blown faceted solution. Sometimes a synonym ring is better. How do you know? We'll cover some guidelines for making those decisions.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Future Directions.&lt;/b&gt; There are some interesting new applications related to facets and controlled vocabularies such as &lt;a href="http://www.xfml.org/"&gt;XFML&lt;/a&gt; (http://www.xfml.org/) and &lt;a href="http://www.topicmaps.org/"&gt;Topic Maps&lt;/a&gt; (http://www.topicmaps.org/). We hope to cover these as well.&lt;/li&gt;&lt;/ul&gt;
&lt;span class="subhead"&gt;Some final thoughts&lt;/span&gt;
That's a lot. And yes, we're ambitious. But no, we aren't writing the definitive treatise on the subject. Our aim is to make this complex and important subject accessible to practicing information architects.
 
We view this as a collaborative effort. We anticipate many questions. We'll answer these through the discussion features of Boxes &amp;amp; Arrows. We also plan to address the bigger questions you have in subsequent columns. Let us know what you want to know, and we'll do our best to provide you with answers.

&lt;end&gt;&lt;/end&gt;


&lt;biobox&gt;&lt;a href="http://www.boxesandarrows.com/people/archives/karl_fast.php"&gt;Karl Fast&lt;/a&gt; is a PhD student in library and information science at the University of Western Ontario. He also has a master's in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.

&lt;a href="http://www.boxesandarrows.com/people/archives/fred_leise.php"&gt;Fred Leise,&lt;/a&gt; president of &lt;a href="http://www.contextualanalysis.com"&gt;ContextualAnalysis, LLC,&lt;/a&gt; is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.

&lt;a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php"&gt;Mike Steckel&lt;/a&gt; is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX. &lt;/biobox&gt;
&lt;p&gt;&lt;img src="/files/banda/art_end.gif" alt="" title="" width="8" height="8" /&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 09 Dec 2002 22:44:53 GMT</pubDate>
      <author>Fred Leise, Karl Fast, Mike Steckel</author>
      <category>Findability</category>
    </item>
    <item>
      <title>What Is A Controlled Vocabulary?</title>
      <link>http://www.boxesandarrows.com/view/what_is_a_controlled_vocabulary_</link>
      <guid>http://www.boxesandarrows.com/view/what_is_a_controlled_vocabulary_</guid>
      <description>&lt;pullquote&gt;&amp;#8220;A controlled vocabulary is a way to insert an interpretive layer of semantics between the term entered by the user and the underlying database to better represent the original intention of the terms of the user.&amp;#8221;&lt;/pullquote&gt;The most effective communication occurs when all parties involved agree on the meaning of the terms being used. Consequently, finding the right words to communicate the message of your website can be one of the most difficult parts of developing it. 

When we converse, we speak in &amp;#8220;natural language.&amp;#8221; This is language in all its raw, rich, gooey glory. When we organize our information and label it however, there is so much richness, variance, and confusion in terminology that we often need to impose some order to facilitate agreement between the concepts within the site and the vocabulary of the person using it. 

This order can come through a controlled vocabulary. Amy Warner &lt;a href="http://www.lexonomy.com/publications/aTaxonomyPrimer.html"&gt;defines&lt;/a&gt; a controlled vocabulary (CV) as &amp;#8220;organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.&amp;#8221;  This means that a CV is a type of metadata that functions as a &amp;#8220;subset of natural language&amp;#8221;(Wellisch); it is not how we normally speak. Using a CV is also a way to overtly display relationships among the various concepts that your site covers in order to increase findability. The most basic, and often overlooked, form of controlled vocabulary is a consistent labeling system. If you are careful to call the same thing, or the same concept, by the same name everywhere on your site, you are using a very simple controlled vocabulary. And you're also ensuring that your users start developing a mental model of the information they can find. 

A controlled vocabulary is a way to insert an interpretive layer of semantics between the term entered by the user and the underlying database  to better represent the original intention of the terms of the user. Consider what happens when you do not use a controlled vocabulary. An uncontrolled vocabulary simply uses the natural language of the documents and matches that with the natural language of the user. This is extremely specific, and it gives the user exactly what they ask for. Sounds great right? Consider, however, a site about chemistry, where many of the documents use the chemical name of the element (&amp;#8220;iron&amp;#8221;), and many use the chemical symbol of the element (&amp;#8220;Fe&amp;#8221;). Using an uncontrolled vocabulary, the results will only include the terms entered by the user. If the user entered &amp;#8220;Fe&amp;#8221; in the search box, he will not get any of the results for documents that use the term &amp;#8220;iron.&amp;#8221;  There is a good chance the user is missing some documents he would like to have.  Very few users will enter both terms, and many will be reviewing their results thinking they are seeing the results from all relevant documents.


&lt;span class="subhead"&gt;The equivalence relationship&lt;/span&gt;
You probably are aware of certain categories or items on your site that might go by multiple names. You realize that if you said &amp;#8220;automobiles&amp;#8221; on your homepage and &amp;#8220;cars&amp;#8221; on the next page, users might get confused. Users will start to wonder if there is a difference between the two terms. Instead you choose &amp;#8220;automobiles&amp;#8221; and don't use &amp;#8220;cars&amp;#8221; at all. In this case &amp;#8220;automobiles&amp;#8221; is the term you prefer to use throughout your site. We call this the &amp;#8220;preferred term.&amp;#8221; &amp;#8220;Cars&amp;#8221; is a variant term, a different word representing the same concept.  Or, consider this example:

 &lt;img src="/files/banda/cv_1.jpg" width=343 height=43 alt="Example of a preferred term"&gt;    

Here, each term refers to the same concept, Elizabeth Taylor (your preferred term). We could tell our system, when people ask for &amp;#8220;Elizabeth Burton&amp;#8221; use &amp;#8220;Elizabeth Taylor.&amp;#8221; This is more traditionally expressed using standard CV notation as:

Elizabeth Fortensky USE Elizabeth Taylor
Elizabeth Taylor UF Elizabeth Fortensky  (UF = Use For)

Or even this:

Liz Taylor USE Elizabeth Taylor 
Elizabeth Taylor UF Liz Taylor

Think about Gap's web page (&lt;a href="http://www.gap.com"&gt;http://www.gap.com&lt;/a&gt;). We already know what they sell (they have excellent branding), and most of their content is generally referred to by the same terms as used in our general culture. In other words, people consistently say &amp;#8220;jeans,&amp;#8221; &amp;#8220;pants,&amp;#8221; and &amp;#8220;shirts.&amp;#8221; Even though you might get the occasional person using the word &amp;#8220;dungarees&amp;#8221; or &amp;#8220;slacks,&amp;#8221; nearly everyone would see &amp;#8220;jeans&amp;#8221; and know what the category referred to (the visuals help support this too). Furthermore, Gap does not carry hundreds of pairs of jeans that must somehow be distinguished from one another. If you examine the natural language people use when talking about Gap's products, there's an unusually small amount of term variance. Content like this works great in the very simple organization system used on the Gap site. It works so well that they do not even need to offer search; this is very unusual for an ecommerce site. What they have is a system in which all of the concepts are consistently labeled using language familiar to their users. They're lucky. Few sites have the option to work in this way.

Let's say, however, that gap.com decided to offer search. Then they would somehow need to translate the natural language of search into the controlled language of the website. People search in the same language they speak, natural language, so a more advanced controlled vocabulary needs to take the concepts of your users (natural language) and match them to the concepts expressed in the language of your website (controlled vocabulary). That means if the developers of the site began to see that people were searching for &amp;#8220;dungarees&amp;#8221; and getting zero hits, they would need to create a way to tell the system, &amp;#8220;when someone searches for 'dungarees,' give them the results for 'jeans.'&amp;#8221; In the language of a controlled vocabulary, &amp;#8220;jeans&amp;#8221; becomes the preferred term and &amp;#8220;dungarees&amp;#8221; is a variant term, and they have an equivalence relationship. This can be a powerful tool for increasing findability. 

There are many examples of the situations that alternate terms cover. Here are a few:&lt;ul&gt;&lt;li&gt;synonyms (two words with the same meaning, like &amp;#8220;jeans&amp;#8221; and &amp;#8220;dungarees&amp;#8221;)&lt;/li&gt;&lt;li&gt;homonyms (words that sound the same, but have different meanings, like &amp;#8220;bank&amp;#8221; the financial institution and &amp;#8220;bank&amp;#8221; the side of a stream or river) &lt;/li&gt;&lt;li&gt;common misspellings &lt;/li&gt;&lt;li&gt;changes in content (e.g., countries that change their name or have multiple spellings)&lt;/li&gt;&lt;li&gt;identifying &amp;#8220;Best Bets&amp;#8221; or the most popular pages associated with a certain term (&lt;a href="http://www.BBC.com"&gt;http://www.BBC.com&lt;/a&gt; is great at this)&lt;/li&gt;&lt;li&gt;connecting a woman's married name to her maiden name&lt;/li&gt;&lt;li&gt;connecting abbreviations to the full word (e.g., NY and New York, the chemical symbol Si with the element Silicon)&lt;/li&gt;&lt;/ul&gt;

There are two types of synonym equivalence lists: synonym rings and authority files. Synonym rings are generally used for searching behind the scenes as a way to connect the various terms for a concept. It can be used to say, &amp;#8220;when someone searches for &amp;#8220;Si,&amp;#8221; give them all documents with both &amp;#8220;Si&amp;#8221; and 'Silicon.'&amp;#8221; However, what happens when you want to display one of these terms in your navigation? Then you will need to pick one to be your preferred term. Now, you have an authority file. In each of the above examples, different terms may be used, but each one represents the same concept. They are tied together and given meaning by making their equivalent relationship explicit.


&lt;span class="subhead"&gt;Hierarchical relationships: broader and narrower terms&lt;/span&gt;
If your content is more complex, for instance if you sold only pants and you had hundreds of types, you might require more from your controlled vocabulary.  &lt;fig image="http://www.boxesandarrows.com/archives/images/121602_CV/cv_2.jpg" width=161 height=271 alt="Jumble of terms" align="left" hspace="5" caption="Figure 2: Terms related to &amp;#8220;pants.&amp;#8221;" /&gt;The natural language we use to describe the concept of &amp;#8220;pants&amp;#8221; quickly enlarges as &amp;#8220;pants&amp;#8221; becomes more specific. In other words, &amp;#8220;slacks,&amp;#8221; &amp;#8220;khakis,&amp;#8221; &amp;#8220;jeans,&amp;#8221; &amp;#8220;trousers,&amp;#8221; &amp;#8220;corduroys,&amp;#8221; and other kinds of pants will all need to be differentiated so users don't have to rummage through pages and pages of search results for the word &amp;#8220;pants,&amp;#8221; when pants are your whole inventory.  &lt;p&gt;&lt;img src="/files/banda/art_end.gif" alt="" title="" width="8" height="8" /&gt;&lt;/p&gt;What will help is a systematic way to map out the different terms so people quickly find the specific kinds of pants they are interested in. What you need is a hierarchy showing the broader terms (BTs), the narrower terms (NTs), and the variant terms (most often displayed as &amp;#8220;USE&amp;#8221; and &amp;#8220;UF&amp;#8221; for Used for). These will show which terms are subsets of larger, broader concepts. You are starting off with a jumble of words that are all related to &amp;#8220;pants&amp;#8221; in some way. It might look something like Figure 2.

We have a bucket we can call &amp;#8220;Pants&amp;#8221; and inside are a lot of terms with a relationship to the concept of pants. In this example, &amp;#8220;pants&amp;#8221; is the broader term, and the kinds of pants refer to subsets of the whole universe of pants. In a controlled vocabulary, we might reconfigure the chart above to look like this:

 &lt;img src="/files/banda/what_is_a_controlled_vocabulary_/cv_3.gif" width=592 height=360 alt="Taxonomy of the concept of Pants"&gt;        
        
This is what people are increasingly calling a Taxonomy. This term makes traditional librarians a little uncomfortable, but we are learning to live with it. Originally it was a term for biological classifications (Genus, species, etc.), but has quickly become a standard word for describing hierarchies. 

The standard CV notation used to express hierarchical relationships are NT (narrower term) and BT (broader term). Using this notation, the term &amp;#8220;Women's Pants&amp;#8221; would be expressed like this:

Women's Pants
&amp;nbsp;&amp;nbsp;BT Pants
&amp;nbsp;&amp;nbsp;NT Casual Pants
&amp;nbsp;&amp;nbsp;NT Dress Pants
&amp;nbsp;&amp;nbsp;NT Sports Pants

There is a lot you can do with this hierarchical arrangement. It can help you formulate your homepage navigation. It could improve your searching and browsing. It can help users broaden and narrow their search results quickly by showing them where each set of results fits into the site's hierarchy (see Keith Instone's &amp;#8220;&lt;a href="http://keith.instone.org/breadcrumbs/"&gt;attribute breadcrumbs&lt;/a&gt;&amp;#8221; for more examples). Generally, few sites need to go beyond the level of a taxonomy, but it might be useful to see the next level of complexity in controlled vocabularies.


&lt;span class="subhead"&gt;Associative relationships: related terms&lt;/span&gt;
How far can I extend the pants example? Oh, quite far. Let's say that you are a research institute that studies pants (ridiculous I know, but stay with me). You not only study pants themselves, but the materials they are made from, their history, how they are manufactured, and more. Your institute might do well to take the time to develop what Peter Morville has called the &amp;#8220;&lt;a href="http://www.asis.org/Conferences/Summit2001/preconference.html"&gt;Rolls Royce of controlled vocabularies&lt;/a&gt;&amp;#8221;&amp;#8212;a thesaurus. A thesaurus shows all of the relationships described so far (BT, NT[LD3], and UF), but will also include related terms (RT). This is an associative relationship. It shows how one term is associated with another.

If a user looked to your institute for research on jeans, you would be able to give them that term embedded in a rich series of relationships. An example of the range of relationships would be expressed like this using the standard format for thesauri:  

Jeans
&amp;nbsp;&amp;nbsp;BT Pants
&amp;nbsp;&amp;nbsp;NT Levis
&amp;nbsp;&amp;nbsp;NT Wranglers
&amp;nbsp;&amp;nbsp;UF Dungarees
&amp;nbsp;&amp;nbsp;UF Waist Overalls
&amp;nbsp;&amp;nbsp;RT Denim
&amp;nbsp;&amp;nbsp;RT Overalls

Denim is related to Jeans, but not hierarchically. It is not a type of jeans, nor is one a subset of the other. Yet someone interested in one term might be interested in the other because they are related concepts. In the interface, you might identify &amp;#8220;Denim&amp;#8221; as a &amp;#8220;see also&amp;#8221; option for &amp;#8220;Jeans.&amp;#8221; If users looked for the term &amp;#8220;Denim&amp;#8221; in the thesaurus they might see something like this:

Denim
&amp;nbsp;&amp;nbsp;BT Fabrics
&amp;nbsp;&amp;nbsp;NT Ring Spun
&amp;nbsp;&amp;nbsp;NT Dark Indigo
&amp;nbsp;&amp;nbsp;NT Stonewash
&amp;nbsp;&amp;nbsp;RT Jeans

The Denim example alone could be filled with many additional terms, and it is easy to see how well this would accommodate user browsing (and &amp;#8220;&lt;a href="http://www.gseis.ucla.edu/faculty/bates/berrypicking.html"&gt;berrypicking&lt;/a&gt;&amp;#8221;). This is also one of the dangers of creating associative relationships: knowing when to stop. This relationship is also the most difficult and subjective of all the relationships in a CV. You are identifying a relationship between two concepts that may not be obviously apparent. On an Amazon product page, when the page identifies an item that others have purchased along with the one being displayed, Amazon is identifying a potentially useful associative relationship. 

To push the concept a little further, if a user is interested in a paper from your pants institute on the &amp;#8220;Hemingway wore khakis&amp;#8221; advertisement from Gap, they might also be interested in a paper you have on how it was really Rock Hudson's subtle use of Khakis that made &amp;#8220;A Farewell to Arms&amp;#8221; such a great movie. The connection between the two documents, the intersection of the concepts of &amp;#8220;Hemingway&amp;#8221; and &amp;#8220;khakis,&amp;#8221; is less direct than the Denim example above. This is expanding the concept of &amp;#8220;related terms&amp;#8221; farther than many would be prepared to go, but it is an option. 


&lt;span class="subhead"&gt;Internal uses of controlled vocabularies&lt;/span&gt;
So far, we have focused on how controlled vocabularies help the user, but there are also benefits to the organization using the CV. Here are a few:&lt;ul&gt;&lt;li&gt;CVs can help with category analysis or keeping your categories distinct.&lt;/li&gt;&lt;li&gt;CVs can help establish a site's navigation.&lt;/li&gt;&lt;li&gt;CVs can be the basis for personalization features.&lt;/li&gt;&lt;li&gt;CVs can help with preparation for CMS or knowledge management projects, since many of these require this sort of structure to your content to do their magic.&lt;/li&gt;&lt;li&gt;CVs get the organization using the same language as the users (which should result in better communication with them).&lt;/li&gt;&lt;li&gt;CVs can help the organization (and the user) understand what concepts your site covers. Your controlled vocabulary is in reality a &amp;#8220;concept map&amp;#8221; of what is on your site.&lt;/li&gt;&lt;/ul&gt;

While controlled vocabularies can be powerful, by themselves they are not the magic pill that will cure what ails your site. CVs are a lot of work, they are often difficult and time consuming to maintain, and they can be very political. Some skepticism toward all metadata is a healthy thing (everyone still reading this should see &lt;a href="http://www.well.com/~doctorow/metacrap.htm"&gt;Metacrap&lt;/a&gt;). As with anything important, there are a lot of people who are doing it loudly and badly.  


&lt;span class="subhead"&gt;Conclusion&lt;/span&gt;
Human beings are natural makers of patterns. That is how we understand what our senses are taking in. When people visit your site, they will immediately begin trying to understand what they see. A well-designed and regularly updated controlled vocabulary can help connect the concepts your users have in their heads to the concepts you present on your site. That is when real communication will occur. 


Next in the series: &lt;a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php"&gt;How to create a controlled vocabulary&lt;/a&gt;.

&lt;end&gt;&lt;/end&gt;

&lt;morebox&gt;&lt;ul&gt;&lt;li&gt;Wellisch, Hans. &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/082420882X/ref=nosim/boxesandarrows-20"&gt;Indexing from A to Z&lt;/a&gt;. New York: H.W. Wilson, 1995. p.214&lt;/li&gt;&lt;li&gt;Amy J. Warner &lt;a href="http://www.lexonomy.com/publications/aTaxonomyPrimer.html"&gt;Taxonomy Primer&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.BBC.com"&gt;http://www.BBC.com&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Keith Instone's &lt;a href="http://keith.instone.org/breadcrumbs/"&gt;attribute breadcrumbs&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.asis.org/Conferences/Summit2001/preconference.html"&gt;ASIS Summit 2001&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.gseis.ucla.edu/faculty/bates/berrypicking.html"&gt;Berrypicking&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.well.com/~doctorow/metacrap.htm"&gt;Metacrap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/files/banda/Bibliography.htm"&gt;An Annotated Bibliography&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/morebox&gt;&lt;biobox&gt;&lt;a href="http://www.boxesandarrows.com/people/archives/karl_fast.php"&gt;Karl Fast&lt;/a&gt; is a PhD student in library and information science at the University of Western Ontario. He also has a master's in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.

&lt;a href="http://www.boxesandarrows.com/people/archives/fred_leise.php"&gt;Fred Leise,&lt;/a&gt; president of &lt;a href="http://www.contextualanalysis.com"&gt;ContextualAnalysis, LLC,&lt;/a&gt; is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.

&lt;a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php"&gt;Mike Steckel&lt;/a&gt; is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX. &lt;/biobox&gt;</description>
      <pubDate>Mon, 16 Dec 2002 23:29:17 GMT</pubDate>
      <author>Fred Leise, Karl Fast, Mike Steckel</author>
      <category>Methods</category>
    </item>
    <item>
      <title>Creating a Controlled Vocabulary</title>
      <link>http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary</link>
      <guid>http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary</guid>
      <description>&lt;pullquote&gt;&amp;#8220;Creating a clear plan early on can save you a lot of trouble down the road and minimize unwelcome surprises. The broad strokes of CV design are like any other type of design: planning and preparation are essential, fundamental steps in producing a good design.&amp;#8221;&lt;/pullquote&gt;You have probably heard IAs discussing the benefits of their latest taxonomy project and how you should be implementing one. But &lt;i&gt;how&lt;/i&gt;, you might wonder, can you get started? 

This article describes a process for building your own controlled vocabulary (CV). A &lt;a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php"&gt;previous article&lt;/a&gt; discussed the concept of a CV&amp;#8212;the &amp;#8220;what.&amp;#8221; This article focuses on the &amp;#8220;how.&amp;#8221;

In this article we are looking at a process for creating any kind of controlled vocabulary. While our ultimate goal in this series is to &lt;a href="http://www.boxesandarrows.com/archives/all_about_facets_controlled_vocabularies.php"&gt;explain facets&lt;/a&gt;, the details of facet analysis will be described in a future article. At this point, we are still exploring fundamental concepts and techniques.

There are many ways to create a controlled vocabulary. What follows is just one methodology. Also, keep in mind that many of the steps described here are not discrete units. When you actually create a CV, some steps may overlap.

Now, let's get started. Imagine we are a company that sells camping gear, and we want to create a controlled vocabulary for our ecommerce site. 
 

&lt;span class="subhead"&gt;1. Develop a strategy. What do you want your controlled vocabulary to do?&lt;/span&gt;
The natural inclination when developing a CV is to start by gathering potential terms. But first, you need to consider a wide range of questions. Creating a clear plan early on can save you a lot of trouble down the road and minimize unwelcome surprises. The broad strokes of CV design are like any other type of design: planning and preparation are essential, fundamental steps in producing a good design.

First, what kind of CV do you need? The answer depends on a variety of issues. Start by thinking about some general questions such as these:
&lt;ul&gt;&lt;li&gt;What do you want your CV to accomplish?&lt;/li&gt;&lt;li&gt;Do you want the CV to integrate with your navigation system?&lt;/li&gt;&lt;li&gt;Are you planning on using the CV to improve searching? To improve browsing? Both?&lt;li&gt;Are you planning to show term relationships in your search results?&lt;/li&gt;&lt;li&gt;How much vocabulary control do you want to provide? Synonym ring? Facets? What level of vocabulary control is appropriate?&lt;/li&gt;&lt;/ul&gt;
Second, think about your dependencies:
&lt;ul&gt;&lt;li&gt;&lt;b&gt;Content&lt;/b&gt; - Consider this in two parts: specificity and stability. 

Specificity: If you are selling camping gear, are you selling 7-10 styles of tent or 100 styles? If you are selling 100 styles, you will need terms that are more specific and more exhaustive. This is because you will need to further differentiate among tents that are similar. The more items that are similar, the more specific you need to be.

Stability: Do the concepts and names for them change often? Do people generally call the same concept (or item or product) by the same name? In our example, we would ask if there are a lot of variant terms for the kinds of items we're selling. What will be your method for keeping up with changing terminology?&lt;/li&gt;&lt;li&gt;&lt;b&gt;Technology&lt;/b&gt; - There are two pieces to this one: tools and integration. Each will help you think about implementation early on.

Tools: Think about where the CV will ultimately sit. Do you have a CMS that will be involved? Will you be uploading your CV into a search engine? What software will you use to hold your terms: a thesaurus maintenance program like &lt;a href="http://www.multites.com/"&gt;Multites&lt;/a&gt;, &lt;a href="http://www.termtree.com.au/"&gt;Term Tree&lt;/a&gt;, or &lt;a href="http://www.lexico.com/"&gt;Lexico&lt;/a&gt;? Or will you be creating it in Excel? Also consider tools you might use while gathering your terms. Many people collect their terms in a large Excel spreadsheet, others on Post-it Notes, sometimes even a wiki might work nicely.

Integration: How will your CV be integrated with the other pieces of your system? If the CV is going to be used in multiple applications, you need to consider the requirements of each. Be sure you talk to someone in IT and outline what your goals are.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Users&lt;/b&gt;  - CV design is a user-centered process. You must &lt;i&gt;understand&lt;/i&gt; the target audience before setting down your terms. Who is the target audience for the site? The general public? Experienced campers? Are they web-savvy? How do they shop? Do they tend to buy one item at a time or several items at once?  Do they need to do a lot of research before they buy? In other words, good, standard user-centered design methods, such as interviews and observation, are appropriate. &lt;/li&gt;&lt;li&gt;&lt;b&gt;Maintenance&lt;/b&gt; - Who from the organization will maintain your controlled vocabulary? What amount of time can they spend on this task? What is their training? If you decide to create a highly complex controlled vocabulary that your high school intern is going to maintain, you will have to provide additional training for that person.  This is also a user-centered design issue, but along a different axis. Above we talked about a process that is extroverted: it looks towards the external users of the system. Here our axis is introverted: it looks towards the internal users of the system, the creators and maintainers of the vocabulary.&lt;/li&gt;&lt;/ul&gt;At this point, any normal person will say to himself, &amp;#8220;Geez! Enough with the questions! Let me get on with creating my controlled vocabulary!&amp;#8221; Resist this urge and stick with the discovery process; developing a strategy is important. You will probably change some of your answers as the project develops, but considering these questions up front will prevent you from wasting time later on. 


&lt;span class="subhead"&gt;2. Start gathering terms. What are the terms used to describe your content?&lt;/span&gt;
 Now you are ready to start gathering your terms. Your goal here, considering the constraints and strategies that came out of Step 1, is to identify the terms that will bring the most success to your user population, enabling them to find exactly the information they need.

This is where the process becomes a little bit like &amp;#8220;The Newlywed Game.&amp;#8221; In this TV game show, the contestants are newly married couples. While one half of the couple is in a soundproof room, the host asks the remaining partner some intimate questions (often about &amp;#8220;making whoopee&amp;#8221;). Later, they reunite the couple and ask the other partner the same questions to see how well their answers match up. The couple with the most matched answers wins the big prize. The underlying questions for this game include &amp;#8220;How well do the two sides of this relationship know each other?&amp;#8221; and &amp;#8220;How well can one half of the couple guess the answer the other half will give?&amp;#8221;

To win the big prize of increased content &lt;a href="http://www.boxesandarrows.com//archives/the_age_of_findability.php"&gt;findability&lt;/a&gt;, your site must describe your content in the terms that best match those terms the users are &lt;i&gt;likely&lt;/i&gt; to use. When your partner (the user) comes out of that soundproof booth, you want to feel confident that you have provided the terms he will use on your site. 

There are lots of great ways to get started with this process.

&lt;b&gt;A. Look inward.&lt;/b&gt; What are the terms you already use to describe items on your site? If you are selling something, what are you selling? Look at each item and start generating terms to describe the object. What are the concepts the terms cover? List them. If we were doing a thesaurus for camping gear, we might start with something like: backpacks, tents, bug spray, etc. Then consider alternative terms you might use for each item.

Consider the level of granularity you want to use to reach your target audience or need to use based on the number of similar items you sell. If the target audience for your camping gear CV is beginning campers, you might distinguish thick sleeping bags from your thinner options by making a distinction by season (as in "winter" and "summer" bags). However, if you are targeting expert campers, you may need to describe your bags as "2-season" or "3-season" bags, in terms of insulating material (goose down, Polarguard, PrimaLoft), or by the temperature ratings. You don't need to describe the entire field of camping gear; you need only describe your content in terms that will resonate with your target audience.

There is a danger here, however. Don't look inward and exclude the additional options for gathering terms described below. It is important to get outside of your own understanding of terms and their concepts. Be sure to follow the next steps as well.
 
&lt;b&gt;B. Look outward.&lt;/b&gt; Where are people using terms related to your content?  You might review competitors' sites, journals or magazines on your subject matter, or discussions by subject experts on the web. For example, if you are looking for terms about camping gear, you might look here:
 
&lt;a href="http://directory.google.com/Top/Shopping/Recreation/Outdoors/?il=1"&gt;http://directory.google.com/Top/Shopping/Recreation/Outdoors/?il=1&lt;/a&gt;

Look at the sites on the list and note how they describe items that you also sell. Are there relevant variant terms you didn't include from the looking inward step?

Consider the differences between &lt;a href="http://www.rei.com"&gt;REI&lt;/a&gt; and &lt;a href="http://www.mec.ca"&gt;MEC&lt;/a&gt; (Mountain Equipment Co-op, a Canadian outdoor equipment store). Note the differences and similarities between the terms they use. In this example, we have shown only the terms for their top-level categories; you should dig deeper and find out what terms they use for sub-categories and individual items.

&lt;table cellpadding="5" align="left"&gt;&lt;tr&gt;&lt;td valign=top&gt;&lt;img alt="mec-categories.jpg" src="/files/banda/creating_a_controlled_vocabulary/mec-categories.jpg" width="164" height="351" border="0" hspace="5"/&gt;&lt;/td&gt;&lt;td valign=top&gt;&lt;img alt="rei-categories.jpg" src="/files/banda/creating_a_controlled_vocabulary/rei-categories.jpg" width="145" height="462" border="0" hspace="5" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;

Sometimes, someone may already have developed a similar controlled vocabulary that you can use or modify. When this happens, we recommend that you perform an exuberant dance of joy. This won't work for our camping gear example, but if you are building a large controlled vocabulary on another topic, you might want to see if you could borrow from one of the controlled vocabularies here:

&#8212; &lt;a href="http://www.asindexing.org/site/refbooks.shtml"&gt;American Society of Indexers&lt;/a&gt;
&#8212; &lt;a href="http://www.willpower.demon.co.uk/thesbibl.htm"&gt;Publications on thesaurus construction and use&lt;/a&gt;

More than likely, you will need to simplify anything you use from one of these lists, but they might be worth reviewing. Often, just the exercise of reviewing other CVs can be helpful in discovering ways to improve your own.

But be careful. Borrowing terms from other sites can muddy your own particular site's strategy. Don't borrow so much that your message gets confused or loses distinction.
 
&lt;b&gt;C. Log files.&lt;/b&gt; If you already offer search, an easy option is to review your log files. Log files are goldmines of valuable customer information. They will give you an idea of what people think they might find on your site, as well as the words they use to describe what they are looking for. If you can get the file to display search results (as in 8 hits, 0 hits, etc.), you can see how successful people are. Or, reproduce the searches yourself to determine if people are getting relevant hits. See how &lt;a href="http://www.fastcompany.com/online/43/kozuh.html"&gt;Nordstrom's benefited from this technique&lt;/a&gt;.

&lt;b&gt;D. Ask people.&lt;/b&gt; Is there a way to ask users what they look for on your site? How would they describe your site's contents? 

Throughout Step 2, you are building into your CV what librarians call &amp;#8220;user warrant.&amp;#8221; This means that a term &amp;#8220;is justified for inclusion in an index (or CV) only if it is of interest to the users of the information service.&amp;#8221; (Lancaster, 26). Your CV will have high user warrant if the terms you include are real terms that people use to describe your content. If you include a lot of terms you suspect people might use, but that did not actually show up during your research, you will lower the user warrant. You are taking a risk: You may be unnecessarily muddying your CV.

At the end of this process you should have a large number of terms describing your site's content. 


&lt;span class="subhead"&gt;3. Establish preferred terms, variants and hierarchies. How do the pieces fit together?&lt;/span&gt;
 After Step 2, we are left with what is essentially a big bucket of unrelated terms. Now we start to put like terms together and identify each one's relationships. For each term, ask what is the broader (more general) term? What are the narrower (more specific) terms? If you are using terms to establish a navigation system, is this a preferred term or a variant? Your controlled vocabulary will start to come together as context is added to each term.

Using our camping gear example, a traditional CV notation for the terms we have collected about sleeping bags might look like this:

Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;BT Camping Equipment
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Down Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Synthetic Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Family Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Cold Weather Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT 2-Season Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT 3-Season Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Ultralight Sleeping Bags

(BT = broader term; NT = narrower term)

Some in your group might say, &amp;#8220;Hey, sleeping bags should go under Backpacking Equipment, not Camping Equipment.&amp;#8221; A perfectly good assertion. Somehow, you will need to decide this issue. Can &amp;#8220;Sleeping Bags&amp;#8221; be in both places? Should the term live in one place in the CV with a cross-reference from the other location? Maybe there is a distinction among different kinds of sleeping bag that you had not previously considered.

It might be a good time to do some research. For instance, ask yourself, &amp;#8220;How do REI and MEC describe their sleeping bags?&amp;#8221;

&lt;table cellpadding=5&gt;&lt;tr&gt;&lt;td valign=top class="articlebody"&gt;MEC does it like this:
&lt;a href="http://www.boxesandarrows.com/archives/images/040703_CV/mec-sleeping-bags.php" onclick="window.open('http://www.boxesandarrows.com/archives/images/040703_CV/mec-sleeping-bags.php', 'popup', 'width=758,height=490,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"&gt;&lt;img src="/files/banda/creating_a_controlled_vocabulary/mec-sleeping-bags-thumb.jpg" width="200" height="129" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span class="caption"&gt;Click to enlarge&lt;/span&gt;&lt;/td&gt;&lt;td valign=top class="articlebody"&gt;REI takes a completely different approach:
&lt;a href="http://www.boxesandarrows.com/archives/images/040703_CV/rei-sleeping-bags.php" onclick="window.open('http://www.boxesandarrows.com/archives/images/040703_CV/rei-sleeping-bags.php', 'popup', 'width=592,height=424,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"&gt;&lt;img src="/files/banda/creating_a_controlled_vocabulary/rei-sleeping-bags-thumb.gif" width="200" height="143" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span class="caption"&gt;Click to enlarge&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;


The differences are striking. The main ones include the following:
&lt;ul&gt;&lt;li&gt;&lt;b&gt;Depth:&lt;/b&gt; The most obvious distinction is how REI goes for increased depth, whereas MEC uses a shallower category set.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Term Choice:&lt;/b&gt; REI uses the general term &amp;#8220;Sleeping Gear,&amp;#8221; whereas MEC uses &amp;#8220;Sleeping Bags.&amp;#8221; What's interesting is that both sites classify terms for related materials&amp;#8212;pillows, stuff sacks, and so on&amp;#8212;as narrower terms, yet only REI uses the more generic term &amp;#8220;Sleeping Gear&amp;#8221; to describe this breadth.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Broader Terms:&lt;/b&gt; REI has &amp;#8220;Sleeping Gear&amp;#8221; as a narrower term under the top-level term &amp;#8220;Camp/Hike.&amp;#8221; MEC also has a similar top-level term&amp;#8212;&amp;#8220;Hiking/Camping Gear&amp;#8221;&amp;#8212;but instead of making &amp;#8220;Sleeping Bags&amp;#8221; a narrower term they put it at the same level.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Bags and Pads:&lt;/b&gt; MEC puts sleeping pads as a narrow term under sleeping bags. REI doesn't put them below sleeping bags, but at the same level in the hierarchy.&lt;/li&gt;&lt;/ul&gt;
Which is better? That's difficult to say. REI is more sophisticated in their categorization, probably because of their larger product line. While REI's scheme is more sophisticated, it's also more complicated, so perhaps the simplicity of the MEC approach is better. Most likely, these differences are the result of differing strategies.

Our intention here is not to suggest which is better, only to show how even a simple situation can give many alternative answers. Certainly one can find much to like about these schemes. But in each case, improvements can be made. They are muddled. Concepts are mixed and matched haphazardly. There are questions about scalability and future directions. Material, temperature, gender, and age are combined in surprising and inconsistent ways. For example, why does MEC put sleeping pads as a narrower term of sleeping bag when they are obviously related, yet distinct items? And we're still confused about the distinction between these two terms in the REI scheme: &amp;#8220;Kids' Camping Bags&amp;#8221; and &amp;#8220;Kids Backpacking Bags.&amp;#8221;

We will return to this example in a future article showing how facets can clarify this situation. But let's not get too far ahead of ourselves.

For now, the question is: How do you clarify these issues? How do you make these difficult decisions? Making these decisions can quickly get messy in a group environment. Perhaps you need to ask a smaller team to consider the question and report back to the larger group. Doing some analysis, as we did with MEC and REI, and looking at your own strategy should help clarify what it is you want to do. However you decide your questions, be sure to note why you made the decision you did (for more on this, see Step 5).

We have been arguing that a good CV design process is essentially a user-centered process. Getting feedback from users will give you a great deal of insight into the problems we have raised.

A simple and commonly used method of getting feedback is called card sorting. Find some people whom you consider to be your target users. Give them cards with examples of items for sale on your site and ask them to arrange them into groups of like objects, or objects that they believe should be together. Then ask them to label their groups of cards. Look for patterns among their responses, compare the results to your original content labels, and make any necessary adjustments. For some good additional materials on card sorting, &lt;a href="http://www.iawiki.net/CardSorting"&gt;see the IA Wiki&lt;/a&gt;. Yes, it really is that simple and effective.


&lt;span class="subhead"&gt;4. Identify the &amp;#8220;see also&amp;#8221; terms. What else might be interesting to your target audience?&lt;/span&gt;
 In most cases, related terms need to be identified only for large projects. If you are working on an ecommerce site, here is a way to connect related products that people might buy at the same time. In other words, you need to identify places where interest in one item might lead to interest in another. If your site users are buying camping boots, do they need socks? If they are buying backpacks, would they be interested in water bottles? Often, these are what the &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0596000359/ref%3Dnosim/boxesandarrows-20"&gt;Polar Bear book&lt;/a&gt; calls &amp;#8220;contextual navigation&amp;#8221; (116-118).

To get you started here, think about these possible relationships when considering related terms: 
&lt;ul&gt;&lt;li&gt;process/agent (camp fires/matches); &lt;/li&gt;&lt;li&gt;action/product of action (baking/cakes); &lt;/li&gt;&lt;li&gt;agent/counteragent (allergies/antihistamine); &lt;/li&gt;&lt;li&gt;raw material/product (wool/sweater).&lt;/li&gt;&lt;/ul&gt;
Putting this idea of cross-selling into traditional CV notation might look something like this:

Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;BT Camping 
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Down Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Synthetic Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Family Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Cold Weather Sleeping Bags  
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT 2-Season Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT 3-Season Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Back Packing Sleeping Bags  
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Expedition Class Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;NT Ultralight Sleeping Bags
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RT Backpacks  
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RT Ultralight Backpacking
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RT Sleeping Bag Liners
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RT Sleeping Pads
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RT Stuff Sacks
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RT Pillows

(RT = related term)

What constitutes a related term? That is something for you to decide. Try to strike the right balance between suggesting options and overwhelming a user with choices. You might want to run the card sorting exercise again, this time giving people a list of items on cards and ask, for each item, if there are any objects from your inventory that they might look for when purchasing it. Adjust your CV accordingly.


&lt;span class="subhead"&gt;5. Establish a record of the rules you are using if you are creating a large thesaurus. &lt;/span&gt;
I suspect most CV creators do not take the time to do this, and that is unfortunate. Remember all those decisions about what term goes where? Review the decisions you made and record what the decision was and why you made it. This will enable you to maintain consistency as your CV changes and expands. This makes your system easier to learn, and consequently, training your staff is easier. This is especially important for keeping categories pure if multiple people will be adding terms to content. It also makes for better decision-making in the future. 

I am reminded of an &lt;a href="http://slate.msn.com/?id=102916"&gt;interview with cellist Yo-Yo Ma&lt;/a&gt; who told some students, &amp;#8220;If you make specific choices in the music, we hear them.&amp;#8221; He added later in the class, &amp;#8220;If you don't make specific choices, we don't hear them.&amp;#8221; This is as true for the actions of a controlled vocabulary as it is for a piece of music. Be aware of the assumptions you are making and make them conscious choices; users will &amp;#8220;hear&amp;#8221; them.

Some possible questions to consider here are: When do you include a new term? What constitutes a relationship or RT? When do you delete terms? What is the basis for choosing a preferred term?  When are terms singular or plural? Nouns or verbs? How will you deal with punctuation? 

A place to look for generating issues you might want to consider is the &lt;a href="http://www.niso.org/standards/standard_detail.cfm?std_id=518"&gt;ANSI/NISO standard for thesaurus construction&lt;/a&gt;. Reviewing these guidelines and deciding what is relevant to your particular situation will help ensure the best possible outcome for your CV creation process. Now is also a great time to review the assumptions you made in Step 1.


&lt;span class="subhead"&gt;6. Implement.&lt;/span&gt;
This step is difficult to write about because implementation is extremely dependant on your specific context. The other steps are not easy, but in the real world implementation is often the most difficult. It is also something the literature on CVs rarely tackles in a meaningful way. For now, we will take the metaphorical 50,000-foot view.

If you are using your controlled vocabulary for developing a menu for navigation or categories for browsing, continue your user testing. At this stage you can present a more complete version for users to evaluate. If you have completed some testing earlier, this should involve only minor changes to your CV.

If you are using your controlled vocabulary for searching, get ready for more work: Tweaking the algorithms for a search engine is a difficult job involving lots of tradeoffs. It will also require a good relationship with your IT staff (good thing you started this already in Step 1!). A lot of difficult decisions will need to be made. Examples include how you use punctuation, Boolean operators (when to use AND and when to use OR connectors), and recall versus precision. Multiple word terms can sometimes be difficult (if your CV term is &amp;#8220;walking staff&amp;#8221; and the user enters &amp;#8220;Walking Staff Wood,&amp;#8221; does he get any variant terms for &amp;#8220;walking staff?&amp;#8221;). Your solution will depend on the search engine you are using, the audience, the content, and the tradeoffs you need to make to get your project up and running. 


&lt;span class="subhead"&gt;7. Test and evaluate.&lt;/span&gt;
You have done some testing during the CV creation process, now it is time to make sure the assumptions you have made throughout the process are correct when you consider the implementation as a whole. 

Start with yourself. Use the site to find various types of information based on assumptions you made earlier. Can you identify which content goes in which slot pretty easily? Can you search and get the results you expect? If using your CV to improve searching, enter a term and carefully look at the first page of returns. Are these the results you want your users to get for this search term?

After you feel like the CV is working as you believe it should, contact some outsiders and ask them to use your site. Do your terms reflect the concepts these people are searching for? Are they getting the results they expect? Are your terms too broad or too narrow? Remember, you are not always going to be successful. This is another time to keep the 80/20 rule in mind.


&lt;span class="subhead"&gt;8. Go back and refine.  What can be improved?&lt;/span&gt;
A controlled vocabulary is never finished. The goal of the initial creation of your CV is simply to create a system for controlling vocabulary that is agile, easy to update, consistent in both scope (what is covered) and granularity (how deeply it is covered), and helps users find what they are looking for. 

However, maintenance is required to keep your CV viable and usable. Constant monitoring, evaluation, and tweaking are critical. This may require daily reviews of search logs, regular testing with users, regular conversations with subject specialists, or other analysis. One of the arguments against using a controlled vocabulary is that it requires so much time to maintain, that it doesn't keep up with the changing terminology of the given field. Therefore, constant analysis is key to success. The list of improvements you can imagine needing to make will always be long, but don't lose sight of the smaller, daily &amp;#8220;housecleaning&amp;#8221; tasks.

There is a lot of talk about how controlled vocabularies improve a site's information architecture. If you decide to create one, however, it is important to realize that an effective controlled vocabulary involves regular maintenance. Doing it right will keep you aware of both the dynamic developments of your content and keep you close to the language of your users and their information needs.
&lt;p&gt;&lt;img src="/files/banda/art_end.gif" alt="" title="" width="8" height="8" /&gt;&lt;/p&gt;&lt;end&gt;&lt;/end&gt;
&lt;morebox&gt;&lt;ul&gt;&lt;li&gt;Cooper, Alan (1999). &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0672316498/ref=nosim/boxesandarrows-20"&gt;The Inmates are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity&lt;/a&gt;. SAMS publishing: Indianapolis, IN.&lt;/li&gt;&lt;li&gt;Lancaster, F.W. (1986). &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0878150064/ref=nosim/boxesandarrows-20"&gt;Vocabulary Control for Information Retrieval&lt;/a&gt; (2nd Edition). Information Resources Press: Arlington, VA.&lt;/li&gt;&lt;li&gt;Rosenfeld, Louis, &amp; Morville, Peter. (2002). &lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0596000359/ref%3Dnosim/boxesandarrows-20"&gt;Information Architecture for the World Wide Web: Designing large scale web sites.&lt;/a&gt; (2nd Edition). O'Reilly &amp; Associates: Sebastopol, CA.&lt;/li&gt;
&lt;li&gt;&lt;a href="/files/banda/Bibliography.htm"&gt;An Annotated Bibliography&lt;/a&gt;&lt;/morebox&gt;&lt;biobox&gt;&lt;a href="http://www.boxesandarrows.com/people/archives/karl_fast.php"&gt;Karl Fast&lt;/a&gt; is a PhD student in library and information science at the University of Western Ontario. He also has a master's in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.

&lt;a href="http://www.boxesandarrows.com/people/archives/fred_leise.php"&gt;Fred Leise,&lt;/a&gt; president of &lt;a href="http://www.contextualanalysis.com"&gt;ContextualAnalysis, LLC,&lt;/a&gt; is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.

&lt;a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php"&gt;Mike Steckel&lt;/a&gt; is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX. &lt;/biobox&gt;</description>
      <pubDate>Mon, 07 Apr 2003 22:43:18 GMT</pubDate>
      <author>Fred Leise, Karl Fast, Mike Steckel</author>
      <category>Methods</category>
    </item>
    <item>
      <title>Synonym Rings and Authority Files</title>
      <link>http://www.boxesandarrows.com/view/synonym_rings_and_authority_files</link>
      <guid>http://www.boxesandarrows.com/view/synonym_rings_and_authority_files</guid>
      <description>&lt;pullquote&gt;&lt;p&gt;&amp;#8220;Synonym rings and authority files are simple tools that can bridge the gap between natural language and complex controlled vocabularies (taxonomies and thesauri) quite nicely.&amp;#8221;&lt;/p&gt;&lt;/pullquote&gt;

&lt;em&gt;This is Part 3 in our continuing series on controlled vocabularies and faceted classification. Previous parts in the series include:&lt;/em&gt;

&lt;ul class="nobullets"&gt;
&lt;li&gt;&lt;a href="http://www.boxesandarrows.com/view/all_about_facets_amp_controlled_vocabularies"&gt;All About Facets and Controlled Vocabularies&lt;/a&gt; (series introduction)&lt;/li&gt;
&lt;li&gt;1. &lt;a href="http://www.boxesandarrows.com/view/what_is_a_controlled_vocabulary_"&gt;What is a Controlled Vocabulary?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2. &lt;a href="http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary"&gt;Creating a Controlled Vocabulary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

As any connoisseur of duct tape knows, when you need to get a job done, the simplest tool is often your best friend. This is as true for controlled vocabularies (CVs) as it is for home repair. Remember that &lt;a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php"&gt;our goal for CVs&lt;/a&gt; is to &amp;#8220;impose some order to facilitate agreement between the concepts within the site and the vocabulary of the person [natural language] using it.&amp;#8221;

But that doesn't mean the CV has to be complicated. Resources do not always allow for a full-fledged thesaurus, and often such a large undertaking is not necessary. Synonym rings and authority files are simple tools that can bridge the gap between natural language and complex controlled vocabularies (taxonomies and thesauri) quite nicely. We can explain how synonym rings work by way of an example. 

International SEMATECH, the semiconductor research consortium, had a searching problem. Documents were uploaded to a private research website in a highly decentralized manner. Member company employees from all over the world had the ability to upload their own research documents and meeting presentations to the website. 

A look at the search logs, however, revealed that people entered search terms that were yielding only a percentage of the documents they were trying to find. The problem was consistency of terminology. A review of the metadata found that those uploading information were equally as likely to call silicon &amp;#8220;Si&amp;#8221; as they were to spell out the whole name, &amp;#8220;silicon.&amp;#8221; There were many similar examples. Besides chemical symbols, users were both searching and uploading documents with acronyms (&amp;#8220;PSM&amp;#8221; vs. &amp;#8220;Phase Shift Mask&amp;#8221;) and simple variants in spelling (&amp;#8220;low K dielectrics&amp;#8221; vs. &amp;#8220;low-K dielectrics&amp;#8221; vs. &amp;#8220;lowk dielectrics&amp;#8221;). 

The way the system previously worked, a user who searched for &amp;#8220;Si,&amp;#8221; &amp;#8220;PSM,&amp;#8221; or &amp;#8220;low K dielectrics&amp;#8221; would get only exact matches. In other words, they would miss documents that had &amp;#8220;Silicon,&amp;#8221; &amp;#8220;Phase Shift Mask,&amp;#8221; or &amp;#8220;low-K dielectrics&amp;#8221; in their metadata. Furthermore, they would get enough hits so they might not have realized that some relevant documents were missing (if they had gotten zero hits, they might have suspected something was wrong and tried another term). 

It was our assumption that when users searched one term, they intended to find the entire set of documents related to that concept. But trying to get such an organization to adopt a style guide for metadata was not viable. The solution was to install a synonym ring into our search engine, &lt;a href="http://technet.oracle.com/products/text/content.html"&gt;Oracle Text.&lt;/a&gt;

&lt;h2&gt;What the synonym ring does&lt;/h2&gt;

A synonym ring connects a series of terms together and treats them all as equivalent for search purposes. When a user enters &amp;#8220;PSM,&amp;#8221; for instance, the search term will be sent through the synonym ring to see if there are any equivalent terms. For &amp;#8220;PSM&amp;#8221; we would find &amp;#8220;Phase Shift Mask&amp;#8221; as a synonym. The search engine would then retrieve all documents with either &amp;#8220;PSM&amp;#8221; or &amp;#8220;Phase Shift Mask&amp;#8221; in their metadata. The searcher would get the complete set of relevant documents as though they had searched both terms (something few people would think to do). 

If there is no match in the synonym list, the search is simply sent through the index as usual and any documents with &amp;#8220;PSM&amp;#8221; are returned. The synonym ring goes into effect only when there is a matching synonym for the term entered into the search box by the user.

Although getting a synonym ring up and running sounds pretty simple, the difficulties often come from trying to answer a simple question: &amp;#8220;What is a synonym?&amp;#8221; The example above was clear case of synonyms: An acronym and the full name of the object. It is not always this simple. A synonym can generally be two words with the exact or very similar meanings. Sounds simple, but how similar is similar enough?  True synonyms are a rare thing. 

&lt;h2&gt;What is a synonym?&lt;/h2&gt;

Some synonyms may appear to be pretty straightforward. These include:

&lt;ul&gt;&lt;li&gt;Acronyms:  BBC, British Broadcasting Company; MPG, miles per gallon&lt;/li&gt;&lt;li&gt;Variant spellings: cancelled, canceled; honor, honour&lt;/li&gt;&lt;li&gt;Scientific terms versus popular use terms: acetylsalicylic acid, aspirin; lilioceris, lily beetle&lt;/li&gt;&lt;/ul&gt;

But synonyms, in general, quickly become more difficult. Are &amp;#8220;medicine&amp;#8221; and &amp;#8220;drugs&amp;#8221; synonyms? Are &amp;#8220;fired&amp;#8221; and &amp;#8220;laid off&amp;#8221;? What about &amp;#8220;forest&amp;#8221; and &amp;#8220;woods&amp;#8221; or &amp;#8220;arid&amp;#8221; and &amp;#8220;dry&amp;#8221;? With these examples, it is more difficult to say for sure. To answer the question about whether two terms are synonyms, you often have to consider the overall content of your site, as well as the site's context and its users.

In our &lt;a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php"&gt;first article&lt;/a&gt;,  we gave the following example of a synonym (which demonstrates the equivalence relationship): 

&lt;img src="/files/banda/cv_1.jpg" width="343" height="43" alt="Example of a preferred term" /&gt;

But one could easily argue that these are not true synonyms. You may be looking for information about Elizabeth Taylor only during the time she was married to Larry Fortensky. In this case &amp;#8220;Elizabeth Fortensky&amp;#8221; might be the only part of the ring you would be interested in. Expanding the results by including results for both &amp;#8220;Elizabeth Warner&amp;#8221; and &amp;#8220;Elizabeth Fortensky&amp;#8221; would reduce the precision of the search results.

When creating a synonym ring, or any controlled vocabulary, you will spend a lot of time evaluating near synonyms. What guidelines should you use for making these decisions?

&lt;h2&gt;Recall and precision&lt;/h2&gt;

Information architecture in the real world is all about the tradeoffs, right? Librarians have long been aware of the tradeoffs one makes between a search system that is broad and one that is specific. A search system that is broad is one with high recall, while one that is very narrow is one with high precision. Let's look at these two terms a little more closely.

Recall is often represented as a ratio: 

&lt;p class="indent"&gt;number of retrieved relevant documents / all relevant documents in a collection&lt;/p&gt;

Recall measures how many of the relevant documents are returned to the user. When you are searching a system with high recall, you are able to get a comprehensive set of documents returned, but you increase the possibility that less relevant documents will also get returned. This is great when you want to look through a large number of documents to make sure you have seen everything on a certain topic. Techniques for increasing recall include a synonym ring, stemming (some search engines will automatically return &amp;#8220;jumping&amp;#8221; and &amp;#8220;jumps&amp;#8221; when someone searches &amp;#8220;jump&amp;#8221;), and wildcards.

Precision, like recall, is often represented as a ratio:

&lt;p class="indent"&gt;number of retrieved relevant documents / total number of documents retrieved&lt;/p&gt;

You want to return all relevant documents to each user. So why not return all documents in your system for every search? That way you can be sure that every single relevant document is returned to the user, right? Well, true, but you're also returning many irrelevant documents at the same time, making it harder for users to find what they want. 

Precision ensures that only the relevant documents are returned to the user. When you are searching a system with high precision, your results are specific to your search. This is closer to a known-item search. You want only relevant search results and are less tolerant of getting some irrelevant results mixed in.

You can increase search precision by using specific indexing terms (&amp;#8220;Ferrari&amp;#8221; and not &amp;#8220;sports car&amp;#8221;), little or no stemming, word proximity operators (how closely words appear next to each other), and search zones.

Measuring the recall and precision of a particular search engine can be &lt;a href="http://www.tbray.org/ongoing/When/200x/2003/06/22/PandR"&gt;difficult&lt;/a&gt;. Measuring recall and precision using hard numbers is questionable. Relevance is difficult to quantify since it is inconstant (even during the course of a single search, relevance may change) and subjective. 

A better way to get a handle on precision and recall is to collect responses from your users. What do people complain about? Do they say, &amp;#8220;I get too many results?&amp;#8221; This really means, &amp;#8220;I get too many irrelevant results&amp;#8221; and is a sign your recall might be too high. Do people say &amp;#8220;I know it is in there, but I can't find it?&amp;#8221; or &amp;#8220;I get no hits for too many searches?&amp;#8221; If so, you might have precision too high. Just remember, recall and precision are inversely related: as one goes up, the other goes down. You will need to strike a balance.

&lt;h2&gt;Authority files&lt;/h2&gt;

So now that we know what a synonym ring is, we can define an authority file. An authority file is similar to the synonym ring, with the addition of one type of term relationship. Instead of all of the terms being equal, one term is identified as the preferred term and the others are considered variant terms.

Authority files help with tagging content consistently. Catalogers for large library collections have long used authority files to find approved terms for describing an item. When they get a book about the Italian city of Firenze and another one about the Italian city of Florence, they use one of the names (based on prescribed rules) and describe all books in the collection about the city using a single, consistent term. 

Similarly, in most major academic libraries, all books about &amp;#8220;Native Americans&amp;#8221; and  &amp;#8220;American Indians&amp;#8221; are described with the term &amp;#8220;Indians of North America.&amp;#8221; When someone performs a subject search on &amp;#8220;Native Americans&amp;#8221; they get a note that says something like &amp;#8220;This term is indexed as INDIANS OF NORTH AMERICA.&amp;#8221; The authority file is the place you go find which term is the heading (the main term) and which term is the cross reference (the variant term). 

A more typical example on a website might work like this: Let's say you have a website devoted to comic books. It would be great if when someone typed &amp;#8220;Caped Crusader&amp;#8221; or &amp;#8220;Dark Knight&amp;#8221; into the search box, they got results for &amp;#8220;Batman.&amp;#8221; In this example, &amp;#8220;Batman&amp;#8221; and &amp;#8220;Caped Crusader&amp;#8221; would not be considered equivalent terms; the authority file would explain their relationship. You would not want to identify each Batman comic book with all three terms, just the main term. But when a user entered &amp;#8220;Caped Crusader,&amp;#8221; you would want the system to convert their term to &amp;#8220;Batman&amp;#8221; and return the appropriate results.

The relationships among the terms could be expressed like this:

&lt;img src="/files/banda/synonym_rings_and_authority_files/steckel_082503_2.gif" width="292" height="213" alt="Example of an authority file relationship" /&gt;

Or in the language of a controlled vocabulary, like this:

&lt;div class="indent"&gt;
&lt;p&gt;Batman&lt;br /&gt;USE FOR: Dark Knight, Caped Crusader&lt;/p&gt;
&lt;p&gt;Caped Crusader&lt;br /&gt;USE Batman&lt;/p&gt;
&lt;p&gt;Dark Knight&lt;br /&gt;USE Batman&lt;/p&gt;
&lt;/div&gt;

Another way that people use authority files is to reinforce a correct term and to discourage an incorrect term. The Polar Bear Book uses the example of how drugstore.com corrects the spelling of Tylenol using an authority file. If you enter &amp;#8220;tilenol&amp;#8221; into the site's search box, you get the results for &amp;#8220;Tylenol.&amp;#8221; Users will see the correct spelling prominently displayed, which will remind them how the word is really spelled. Maybe they will remember the correct spelling in the future.

&lt;h2&gt;Guidelines for implementation&lt;/h2&gt;

When putting a synonym ring or authority file in place, consider the following guidelines:

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Show users how their search term was changed or added to by the system and exactly what was searched. At International SEMATECH, a search for Silicon would look like this:&lt;/p&gt;&lt;p&gt;&lt;img src="/files/banda/synonym_rings_and_authority_files/steckel_082503_3.gif" width="594" height="132" alt="Example showing the user exactly how the search was submitted." /&gt;&lt;/p&gt;&lt;p&gt;The line under the search box tells users exactly how the search was submitted. When users understand how their term is expanded to include synonyms, they have a better understanding of how the site works. When done well, explanations can also increase confidence that users have in the system, since it shows them that the system understands what they are looking for.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;Keep the display simple. Include a search box at the top of the page so users can edit their terms if they see they have made a mistake. Try to follow the prescient words of the old poem:
&lt;blockquote&gt;&lt;p&gt;Give me a look, give me a[n inter] face,&lt;br /&gt;That makes simplicity a grace;&lt;/p&gt;&lt;p&gt;&amp;#8212; Ben Johnson (slightly modified) [&lt;a href="http://www.bartleby.com/100/146.9.html"&gt;http://www.bartleby.com/100/146.9.html&lt;/a&gt;]&lt;/p&gt;&lt;/blockquote&gt;&lt;/li&gt;
&lt;li&gt;Try to characterize your content and the way your users understand it. At International SEMATECH, the majority of the synonym ring we use is made up of acronyms, since the scientific community seems to love creating and communicating with them. The content is also very narrow and scientific. There is not a great deal of the mushy language that comes from the general culture; most of it is very well defined. A general rule: The broader the content your site covers, the more you will find yourself dealing with near synonyms. Try to make similar evaluations of the content you are searching.&lt;/li&gt;
&lt;li&gt;Review search logs every day to look for new terms and synonyms. Is someone looking for an acronym that is not on your list? Try to find out what it means and make sure the next person looking for it gets the correct results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

Synonym rings and authority files are simple, common-sense ways to help users connect the various semantic concepts that are inherently intertwined with the term they choose. They are particularly good for large decentralized sites that are search dominant and have little centralized control over content. 

Most of us know by now that users tend to use a small number of words for each search. They should not be forced to consider all the synonyms their search terms might have. &lt;a href="http://www.tbray.org/ongoing/When/200x/2003/06/24/IntelligentSearch"&gt;Tim Bray&lt;/a&gt; said it well: &amp;#8220;If you need to know about cow farming, you're probably also searching for cattle ranching, beef (or dairy) production, and Kuhbauernhof, whether you know it or not.&amp;#8221; &lt;p&gt;&lt;img src="/files/banda/art_end.gif" alt="" title="" width="8" height="8" /&gt;&lt;/p&gt;&lt;morebox&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.boxesandarrows.com/archives/all_about_facets_controlled_vocabularies.php"&gt;All About Facets and Controlled Vocabularies&lt;/a&gt; (series introduction)&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php"&gt;What is a Controlled Vocabulary?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php"&gt;Creating a Controlled Vocabulary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/files/banda/Bibliography.htm"&gt;An Annotated Bibliography&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/morebox&gt;

&lt;biobox&gt;&lt;a href="http://www.boxesandarrows.com/people/archives/karl_fast.php"&gt;Karl Fast&lt;/a&gt; is a PhD student in library and information science at the University of Western Ontario. He also has a master's in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.boxesandarrows.com/people/archives/fred_leise.php"&gt;Fred Leise,&lt;/a&gt; president of &lt;a href="http://www.contextualanalysis.com"&gt;ContextualAnalysis, LLC,&lt;/a&gt; is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php"&gt;Mike Steckel&lt;/a&gt; is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX.&lt;/biobox&gt;</description>
      <pubDate>Tue, 26 Aug 2003 21:08:53 GMT</pubDate>
      <author>Fred Leise, Karl Fast, Mike Steckel</author>
      <category>Findability</category>
      <category>Methods</category>
    </item>
    <item>
      <title>Controlled Vocabularies: A Glosso-Thesaurus</title>
      <link>http://www.boxesandarrows.com/view/controlled_vocabularies_a_glosso_thesaurus</link>
      <guid>http://www.boxesandarrows.com/view/controlled_vocabularies_a_glosso_thesaurus</guid>
      <description>&lt;pullquote&gt;&amp;#8220;There is a singular lack of vocabulary control in the field of controlled vocabularies.&amp;#8221;&lt;br /&gt;&lt;i&gt;&amp;#8212; Bella Hass Weinberg&lt;/i&gt;&lt;/pullquote&gt;&lt;br /&gt;&lt;i&gt;This is part 4 in our continuing series on controlled vocabularies and faceted classification. Previous parts in the series include:&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a href="http://www.boxesandarrows.com/view/all_about_facets_controlled_vocabularies"&gt;All About Facets and Controlled Vocabularies&lt;/a&gt; (series introduction)&lt;br /&gt;1. &lt;a href="http://www.boxesandarrows.com/view/what_is_a_controlled_vocabulary_"&gt;What is a Controlled Vocabulary?&lt;/a&gt;&lt;br /&gt;2. &lt;a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php"&gt;Creating a Controlled Vocabulary&lt;/a&gt;&lt;br /&gt;3. &lt;a href="http://www.boxesandarrows.com/archives/synonym_rings_and_authority_files.php"&gt;Synonym Rings and Authority Files&lt;/a&gt;&lt;/i&gt;&lt;/p&gt;    &lt;p&gt;&lt;span class="subhead"&gt;Introduction&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&amp;#8220;There is a singular lack of vocabulary control in the field of controlled vocabularies,&amp;#8221; Bella Hass Weinberg, professor of library science at St. John&amp;#8217;s University in New York, is fond of saying.&lt;/p&gt;    &lt;p&gt;To help you cut through the maze of verbiage often found in this field, we have created a glossary of terms.&lt;/p&gt;    &lt;p&gt;The glossary reflects our usage of terms in the articles of this series. But this glossary is more than just a list of terms. We wanted it to serve as an illustration of what a controlled vocabulary looks like (we are fond of killing multiple birds with multiple stones).&lt;/p&gt;    &lt;p&gt;Accordingly, the glossary is itself a controlled vocabulary, more specifically a thesaurus. So you will find all of the standard features of any thesaurus: broader, narrower, and variant term indicators, as well as scope notes. In this case, however, the scope notes provide the definition of the particular glossary term being presented.&lt;/p&gt;    &lt;p&gt;&lt;span class="subhead"&gt;Glosso-Thesaurus&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;The following standard abbreviations are used in the glosso-thesaurus.&lt;/p&gt;    &lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;b&gt;BT&lt;/b&gt; = Broader Term &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;b&gt;NT&lt;/b&gt; = Narrower Term &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;b&gt;RT&lt;/b&gt; = Related Term (&amp;#8220;See also&amp;#8221;) &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;b&gt;SN&lt;/b&gt; = Scope Note &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;b&gt;UF&lt;/b&gt; = Used For &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;b&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;USE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt; = &amp;#8220;See&amp;#8221; (Refers reader from variant term to vocabulary term.)&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="alternateterm"&gt;&lt;/a&gt;&lt;b&gt;Alternate Term&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;USE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;a href="#variantterm"&gt;Variant Term&lt;/a&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="associativerelationship"&gt;&lt;/a&gt;&lt;b&gt;Associative Relationship&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN The connection between related &lt;a href="#vocabularyterm"&gt;vocabulary terms&lt;/a&gt;. That is, related terms are connected through an associative relationship.&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;BT &lt;a href="#termrelationship"&gt;Term Relationship&lt;/a&gt;&lt;br /&gt;RT &lt;a href="#equivalencerelationship"&gt;Equivalence Relationship&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#hierarchicalrelationship"&gt;Hierarchical Relationship&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#relatedterm"&gt;Related Term&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="authorityfile"&gt;&lt;/a&gt;&lt;b&gt;Authority File&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;SN A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; flat (non-hierarchical) list containing &lt;a href="#preferredterm"&gt;preferred terms&lt;/a&gt;. May include &lt;a href="#variantterm"&gt;variant terms&lt;/a&gt;. Essentially, an authority file is a &lt;a href="#synonymring"&gt;synonym ring&lt;/a&gt; with the preferred term identified for each concept.&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;BT &lt;a href="#controlledvocabulary"&gt;Controlled Vocabulary&lt;/a&gt;&lt;br /&gt;RT &lt;a href="#synonymequivalencelist"&gt;Synonym Equivalence List&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="broaderterm"&gt;&lt;/a&gt;&lt;b&gt;Broader Term&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN The superordinate word in an inclusion or &lt;a href="#hierarchicalrelationship"&gt;hierarchical relationship&lt;/a&gt;. A class or category term. Abbreviated in displays as &amp;#8220;BT.&amp;#8221; The inversion of broader term is &lt;a href="#narrowerterm"&gt;narrower term&lt;/a&gt;. For example, &amp;#8220;shoe&amp;#8221; is a broader term than &amp;#8220;running shoe.&amp;#8221; Broader terms are sometimes referred to as &amp;#8220;parent&amp;#8221; terms. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;UF Parent Term &lt;br /&gt;RT &lt;a href="#hierarchicalrelationship"&gt;Hierarchical Relationship&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#narrowerterm"&gt;Narrower Term&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#relatedterm"&gt;Related Term&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="cardsorting"&gt;&lt;/a&gt;&lt;b&gt;Card Sorting&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN An exercise that can be used to help create a &lt;a href="#controlledvocabulary"&gt;controlled vocabulary&lt;/a&gt;. In a card sort, users are asked to group cards into like categories or to name categories of like items. Card sorting can be used to compile lists of variant terms or to verify the relationships in a hierarchy. For additional information, see &lt;a href="http://www.boxesandarrows.com/archives/cardbased_classification_evaluation.php"&gt;Card-Based Classification Evaluation&lt;/a&gt; by Donna Maurer or the &lt;a href="http://www.iawiki.net/CardSorting"&gt;IAWiki page on card sorting&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;RT &lt;a href="#freelisting"&gt;Free Listing&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#hierarchy"&gt;Hierarchy&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="childterm"&gt;&lt;/a&gt;&lt;b&gt;Child Term&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;USE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;a href="#narrowerterm"&gt;Narrower Term&lt;/a&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="controlledvocabulary"&gt;&lt;/a&gt;&lt;b&gt;Controlled Vocabulary&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;SN A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; subset of &lt;a href="#naturallanguage"&gt;natural language&lt;/a&gt; that is used to tag documents and then to find content through navigation or search. Use of a controlled vocabulary increases consistency in tagging and can help match users&amp;#8217; natural language with &lt;a href="#preferredterm"&gt;preferred terms&lt;/a&gt;. Abbreviated as &amp;#8220;CV.&amp;#8221; &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;Controlled vocabularies exhibit the following relationships:&lt;/p&gt;    &lt;p&gt;&lt;a href="#synonymring"&gt;Synonym ring&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ &lt;a href="#preferredterm"&gt;Preferred terms&lt;/a&gt; =&lt;br /&gt;&lt;a href="#authorityfile"&gt;Authority file&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ &lt;a href="#broaderterm"&gt;Broader&lt;/a&gt; and &lt;a href="#narrowerterm"&gt;narrower terms&lt;/a&gt; =&lt;br /&gt;&lt;a href ="#taxonomy"&gt;Taxonomy&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ &lt;a href="#relatedterm"&gt;Related terms&lt;/a&gt; =&lt;br /&gt;&lt;a href="#thesaurus"&gt;Thesaurus&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;NT &lt;a href="#authorityfile"&gt;Authority File&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#facetedclassification"&gt;Faceted Classification&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#synonymequivalencelist"&gt;Synonym Equivalence List&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#synonymring"&gt;Synonym Ring&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#taxonomy"&gt;Taxonomy&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#thesaurus"&gt;Thesaurus&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;RT &lt;a href="#naturallanguage"&gt;Natural Language&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="entryterm"&gt;&lt;/a&gt;&lt;b&gt;Entry Term&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;USE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;a href="#variantterm"&gt;Variant Term&lt;/a&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="equivalencerelationship"&gt;&lt;/a&gt;&lt;b&gt;Equivalence Relationship&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN The connection between terms in a &lt;a href="#synonymring"&gt;synonym ring&lt;/a&gt;, or between &lt;a href="#preferredterm"&gt;preferred terms&lt;/a&gt; and &lt;a href="#variantterm"&gt;variant terms&lt;/a&gt;. Terms that exhibit an equivalence relationship refer to the same concept. For example, &amp;#8220;cat&amp;#8221; and &amp;#8220;feline&amp;#8221; are often considered as being equivalent. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;BT &lt;a href="#termrelationship"&gt;Term Relationship&lt;/a&gt; &lt;br /&gt;RT &lt;a href="#associativerelationship"&gt;Associative Relationship&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#hierarchicalrelationship"&gt;Hierarchical Relationship&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#variantterm"&gt;Variant Term&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="exhaustivity"&gt;&lt;/a&gt;&lt;b&gt;Exhaustivity&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN The range of concept coverage of &lt;a href="#vocabularyterm"&gt;vocabulary terms&lt;/a&gt; in a &lt;a href="#controlledvocabulary"&gt;controlled vocabulary&lt;/a&gt;. If the vocabulary terms cover all of the concepts included in the content under consideration, then the controlled vocabulary is exhaustive. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;RT &lt;a href="#specificity"&gt;Specificity&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="facet"&gt;&lt;/a&gt;&lt;b&gt;Facet&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;SN A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; fundamental category by which an object or concept may be described. For example, a child&amp;#8217;s ball may be described using the facets of size, weight, shape, color, texture, material and price. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;RT &lt;a href="#facetanalysis"&gt;Facet Analysis&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#facetedclassification"&gt;Faceted Classification&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="facetanalysis"&gt;&lt;/a&gt;&lt;b&gt;Facet Analysis&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN The process of analyzing content to determine appropriate &lt;a href="#facet"&gt;facets&lt;/a&gt; and &lt;a href="#vocabularyterm"&gt;vocabulary term&lt;/a&gt; &lt;a href="#termrelationship"&gt;relationships&lt;/a&gt;, using &amp;#8220;one characteristic of division at a time, to produce homogeneous, mutually-exclusive groups.&amp;#8221; * &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;RT &lt;a href="#facet"&gt;Facet&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#facetedclassification"&gt;Faceted Classification&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;* Aitchison, Jean, Alan Gilchrist, and David Bawden (2002). Thesaurus Construction and Use: A Practical Manual. 4th ed. Chicago: Fitzroy-Dearborn, pg. 70.&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="facetedclassification"&gt;&lt;/a&gt;&lt;b&gt;Faceted Classification&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;&lt;span class="caps"&gt;SN A&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;a href="#controlledvocabulary"&gt;controlled vocabulary&lt;/a&gt; that divides &lt;a href="#vocabularyterm"&gt;vocabulary terms&lt;/a&gt; into &lt;a href="#facet"&gt;facets&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;BT &lt;a href="#controlledvocabulary"&gt;Controlled Vocabulary&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="freelisting"&gt;&lt;/a&gt;&lt;b&gt;Free Listing&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;A method of vocabulary development in which users are asked to &amp;#8220;name all the [x] you know.&amp;#8221; Free listing can identify core terms in a &lt;a href="#controlledvocabulary"&gt;controlled vocabulary&lt;/a&gt;, as well as &lt;a href="#variantterm"&gt;variant terms&lt;/a&gt;. For additional information, see &lt;a href="http://www.boxesandarrows.com/archives/beyond_cardsorting_freelisting_methods_to_explore_user_categorizations.php"&gt;Beyond cardsorting: Free-listing methods to explore user categorizations&lt;/a&gt; by Rashmi Sinha.&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;RT &lt;a href="#cardsorting"&gt;Card Sorting&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="#userwarrant"&gt;User Warrant&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="granularity"&gt;&lt;/a&gt;&lt;b&gt;Granularity&lt;/b&gt;&lt;/p&gt;&lt;ul&gt;SN The level of &lt;a href="#specificity"&gt;specificity&lt;/a&gt; with which content is described. The more granular, the more specific. &lt;br /&gt;&lt;br /&gt;    &lt;p&gt;RT &lt;a href="#specificity"&gt;Specificity&lt;/a&gt;&lt;/ul&gt;&lt;/p&gt;&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;    &lt;p&gt;&lt;a name="hierarchy"&gt;&lt;/a&gt;&lt;b&gt;Hierarchy&lt