<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Comments on Measuring the Success Of a Classification System</title>
    <link>http://www.boxesandarrows.com/view/measuring-the</link>
    <pubDate>Wed, 23 May 2007 01:42:07 GMT</pubDate>
    <description>The design of complex information systems often calls for early validation of the proposed classification schemes. Iain Barker offers an evaluation method that may help.</description>
    <item>
      <description>&lt;p&gt;@Miles &amp;#8211; I played around with using conditional formatting, but with my limited Excel powers (!) I was unable to make this work is the correct answer could in multiple locations, i.e. the correction location cell has multiple right answers seperated by commas. I suppose there are some clunky workarounds for this (using multiple cells), but if you know of a more clever way please do share!&lt;/p&gt;

	&lt;p&gt;@Simon &amp;#8211; Thanks for your comments. I&amp;#8217;ll try to tidy up a version of the spreadsheet and make it available in due course.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_8072</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_8072</guid>
      <pubDate>Wed, 23 May 2007 01:42:07 GMT</pubDate>
      <author>Iain Barker</author>
    </item>
    <item>
      <description>&lt;p&gt;Another variation I have also tried is using a wireframe of the home page combined with index cards for lower levels of the hierarchy. The wireframe included navigation for users to &amp;#8216;jump&amp;#8217; a level deeper, so a rigid top down test would not reflect the way in which users navigated on the site. This approach allowed us to explore the combination of taxonomy and navigation in a single test (kind of a mix between paper prototyping and Donna&amp;#8217;s card-based classification evaluation).&lt;/p&gt;

	&lt;p&gt;- Miles.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_8032</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_8032</guid>
      <pubDate>Sun, 27 May 2007 10:26:26 GMT</pubDate>
      <author>Miles Rochford</author>
    </item>
    <item>
      <description>&lt;p&gt;One suggestion I would add is the use of &amp;#8216;conditional formatting&amp;#8217; in Excel to colour code the cells, both in terms of whether the responses were &amp;#8216;correct&amp;#8217; and in terms of the percentage of correct responses. This saves a &lt;span class="caps"&gt;LOT&lt;/span&gt; of time and reduces the risk of error.&lt;/p&gt;

	&lt;p&gt;I&amp;#8217;ve also used a variation on Donna&amp;#8217;s original technique which allowed me to highlight the areas where users experienced difficulty in classifying &amp;#8216;correctly&amp;#8217; and what their alternative classifications were. This assisted greatly in improving the results in subsequent evaluations.&lt;/p&gt;

	&lt;p&gt;- Miles.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_8031</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_8031</guid>
      <pubDate>Sun, 27 May 2007 10:26:12 GMT</pubDate>
      <author>Miles Rochford</author>
    </item>
    <item>
      <description>&lt;p&gt;Iain,&lt;/p&gt;

	&lt;p&gt;I must say that this is a wonderfully clear representation of data. I am sure Tufte would be proud of you.&lt;/p&gt;

	&lt;p&gt;To save us all the pain of reinventing the wheel, is there any chance you could provide this spreadsheet for download? Please excuse me if I missed the link if it already exists.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7921</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7921</guid>
      <pubDate>Fri, 18 May 2007 16:02:44 GMT</pubDate>
      <author>Simon Johnson</author>
    </item>
    <item>
      <description>&lt;p&gt;Iain,&lt;/p&gt;

	&lt;p&gt;From a statistical stand-point you&amp;#8217;d be looking to use binomial point estimators and confidence intervals to estimate the &amp;#8216;real&amp;#8217; (i.e. population) values for 1st-time completion, overall success etc. 15-30 people will give you a confidence interval that is relatively broad. I&amp;#8217;ve worked some of these out previously for reference:&lt;/p&gt;

	&lt;p&gt;Task with measured &amp;#8216;success&amp;#8217; rate of 2/3 (66.67%): 47.7% &amp;#8211; 81.9% with an expected success ratio of 64.8% (30 users)&lt;br /&gt;Task  with measured &amp;#8216;success&amp;#8217; rate of 4/5 (80%): 61.44% &amp;#8211; 91.75% with an expected success ratio of 76.6% (30 users)&lt;br /&gt;Task  with measured &amp;#8216;success&amp;#8217; rate of 3/4 (75%): 56.82% &amp;#8211; 87.82% with an expected success ratio of 72.32% (32 users)&lt;/p&gt;

	&lt;p&gt;What that means is that, if you&amp;#8217;re tests gave you an 80% measure of &amp;#8216;success&amp;#8217; (whatever that was), then you would expect the user population as a whole to perform the same task with a success rate of 76.6%, with a 99% confidence interval (i.e. 1/100 chance that it really lies outside the range) of 61.44% at the low end, and 87.82% at the high end. To put that another way &amp;#8211; there&amp;#8217;s a half of a percentage point chance that users will actually fare worse than 61.44% if you measured 80% during the test.&lt;/p&gt;

	&lt;p&gt;I would also reinforce your point about not using the same users again for future iterations &amp;#8211; you&amp;#8217;d basically invalidate the results &amp;#8211; statistically speaking &amp;#8211; in that event.&lt;/p&gt;

	&lt;p&gt;Nicely written article, btw.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7523</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7523</guid>
      <pubDate>Wed, 16 May 2007 23:42:11 GMT</pubDate>
      <author>Steve Baty</author>
    </item>
    <item>
      <description>&lt;p&gt;Mike,&lt;br /&gt;Thanks for your comment. So far I&amp;#8217;ve only used the technique to provide data to support what would otherwise be an entirely subjective assessment of the success/failure of the classification system.&lt;/p&gt;

	&lt;p&gt;The limited budgets/timescales I typically work within have only enabled me to run one or two days of sessions within each iteration. This means I&amp;#8217;ve spoken to between 15 to 30 people, each attempting 10 to 15 tasks. I am not a statistician, but I am guessing that these numbers won&amp;#8217;t produce statistically valid data (I am always careful to point this out when using this technique to communicate to clients).  Sadly I can&amp;#8217;t advise as to what kind of numbers you would need to involve to provide statistically valid data &amp;#8211; maybe someone else can help there?&lt;/p&gt;

	&lt;p&gt;As for repeatedly using the same people, I have always had the luxury of a large pool of potential users, and have always used new users for each iteration. Obviously this could cause some to question the comparison between iterations, but I&amp;#8217;ve never had this problem with the clients I&amp;#8217;ve worked with. Even if a client wants to include some repeat participants, I would always argue for some new participants with each iteration.&lt;/p&gt;

	&lt;p&gt;My preference for using new participants is primarily so they don&amp;#8217;t take ownership of any solutions/recommendations they may give during the sessions. Also I find that from a political perspective, it is often better to be able to say that 100 people participated in the creation of the classification system, rather than just 15.&lt;/p&gt;

	&lt;p&gt;I am sure that a statistician may well be wincing at my response &amp;#8211; if one is out there reading and wants to recommend how a more rigourous approach could be applied I&amp;#8217;d like to hear from them.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7493</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7493</guid>
      <pubDate>Mon, 07 May 2007 01:25:36 GMT</pubDate>
      <author>Iain Barker</author>
    </item>
    <item>
      <description>&lt;p&gt;Iain,  Very nice article.  I work in the Process Improvement arena, specifically with &lt;span class="caps"&gt;CMMI&lt;/span&gt;, and capturing measures and managing to them is integral to achieving high level certifications.  Measuring the improvement of websites, and proving it, can be tricky.  Your spreadsheets provide a simple and understandable way of showing this.&lt;/p&gt;

	&lt;p&gt;How many people do you interview (ie. is there a magic number), and is it important that the same people be interviewed throughout the process?&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7398</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7398</guid>
      <pubDate>Fri, 04 May 2007 13:20:23 GMT</pubDate>
      <author>Mike Murphy</author>
    </item>
    <item>
      <description>&lt;p&gt;Thanks Dawn &amp;#8211; hopefully that will be corrected very soon.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7366</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7366</guid>
      <pubDate>Thu, 03 May 2007 22:26:42 GMT</pubDate>
      <author>Iain Barker</author>
    </item>
    <item>
      <description>&lt;p&gt;Hey Ian &amp;#8211; did you accidentally repeat the first photo of the index card after this text:&lt;/p&gt;

	&lt;p&gt;2. On another set of index cards, write and number around 15 common information-seeking tasks. One task per index card.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7304</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7304</guid>
      <pubDate>Thu, 03 May 2007 22:25:53 GMT</pubDate>
      <author>Dawn Buie</author>
    </item>
    <item>
      <description>&lt;p&gt;Chad,&lt;br /&gt;So far I&amp;#8217;ve always tried to avoid making the user focus too much on their success/failure &amp;#8211; I&amp;#8217;ve just given them two attempts, but it sounds like what you are suggesting could work. Let me know how it goes.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7297</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7297</guid>
      <pubDate>Thu, 03 May 2007 01:04:00 GMT</pubDate>
      <author>Iain Barker</author>
    </item>
    <item>
      <description>&lt;p&gt;Nifty. One additional success metric to apply would be the levels of the hierarchy traversed to get to each item. I.e., average &amp;#8220;clicks&amp;#8221; to finish each scenario in each redesign. The success rate and &amp;#8220;click&amp;#8221; count together would really be compatible with each other. I&amp;#8217;ll have to give this a go in the future.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7261</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7261</guid>
      <pubDate>Fri, 04 May 2007 13:04:49 GMT</pubDate>
      <author>Chad Wingrave</author>
    </item>
    <item>
      <description>&lt;p&gt;at first glance this looks like a great way of improving your ia iteratively.  but do the results get better because you adapt to the test users or the other way around?&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7097</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7097</guid>
      <pubDate>Fri, 27 Apr 2007 20:45:26 GMT</pubDate>
      <author>henrik persson</author>
    </item>
    <item>
      <description>&lt;p&gt;thank you so much for outlining this.  the final spreadsheet really helps show clients the value of the different iterations.   I will apply my own flavor to it and report the results!&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7076</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7076</guid>
      <pubDate>Fri, 27 Apr 2007 16:15:49 GMT</pubDate>
      <author>Ryan Trembath</author>
    </item>
    <item>
      <description>&lt;p&gt;This is great&amp;#8230;I like that you also keep testing results visually represented with only three colors.  Any more than that would require too much thinking and analysis (think of the American homeland defense color scheme).  I hope practitioners try your method and report back in this comment string.&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7057</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7057</guid>
      <pubDate>Fri, 27 Apr 2007 13:44:50 GMT</pubDate>
      <author>Michael Beavers</author>
    </item>
    <item>
      <description>&lt;p&gt;A good enhancement &amp;#38; neat visualisation of outcomes!&lt;/p&gt;</description>
      <link>http://www.boxesandarrows.com/view/measuring-the#content_7043</link>
      <guid>http://www.boxesandarrows.com/view/measuring-the#content_7043</guid>
      <pubDate>Fri, 04 May 2007 21:08:57 GMT</pubDate>
      <author>Donna Maurer</author>
    </item>
  </channel>
</rss>
