Start a new topic!
In my search results, I will return a variety of documents with different document types, mostly pdf’s and html pages (maybe some word docs). I’d like to allow users to sort the results by document length, but I can’t figure out a common measure that will fairly represent the length of what they are about to read. Byte counts are inflated by graphics or even style elements. I’m not even sure we can determine the number of pages in a pdf, without crawling the actual pdf itself. Any ideas?



