Word 2004: Use the word count feature with caution

Posted by Pierre Igot in: Microsoft
November 24th, 2005 • 3:02 pm

Traditionally, when problems with Word’s word count feature are mentioned by a Mac journalist in his review of the product, they are dismissed by Mac BU developers are something that only a journalist would worry about. Here’s a quote from Rick Schaut:

The needs of most Word users aren’t the same as the needs of professional writers. A great example of this is the word count feature, over which reviewers like Adam Engst, who happen to be professional writers, have been knocking Word for quite some time. Most Word users don’t really care about word count. For the sake of those users, and not for the sake of getting rave reviews from professional writers, we didn’t spend a great deal of time trying to make word count faster. We spent time working on improvements that would be more relevant to our more common users. It’s a shame that Adam doesn’t understand this; hence his reaction to the snickers he heard when we mentioned that we’d sped up word count.

It’s yet another great example of the great disconnect between Mac BU developers and real Word users in the real world. What entitles Rick Schaut to say that “most Word users don’t really care about word count”? Did he do a comprehensive survey on this particular topic?

And if most Word users don’t care about word count, why is it that the “Live Word Count” feature in Word (the word count that appears in the status bar at the bottom of a Word document’s window and is updated “live” as you type) is on by default in Word, especially since Microsoft’s own MVPs openly acknowledge that this feature is “very power hungry” and contributes to Word’s very poor overall performance?

I am a professional translator. So I guess that makes me a “professional writer” and not one of Rick Schaut’s “more common users.” It’s funny, though: You’d think that a word processor, of all things, and especially an expensive one such as Microsoft Word, would precisely be geared towards professional writers, wouldn’t you?

Anyhow… My problem today is not with the speed of the word count feature, but with the fact that the numbers provided are not reliable. Or, more accurately, if you rely on Word’s word count feature, you need to be aware of a number of important facts.

First of all, the word count never includes text that appears inside text frames. Text frames are this horrible feature where Word pretends to be a page layout application and lets the user create boxes of text that are not part of the document’s normal text flow (although they can be “anchored” to a specific paragraph). In my experience, many people don’t know how to use Word’s paragraph border or table creation features, which can be used with regular paragraphs of text to create things that look like text boxes while remaining part of the normal text flow.

So they use this horrible text frame feature. (It’s horrible for many reasons. I personally avoid it like the plague.) They also use it when they want to create things like organizational charts in Word. It leads to all kinds of very unsightly results. But, more important, if you use Word’s word count feature, you need to always remember that the text inside these frames is never included in the document’s word count. Why? Only Microsoft knows.

The other thing to know is that Word’s word count feature is accessible through two different interfaces. You can either go to the “File” menu and select the “Properties…” command, which will bring up a window with multiple tabs, one of which is titled “Statistics” and contains various statistics about your Word document, including the word count:

Word count in Properties

Or you can go to the “Tools” menu and select the “Word Count…” command, which will bring up another dialog box, which only contains statistics about the current Word document:

Word count in Tools

But there is a crucial difference, as the screen shots above illustrate. The dialog box that the “Word Count…” command brings up contains a check box for the option to “Include footnotes and endnotes.” And for some unknown reason, the dialog box that the “Properties…” command brings up does not contain the same option! In other words, if you use the “Properties…” command to access the statistics about your document, there’s absolutely nothing that warns you that the word count does not include the text in the footnotes and endnotes!

Finally, there is the problem with what grammar calls contracted forms or contractions. As far as one can tell, Word’s word count feature uses a pretty dumb algorithm, which considers that a word is anything that is enclosed within two space characters. The problem is that this completely fails to account for the fact that contractions are actually made of two words. For example, in English isn’t is the contracted form of is not. It should therefore count as two words, not one. Similarly, in French, l’amour is the contract form of le amour and should count as two words as well. In both cases, Word’s word count only counts one word.

Worse still, Word’s word count algorithm completely ignores the fact that, in a language such as French, hyphens are very commonly used in interrogative forms, where they do not amount to the amalgamation of two words into a single one. For example:

Est-il arrivé?

is the interrogative form of

Il est arrivé.

Both should have the exact same word count, i.e. a word count of 3. But in Word, “Il est arrivé?” has a word count of 3 while “Est-il arrivé?” only has a word count of 2!

This is simply unacceptable. It means that a very common French phrase such as n’est-ce pas counts as 2 words when it should actually count as 4!

As far as I can tell, these unacceptable flaws have not prevented people everywhere from embracing Word’s word count feature as a regular way to count words in electronic documents. For professional writers such as myself, this means a non-negligible loss of income, because the word counts on which my cost calculations are based are consistently lower than what they should really be. But of course there is no point in my trying to explain to each and every one of my customers, again and again, that Word’s word count feature is wrong.

In addition, I simply cannot afford having to go through each and every document for which I need a word count manually to count all the extra words that Word has failed to count. It’s already bad enough that I have to go through documents manually to check and make sure that they don’t contain any text frames that Word has failed to count, and to make sure that the word count includes the footnotes and endnotes!

So I have no choice but to use Word’s word count feature, with all its flaws. This is a typical example of how Microsoft’s dominance has led to the adoption of flawed features and procedures designed by software engineers disconnected from the real world as the “standard” way of doing things.

(To be fair, Apple’s Pages doesn’t fare any better when it comes to contracted and hyphenated forms. But that’s no excuse.)

2 Responses to “Word 2004: Use the word count feature with caution”

  1. Warren Beck says:

    Pierre: The incorrect word count in French is interesting. I wonder if the MacBU tested the word count in French before shipping it.

    But here’s a question about why people need a word-count feature. My take is that the word count is a poor-man’s text length metric; a publisher would specify a certain number of words to require a writer to provide an article of known length in the layout. So, I think that the word count should return a count that is proportional to the number of characters given an assumption of an average number of characters per word. Clearly, the algorithm that Word uses, given your results above, is irrational either using a linguistic metric or my characters/word metric. But that should be expected from Word; you’ve reviewed a large number of irrational behaviors in Word in your writing over the last few years.

    Again, the “Kool-Aid” that they have at Microsoft and at the MacBU must be an interesting drink.

  2. Pierre Igot says:

    Support for languages other than English is almost always an afterthought in most modern software. That’s just the way it is. But the problem here is that the word count algorithm is not reliable even for English text. If Microsoft can’t even be bothered to come up with a proper word count scheme for English text, there’s just no hope for other languages.

    I agree that word count is an imperfect metric, even when it’s done right. But I guess it’s a close enough approximation that most people are content with it.

    Still, it’s pretty sad that, with so much computing power in our hands, we still have to make do with such primitive, antiquated algorithms. I suppose it’s fundamentally a supply and demand problem. If there is not enough of a market for a more advanced/scientific text length metric, then it’s just not going to happen.

    And even if it ever happens in a niche product especially geared towards professional writers (which MS Word clearly is not), it will be next to useless if it cannot be used in transactions with non-professionals. I just don’t see myself telling my clients: “Yeah, MS Word says the word count is 2,000 words, but my highly specialized writing software says it’s actually 2,500 words.”

    Today’s situation is yet another thing that we have to thank Bill Gates for. Mediocrity all around. Yey.

Leave a Reply

Comments are closed.