Word Frequency Analyisis as a means to improve writing quality
In the old times of Windows 1.0 back in the 1980's there was a
tool called Word Frequency that came with the MS Word
distribution package. As someone who uses English as a second
language I used it heavily, because it helped me to improve my
vocabulary and to correct misspellings beyond the capacity of
the available spelling checkers.
That MS Word add-on created a list of all the words in a
document, ordered by frequency. It made it easy to detect
overuse and/or abuse of a certain word or expression. The little
used words were also of help, because sometimes I wrote Thomson
instead of Thompson, car instead of cart, or similar errors that
the spelling checker does not detect.
Frequency analysis can also be used as a means to establish the
"signature" of a certain author, the cultural level of the
writer, its use of slang or technical jargon, and other writing
features. It is possible to extrapolate the number of words used
in a certain text to the total vocabulary of a person. Frequency
analysis can accuse some writers to have the vocabulary of a
10-year-old. Or the word-richness of a Chinese-born 2nd year
English student.
Frequency analysis combined with a synonym dictionary, as
provided in currently available "synonymizer" software, can help
writers to enrich its lexicon and avoid abuse of certain
expressions.
It is also a means to avoid producing identical text for those
who need to make its text different from a source. For instance,
a web content writer that needs to fill many similar but not
identical pages, and students who want to avoid plagiarism
detection and accusation. Rightly or wrongly.
Plagiarism detection also makes use of frequency analysis,
because comparison of a given text with the whole Web contents
is a major task, and the detection system does not know where to
look and where to start. Thus, analysing the word frequency can
give some clue on the writing style and the authorship of a
given text, without indexing the whole thing.
Search engines use word frequency to establish the subject of
web pages. They developed complex linguistic analysis in order
to classify pages by subject without human intervention. In
turn, webmasters do the same, to try to fool search engines into
assigning high keyword relevance to the pages they create. For
instance, using a word with a 3% frequency gives a text good
relevance on that word (or keyword, in a search engine context).
A 10% frequency is still OK, but it is close to "keyword
stuffing", a technique used by webmasters who try to force their
websites into the top places of the search engines. Keyword
stuffing is penalized by the search engines, and needs to be
prevented by smart use of synonyms. Either with synonymizer
software or good writing skills.
This article, for instance, has the following Word Frequency :
word : 9, frequency : 7, used : 6, not : 6, search : 6, text: 6,
engines: 6, analysis: 5, can: 5, use: 5 ... ...
I could have edited the text after the analysis, to avoid
intensive use of "word" and "frequency" for linguistic purposes.
However, it is OK for Search Engine Optimization purposes
(attempting to make this article more findable by Google and
Yahoo).
Are there any serious writers that still avoid the use of a
wired computer? Probably not many can avoid using the Web and
the search engines to find the correct word, the most used
expression, to perform spelling or grammar checking. Checking
word usage in Google is faster and more efficient than using a
dictionary, either in paper, disc or the Web. The search engines
list every word ever written, not only the well-written words as
dictionaries do.
Be prepared to have your texts analysed for word frequency,
educational level, plagiarism, technicality, jargon usage and
other parameters, in addition to old-fashioned spelling.
According to these tendencies, the ultimate challenge for a job
candidate would be to write an essay with paper and pen. Most of
us are not prepared to pass such a test.