Features
Book reader

In-depth: Easy reader

An in-depth look at the Readability Report tool, by its creator Neil Newbold.

RECENT RESEARCH SUGGESTS that readability is dependent on two types of factors: reader factors, which consider certain abilities of the reader; and text factors, which consider the formulation of the text (Oakland and Lane, 2004). Reader factors include the person’s ability to read fluently, level of prior subject knowledge, familiarity with the language, and motivation and engagement. Text factors cover considerations of syntax, vocabulary and the structure of the text.

We have elaborated Oakland and Lane by considering the relationship between text, reader, and author, and built an extension for OpenOffice called Readability Report which provides detailed help to authors for producing readable documents.

The Readability Report software creates three reports; the Readability Report, the Brain Overload Report, and the Coherence Report.

The main report (called Readability Report) uses a new readability formula that determines the difficulty of a word using word familiarity instead of counting characters or syllables. As readers develop their language capability, they gain a familiarity with words through experience. Readers only analyse a word when it cannot be read from memory instantly. It is only with unfamiliar words that they pause to consider the meaning. This is particularly common in scientific or technical documents where anyone unfamiliar with the terminology would find the document hard to understand.

Current readability formulas struggle because a specialised document will contain certain words and phrases that are short, but unfamiliar to wider audiences. For example, consider the following two sentences:

1. “The muon is a particle similar to the electron with negative electric charge.”
2. “There was important information that led to the decision to close the business.”

Most people would consider the second sentence to be easier to understand than the first. However, the two sentences get the same score using the current readability formulas because they contain exactly the same amount of words and polysyllabic words (a polysyllabic word is a word containing 3 or more syllables).

However, if we determine word difficulty by examining their frequency in everyday language, the polysyllabic words in the second sentence - such as ‘important’ and ‘information’ - are considered to be relatively easy. This is because these words are used regularly in everyday discourse, whereas the word ‘muon’, whilst short, is unknown to the majority of people. An expert in particle physics would be perfectly happy using the term, since familiarity with certain words depends on experience.

We measure word familiarity using the frequency of the word in general language. Currently Readability Report provides the author with a numeric value for readability, using word frequency, along with a rating to help understand what the value means. We show the simplest and most difficult sentences in a document to demonstrate what is meant by “good” and “bad” readability. There is also an option to view the score for each sentence in the text within an Open Office spreadsheet. This can help identify the troublesome sections of a document.

One common mistake made by authors is to convey a significant amount of information in a relatively small amount of text, believing this helps readers to understand more. The Brain Overload Report measures the density of information in the document.

This report provides a rating ranging from ‘general’ to ‘specialised’ to indicate how technical the document is. The report examines multi-word phrases such as ‘current account balance’ which can confuse readers through its ambiguity. Does the phrase refer to the balance of a current account, or an account balance which is current? The report shows the most frequent multi-word phrases in the text and where possible, demonstrates how they can be rewritten.

Pustejovsky et al. (1994) showed how long phrases can be rewritten by examining the other phrases in the text. For example, if the phrase ‘current account’ is used often in your document, the report will recommend you rewrite ‘current account balance’ as a ‘balance of a current account’. Although the rewritten phrase uses more words, it increases the readability by removing ambiguity and reducing the density of information. Readability isn’t just about having short sentences; sometimes more of the right kinds of words are necessary to make your message clearer.

Problems with coherence occur when writers present new information to the reader without making clear its relationship to previous information. The writer assumes that they have provided enough information to allow readers to follow their arguments logically.

We can measure coherence because authors use the repetition of concepts and ideas to provide a structure for the reader to connect with. It is through this repetition that a series of links can be made between the sentences. We apply Hoey’s method (1991) of using sentence links to automatically summarise a document to measure how easy it is to follow. The Coherence Report rates a document for cohesion and shows the author the words that most frequently link their sentences together. If the author wants to increase their cohesion score, they can use these words more often in their text.

The author can also see the sentence which is most representative of their document, which should act as an appropriate summary of their work. If it does not then the author might need to consider how and why they’ve gone off topic in their text.

Plain English Alternatives
The software highlights difficult words and phrases in the text as identified by Plain English Campaign. These common writing mistakes can be replaced automatically with a suitable alternative. The author can select which replacement is the most suitable with a single mouse click. For instance, consider the following two sentences:

1. “According to our records, we must endeavour to maintain our objective.”

2. “Our records show, we must try to keep our goal.”

The first sentence is rated ‘Good’ by our word familiarity measure, however the phrases, ‘according to our records’, ‘endeavour’, ‘maintain’ and ‘objective’ can be replaced with ‘our records show’, ‘try’, ‘keep’ and ‘goal’ respectively. This results in the second sentence, which has an improved rating of ‘Easy’. By giving authors this kind of feedback, we can help them improve their readability.

What next?
In future, we hope to expand our readability methods into search engines and email applications by filtering out text that doesn't meet certain readability requirements. Texts outside a selected reading age can be removed from inboxes and the results of search queries. Readability can be tailored to each user by examining their history of previously read documents. As previously discussed, a difficult word for a novice is not always the same as a difficult word for an expert. By measuring a reader’s previous experience with each word, we can measure word difficulty for an individual. This means that technical documents will be scored as more readable to experts than to learners, and also that search engines will return different, more useful documents depending on the user’s expertise.

By using the same methods that help authors write what they want to write, we can prevent readers from reading what they don’t want to read.

References
T. Oakland and H. B. Lane. (2004). Language, reading, and readability formulas: Implications for developing and adapting tests. International Journal of Testing, 4(3):239-252.
J. Pustejovsky, S. Bergler, P. Anick. (1994). Lexical semantic techniques for corpus analysis. Computational Linguistics, 19(2):331-358.
M. Hoey. (1991). Patterns of Lexis in Text. Oxford, OUP.