In association with Pilgrims Limited
*  CONTENTS
--- 
*  EDITORIAL
--- 
*  MAJOR ARTICLES
--- 
*  JOKES
--- 
*  SHORT ARTICLES
--- 
*  CORPORA IDEAS
--- 
*  LESSON OUTLINES
--- 
*  STUDENT VOICES
--- 
*  PUBLICATIONS
--- 
*  AN OLD EXERCISE
--- 
*  COURSE OUTLINE
--- 
*  READERS’ LETTERS
--- 
*  PREVIOUS EDITIONS
--- 
*  BOOK PREVIEW
--- 
*  POEMS
--- 
--- 
*  Would you like to receive publication updates from HLT? Join our free mailing list
--- 
Pilgrims 2005 Teacher Training Courses - Read More
--- 
 
Humanising Language Teaching
Humanising Language Teaching
Humanising Language Teaching
IDEAS FROM THE CORPORA

Corpus-based Evaluation of Test Items’ Linguistic Authenticity

Ilana Salem, Israel

Ilana Salem is an EFL teacher and teacher educator in Israel. She holds an MA in Applied Linguistics and TESOL from University of Leicester. Her research interests include classroom testing, EFL teachers’ language awareness, lexico-grammatical analysis of school English, and translation for pedagogic purposes.

Menu

Introduction - authenticity of test items
Corpus work
The human element
Conclusion
Acknowledgments
References

Introduction - authenticity of test items

At various stages of EFL instruction teachers wish to assess the learners’ ability to produce grammatically correct sentences. In mainstream schools this is often done by means of written classroom tests. For instance, teachers who wish to assess their students’ mastery of wh-question formation might use the ‘Ask questions about the underlined words’ format, such as 1 ab.

cue: (1a) I will stay for an hour.
target question: (1b) How long will you stay?

The teacher writes a sentence and underlines one or more words (this is the ‘cue’). The student responds by writing a wh-question that can be answered by the underlined word/s (this is the ‘target question’). Obviously, this activity does not reflect normal speech-act question-asking situations (Cohen 1996) but, provided the teacher makes sure that the underlined words do indeed answer the target question, this task-type is an effective way of eliciting wh-questions in classroom tests.

This paper focuses on the linguistic authenticity of ‘Ask questions about the underline words’ test items. The key feature of linguistic authenticity as used it this article is the item’s frequency in real-life language (Salem 2009). In other words, when judging the linguistic authenticity of the cue and the target form, one asks: How frequent is this sentence in the English language used outside the EFL classroom? It should be noted that the issue here is not whether learners should become competent in pragmatic conventions that are commonly used in socio-cultural contexts (Hwang 2008), or whether grammatically complete sentences or questions are needed for efficient communication (Kryszewska 2003, Mumford 2007). Rather, I will investigate instances of ‘school English’ (Mindt 1996, p. 293) in terms of their frequency in non-pedagogic English. De-contextualized sentences can serve as effective cues (see also Flucher and Davidson 2007; unit A5), but the teacher should monitor their linguistic authenticity and make sure that the language tokens presented and elicited in tests (and materials) represent communicatively meaningful language.

Teachers can evaluate their test-items’ linguistic authenticity intuitively or by means of corpus frequency data. In fact, corpus data can be used to validate our own intuitions (Stewart 2009), as is done in this paper. I have collected 100 ‘Ask questions about the underline words’ test items geared to eliciting who, what, how, when, where and why questions. To acquaint myself with this sample, I judged each cue and target question on a linguistic authenticity scale of 0-5 points. A sentence which was considered unusable in real-life English was assigned a 0. A communicatively useful and well-formed sentence received 5 points. According to this scoring the majority of what-questions were found highly authentic (4-5 points), while most how-questions appeared to be low in linguistic authenticity (1-2 points). Hence I will use examples from these two sets of questions to demonstrate how corpus data can be instrumental in assessing the linguistic authenticity of ‘Ask questions about the underlined words’ test items.

Corpus work

For convenience reasons, I chose to work with the Corpus of Contemporary American English - COCA (Davies 2009), a user-friendly free on-line corpus. Ideally, I would have liked to type the test-item sentence in the search-string box and in return be informed of this sentence’s frequency in the corpus. It turned out, however, that the cue or its target form seldom appear in the corpus as a whole. Therefore I searched for parts of sentences, and these were of two kinds: lexical strings, which are labeled as ‘lextrings’, and semantically related collocates – the ‘sellocates’.

Two kinds of collocations

Lextrings are word strings which appear in the cue exactly in the same form as in the corpus. Examples of lextrings are: ‘in the garden’ in test item 2a, or ‘running quickly’ in test item 5a.

Sellocates consist of content words which collocate on semantic grounds. These words may appear in different morphological forms and may be separated by other words. For instance ‘flower’ and ‘grow’ are semantically related, they collocate in various forms (flower/s and grow/grew/grown), and they may immediately follow each other or may be separated by other words. This kind of word partnership is formulated as [flower] +/-4 [grow] = 395, reading: in the corpus ‘flower’ and ‘grow’ in any of their forms appear 395 times in each other’s proximity within the distance of up to three words between them. The tables below list lextrings and sellocates separately, along with their number of occurrences in COCA.

Counting collocation occurrences

The linguistic authenticity strength of each test item can be figured out by adding up all its collocation occurrences. The table below shows the collocation-frequency breakdown of item 2ab: the cue ‘Many flowers grew in the garden’, and the target question ‘What grew in the garden?’.

(2a) Many flowers grew in the garden. 4* (2b) What grew in the garden? 5*
total total
lextrings flowers grew = 21
flowers grew in = 4
grew in = 713
grew in the garden = 4
in the garden = 2265
3007 grew in = 713
grew in the garden = 4
in the garden = 2265
2982
sellocates [flower] +/-4 [grow] = 395
[flower] +/- 4 [garden] = 591
[grow] +/-4 [garden] = 433
1419 [grow] +/-4 [garden] = 433 433
Total cue 4426 target question 3415
item total 7841

*The numerals following the cue and the target question represent my initial linguistic-authenticity scoring of these sentences.

The table informs us that the sum total of corpus-based collocation occurrences in 2ab is 7841. This high number of occurrences confirms my initial judgment of this item’s linguistic authenticity and grants the test item 2ab a high authenticity status. Its significance, however, is relative, as we shall see when we compare it with 3ab, which is a shorter sentence.

Note: Some highly frequent combinations will be excluded from the count; due to their extremely high frequency in the language, counting them would not be indicative. These are: determiner - noun combinations such as ‘the garden’, ‘many flowers’; preposition - article such as ‘in the’, pronoun - verb such as ‘we need’ in (3a), auxiliary - pronoun - verb such as ‘do you need’ in (3b), and question-word - auxiliary combinations such as ‘How is’.

Sentence length and collocation length

Unlike 2ab, which accumulated 7841 occurrences, 3ab’s collocation occurrences amount only to 3110. Does this mean that 3ab lacks linguistic authenticity? Not at all.

(3a) We need money. 5 (3b) What do you need? 5
total total
lextrings we need money = 43
need money = 355
398 what do you need = 291 291
sellocates [need] +4 money 2421 - -
total cue 2819 target question 291
item total 3110

The cue and the target question 3ab are short sentences, consisting of one countable constituent each: ‘need money’ in the cue 3a and ‘What do you need’ in the target question 3b. The lextring ‘need money’, which constitutes the major part of 3a, appears in the corpus as many as 355 times. And the lextring ‘What do you need’, which makes up the whole of 3b, appears in the corpus 291 times in exactly the same form as in the test item. These highly usable word-combinations grant 3ab a high linguistic-authenticity status.

The comparison of 2ab and 3ab shows that the total frequency count is not an absolute indicator of linguistic authenticity. Rather, the following factors should be taken into account:

  • The length of the sentence - How many countable constituents (lextrings and sellocates) does it contain? Obviously, a larger number of constituents potentially lends itself to higher total collocation counts.
  • The relative length of the corpus-recurring lextrings - Does the lextring form a major part of the test-item sentence, or perhaps all of it? Lextrings which frequently reoccur in the corpus might grant the test-sentence a high linguistic authenticity status in spite of the relatively low total collocation count caused by the sentence’s limited number of countable constituents.

Further corpus research might arrive at operable definitions of these issues, and provide empirically grounded knowledge about the comparative significance of lextrings as opposed to sellocates. Establishing the relative strength of impact each of these two kinds of collocations has on the linguistic authenticity could also have significant implications for language pedagogy and materials design.

The human element

I have suggested ways in which the corpus can serve teachers in evaluating the linguistic authenticity of their test items. But this process is not always as simple as presented above; in the course of corpus work we often find ourselves in situations where linguistic decisions need to be made (O’Dell 2005), since ‘it is up to the researcher to make sense of the patterns of language which are found within a corpus’ (Baker 2006, p. 18). Let me demonstrate this on items 4ab and 5ab.

Interpretation of frequency data

Frequency data must be interpreted with caution. Look at test item 4ab.

(4a) Mother works hard. 5 (4b) How does mother work? 1
total total
lextrings mother works hard = 2
mother works = 65
works hard = 326
393 mother work = 11 11
sellocates [work] -1/+2 hard 12689
[mother] +2 [work] 615
13304 [mother] +2 [work] = 615 615
total cue 13697 target question 626
item total 14323

The high total frequency count of 14323 implies high linguistic authenticity of this test item. A closer look shows that this is only partly true. It is the cue 4a ‘Mother works hard’ which is highly linguistically authentic, as its total frequency score testifies. The elicited question ‘How does mother work’, on the other hand, does not sound too useful.

Let us now focus on 4b ‘How does mother work’. As said above (and expressed by my initial score of 1), ‘How does mother work’ sounds rather odd. But there seems to be a contradiction between 4b’s low authenticity impression, and its relatively high total score of 626. This contradiction invites further investigation: In fact, the main contributor to 4b’s total score is the re-occurring sellocate [mother] +2 [work]. Additional corpus searches reveal that the corpus has no how-questions including the sellocate [mother] +2 [work]. The fact that there is no how-question composed of the word ‘mother’ followed (immediately or with one other word in-between) by any form of ‘work’ renders 4b very unauthentic, and this evaluation is in agreement with the original intuition-based judgment of this sentence.

The above interpretation of the numerical data has shown that the presence of an otherwise frequent sellocate does not guarantee high linguistic authenticity of the wh-question in which it appears. So the total count of 4b is, to some extent, misleading. Likewise, the impressively high total count of 4ab conceals the fact that this test item is on the whole unauthentic because the cue 4a fails to elicit an authentic wh-question. These findings have become evident through the corpus user’s linguistically based intervention.

Consulting KWIC for meaning

At times we need to consult the KWIC – ‘key word in context’ feature in order to verify a function or meaning of a particular word in each of its corpus occurrences. A case in point is the sentence 5a ‘The cat is running quickly’. It contains one (rather uncommon) lextring ‘running quickly’, which occurs seven times in the corpus. But when we consult the KWIC list, which presents the seven instances of ‘running quickly’ in its original co-text, we discover that most of these seven occurrences are irrelevant to cue 5a, as only in one of the corpus-texts ‘run’ bears the core meaning of ‘move fast on foot’ (Oxford 2005).

(5a) The cat is running quickly. 2 (5b) How is the cat running? 0
total total
lextrings running quickly = 7 7 - -
sellocates [run] -1/+2 quickly = 182 [cat]+2 [run] = 50 232 [cat]+1[run] = 26 26
total cue 239 target question 26
item total 265

If we add this finding to the fact that the elicited question ‘How is the cat running’ contains no lextrings at all, we can conclude that the lack of reoccurring lextrings renders 5ab very unauthentic. We arrived at this conclusion by combining the corpus-data findings with our own language-awareness-based reasoning.

Other linguistic considerations

When researching sellocates, it is up to the corpus user to determine the distance between the two collocation partners under inspection – i.e. how many intervening words should be allowed to separate the collocates. Thus, the decision on a four-word double-sided distance in [flower] +/-4 [grow] as opposed to a two-word post-distance in [mother] +2 [work] is based on the corpus-user’s syntax-related choices.

Furthermore, 4ab and 5ab share a problem which, to the best of my knowledge, cannot at present be identified through corpus work. Not only do the cues trigger unauthentic questions, but the target questions would not normally be answered by the underlined words that triggered them. The question-answer pairs ‘How does mother work?’ – ‘Hard’, and ‘How is the cat running?’ – ‘Quickly’ are surely not common conversational exchanges. Inclusion of such unauthentic items in EFL classroom tests could be prevented if teachers asked themselves in the process of their test construction: Is the target question indeed answerable by the underlined word/s?

This reinforces the claim that enhanced linguistic awareness is a prerequisite for ensuring the linguistic quality of teacher-produced materials and tests (Andrews 2007; section 5.5). My experience has shown that language awareness is also a prerequisite for efficient corpus work. And, as noted by Allan (1999), the opposite holds as well: one’s language awareness gets enhanced by corpus work.

Conclusion

This investigation has documented the interplay between EFL teacher-researchers’ linguistic intuitions/language awareness and their corpus-search skills as vehicles for judgment of the pedagogic language. I have shown that corpus frequency data are potentially informative of the major aspects of test items’ linguistic authenticity. With further advent of the corpus research, corpora can be expected to offer new practical options for teachers to verify the linguistic authenticity of their teaching materials and tests conveniently, swiftly and efficiently.

Acknowledgments

I wish to thank Dr. Beverly Lewin for her most helpful comments on earlier drafts of this paper. Many thanks to Prof. Mark Davies for making COCA widely and conveniently accessible. I am grateful to Prof. Susan Conrad for her feedback and kind advice.

References

Allan, Q.G. (1999) Enhancing the language awareness of Hong Kong teachers through corpus data: The TeleNex experience, Journal of Technology and Teacher Education 7/1; pp. 57-74.

Andrews, S. (2007) Teacher language awareness. Cambridge: CUP.

Baker, P. (2006) Using corpora in discourse analysis. London/New York: Continuum.

Cohen, A.D. (1996) Speech acts, in McKay, S.L. and N.H.Hornberger (eds.) Sociolinguistics and language teaching. Cambridge: CUP; pp. 383-420.

Davies, M. (2009) Corpus of Contemporary American English. Brigham Young University www.americancorpus.org

Flucher, G. and F. Davidson (2007) Language testing and assessment. London and New York: Routledge.

Hwang, C.C. (2008) Pragmatic conventions and intercultural competence, The Linguistics Journal 3/2; www.linguistics-journal.com/August_2008_ch.php

Kryszewska, H. (2003) Why I won’t say good bye to the Lexical Approach, Humanising Language Teaching 5/2; Major Articles.

Mindt, D. (1996) English corpus linguistics and the foreign-language teaching syllabus, in Sampson, G. and D. McCarthy (eds.) (2004) Corpus linguistics. London/New York: Continuum, pp. 293-303.

Mumford, S. (2007) Exercises from companion to Cambridge grammar of English, Humanising Language Teaching 9/4; Ideas from the corpora.

O’Dell, F. (2005) Using corpora to develop materials on collocation, Humanising Language teaching 7/5; Ideas from the corpora.

Oxford advanced learner’s dictionary (2005). Oxford: OUP.

Stewart, D. (2009) ‘Safeguarding the lexicogrammatical environment’, in Beeby, A., et al. (2009) Corpus use and translating. Amsterdam/Philadelphia: John Benjamins Publishing Company; pp. 29-46.

Salem, I. (2009) How real is the English in grammar tests?, Humanising Language Teaching 11/3; Short articles.

--- 

Please check the How to Use Technology in the Classroom course at Pilgrims website.

Back Back to the top

 
    © HLT Magazine and Pilgrims