In association with Pilgrims Limited
*  CONTENTS
--- 
*  EDITORIAL
--- 
*  MAJOR ARTICLES
--- 
*  JOKES
--- 
*  SHORT ARTICLES
--- 
*  CORPORA IDEAS
--- 
*  LESSON OUTLINES
--- 
*  STUDENT VOICES
--- 
*  PUBLICATIONS
--- 
*  AN OLD EXERCISE
--- 
*  COURSE OUTLINE
--- 
*  READERS’ LETTERS
--- 
*  PREVIOUS EDITIONS
--- 
*  BOOK PREVIEW
--- 
*  POEMS
--- 
--- 
*  Would you like to receive publication updates from HLT? Join our free mailing list
--- 
Pilgrims 2005 Teacher Training Courses - Read More
--- 
 
Humanising Language Teaching
Humanising Language Teaching
Humanising Language Teaching
SHORT ARTICLES

Testing Speaking in Large Classes

Barry O'Sullivan, UK

Barry O'Sullivan is Professor of Applied Linguistics at Roehampton University, London, where he is also the Director of the Centre for Language Assessment Research (CLARe). Before getting involved in teacher education and later in language testing, he taught English at secondary schools in Peru and in Japan. E-mail: b.osullivan@roehampton.ac.uk

Menu

Introduction
Some Additional Concerns
The Basic Design of the Test
The Tasks
Scoring the Performances
Did it Work?

Introduction

Over ten years ago, when I was teaching at the Faculty of Education, Okayama University in Japan, some of my former students asked me to help them find a solution to a problem. All were newly, or recently, qualified teachers of English in the High School system, and all had been asked by their schools to take on the teaching of language for communication as required by the recently changed national curriculum. The more experienced teachers had evidently decided to continue with their grammar-translation approach, while the new staff were expected to deal with the teaching, and, as it turned our assessment, of communicative English.

Unfortunately for my ex-students, they were lumbered with classes that often passed the 40 mark and with a rigid system which called for regular weekly assessments. Typically the test was to be administered in a 50 minute class period every Friday. The question was, HOW? (let's ignore the more obvious WHY? question here!)

Some Additional Concerns

Before starting out on the path to finding a solution to the problem, we had to face up to quite a few additional issues, such as:

The pupils were not accustomed to using English for communication
The had never performed group activities in their language classes
They were unused to the concept of learner autonomy (or responsibility)
They had never been asked to think about their own or their classmates' language

The Basic Design of the Test

Since the purpose of the changes to the curriculum were to change the focus of learning from language knowledge to language use, the basic premise should be that the test should reflect this. In other words, the test should be a direct test of speaking ability, where pupils are required to speak aloud in English. It should also explicitly focus on the types of communication outlined in the curriculum.

The basic idea for the test was very simple. In order to allow for all pupils in a large class to be tested in a single period, the only possibility was to use a group design. It quickly became apparent that using small groups and having pupils rotate their roles every 10 minutes would mean (at least in theory) that we could get through a whole class in little over 30 minutes - though for practical reasons this became more like 40 minutes.

The phases of the test were as follows:

Phase 1
Each pupil assumes a role. The Examiner decides on the task versions to use, and is responsible for asking all questions. The Candidate listens to the questions and responds appropriately. The Manager is in charge of timing - reminding the Examiner to wind up tasks and to end the test event within the allocated time. The manager is also responsible for ensuring that the score sheets are completed.
Phase 2
The pupils swap roles in a predetermined order (e.g. Pupil A will now be the Candidate, having acted as the examiner in Phase 1). In all other ways this phase is the same as Phase 1 (e.g. the Examiner will again get to choose the task versions)
Phase 3
The pupils again swap roles, so that by this time all three will have participated in the event in all three roles. The additional role of the Manager in the final Phase is to gather all the score sheets, staple them together and present them to the teacher.

The main reason things could take longer was that in most cases we were faced with classes where the total number of pupils could not be split into groups of 3 - often we would have one or two pupils 'left over'. To get around this, additional pupils were assigned as observers to existing groups - creating one or two groups of four pupils. This simply meant that an additional phase or rotation was needed in order to allow all pupils to take on all four roles. The additional time required for this final phase (all the three-member groups would now be finished) would be taken up by assigning the groups an additional non-assessed practice task.

The other feature of the design was for the teacher to observe each pupil as he or she performed a single task and then to move on to another group - this would allow the teacher to observe about 9 or 10 pupils every time the test was administered.

Of course, this is only the basic design. Before the test could be used we needed a set of tasks and a set of scoring criteria. In order to give the pupils more responsibility and autonomy, it was decided to involve them in the development of both the tasks and the scoring criteria.

The Tasks

Two decisions were needed here:

How many tasks would we included in each test event?
What might these tasks be?

The answer to the first question was established after talking to a small number of classes and their teachers. It was agreed that it might be difficult for these pupils to speak in English for more than two to three minutes for any one task. This suggested that we would need three tasks (remembering that each event was to last about 10 minutes - and that this time would include all rotation and task selection issues). After these discussion three task types were identified, these were:

Personal information exchange (family, hobbies, films etc.)
Picture description (it became necessary to specify the type of pictures to be used - though these specifications came from the pupils)
Decision making (i.e. making a choice from a number of options)

The decision to ask pupils to become involved in the development of tasks was re-enforced by these discussions. further decision emerged from these discussions, they were asked to devise tasks while working in groups both in class and outside of class. So, the questions for the first task were devised and discussed by pupils before being placed in a folder, the pictures for description were similarly the responsibility of pupils as were the sets of objects (for practical reasons we used pictures or drawings of these objects).

It also became clear that pupils would need to be trained in the delivery of the test so that they would be familiar with the expectations of the different roles before any administration.

Scoring the Performances

The final pieces of the jigsaw were the rating scale (or scoring criteria) and the rating procedure. For this we used our discussions with the teaches and students to identify the main criteria (these were Grammar (accuracy); Vocabulary (appropriacy); Presentation ('naturalness' - things like fluency of speech and the use of non-verbal strategies such as eye contact to aid communication) and ; Content (was it interesting). We also agreed to award scores from 1 (not so good) to 4 (great).

The pupils expressed a keen interest in these criteria, and were very well aware of their importance to the test, and to their own preparation for the test. After some trials in which the scale contained written descriptions of what each level of each criterion actually meant it was decided (by the pupils) that things would be easier if there were no written descriptors, just numbers. The final scale looked something like this:

Criterion Description Score
not so
good
great
Grammar The language used was accurate, with very few if any mistakes 1 2 3 4
Vocabulary A lot of different and interesting words were well used 1 2 3 4
Presentation Eye-contact and gestures were used to get the message across. There were few if any times when the examinee had stop speaking 1 2 4 4
Content What the examinee said was very interesting and kept our attention 1 2 3 4

Before using the scale in an administration, we set up practice events for the pupils (and the teachers). This involved either looking at video tapes of other pupils (from different schools) performing their versions of the tasks, or, where the technology was not available, just using some groups of volunteer pupils. When it became clear that the rating scale was understood and was being used as expected the final decision remained. How would it be used?

In fact, this question was easily answered. Pupils and teachers were all asked to award scores for each task they witnessed. This meant that for each event, a pupil would be awarded two sets of peer-awarded scores and one set of self-awarded scores. This introduced to the pupils the concepts of peer and self assessment - the aspect of the whole project that was to prove most rewarding, as the vast majority reported later (in follow-up interviews and questionnaires) that they both enjoyed the experience of awarding scores and learnt a lot about the way in which speaking tests worked - and, most importantly they began to understand the importance of these criteria in relation to everyday communication in English. As for the teachers, they too were very happy with the way the scales worked and found them easy to use after relatively little practice.

Did it Work?

While I'd love to say that the test certainly did work and is still used in Japan to this day, I'm afraid I can't. Though I did speak about the project at a Japanese Association of Language Teachers conference in 1996 and found that there was a great deal of interest in the idea, it quietly faded into obscurity. Maybe a version of the test is out there, maybe not!

In a study based on the test I found that after a few weeks using the procedure, there was little difference between the scores awarded by the pupils to their peers and those awarded by the teachers for the same task performances. In addition, the pupils, with some practice, became as consistent as their teachers, though the self-awarded scores were always about 15% lower than the others. So you see, I'm convinced that the idea worked, and feel that it is still appropriate for teachers who are faced with large classes and do not want to loose a lot of learning time to time-consuming traditional teacher-fronted tests. The fact that the pupils in the original project saw the whole process as a learning experience adds to my conviction.

--- 

Please check the Humanising Testing course at Pilgrims website.
Please check the Humanising Large Classes course at Pilgrims website.

Back Back to the top

 
    © HLT Magazine and Pilgrims