Humanising Language Teaching Magazine for teachers and teacher trainers

	CONTENTS

	EDITORIAL

	MAJOR ARTICLES

	JOKES

	SHORT ARTICLES

	CORPORA IDEAS

	LESSON OUTLINES

	STUDENT VOICES

	PUBLICATIONS

	AN OLD EXERCISE

	COURSE OUTLINE

	READERS’ LETTERS

	PREVIOUS EDITIONS

	BOOK PREVIEW

	POEMS

Would you like to receive publication updates from HLT? Join our free mailing list

Pilgrims 2005 Teacher Training Courses - Read More

SHORT ARTICLES

Working Towards a Less Mistake-oriented Learning Environment and Assessment Culture

Michel Fandel, Luxembourg

Michel Fandel studied at the University of Kent (Canterbury), graduated in 2005, completed a postgraduate MA in 2006, started teaching in September 2006. He is currently teaching at the Lycée Technique Michel Lucius, Luxembourg. E-mail: michel.fandel@education.lu

A new approach to assessing extensive free writing in summative testing
Detailed explanation of the adapted assessment system
Practical application of the adapted assessment system
Correction and feedback
Conclusions
An approach revisited: reconsidering the use of the CEFR in summative assessment
Selected bibliography

A new approach to assessing extensive free writing in summative testing

Since the early stages of the 2007-2008 school year, my assessment of free writing portions (i.e. comprehension and development questions) in summative tests had consisted in a rather ‘traditional’ approach of equally weighting the form of students’ answers (accounting for 50% of the mark) and their content (the other 50%). In itself, this balanced distribution of marks made sense, as the language learner should be examined both in terms of L2 proficiency (or ‘competence put to use’ (CEFR, p.187)) and the overall argumentative pertinence of his written product. Therefore, this equal weighting of form and content was also maintained in the summative test that will now be examined more closely; more specifically, I will focus on a development question with an overall weighting of 12 marks from a test that I conducted in my 4e moderne in the third term of that school year. But what exactly was this assessment based on?

The measuring of answer content will always remain at least partly subjective (particularly in relation to students’ answers to development questions) and the correspondingly awarded part of the mark will therefore notoriously risk varying from teacher to teacher. Of course, factors such as complexity of argument, overall coherence and degree of thematic development may be taken into consideration as tangible and more objective evidence of the student’s general skill in constructing a persuasive and logical argumentation. In addition, a norm-referenced comparison of individual students’ writings (in terms of creativity, complexity…) also frequently provides a basis for the relative distribution of high and low marks, respectively. In contrast, however, the form(or language-related) component seems, on the surface, easier to rate ‘objectively’: indeed, one is tempted to simply consider the number of mistakes committed, potentially even distinguishing between different degrees of ‘severity’ of various wrong forms(thus, a grammar mistake such as a missing 3^rd-peron ‘-s’ ending in the present simple is commonly regarded as ‘worse’ than a glitch in spelling or the use of a wrong preposition), and to deduct marks in function of their respective frequency in the student’s product.

However, there are several potential problems with such an approach, as it crucially fails to look beyond the error (or, indeed, mistake) itself. For instance, it does not acknowledge the various sources of errors (as outlined by Astolfi and Brown) that can affect the learner’s performance in the complex task of free writing. Indeed, single elements which appear as ‘basic’ errors (such as the notoriously missing 3^rd-person ‘-s’ in the present simple) may in fact be simple ‘slips’ which have uncharacteristically happened to the learner in the peculiar circumstances of the isolated summative test. Hence, this may give a wrong impression of the student’s current language competence if one fails to properly contrast the lacking consistency of such performance lapses to more persistent errors in his interlanguage.

On the other hand, revealing or identifying an individual’s actual level of competence in a summative test is a tricky matter. As it corresponds, by its very nature, to a type of ‘performance assessment’ that ‘requires the learner to provide a sample of language in speech or writing in a direct test’, the underlying competences that are put to use in it can only be surmised rather than directly observed:

Unfortunately one can never test competences directly… [A]ll tests assess only performance, though one may seek to draw inferences as to the underlying competences from this evidence. (CEFR, p.187)

A more differentiated assessment technique which distinguished various levels of the learner’s apparent target language proficiency (i.e. ‘competence put to use’) rather than simply separating ‘correct’ language uses from ‘incorrect’ ones would therefore allow for a more precise exploration of individual learners’ strengths and weaknesses.

In my summative assessment of students’ free writing, the aim thus moved away from an essentially behaviourist perspective, which regarded all errors as ‘phenomena to be avoided’ and, consequently, to be punished by the systematic deduction of marks (reflecting fundamental traits of the grammar-translation and audiolingual approaches). Instead, I began to favour a CLT-inspired analysis of the overall communicative value of the learners’ writings, which regarded the emergence of learner errors in free writing tasks as an ‘inevitable, transient product of the learner’s developing interlanguage’ (CEFR, p.155). If that was the starting point, then evidently my “intermediate”-level students could not be expected to display flawless mastery of L2 structures and elements and should not be compared to such a realistically unachievable level of proficiency in every aspect (this line of argument is evidently only valid for the complex task of free writing. In portions of summative tests that merely verify the students’ ability to use discrete items accurately, and thus demonstrate their understanding of actual L2 ‘systems’, the strict application of authentic L2 standards as the ‘measuring stick’ is certainly much more valid and desirable.). Instead, a form of assessment that was based on inferring the students’ actual competence level by judging the developmental stage of the interlanguage they had used in the performance test might give a more accurate picture of the student’s proficiency at that precise point in time.

Detailed explanation of the adapted assessment system

As a basis for summative assessment that would facilitate the inference of ‘underlying competences’ from a specific ‘performance’, a number of criteria-based scales from the CEFR constituted a useful starting point. By relating various aspects of my students’ performance to the relevant proficiency descriptors in a variety of form-focused categories (such as grammatical accuracy and vocabulary control), I attempted to draw a more accurate picture of their overall writing and communicative skills. For practical reasons, I slightly adapted and edited the phrasing of some of the relevant ‘can do’ descriptors for the task at hand so as to make the final assessment grid a little less congested and more easily useable (as an example of such minor editing, the A2 descriptor for ‘thematic development’ was reduced from ‘Can tell a story or describe something in a simple list of points’ to ‘Can describe something in a simple list of points’, since the relevant development question was of a descriptive nature in this test. Other excessively wide-reaching or global descriptors were made slightly more concise for the task at hand in similar ways; evidently, however, I did not change anything about the expected proficiency levels for any of the descriptors.). As the use of these criteria also needed to be custom-tailored to the realistic proficiency level of my 4e students, merely three of the originally six competence levels (A2-B2) were used as a grading spectrum in the summative measurement of my students’ performance in relation to the chosen criteria. After all, the attainment of the highest proficiency levels (C1/C2) would have been highly unlikely for intermediate learners, as it constitutes a difficult goal to reach even for much more highly trained and proficient target language learners. In fact, I would argue that even reaching the level of C1 would have to be considered as an enormous and genuinely exceptional achievement at this stage. The very basic A1 level, on the other hand, constituted a stage that 4e students should be safely expected to have mastered and left behind, which is why it was excluded from the assessment grid (therefore any performance below the level of A2 would automatically lead to a mark of 0/6 in relation to the given criterion). Realistically, and having seen my learners’ previous efforts over the course of the year, I had thus grown to expect a proficiency spectrum spanning only – or at least predominantly – the three levels of A2, B1 and B2. As these ‘medium’ levels could only go to acknowledge and confirm the status of my students as ‘intermediate’ learners, it seemed like a much fairer basis for the assessment of the proficiency level displayed in their free writing than the utopian C levels (which would effectively have counted as a reference for assessment if I had insisted on every single mistake in my students’ free writing).

As the Luxembourg secondary school system is at present still based on the awarding of numerical grades to evaluate students’ progress, a highly important yet equally challenging task now consisted in translating these detected competence levels into tangible marks in a legitimate and sensible way. In the context of a summative test that counts for the student’s final mark, both the validity and reliability of the grading system are, after all, of utmost significance. While such a summative use is not the intended one of the CEFRscales, I considered that a workable competence-based marking system could be devised in relation to them (for the time being) by associating carefully proportioned amounts of marks to the different competence levels. As the grids consist mainly of ‘can do’ descriptors, this additionally led to a ‘positive’ marking system that highlighted the students’ already acquired development rather than their remaining shortcomings.

However, as the highest marks should evidently highlight a very good performance, the maximum for the ‘form’ aspect of their free writing (corresponding to 6/6 in the test question described in this particular case) would only be awarded to students who had reached competence level B2+ consistently (although not represented in my final assessment grid, a potential C-level performance could of course only have received the highest possible marks, too). In contrast, medium marks (3/6 or 4/6) acknowledged the language proficiency of a student who had generally evolved at B1 and thus shown a ‘satisfactory’ level. To arrive at a final, overall mark for the ‘form’ aspect of the student’s answer, each category was separately given a mark out of 6 in terms of competence level reached; the overall mark was constituted by the average mark obtained in all ‘form’ aspects in that manner (rounded up to the nearest .5 or .0 mark). Each competence level was further nuanced into ‘low’ and ‘high’ categories to allow for more flexible and fine-tuned marking (e.g. ‘B1-’ or ‘low B1’ would give the student 3/6 marks on a given form aspect, while ‘B1+’ or ‘high B1’ would lead to 4/6; this would also leave the possibility for a ‘middle B1’ corresponding to 3.5/6). As numerous students generally evolved on similar intermediate competence levels, this approach gave me some leeway to acknowledge slight differences in the proficiency of different learners (in cases where the leap to a higher or lower competence level seemed exaggerated, yet a slender difference existed in the students’ fluency and/or accuracy).

The second half of the overall mark for the students’ free writing portion still remained reserved to the assessment of the actual content of the answer (thus evaluating what the learner argued in addition to how he did it). While two further CEFR scales were drawn upon to increase the objectivity of this part of the summative assessment (namely, ‘thematic development’ and ‘coherence and cohesion’), a remaining (more ‘subjective’) portion of the overall mark was attributed to the creativity of the student’s answer, as well as to the general task achievement (i.e. did he take into account and answer all aspects of the question?). In that respect, the students’ texts were compared to each other and thus a partially norm-referenced marking component for the answer content complemented the more criteria-referenced form assessment.

Practical application of the adapted assessment system

The detailed analysis of a student’s performance in the relevant portion of the summative test will serve to exemplify how this less mistake-oriented approach was implemented in practice. One sample shows clear L1-interference in terms of lexis: the erroneous use of ‘server’ (for ‘waiter’), ‘cooker’ (for ‘chef’) and ‘relax’ (for ‘relaxing’) all strongly indicate interlingual transfer as the student clearly attempts to adapt L1 expressions to answer his communicative needs in L2. While the use of ‘relax’ does not impede the transmission of his intended message, the intelligibility arguably becomes more problematic for ‘servers’, and most definitely in the case of ‘the cooker is under stress’ (which is devoid of a valid communicative message). In his vocabulary range and control, this learner has thus certainly not moved beyond the level of ‘B1’ (as he ‘shows good control of elementary vocabulary’ but ‘major errors still occur’). His command of grammar and orthography are generally sound, even though occasional structures like ‘the other customers don’t make you angry’ betray a relatively direct translation from L1 as well (without, however, being inherently ‘wrong’; hence only a comment about the intended message, not about the form, was added in this instance). This corresponds to the ‘high B1’ CEFR descriptor for grammatical accuracy: the student has ‘generally good control though with noticeable mother tongue influence’. Overall, the student thus scores 3.5/6 on form as he by and large seems to evolve on a ‘B1’ competence level. In terms of content, the student mostly respects the task requirements; in compliance with the instructions, his overall text does constitute a restaurant review with a number of clearly distinct points. The lack of a proper transition to the last paragraph (and, indeed, of a real conclusion) affect the mark for his overall coherence, while the originality and complexity of his answer (like his L2 proficiency) are inferior to other samples studied.

Of course, one may argue that a holistic appreciation of students’ overall efforts may have yielded very similar overall marks. However, the above-mentioned competence-based assessment system draws a much more nuanced picture of individual students’ overall strengths and weaknesses (as far as they are possible to infer from this one-time performance).

Correction and feedback

A look at my correction technique in this summative test reveals that I resolutely opted to provide differentiated feedback regarding mistakes and errors: while apparent ‘slips’ were simply underlined and coded, the student’s attempts and errors were occasionally overtly corrected or recast when it seemed clear that he would not have found the solution otherwise. After handing the test back, I asked the students to rewrite a correct version of their text in response. Although this approach may run counter to Truscott’s case for the total abandonment of grammar correction, it was maintained for various reasons in this case. On the one hand, the legal framework of the Luxembourg school system demands that a clear indication of the students’ mistakes and a specification of their nature be given to him in the teacher’s corrective feedback; as such, the mere awarding of a grade without more explicit comments would have disregarded clear official guidelines. The same legislative article also explicitly states that extra marks (+/- 4) can be awarded for students’ corrections, which is an option to improve one’s personal grade that I did not want to take away from my students; in order to guide them in their task, I at least wanted to highlight the passages which needed to be revised. In contrast to the beginning of the school year, however, I would not hand them all the accurate versions anymore, so as to render them more active and autonomous in the correction process.

From a pedagogical perspective, I would further argue that the fact of not acknowledging mistakes at all could actually be wrongly construed by the learner, in the sense that ‘in the absence of treatment, learners could perceive erroneous language as being positively reinforced’. In reference to the ‘Skinnerian operant conditioning model’, Brown suggests that this risks leading to the ‘persistence, and perhaps the eventual fossilization, of such errors’ (Brown, p.274 & 276). Thus, being conscious of the learners’ underlying interlanguage does imply being ready to accept that the learner will, inevitably, make mistakes in his evolution towards L2 proficiency; certainly, efforts should thus be made to acknowledge this through a less mistake-oriented assessment culture. However, this does not necessarily mean that one should simply ignore these mistakes and just hope for the best. Indeed, this would contradict the teacher’s fundamental duty of helping his learner to ‘reshape’ his interlanguage in progressive, supportive ways. In this respect, I would therefore align myself with Brown, who concisely sums up as follows:

Learners are processing language on the basis of knowledge of their own interlanguage, which, as a system lying between two languages, ought not to have the value judgments of either language placed upon it. The teacher’s task is to value learners, prize their attempts to communicate, and then provide optimal feedback for the system to evolve in successive stages until learners are communicating meaningfully and unambiguously in the target language. (Brown, pp.280-1)

Conclusions

In this piece of writing, I have tried to map key moments and realisations in my personal, changing approach to errors. I have also retraced the resultant steps taken to amend my correction, feedback and assessment systems correspondingly. After growing to understand the multiple factors and error sources that can provoke learner errors in free writing, I have gradually tried to adapt my correction and feedback techniques in such a way that errors are no longer treated as undesirable, but rather, in the vein of a constructivist view of learning, as a fundamentally normal and even desirable phenomenon in the students’ attempts to express themselves in the target language. After all, by obtaining an insight into the apparent principles and systems of his interlanguage, we can gain a detailed understanding of the learner’s current level of linguistic competence.

While the amendments to correction techniques can only be seen as a first move towards a less mistake-oriented assessment culture, I have certainly come to the firm conviction that the key to a fair evaluation of the student’s efforts must reside in a realistic acknowledgement of what he can (rather than what he cannot yet) do. Moreover, this must be put in relation to reasonable expectations (i.e. the teacher must be conscious of the fact that he cannot always compare an intermediate learner’s efforts to the actual L2 norms of a native speaker).

Of course, this does not mean that I have abandoned the aim of developing accuracy in my students’ language use in any way; far from it. For instance, the insistence on my students’ active self-correction of their mistakes (i.e. ‘slips’ in relation to cognitive content they have already mastered) proves that I have tried to make them more aware of elements they may very well be able to avoid if they only put the competences they have already acquired to more concentrated and meticulous use. Nonetheless, Astolfi’s considerations suggest that even such measures may meet with limited success as the complexity of the free writing task will almost inevitably lead to mistakes in the actual performance, particularly under time pressure and without possibility of redrafting before marks are awarded. At the same time, I have realised that the complexity of my students’ learning processes inhibits a straightforward, immediate ‘elimination’ of errors; one should therefore focus more attention on the overall communicative value of their writings, particularly in the complex context of free writing tasks in summative tests. Once accuracy is viewed as a long-term goal that the student will only be able to achieve progressively (through the gradual ‘reshaping’ of the language systems he constantly keeps developing), it indeed becomes obvious that, all too often, ‘time spent on grammar correction is time not spent on…more important matters.’ (Truscott, p.356)

In this respect, I may understandably not have managed to reach my initially set target of permanently ‘eliminating’ certain mistakes from my students’ writings through my implemented types of remedial grammar correction. Yet my continued theoretical research and active experimentation with approaches to student errors have definitely brought me in touch with ‘more important matters’ for my further outlook and development as a language teacher. Actively looking for assessment systems that transcend rather than reinforce the mere focus on errors can certainly be regarded as such a salient element, and it is definitely one that I intend to pursue, expand and perfect as far as possible in my further professional practice.

An approach revisited: reconsidering the use of the CEFR in summative assessment

Upon revisiting this article (initially written after my very first experimentations with competence-based assessment of learners’ writing performances during my time as a student teacher), it seems necessary to add a number of amendments that, after a further three years of practical experience and assessment-related research, I have come to consider as crucial with regard to the possible use of CEFR descriptor scales and proficiency levels in summative assessment.

First and foremost, the above-mentioned attempts of linking specific descriptors from the original CEFR scales to numerical marks for the purpose of attributing a summative value to a specific performance ultimately constitutes a highly problematic issue. In fact, I would now argue that their inherent focus on language proficiency essentially impedes the direct usability of these CEFR descriptors to arrive at a numerical mark for a given student production, particularly as summative tests often focus on achievement rather than proficiency assessment (or indeed a mix of both). The possible inference of language competence as implied by the CEFR descriptor scales essentially requires a wide range of performances by the same student before language proficiency can be genuinely assessed (and certified). For that reason, I would argue that a more nuanced approach to the CEFR descriptor scales than the one described above is necessary in summative assessment. This does not mean that the guiding principles behind the CEFR descriptors cannot be integrated into summative assessment at all; far from it. It would certainly be wrong to discard the very promising and fruitful focus on what the learner can do and the resulting positive impact on the assessment culture as a whole. Nevertheless, the direct attribution of precise numerical values to individual CEFR descriptors and levels is neither their intended purpose nor a particularly desirable application.

Instead, I would argue that the most suitable use of CEFR descriptor scales consists in informing the targeted proficiency levels in more specific marking grids that are purposely designed for the summative assessment of an individual free writing (or speaking) performance. Such an approach has recently been pursued in the Luxembourg education system, where CEFR-inspired marking grids for speaking and writing have been published in the ELT syllabi for lower-level classes (defining ‘A2’ as the basic level the students need to reach before they can progress to a higher form, and ‘A2+’ as the ideally targeted level of performance). While these grids pay tribute and are generally aligned to the general CEFR proficiency levels in the sense that they allow for ‘imperfect’ performances to be acceptable (even for the highest achievable marks) at this early stage of language learning, they do not simply use the original descriptors from the CEFR scales. Instead, more precise and product-oriented ‘content’ and ‘language’-related descriptors are used as the core of a criteria-referenced assessment of a one-off writing performance. Due to the various criteria and descriptors that these grids include, they also allow for a more systematic move away from excessive norm-referencing (i.e. the comparison of various student performances with each other, potentially leading to the unfair attribution of an excessively low mark for a comparatively ‘weaker’ performance, is replaced by a stricter focus on what can realistically be expected by a student at this stage, which in turn implies ‘fairer’ marks overall).

In hindsight, I would also argue that pertinent marking grids should not span more than two different CEFR proficiency levels at once if they are to be used for the summative assessment of various student performances from the same class or form. In other words, the decision to use a form of assessment in 4e moderne that encompassed the three levels of A2, B1 and B2 might have been ill-advised to a certain extent. Indeed, one might consider that B2 was a very steep target for “intermediate”-level students in only their third year of school-based English learning; in the logic of allowing some imperfections to characterise even the highest-scoring performances, ‘B1’ (as basic standard) and ‘B1+’ (as target standard) might have been more sensible and realistic at that point.

Overall, then, I believe that it remains important to underline the numerous beneficial effects that a CEFR-inspired approach to assessment undoubtedly offers; at the same time, however, it is paramount to keep in mind that the practical application of this tool requires very careful reflection and planning to guarantee acceptable levels of validity and reliability of the proposed assessment schemes.

Selected bibliography

Astolfi, Jean-Pierre. L’erreur, un outil pour enseigner, ESF (Paris: 1997)

Brown, H. Douglas. Principles of Language Learning and Teaching (5^th ed.), Pearson Longman (New York: 2007).

Truscott, John. ‘The Case against Grammar Correction In L2 Writing Classes’ in Language Learning 46:2, June 1996.

Council of Europe, Common European Framework of Reference for Language Teachers.

The Methodology and Language for Secondary Teachers course can be viewed here