Grammatical Creativity in Learner Corpora
Wayne Rimmer, Russia
Wayne Rimmer is a teacher at BKC-International House, Moscow. His doctoral research explores the degree and range of subordination in the International Corpus of Learner English. E-mail: wrimmer@bkc.ru
Menu
Introduction
Background
Results
Conclusions
References
While corpora have been around for hundreds of years, and the last fifty in electronic format, a relatively new phenomenon is the learners’ corpus, i.e. a collection of texts spoken/written by language learners rather than native-speakers. The main purpose of learner corpus linguistics has been research into second language acquisition and performance. To illustrate, Hinkel (2003) contrasted sentence structure in corpora of L1 and L2 academic essays. She found that non-native speakers relied on relatively simple syntax, for example their writing featured much fewer cleft constructions (it’s her attitude I resent) than native-speakers. Learner corpora have also become important in materials design. Thus, the ‘Common mistakes…’ series (e.g. Moore, 2005), based on the Cambridge Learner Corpus of 60,000 exam scripts, highlights typical test-takers’ errors and supplies remedial exercises.
Initially, learner corpora were used as a point of comparison with native-speaker corpora. This led to the image of the learner corpus as a flawed model, what not to say or write. Inevitably, learner corpus linguistics was heavily identified with error analysis. The content of the corpus was seldom of interest, let alone the contributions of individual learners. However, two factors necessitated a crucial reassessment of the perceived inferiority of learner corpora. The first is the debunking of the myth of the infallible native-speaker. The definition of a native-speaker has been shown to be fragile (Davies, 2003) and the relevance of the native-speaker model to international communication has been questioned (Jenkins, 2000). The second factor is humanistic methodologies which promote the learners’ worth. The result is that learner corpora are now being seen as valuable in their own right. This article reports a study that recognizes learner corpora as important sources of data for language description. The methodology can be called humanistic in the sense that it addresses the linguistic motivation of individual learners when communicating meaningful and personalized content.
This study uses a corpus methodology to investigate the grammatical creativity of advanced second-language learners. At the highest levels of language proficiency, it is difficult to distinguish grammatical performance purely by syntax and morphology. Near-native language competence is partly characterized by the capability to play with grammar and formulate new patternings. A literary example is zeugma, the coordination of unequal elements for humorous effect, as in she gave the tramp no money and no thought. This ability to manipulate and innovate grammatical constructions for pragmatic and stylistic goals is here defined as grammatical creativity. The significance of grammatical creativity to theory is that it explores the complex interaction of factors which power language variation and change from the learners’ perspective. The humanistic side of grammatical creativity is that it treats learners as participants in shaping language change and as stake-holders in the future development of English.
The data are drawn from the International Corpus of Learner English (ICLE) (Granger, 2007), a 2.5 million word corpus of academic essays on a set number of topics, typically argumentative in genre. A sub-corpus of the ICLE was formed consisting of answers to the following essay title.
Some people say that in our modern world, dominated by science and technology and industrialization, there is no longer a place for dreaming and imagination. What is your opinion?
Affect was the main consideration for choosing this title. The prompt is the most popular of all 922 different essay titles in the ICLE with 452 writers opting for this topic (Granger et al, 2002: 18). It is fair to say then that writers found this subject mater attractive and stimulating, which provides some confidence in the quality and representativeness of the language. When learners are engaged in a task which demands a highly-personalized response it is more likely that the language sample will be a valid indicator of grammatical competence (Purpura, 2004: 233).
The characteristics of this sub-corpus are tabulated below (statistics derived from Granger et al, 2002):
Size |
193, 951 words / 316 texts |
First language of contributors |
Bulgarian, Czech, Dutch, Finnish, French, German, Italian, Polish, Russian, Spanish, Swedish |
Mean age of contributors |
21.5 |
Gender of contributors |
83% female |
Status of contributors |
undergraduates |
Mode |
writing |
Genre |
argumentative essay, single topic, mean length 705 words |
The problem is identifying grammatical creativity reliably. It is not possible to arrive at a definition of any form of creativity, be it musical or linguistic, which excludes subjectivity on the part of the observer. In particular, the borderline between error and creativity is blurred. Consider the common use in the learner corpus of like as a conjunction.
It is like we are in a vicious circle (essay 49).
Confusion between the preposition like and conjunction as (if) is bemoaned by usage manuals going back to Fowler (2002/1926: 325). At first glance, like is the kind of sloppy usage associated with learner writing. However, on another reading, like may be a matter of conscious choice for stylistic reasons. The writer may be striving for a more informal effect. Examining the wider context, which a corpus allows us to do, can provide evidence for this. Text (49) is actually rather informal in tone as the opening sentence establishes.
Sometimes I really don’t know what to think.
The first person pronoun and contraction are less typical of formal writing. Informal does not mean lax or unskilled. The use of such literary vocabulary as equivocal, oxymoron and attain ideals shows that the writer is extremely capable. The informal register, I would suggest, is a conscious strategy to establish a rapport with the reader. With It is like we are in a vicious circle the quasi-technical term vicious circle is made less imposing by the casual use of like as a conjunction. The avoidance of an academic persona makes the writer’s voice easier to identify and empathize with. Examination of the sentence in context makes it more difficult to dismiss it as erroneous.
This particular analysis is open to debate but the general approach of contextualization is surely a more productive way to investigate grammatical creativity because it reaches out beyond individual sentences and the often frankly pedantic issues of prescriptive grammars. There are three types of context which are relevant to determining grammatical creativity. The first is the text in which the language appears, i.e. an individual essay, to indicate how the writer uses the language to develop the discourse; the second is the learner corpus in toto, to allow patterns to be related to other writing in the same genre; the third is native-speaker corpora, as a matter of quantity not quality - L1 corpora being bigger simply offer a fuller picture of the language. This paper argues that context is fundamental in setting the conditions for grammatical creativity. While syntax is synchronically relatively stable, context is by definition an unfixed variable so logically the impetus for grammatical variation and innovation will come from contextual constraints. Larsen-Freeman (1997) claims that this process eventually powers diachronical change, i.e. rules are shaped by discourse, not vice-versa.
The context-based nature of the study offers a natural opportunity for stylistics to be incorporated into the methodology. Stylistics is the study of linguistic options. The value of stylistics is that it presents grammar as a system of choices and the text as open to multiple interpretations. Even if the writer’s original purpose was retrievable, for example as in report-back research methodologies, where writers relate their composing processes, readers are still free to create their own sense and evaluation of the text. The meaning of a text is not fixed, it offers different things to different people and peoples. If this were not so, the study of literature would lose much of its excitement and personal resonance. Similarly, grammaticality can also be judgmental. Notions such as accuracy and appropriateness must be sensitive to context. Stylistics hence brings a fresher and more humanistic perspective to the preoccupation with grammar as right/wrong.
With learners, there is the temptation to regard all non-standard grammar as error. This would obscure those occasions when writing is intended to be innovative. Learners have other goals apart form accuracy and at an advanced level they have the grammatical knowledge to be creative. To illustrate, there is no phrasal verb to hermit away in the dictionary so the following clause from the corpus is striking.
… people get hermitted away through computers and electronic devices. (14).
The meaning is clear: technology is isolating people. The metaphor of a hermit gives a powerful image of miserable loneliness. Taken as a preposition of space, through conjures up the amusing scene of computer narks becoming physically absorbed into their monitors. The writer uses two grammatical devices to make to hermit away seem a legitimate coinage. First, the clause is marked by being in the passive voice. This markedness creates a certain tolerance for novelty. Second, the copula is get rather than be, although, from data in the 100 million word British National Corpus (BNC) (Biber et al, 1999: 481), get is (1) much less common than be in the passive and (2) virtually restricted to speaking get is chosen because it is associated with accidents and unfortunate events (Fleisher, 2006). This writer expertly manipulates the grammar to achieve a memorable metaphor.
There is a similar example of coining a phrasal verb below.
Wasting one’s time dreaming about was considered as a serious mistake. (200).
There is a twist in this sentence after about. The reader expects an object after about so when there isn’t one it is necessary to backtrack and reanalyze about as an adverb. Then there are two ways to parse dreaming about. It could either complement the noun phrase or be in apposition to it. This little mind game ensures that full attention is paid to the gravity of the prepositional complement as a serious mistake. The fact that wasting and dreaming share the same suffix creates a parallelism between them of form and meaning: dreaming used to be equated with time-wasting. As for the nominalized phrasal verb dreaming about, verb + about is a generative pattern indicating futile activity (hang about, mess about, piss about) so while dream about is unattested it fits semantically.
At this point it must be made clear that I am not endorsing a liberal stance which eschews the concept of error altogether. Error is a reality of usage, not every non-standard structure can be condoned. Less skilled writing cannot get away with attempts at innovation because their grammatical resource is not strong enough to support a new interpretation of the text. The learner corpus is not homogenous in terms of level and there was evidence of language which has to be labeled as error rather than creativity.
We live at a time, when machines replace humans in almost all branches of life, a time when the technical progress is so rapid, that we are choking in our tendency for catching it. (38).
choking is an effective image as the earlier mention of machines suggests us gagging from their grime and smoke. However, the prepositional phrase in our tendency for catching it in the comparative clause sounds suspect. Despite the lack of a comma, the whole prepositional phrase must function as an adverbial since choke does not license the preposition in. (The punctuation in the sentence is faulty at several points). catching is not the right word because to catch progress is not a collocation. Perhaps then, catching is deliberately used for its sense of contracting an illness, to catch a cold, implying that technology is an infectious disease. This would certainly reinforce the meaning of the text. However, the adverbial is ill-formed and this obscures the attempt at metaphor. The error is that tendency is wrongly complemented by an –ing nominal instead of a to infinitive, tendency to catch it. There are no occurrences in the BNC corpus of tendency + for + -ing participle. Arguably, if the correct complementation pattern had been used, the metaphorical interpretation of the adverbial would be much more convincing. Unfortunately, lack of grammatical knowledge spoils the effect. The willingness to take risks with language perhaps increases with proficiency but grammatical control is integral to success or failure.
Categorical assertions about grammaticality are notoriously vulnerable. Native-speakers just do not agree on the felicity of contentious sentences such as ?I bought two mouses for my computer (Rimmer, 2006b). Undoubtedly, individual variation can be indicative of larger trends in language. In learner corpora, the motivation for variation may be due to stylistics rather than a gap in competence. Consider the comparative clause below.
Has our schoolsystem gone this mad that, already in nursery school, they have to prepare the children for their future jobs. (113).
We would expect so in place of this. The COBUILD corpus (2006) provides 2 concordance lines for the structure so mad + finite clause, 1 with the complementizer that.
Some of the other ideas he mentioned that night were so mad that I should have seen through them straight away.
There are no examples of this adjective + finite clause in the COBUILD corpus. this is basically either a determiner or a pronoun. Functions of this as an adverb are semantically very restricted and confined to speaking. For example, from the Longman corpus of 300 million words (Summers, 2005), this is used to indicate size when the speaker uses corresponding hand signals, as in the boastful fisherman’s line ‘It was this big!’ While the meaning of this adjective + finite clause is perfectly clear on analogy with the standard comparative construction, its usage here is idiosyncratic and not part of the shared language community.
Still, it is problematic to reject the construction on purely linguistic grounds. that is well-attested as a modifier, even appearing in the learner corpus.
this soil is not that fertile anymore. (38).
There is also a pattern that adjective + comparative clause, labeled in the Longman corpus as British, spoken, informal.
I was that embarrassed I didn’t know what to say.
Why should this not occupy the same functional ground as that? The deictic distinction could be maintained with so as the unmarked variant. The original sentence can be restructured to represent a system of linguistic choices.
Has our schoolsystem gone so mad that,already in nursery school,they have to
that
this
prepare the children for their future jobs.
Grammatically, there is a gradient from standard to colloquial to marginal. Pragmatically, there may also be variation. so + adjective + that is unmarked. that + adjective + that emphasizes the undesirability of the situation. The Longman dictionary entry (ibid.) points out that this structure comments on ‘bad’ state of affairs. More generally, that has the force of problematicising situations (Carter & McCarthy, 2006: 180). The repetition of that, once as a strong and then as a weak form, is also emphatic. this + adjective + that internalizes the situation more deeply. The focus is not so much on blame and anger as personal hurt and disappointment. The linguistic context encourages this interpretation. The use of the plural personal determiner in our schoolsystem identifies the speaker with a whole community and culture. The pronoun they is intriguing for it has no obvious anaphor in this or previous sentences - the schoolsystem is singular and non-human. they is plausibly the teachers or all the mandarins in the educational machine. In any case, they is portrayed as an external and damaging entity.
The if clause in the concordance line below is another instance of disputed usage.
If to speak of Marc Chagel’s paintings, I would say that they are dream-like. (230).
A frequently-cited rule in usage manuals (e.g. Eastwood, 2005: 147: Swan, 2005: 610) is that if is not followed by a to-infinitive. Rimmer (2006a) used the BNC to show that proscription of if + to-infinitive is not justified. 741 tokens of if + to were attested, falling into four categories. The examples are from the BNC, the percentage represents proportion of total occurrences of if + to.
Category 1: as if adverbial of manner. 88%.
Holding her at arm’s length, he looked into her face, as if to memorize every detail.
This is by far the most common category but as if represents a separate subordinating phrase (equivalent to as though) which standardly introduces non-finite clauses:
She leant forward as if to make the facts clearer.
‘I have to go’, said Lucy, as if scolding, eyes sparking with pleasure.
Biber et al (1999: 840) note that as if is frequent in newspapers, fiction and academic writing.
Category 2: verbless clause. 10%.
It is proposed in the course of this work to use the USA as a main comparison, though other jurisdictions are also considered, even if to a lesser extent.
The particle to is a preposition not an infinitive marker so this category can also be discounted.
Category 3: citation. 2%.
But there’s the damned word if to contend with
Here if has a lexical not a grammatical function, as in the idiom no ifs and buts.
Category 4: conditional. 2%.
If to be in crisis means that the whole system is on the brink of total collapse or explosion, then we probably do not have a crisis.
This is the only relevant category in the BNC because if is an independent lexeme with a subordinating function. if marks the hypotaxis of a remote conditional sentence. The construction is rare but attested, always in formal registers. Another example from the BNC is noteworthy in that there are two parallel if + to sequences:
If to do this, if to be a Whistler at best, in the art of poetry, is to reach the height of poetic expression, then Ezra and Eliot have approached it and tant pis for the rest of us.
(Interestingly, the learner corpus citation also has a famous painter as a complement within the if clause.). Rimmer concludes that because if + to infinitive clause is so uncommon in even a corpus as large and balanced as the BNC, ‘… it is reasonable to infer that test-takers who produce subordinate clauses where if is followed by a to infinitive are processing complex grammar and operating at the highest end of the language proficiency spectrum (ibid: 514).’
For Rimmer, frequency is an indicator of complexity. This premise is questionable as briefly illustrated in the two sentences, semantically equivalent, below.
a. That Diana married Charles is amazing
b. It is amazing that Diana married Charles
a is grammatically unmarked because it is reducible to the canonical Subject-Verb-Complement distribution of English. b features extraposition by means of the dummy it. Comparing a and b usage data, the BNC (Biber et al, 1999: 674) tells us that the extraposed version is much more likely to be spoken and written. The relationship between frequency and complexity is elusive. Similarly, linguistic creativity is not just about being original. Novelty has to be licensed so an alternative approach is to examine the construction in context for justification of its use and effectiveness. Consider again the learner corpus example.
If to speak of Marc Chagel’s paintings, I would say that they are dream-like. (230).
Read as a single stand-alone sentence, there seems no particular motivation for the if + to infinitive clause. It seems interchangeable with a finite adverbial:
If we speak of Marc Chagel’s paintings, I would say that they are dream-like.
The context supplies an interpretation for the non-finite clause. The essay develops a theme of the freedom of personality. Several artists are invoked as symbols of free and independent thinkers. The writer shows her own freedom of thought by using the rare if + to infinitive structure rather than the canonical if + finite clause. The message is that freedom is a principle of language as well as art. By asserting her personal linguistic freedom, she thus aligns herself with the theme of the essay. The complexity of the structure resides not in its form or low-frequency but in its skillful embedding in context in order to enhance the argument.
There is no such opportunity to examine the full context of the BNC concordance lines. References are available for sources of concordance lines, e.g. the book or newspaper article, so in some cases the context is technically retrievable, but this would be an awkward task. The instant availability of context is an important advantage of manual corpora such as used in this study. However, qualitative analysis of even individual concordance lines affords some reward. The first example of if + to infinitive from the BNC invites an interesting contrast with its putative participal counterpart.
If to be in crisis means that the whole system is on the brink of total collapse or explosion, then we probably do not have a crisis.
If being in crisis means that the whole system is on the brink of total collapse or explosion, then we probably do not have a crisis.
There is no issue over the grammaticality of if + participal clause. For example, unlike with the if + to infinitive, a subject is allowed: If Barclays bank being in crisis… The writer prefers the to infinitive for semantic grounds. Where there is a choice between infinitive and participal complements, the difference can be one of mood (Huddleston & Pullum, 2002: 1243). The infinitive emphasizes the potential, the participle the realized. Compare I tried to sleep (but I couldn’t drop off) with I tried sleeping (but it made no difference). The main clause then we probably do not have a crisis shows that the situation is not critical. This favours a potential interpretation, rendered by the to infinitive. The writer, a politician or senior manager one suspects, carefully stresses that the situation is under control. probably, indicating an allowance for faulty judgment, makes the tone more modest.
The second BNC example deals with art rather than industry so it is expected that the motivation for if + to infinitive will be style rather than public rhetoric.
If to do this, if to be a Whistler at best, in the art of poetry, is to reach the height of poetic expression, then Ezra and Eliot have approached it and tant pis for the rest of us.
The three to infinitive clauses are all elements of the hypotaxis. Note how they become progressively more complex formally.
to do this verb + pronoun
to be a Whistler at best verb + proper noun + adverbial
to reach the height of poetic expression verb + head noun + prepositional complement
The repetition and development creates a sense of expectation and elegance to introduce the genius of Pound (dubbed by his first name Ezra for the sake of alliteration) and Eliot with a fanfare. The second sentence tant pis for the rest of us immediately lowers the tone with its relative simplicity, there is not even a verb, and the humorous juxtaposition of a French expression with a colloquial phrase.
To conclude, syntactically, there is nothing demonstrably infelicitous about if + to infinitive as an adverbial. This contrasts with the patent error when if + to infinitive is a complement, * I don’t know if to go. In the learner corpus and even the BNC, qualitative analysis has proved an invaluable tool in evaluating if + to infinitive. Writers select this construction not because it is infrequent but because it is effective in a specific context of use. Frequency is a somewhat misleading concept when applied to the individual writing process because it suggests that structures are employed randomly. For example, when Biber et al (1999: 674) calculate that 80% of that-clauses are in post-predicate position, this does not mean that a writer will use a pre-predicate that-clause in every 5th that-clause. Statistics summarize facts about groups, they cannot take into account specific conditions of use. Again, this is a strong reason, to add to humanistic criteria, why corpus linguistics should operate at the level of individual texts.
It has been noted how grammatical creativity often occupies the hinterland between error and creativity. It was also shown that examination of the context is essential for making an informed decision about the error/innovation distinction. Obaidul Hamid (2007) draws the same conclusion in a study investigating the corrigibility of second language learner errors. His results, compromised somewhat by the size of the data-set, suggest first that teachers’ interpretations of learners’ written mistakes are far from reliable and second that meaning can only be reconstructed by considering errors beyond the boundaries of the sentence. Similarly, grammatical creativity is identified most reliably in a context-rich environment.
The implication for teaching is the recognition that grammar is more than the sum of its parts. In an analogy with the natural sciences, Larsen-Freeman argues (1997: 143) that the complexity of systems resides in the interactions of components not the specific qualities of components. The parts of a system could be individually unremarkable; it is their combination and cooperation which is forceful. This holistic view of grammar contrasts strongly with the traditional stance that grammar is a check-list of forms that have an inherent status and potential to impart value to a text. For example, de Chazal links adverbials with ‘sophistication’ in advanced texts and advocates their use in successful academic writing.
‘The writer can select an appropriate [adverbial] form from the toolkit of structural items and generate their own language based on the given prototypical patterns, thereby meeting their semantic needs’. (2006: 41)
The reduction of grammar to a ‘toolkit’ grossly misrepresents and trivializes the nature of the grammatical resource and the writing process. If effective writing were just a matter of slotting idealized structures into sentences, it is difficult to account for why writing is so challenging for many learners. It is doubtful that mastery of even such a diverse area as adverbials provides a magic key to the problem of language acquisition and the attainment of high-order skills. The results of this study do not support a one to one connection between creativity and specific syntactical constructions. Consequently, the decompartmentalizing of grammar will not help learners to process grammar in context in order to be creative. Grammar should not be regarded as a body of knowledge but as a skill, what Larsen-Freeman (2003) later calls ‘grammaring’.
Grammaring is the process of applying forms to express functions that are embedded in a context which is dynamic in the sense that it is open to multiple interpretations. The way that writers/speakers can use grammar to shape a context is highly personalized.
‘Grammar is much more about our humanness than some static list of rules and exceptions suggests. Grammar allows us to choose how we present ourselves to the world, sometimes conforming to social norms yet all the while establishing our individual identities’. (Larsen-Freeman, 2003: 142)
In effect, each grammatical choice is unique for that individual in that context of use. Grammatical creativity can be viewed as the most advanced stage of grammaring. The learner needs to combine a deep knowledge of the grammatical system with awareness of how it can be manipulated within a specific context. The challenge for the teacher is to give advanced learners the skills and confidence to exploit their knowledge. Pedagogically, an approach which encourages self-expression and risk-taking represents a refreshing break from teaching grammar as a system of predetermined and inviolate rules. Humanistically, it allows learners to celebrate their individuality and creativity through the resource of grammar.
Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. (1999). Longman grammar of spoken and written English. Essex: Pearson Education.
Carter, R. & McCarthy, M. (2006). Cambridge grammar of English. Cambridge: Cambridge University Press.
COBUILD advanced learner’s dictionary and wordbank (CD-ROM). (2006). Glasgow: Harper Collins Publishers.
Davies, A. (2003). The native-speaker: myth and reality. Clevedon: Multilingual Matters.
DeChazal. (2006). Symposium on academic writing. In B. Beaven (Ed.), IATEFL 2006: Harrogate conference selections (pp. 40 – 43). Canterbury: IATEFL.
Eastwood, J. (2005). Oxford learner’s grammar. Oxford: Oxford University Press.
Fleisher, N. (2006). The origin of passive get. English language and linguistics, 10 (2), 225 – 252.
Fowler, H. (2002). A dictionary of modern English usage. Oxford: Oxford University Press. (Original work published 1926).
Granger, S., Dagneaux, E. & Meunier, F. (2002). International corpus of learner English. Louvain, Belgium: UCL Presses Universitaires de Louvain.
Granger, S. (2007). International corpus of learner English [On-line]. Available: http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm
Hinkel, E. (2003). Simplicity without elegance: features of sentences in L1 and L2 academic texts. TESOL Quarterly, 37 (2), 275 – 301.
Huddleston, R. & Pullum, G. (2002). The Cambridge grammar of the English language. Cambridge: Cambridge University Press.
Jenkins, J. (2000). The phonology of English as an international language. Oxford: Oxford University Press.
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisition. Applied Linguistics, 18 (2), 139 – 157.
Larsen-Freeman, D. (2003). Teaching language: from grammar to grammaring. Boston: Heinle.
Moore, J. (2005). Common mistakes at Proficiency … and how to avoid them. Cambridge: Cambridge University Press.
Obaidul Hamid, M. (2007). Identifying second language errors: how plausible are plausible reconstructions? English language teaching journal, 61 (2), 107 – 116.
Purpura, J. (2004). Assessing grammar. Cambridge: Cambridge University Press.
Rimmer, W. (2006a). Measuring grammatical complexity: the Gordian knot. Language Testing 23 (4), 497 – 519.
Rimmer, W. (2006b). Grammaticality judgment tests: trial by error. Journal of Language and Linguistics 5 (2), 246 – 261.
Summers, D. (2005). Longman dictionary of contemporary English. Essex: Pearson Education.
Swan, M. (2005). Practical English usage (3rd ed.). Oxford: Oxford University Press.
Please check the Methodology for Teaching Spoken Grammar and Language course at Pilgrims website.
Please check the What’s New in Language Teaching course at Pilgrims website.
Please check the Teaching Advanced Students course at Pilgrims website.
|