Pilgrims HomeContentsEditorialMarjor ArticleJokesShort ArticleIdeas from the CorporaLesson OutlinesStudent VoicesPublicationsAn Old ExercisePilgrims Course OutlineReaders LettersPrevious Editions

Copyright Information

Humanising Language Teaching
Year 1; Issue 8; December 1999

Ideas from the Corpora


English Learner Corpora And The Polish Learner

Przemysław Kaszubski
School of English
Adam Mickiewicz University
Poznań


The current issue of HLT being in the hands of a Polish editor, I could not contain my desire to focus on issues that address the interests of Polish learners of English, who, however, should be regarded here as no more than temporary representatives of the broader EFL learner community.

My major research interest is in corpora of English learner writing and in ways to exploit them to enhance writing instruction. Computerised learner corpora started to make it big in the early nineties, following the "corpus revolution" and the rise of the lexical approach. In 1994 Granger published a seminal paper in which she claimed that "one should not exaggerate the impact of native corpora on foreign language teaching [and that] having access to comprehensive frequency lists may well help course designers compile better lexical syllabi, but it will not give them access to learners' actual lexical problems" (Granger 1994: 25). Learner corpora were intended to fill that gap.

Compared with standard corpora of native English, interlanguage corpora are usually tiny, running in the range of hundreds of thousands of words, rather than hundreds of millions. A notable exception is the commercially developed 10-million-word Longman Learner Corpus (LLC), which has underpinned some important lexicographic releases, such as The Longman Language Activator (1993), or the Longman Dictionary of Contemporary English (3rd ed., 1995). Smaller learner corpora compiled by linguists tend to be more tightly controlled than the LLC and to document specific kinds of interlanguage (focus on one type of text and task, one proficiency level etc). It is this homogeneity, as well as their typical L1 orientation, which makes such resources especially valuable in research.

One of the major international collections built on strict sampling principles is the International Corpus of Learner English (ICLE), which contains argumentative essays acquired from learners in more than a dozen different EFL countries in Europe and beyond. Although the ICLE corpus is not yet available to the public, research on it has been carried out for years. On the strength of my role in the compilation of the Polish sub-corpus of ICLE (PICLE), I have been able to access both the learner and the native control data that are part of ICLE. The illustrations given below have largely been taken from that resource.

PICLE is not the only Polish EFL corpus in existence. Besides the Polish part of the Longman Learner Corpus, significant contributions to learner corpus research in Poland have been made at the University of Łódź and other academic centres. The reader wishing to find out more about PICLE, ICLE and learner corpora in Poland and elsewhere is advised to visit my web page: http://main.amu.edu/~przemka.

Having computerised resources at hand is a major improvement on the situation that error and interlanguage analysts faced in the 1970s and 1980s. Even today, however, as attempts to use spell and grammar checking software can easily demonstrate, automatisation of error detection continues to challenge computers. To ensure the right "hits" we still rely on intuition, observation, published reports and/or chance. Fortunately, computers can be of help in speeding up the retrieval of data. Below are two examples of PICLE-based findings inspired by my personal classroom experience. The first is a misuse of the phrase "possibility to do sth", which Polish EFL writers often confuse with "chance/opportunity to do sth":

... the adoptive parents have influenced their child, without giving him any choice or possibility to "try out" other options.

For this reason we should reread a story because it gives us possibility to look at the literary work from a perspective.

Another typical feature of advanced written "Polglish" is the use of the idiomatic "in case of" (normally reserved for "emergency" contexts such as "damage" or "fire") instead of the correct "in the case of":

... the older they are the more difficult it is to adjust to another person. Especially, in case of women where their attractiveness decreases with their age.

Therefore doctors call marijuana "yeasts for brain". In case of many, it awakes creativness [sic!] in various fields.

Both cases can be attributed to word-for-word transfer from L1. "Give/have (a/the) possibility to do sth" (meaning "enable/can") seems motivated by the Polish "dać/mieć możliwość zrobienia czegoś", with the noun "possibility" inserted on account of its being the primary equivalent to the Polish word "możliwość". "In case of" and "in the case of" are apparently blended in the mind of a typical Polish user, who likely associates both with the phrase "w przypadku czegoś", which can serve both meanings and contexts in Polish.

A learner corpus stored on computer is probably at its best when pitted against similarly encoded other corpora — with a view to finding the differences and similarities in the relative frequencies of words, collocations, idioms, grammatical structures, discourse features etc., and their distribution in text. Statistically confirmed disparities between native and non-native use may be indications of a characteristic foreign 'accent', also in writing. Some of the global findings in this area point to learners' underuse of function words, overuse of common vocabulary, underuse of restricted collocations, overuse of certain sentence patterns. Specific quantitative findings often refer to learners sharing an L1. The phrase "in (the) case of" features very prominently in Polish EFL writing in terms of frequency, and the reason may be precisely the presence of the corresponding equivalent, which, it should be noted, typifies written rather than spoken Polish.

Although the commonest English lexical words are said to be overused by EFL learners, closer inspections of semantic meaning, wordforms and phraseology sometimes adjusts such generalisations. The verb (lemma) "MAKE", for instance, appears to be "liked" by Polish learners apparently the most in its canonical form "make" (i.e. in the infinitive or the present tense) but not at all in the form "made". A disambiguation of the latter (separation of past tense instances, identification of phrasal and other multiple-word uses, semantico-syntactic classification) reveals a chasm between non-native and native uses of the passive mono-transitive MAKE:

Frequencies per 100,000 running words

Polish intermediate EFL (Polish LLC)

Polish advanced EFL (PICLE)

native English (learners)

native English (professional writers)

Lemma MAKE

267

276

329

249

Wordform "made"

37

31

101

89

Total passive uses of "made"

17

19

59

54

(BE) made (monotransitive, e.g. An offer was made.)

11

6

31

34

These numbers are not as straightforwardly connected to L1 transfer. However, while we might plan to cast a wider net to see if Polish learners tended to avoid the passive voice in general, the finding on "made" has potential pedagogical value, as it refers to an individual, tangible item of the lexicon, which a student should be easily able to refer to and adjust in his/her productive use if necessary. One possible way of investigating the case further might be to study the associated restricted collocates:

x = instance(s) of collocate attested in corpus

Polish intermediate EFL (Polish LLC)

Polish advanced EFL (PICLE)

native English (learners)

native English (professionals)

Different restricted noun collocates:

6

3

16

16

advance(s)

   

x

 

application

     

x

attempt(s)

   

x

x

changes

 

x

x

 

choice

     

x

claim

   

x

 

comment(s)

x

   

x

comparison(s)

x

 

x

 

decision(s)

 

x

x

 

difference

   

x

 

discovery (-ries)

     

x

effort

x

     

errors

   

x

 

examination

     

x

films

 

x

   

judgments

   

x

 

lists

x

     

mistake

   

x

 

money

   

x

 

notes

     

x

observation

   

x

 

point

     

x

progress

x

 

x

 

recording(s)

     

x

remark

x

     

sacrifices

   

x

 

sketch

     

x

sounds

     

x

start

     

x

statement

     

x

studies

   

x

 

suggestions

     

x

use(s)

     

x

Although the frequencies of the above collocations (even if we group the data) would be too low to guarantee accuracy, and some appear to be topic-prone ("films", "money"), the fact that transpires from the chart is that formal argumentative native English writing features a wide range of passive uses of "made" co-occurring with nouns such as those listed. It is my point that informing the Polish learner explicitly about this stylistic tendency and providing appropriate exemplification should be beneficial for his/her development as an EFL writer. I believe the argument holds equally well for any other learner variety of English.

+ + +

More examples of state-of-the-art learner corpus research, at various levels, can be found in Granger (ed. 1998).


References:

Benson, M., Benson, E. and Ilson, R. (1986) The BBI combinatory dictionary of English: a guide to word combinations. Amsterdam-Philadelphia: John Benjamins Publishing Company.

Granger, S. (1994) 'The learner corpus: a revolution in applied linguistics'. English Today 39, Vol.10, Nr 3: 25-29.

Granger, S. (ed.) (1998) Learner English on computer. London, Longman.

+ + +

Przemysław Kaszubski, M.A. — assistant lecturer at the School of English, Adam Mickiewicz University, Poznań, Poland — has been a teacher of English writing to 1st- and 2nd-year Polish students of English for over 6 years. For the last 5 years, he has held the post of co-ordinator of the writing programmes in the School. Since 1995, he has been compiling and researching PICLE — the Polish sub-corpus of the International Corpus of Learner English; information about these corpora activities can be obtained from his Web page:

http://main.amu.edu.pl/~przemka. Apart from publishing on learner corpora, he has also been involved in lexicographical projects, such as the Collins Polish-English English-Polish Dictionary and a recently co-authored dictionary of Polish-English Idioms.


Back to the top