![]()
|
Humanising Language Teaching "I thought Chunking was a city in China - until I discovered Michael Lewis"In an excellent talk at this year's IATEFL conference, Jane Willis showed how multiword expressions that occur frequently in a corpus could be automatically identified by smart corpus-query software. She listed the 200 most common four-word clusters in Cobuild's corpus of Spoken English, and showed how many of these could be distributed among a small number of functional categories. Thus, for example, expressions like "a long time ago", "a number of years", and "towards the end of" all appeared several hundred times and could broadly be lumped under the heading of Time-markers. Interestingly, the terms "corpus" and "chunking" were used repeatedly in her lecture without ever being defined - so she must have assumed that the audience was already fully up to speed on both points. This would certainly not have been the case even two or three years ago, and it is a good indication of how fully both corpus linguistics and the "Lexical Approach" have penetrated the consciousness of the ELT profession. And not before time. The notion that much of our language output consists of "chunks" - or partially pre-assembled multiword units of various kinds - dates back at least to the 1930s, when A.S. Hornby and Harold Palmer first showed the importance of ready-made sequences in the way we store and process language. The advent of corpus linguistics helped to revive interest in this area, because what corpus data reveals above all is the variety of ways - whether grammatical, collocational, or idiomatic - in which words combine with each other. Most recently, the work of Michael Lewis has done a great deal to spread this message to a wider audience. So the emphasis now is on strategies for helping learners recognise these items when they see them, and at the Edinburgh IATEFL conference both Willis and Lewis stressed the need for raising students' consciousness in this area. How can a corpus help here? Computers are famously good at counting, and corpus-query programs can rapidly identify any 2-word, 3-word, or n-word combination that occurs frequently in a text or text collection. By far the best piece of software for general use is WordSmith Tools, an inexpensive but powerful program for all forms of corpus study. I will say more about WordSmith in a later column (for details see the website of its creator, Mike Scott, at http://www.liv.ac.uk/~ms2928/wordsmit.html.) But among its many functions is a tool for counting "clusters", which gives you a range of options. You can set the length of cluster you are interested in (in practice, two-word, three-word, and four-word groups are the most revealing), then run the program either on your whole corpus - which is what Jane Willis did for her IATEFL session –-or to focus on just one particular word and the way it combines. I tried this on the word way on a small subset of the British National Corpus (BNC), and the four-word chunks this identified included things like "the way in which", "in the same way", and "on the way to". But the 8th most common cluster here was the expression "in the way of", and a further search of the corpus showed almost 1000 of these in the BNC. This phrase, in other words, is roughly in the same frequency band as words such as "amateur" or "bicycle", which suggests it may be worth learning about. Here are some of the concordance lines generated by this search: xpect to be available in the way of food at Christ [source: BNC] What we have here is a common device for specifying the particular type of thing you are talking about - used in much the same way as phrases like "when it comes to x" or "as far as x is concerned". Intuition alone is unlikely to lead us to this sort of expression, but corpus software does it with effortless efficiency. An interesting question here is where the border lies between, on one side, multiword expressions that form a vital part of "native-like" fluency and, on the other side, worn-out cliches which students would be better off avoiding. We will have to save this for another time. Michael Rundell is a lexicographer, and has been using corpora since the early 1980s. As Managing Editor of Longman Dictionaries for ten years (1984-94) he edited the Longman Dictionary of Contemporary English (1987, 1995) and the Longman Language Activator (1993). He has been involved in the design and development of corpus materials of various types, including the BNC and the Longman Learner Corpus. He is now a freelance consultant, and (with the lexicographer Sue Atkins) runs the "Lexicography MasterClass", providing training courses in all aspects of dictionary development and dictionary use (see http://ds.dial.pipex.com/town/lane/ae345). |