In association with Pilgrims Limited
*  CONTENTS
--- 
*  EDITORIAL
--- 
*  MAJOR ARTICLES
--- 
*  JOKES
--- 
*  SHORT ARTICLES
--- 
*  CORPORA IDEAS
--- 
*  LESSON OUTLINES
--- 
*  STUDENT VOICES
--- 
*  PUBLICATIONS
--- 
*  AN OLD EXERCISE
--- 
*  COURSE OUTLINE
--- 
*  READERS’ LETTERS
--- 
*  PREVIOUS EDITIONS
--- 
*  BOOK PREVIEW
--- 
*  POEMS
--- 
--- 
*  Would you like to receive publication updates from HLT? Join our free mailing list
--- 
Pilgrims 2005 Teacher Training Courses - Read More
--- 
 
Humanising Language Teaching
Humanising Language Teaching
Humanising Language Teaching
IDEAS FROM THE CORPORA

Creativity and corpus linguistics: Using a Corpus to Write Stories

Ian Michael Robinson, Italy

Ian Robinson is a researcher at the University of Calabria in Italy. His main work at the university is teaching English for students involved in Social Work degree courses and his interests involve student autonomy, intercultural issues as well as corpus linguistics.
E-mail: ian.robinson@unical.it

Menu

Background
Results
Story writing
Final thoughts
References
Appendix

Background

This article is about a presentation given at the first international conference of Creativity and Innovation, held at the University of Calabria in Italy and organised by Professor Carmen Argondizzo. When I first heard about this conference I was attending the corpus linguistics summer school at Aston University in Birmingham, England. My idea was to try and incorporate corpus linguistics with creativity. More or less at the same time I had been listening to a BBC podcast about the brothers Grimm. I thought it might be interesting to try and combine all these elements together. I wanted to see what results could be achieved from giving students the key words or phrases from fairy tales and having them write their own story. This was a way of combining the innovation of corpus linguistics and the creativity of writing stories, and so would fit nicely into the theme of the conference. This idea that creativity can occur even when students are given the constraint of using certain words is supported by Garr Reynolds, one of the most creative talents in the field of presentations, when he writes that ‘Constraints and limitations are wonderful allies. They lead to enhanced creativity and ingenious solutions’ (2010, p16).

When investigating language it is very difficult to get trustworthy data from a very limited source. Just because in one specific text we find that the verb ‘listen’ tends to be followed by ‘to’ does not necessarily mean it can be said to be a general rule. This is where a corpus comes in useful. A corpus is a group of texts collected together to aid analysis of the language. For this analysis the greater the collection is, the more accurate the findings are. The trouble is that it can be troublesome to analysis this data by hand. However, with the advent of modern computers the amount of data that can be stored and analysed has grown incredibly. Nowadays a corpus can be described as ‘A collection of texts stored in an electronic database’ (Baker, Hardie and McEnery 2006, p48). A corpus can be very specific, such as the works of one particular author, or very general such as the British national Corpus which is available online and has a total of over 100,000,000 words. Other corpora have grown even beyond this and might be able to be broken down into distinct fields such as informal written British English, or academic spoken American English, for example. One of the largest corpora is the UKwac, which is an ever growing corpus that has over two billion words in it.

Jacob and Wilhelm Grimm, born in 1785 and 1786 respectively, collected together German folk tales which they published in a book called Kinder- und Hausmärchen, which people in English speaking countries know as Grimm’s Fairy Tales. There are 209 stories and these are available in book form and online and for this study I used the Margaret Hunt translation online. This collection makes a good selection for a corpus as it can be said to form a complete corpus on one specific area, just as the complete writings of any writer would do.

The corpus that came from the Brothers Grimm amounts to 281,934 words. This means that it is not one of the biggest corpora. Having collected the data, i.e. downloading it from the internet, it was necessary to have a programme with which to analyse it. This particular work was carried out using the Wordsmith tool, which I found to be generally easy to use. For some reason the transfer from online webpages to a word document was not as straight forward as I had hoped. Lots of the ends of lines in the webpage paragraphs were seen as paragraph endings in the word document and so I had to go through and reconnect them. Some words did not come through clearly and had to be retyped. This was rather time consuming and might just be due to my limited knowledge of computers.

The various programmes available for corpus analysis can do some amazing things. The most commonly used features are probably word frequency and word collocation and it is necessary to choose the programme that will allow you to do the analysis that you want. For this experiment I wanted to get the most frequent words found in the text and words that are commonly used together. In his book The Lexical Approach, Lewis (1993) put forward ideas concerning how people do not construct sentences by individually putting each single word together to form a sentence, but rather we learn language in set chunks. EFL students do not necessarily understand the construction of the phrase ‘How old are you?’ when they first learn it, but they learn it as one chunk of language and they know what it means. I wanted to find chunks of language that were used in fairy tales so that these could be used in writing other tales. Wordsmith allowed me to do this, although instead of using the word ‘chunks’ it refers to them as ‘clusters’.

Results

The word frequency list from the Grimm tales gave the words in the following table as the top ten words used in the tales

1 The
2 End
3 To
4 He
5 A
6 Was
7 Of
8 It
9 You
10 In

Table One: Grimm tales top ten words.

This is probably not a great surprise as these are among the most commonly used words in the English language and so almost any corpus would give similar results. It was therefore necessary to eliminate the most common words from the list. This would give the keywords. A keyword is ‘A word which appears in a corpus statistically significantly more frequently than would be expected by chance when compared to a corpus which is larger’ (Baker, Hardie and McEnery 2006, p97).

I took the top 100 words from the Grimm tales and took away from these any word that appeared in the BNC top 300 words and then also any that appeared in the UKwac top 300. This should then give the keywords that were specific to the tales. This time the top 12 were the words that are in the next table:

1 King
2 Saw
3 Once
4 Himself
5 Father
6 Daughter
7 Answered
8 Cried
9 Let
10 Forrest
11 Son
12 Wife

Table Two: Grimm tales top twelve keywords.

I was also interested in the language clusters, and Wordsmith is able to give 2, 3, 4, 5, 6 and 7 word clusters as well as the individual word list. I looked for clusters that appeared at least ten times in the corpus.

Looking for the different chunks the programme came up with the following numbers: 3 words – 1576
4 words – 54
5 words – 22
6 words – 5
7 words – 1

The longest cluster had seven words in it and was ‘there was once upon a time a’. It is obvious that this seven word cluster also includes other strings of words such as the two 6 word clusters ‘there was once upon a time’ and ‘was once upon a time a’. It is also therefore obvious that it contains 5 word, 4 word and 3 word clusters. These clusters which are included in the longer cluster were not used later. This same process was used to remove all the 5 word clusters that appeared in the corresponding 6 word clusters and so on. This was decided just as a way to limit the amount of input and avoid unnecessary repetition.

It was not practical or useful to give students all the clusters that were found and so a cut off limit was chosen otherwise the students would have had to read through all 1576 of the 3 word clusters. Wordsmith can give the clusters ordered by how frequently they appear in the tales, and so the most frequent clusters appeared first and those that were only used ten times (this was the minimum limit that I set in my search) came last. So as not to overload the potential creative writers too much I wanted to give only about the top twenty examples of each type of cluster, for the 4 word cluster this cut off was slightly higher as that was the boundary between one frequency and another.

Eventually a list of keywords, 3, 4, 5, 6 and 7 word clusters was compiled. This can be seen in the appendix.

Story writing

The second part of this work involved students using this list to create their own story. For this I asked for the help of students I was working with studying in the degree course of Social Work and students from various degree courses who had enrolled on an English course at the Language centre, all at the University of Calabria. Their level was, in general, A2/B1. The students were given the word lists and asked to write a story which they could do individually or together in pairs or small groups. They were not given a specific theme nor were they set a word limit.

Along with the word list I had attached another paper briefly introducing myself and explaining what I wanted them to do:

I am Ian Robinson and I am a researcher in Political Science. I would like your help in a research project that I am doing. I would like you to write a short story in English. On the other page there are some lists of words and phrases, please try to use as many of these as possible in your story. I would appreciate it if you could either underline the words and phrases that you use or highlight them in some other way. When you have finished your story, could you please send it to me via email.

There was also a short part for them to sign which would allow me to use their work in my research.

I received 28 stories from the students, some written individually and some in pairs or groups; some of these arrived via email and others were handwritten and delivered by hand. These creative works averaged just over 156 words and almost 30% of the writing was made up of words on the prepared list. The average number of single words that were used was just over 16 and the average word cluster length was between 4 and 5 words.

The example shown below was chosen not because it was the best or used the most words and clusters but because it happened to be the first that was sent to me.

The underlined words are the single keywords used from the list, while the words in italics are the clusters that were used.

There was once a king who lived in a beautiful castle in the forest. One day it come to pass that knocked at the door an old woman; she said to him that she was his mother, thought by all to be died.
In the middle of conversation the king’s daughter arrived. When the girl was in front of the old woman she asked her if she was a ghost, but she (old woman) said that in the midst of the forest there was a maiden that had taken care of her because she had been killed by wolf. The maiden give her to drink the water of life for the three time and so she revived.
When the king’s mother finished to tell her story toke in her arms the son and the granddaughter.
Ever since they lived happily.

Final thoughts

This project produced some creative and entertaining stories, but a result that is difficult to show here is the joy that the students demonstrated in writing these stories. I had tried to set a time limit but the students wanted to continue and were happy to do this. They were not told what type of story to write but all of the stories could be categorised as fairy tales. This suggests that a corpus generated from a specific genre, in this case tales, can help students write in that genre and even guides them towards that. Feedback after the presentation at the conference was generally positive and people seemed to be generally pleased to see a practical use for corpus linguistics and one that could have a direct application in classrooms. Often corpus linguistics is seen as something that is only done by people worried about whether a certain verb can be followed by a specific adverb or preposition. This project shows that corpus linguistics is something that we can all do and even have fun doing it.

References

Baker, P., Hardie, A. and McEnery, T. (2006) A Glossary of Corpus Linguistics, Edinburgh University Press; Edinburgh

Lewis, M. (1993) The Lexical Approach, LTP

Reynolds, G (2010) Presentationzen: Design, New Riders; Berkeley

Tales Collected by the Brothers Grimm: http://myweb.dal.ca/barkerb/fairies/grimm/ (accessed 28/08/2009)

Appendix

A STORY

Use as many of these words and phrases below as well as other words to write a story

king answered Mother morning sat eat
saw cried Door herself lay brother
once let Fell poor gold heart
himself forest Together tree tailor everything
father son Beautiful maiden whole
daughter wife Brought girl golden

Clusters

7-word clusters

there was once upon a time a

6-word clusters

it was not long before the did not know what to do
and when he came to the

5-word clusters

it came to pass that went to the king and and was just going to
in the middle of the if I could but shudder have I not reason to
when they came to the there was once a poor knocked at the door and
went into the forest and and at the same time when he got to the

4-word clusters

for a long time and was about to he went into the
and when he had and said to the he went to the
in front of the he said to the but he did not
there was once a in the midst of by the hand and
I will give you for the third time on the ground and
and as he was said the old woman he did not know
and when she had that he could not the king said to
I do not know that he was to the water of life
if you do not and said if you at the same time
out of the window and said to him they came to a
as soon as he and went to the
he was forced to as soon as the

3-word clusters

out of the
the king's daughter
the king's son
and when the
that he had
then said the
there was a
to him and
and said I
one of them
if you will
and in the
and when they
when he was
and began to
in the forest
and I will
and then he
him and said
in the evening
when it was

--- 

Please check the Methodology for Teaching Spoken Grammar and English course at Pilgrims website.
Please check the Creative Methodology for the Classroom course at Pilgrims website.

Back Back to the top

 
    © HLT Magazine and Pilgrims