Creativity and corpus linguistics: Using a Corpus to Write Stories
Ian Michael Robinson, Italy
Ian Robinson is a researcher at the University of Calabria in Italy. His main work at the university is teaching English for students involved in Social Work degree courses and his interests involve student autonomy, intercultural issues as well as corpus linguistics.
E-mail: ian.robinson@unical.it
Menu
Background
Results
Story writing
Final thoughts
References
Appendix
This article is about a presentation given at the first international conference of Creativity and Innovation, held at the University of Calabria in Italy and organised by Professor Carmen Argondizzo. When I first heard about this conference I was attending the corpus linguistics summer school at Aston University in Birmingham, England. My idea was to try and incorporate corpus linguistics with creativity. More or less at the same time I had been listening to a BBC podcast about the brothers Grimm. I thought it might be interesting to try and combine all these elements together. I wanted to see what results could be achieved from giving students the key words or phrases from fairy tales and having them write their own story. This was a way of combining the innovation of corpus linguistics and the creativity of writing stories, and so would fit nicely into the theme of the conference. This idea that creativity can occur even when students are given the constraint of using certain words is supported by Garr Reynolds, one of the most creative talents in the field of presentations, when he writes that ‘Constraints and limitations are wonderful allies. They lead to enhanced creativity and ingenious solutions’ (2010, p16).
When investigating language it is very difficult to get trustworthy data from a very limited source. Just because in one specific text we find that the verb ‘listen’ tends to be followed by ‘to’ does not necessarily mean it can be said to be a general rule. This is where a corpus comes in useful. A corpus is a group of texts collected together to aid analysis of the language. For this analysis the greater the collection is, the more accurate the findings are. The trouble is that it can be troublesome to analysis this data by hand. However, with the advent of modern computers the amount of data that can be stored and analysed has grown incredibly. Nowadays a corpus can be described as ‘A collection of texts stored in an electronic database’ (Baker, Hardie and McEnery 2006, p48). A corpus can be very specific, such as the works of one particular author, or very general such as the British national Corpus which is available online and has a total of over 100,000,000 words. Other corpora have grown even beyond this and might be able to be broken down into distinct fields such as informal written British English, or academic spoken American English, for example. One of the largest corpora is the UKwac, which is an ever growing corpus that has over two billion words in it.
Jacob and Wilhelm Grimm, born in 1785 and 1786 respectively, collected together German folk tales which they published in a book called Kinder- und Hausmärchen, which people in English speaking countries know as Grimm’s Fairy Tales. There are 209 stories and these are available in book form and online and for this study I used the Margaret Hunt translation online. This collection makes a good selection for a corpus as it can be said to form a complete corpus on one specific area, just as the complete writings of any writer would do.
The corpus that came from the Brothers Grimm amounts to 281,934 words. This means that it is not one of the biggest corpora. Having collected the data, i.e. downloading it from the internet, it was necessary to have a programme with which to analyse it. This particular work was carried out using the Wordsmith tool, which I found to be generally easy to use. For some reason the transfer from online webpages to a word document was not as straight forward as I had hoped. Lots of the ends of lines in the webpage paragraphs were seen as paragraph endings in the word document and so I had to go through and reconnect them. Some words did not come through clearly and had to be retyped. This was rather time consuming and might just be due to my limited knowledge of computers.
The various programmes available for corpus analysis can do some amazing things. The most commonly used features are probably word frequency and word collocation and it is necessary to choose the programme that will allow you to do the analysis that you want. For this experiment I wanted to get the most frequent words found in the text and words that are commonly used together. In his book The Lexical Approach, Lewis (1993) put forward ideas concerning how people do not construct sentences by individually putting each single word together to form a sentence, but rather we learn language in set chunks. EFL students do not necessarily understand the construction of the phrase ‘How old are you?’ when they first learn it, but they learn it as one chunk of language and they know what it means. I wanted to find chunks of language that were used in fairy tales so that these could be used in writing other tales. Wordsmith allowed me to do this, although instead of using the word ‘chunks’ it refers to them as ‘clusters’.
The word frequency list from the Grimm tales gave the words in the following table as the top ten words used in the tales
1 |
The |
2 |
End |
3 |
To |
4 |
He |
5 |
A |
6 |
Was |
7 |
Of |
8 |
It |
9 |
You |
10 |
In |
Table One: Grimm tales top ten words.
This is probably not a great surprise as these are among the most commonly used words in the English language and so almost any corpus would give similar results. It was therefore necessary to eliminate the most common words from the list. This would give the keywords. A keyword is ‘A word which appears in a corpus statistically significantly more frequently than would be expected by chance when compared to a corpus which is larger’ (Baker, Hardie and McEnery 2006, p97).
I took the top 100 words from the Grimm tales and took away from these any word that appeared in the BNC top 300 words and then also any that appeared in the UKwac top 300. This should then give the keywords that were specific to the tales. This time the top 12 were the words that are in the next table:
1 |
King |
2 |
Saw |
3 |
Once |
4 |
Himself |
5 |
Father |
6 |
Daughter |
7 |
Answered |
8 |
Cried |
9 |
Let |
10 |
Forrest |
11 |
Son |
12 |
Wife |
Table Two: Grimm tales top twelve keywords.
I was also interested in the language clusters, and Wordsmith is able to give 2, 3, 4, 5, 6 and 7 word clusters as well as the individual word list. I looked for clusters that appeared at least ten times in the corpus.
Looking for the different chunks the programme came up with the following numbers:
3 words – 1576
4 words – 54
5 words – 22
6 words – 5
7 words – 1
The longest cluster had seven words in it and was ‘there was once upon a time a’. It is obvious that this seven word cluster also includes other strings of words such as the two 6 word clusters ‘there was once upon a time’ and ‘was once upon a time a’. It is also therefore obvious that it contains 5 word, 4 word and 3 word clusters. These clusters which are included in the longer cluster were not used later. This same process was used to remove all the 5 word clusters that appeared in the corresponding 6 word clusters and so on. This was decided just as a way to limit the amount of input and avoid unnecessary repetition.
It was not practical or useful to give students all the clusters that were found and so a cut off limit was chosen otherwise the students would have had to read through all 1576 of the 3 word clusters. Wordsmith can give the clusters ordered by how frequently they appear in the tales, and so the most frequent clusters appeared first and those that were only used ten times (this was the minimum limit that I set in my search) came last. So as not to overload the potential creative writers too much I wanted to give only about the top twenty examples of each type of cluster, for the 4 word cluster this cut off was slightly higher as that was the boundary between one frequency and another.
Eventually a list of keywords, 3, 4, 5, 6 and 7 word clusters was compiled. This can be seen in the appendix.
The second part of this work involved students using this list to create their own story. For this I asked for the help of students I was working with studying in the degree course of Social Work and students from various degree courses who had enrolled on an English course at the Language centre, all at the University of Calabria. Their level was, in general, A2/B1. The students were given the word lists and asked to write a story which they could do individually or together in pairs or small groups. They were not given a specific theme nor were they set a word limit.
Along with the word list I had attached another paper briefly introducing myself and explaining what I wanted them to do:
I am Ian Robinson and I am a researcher in Political Science. I would like your help in a research project that I am doing. I would like you to write a short story in English. On the other page there are some lists of words and phrases, please try to use as many of these as possible in your story. I would appreciate it if you could either underline the words and phrases that you use or highlight them in some other way. When you have finished your story, could you please send it to me via email.
There was also a short part for them to sign which would allow me to use their work in my research.
I received 28 stories from the students, some written individually and some in pairs or groups; some of these arrived via email and others were handwritten and delivered by hand. These creative works averaged just over 156 words and almost 30% of the writing was made up of words on the prepared list. The average number of single words that were used was just over 16 and the average word cluster length was between 4 and 5 words.
The example shown below was chosen not because it was the best or used the most words and clusters but because it happened to be the first that was sent to me.
The underlined words are the single keywords used from the list, while the words in italics are the clusters that were used.
There was once a king who lived in a beautiful castle in the forest. One day it come to pass that knocked at the door an old woman; she said to him that she was his mother, thought by all to be died.
In the middle of conversation the king’s daughter arrived. When the girl was in front of the old woman she asked her if she was a ghost, but she (old woman) said that in the midst of the forest there was a maiden that had taken care of her because she had been killed by wolf. The maiden give her to drink the water of life for the three time and so she revived.
When the king’s mother finished to tell her story toke in her arms the son and the granddaughter.
Ever since they lived happily.
This project produced some creative and entertaining stories, but a result that is difficult to show here is the joy that the students demonstrated in writing these stories. I had tried to set a time limit but the students wanted to continue and were happy to do this. They were not told what type of story to write but all of the stories could be categorised as fairy tales. This suggests that a corpus generated from a specific genre, in this case tales, can help students write in that genre and even guides them towards that. Feedback after the presentation at the conference was generally positive and people seemed to be generally pleased to see a practical use for corpus linguistics and one that could have a direct application in classrooms. Often corpus linguistics is seen as something that is only done by people worried about whether a certain verb can be followed by a specific adverb or preposition. This project shows that corpus linguistics is something that we can all do and even have fun doing it.
Baker, P., Hardie, A. and McEnery, T. (2006) A Glossary of Corpus Linguistics, Edinburgh University Press; Edinburgh
Lewis, M. (1993) The Lexical Approach, LTP
Reynolds, G (2010) Presentationzen: Design, New Riders; Berkeley
Tales Collected by the Brothers Grimm: http://myweb.dal.ca/barkerb/fairies/grimm/ (accessed 28/08/2009)
A STORY
Use as many of these words and phrases below as well as other words to write a story
king |
answered |
Mother |
morning |
sat |
eat |
saw |
cried |
Door |
herself |
lay |
brother |
once |
let |
Fell |
poor |
gold |
heart |
himself |
forest |
Together |
tree |
tailor |
everything |
father |
son |
Beautiful |
maiden |
whole |
|
daughter |
wife |
Brought |
girl |
golden |
|
Clusters
7-word clusters
there was once upon a time a |
6-word clusters
it was not long before the |
did not know what to do |
and when he came to the |
|
5-word clusters
it came to pass that |
went to the king and |
and was just going to |
in the middle of the |
if I could but shudder |
have I not reason to |
when they came to the |
there was once a poor |
knocked at the door and |
went into the forest and |
and at the same time |
when he got to the |
4-word clusters
for a long time |
and was about to |
he went into the |
and when he had |
and said to the |
he went to the |
in front of the |
he said to the |
but he did not |
there was once a |
in the midst of |
by the hand and |
I will give you |
for the third time |
on the ground and |
and as he was |
said the old woman |
he did not know |
and when she had |
that he could not |
the king said to |
I do not know |
that he was to |
the water of life |
if you do not |
and said if you |
at the same time |
out of the window |
and said to him |
they came to a |
as soon as he |
and went to the |
|
he was forced to |
as soon as the |
|
3-word clusters
out of the |
the king's daughter |
the king's son |
and when the |
that he had |
then said the |
there was a |
to him and |
and said I |
one of them |
if you will |
and in the |
and when they |
when he was |
and began to |
in the forest |
and I will |
and then he |
him and said |
in the evening |
when it was |
Please check the Methodology for Teaching Spoken Grammar and English course at Pilgrims website.
Please check the Creative Methodology for the Classroom course at Pilgrims website.
|