Waking up in the green ‘park’ campus of LU means that all you can see from your window are trees and the beautiful northern countryside. This immediately makes me want to get ready for another day full of new things to learn.
And new they are indeed.
After another hearty breakfast at the university ‘Market Place’ cafeteria, I head to a lecture by no other than Dr Andrew Hardie, the creator of CQPweb himself. The aim of the session is to explain the basic ideas underlying the calculation of statistical collocation.
What are collocations?
Hardie explains that a collocation is when one word occurs, the occurrence of another word is very likely to occur as well, and it is a textual phenomenon, reliably tested only in texts rather than it being a psychological phenomenon.
How do we detect collocations?
Problems in detection:
- Operationalisation – there are many ways to do this, which method should we choose?
- The concept of Nearby words. But what does that include exactly? What do we count as ‘nearby’? Is it a four-word-window? Same clause? Same paragraph? All in the same n-gram/cluster? or do we go by a grammatical relationship between the items?
- The next question to consider is whether we are going to rely on concordance analysis or statistics. And if we do use statistics, which statistics?
If we assume that we are going to identify collocations using statistical scores, we also need to assume some basic terminology to put boundaries on the span of the collocations. Dr Hardie mentions the following:
- Node: the word we are studying
- Collocate: a feature that occurs with the node
- Span: the distance before and after the node
- Window: all the spans put together
What do we count?
Collocation stats rely on counts of co-occurrence between node and collocate. Therefore, Hardie goes on to explain, we need to know the frequency of the node in the whole corpus (the size of the window) and the frequency of the collocate in the whole corpus and number of tokens.
Other topics covered in the session include various statistical tests, and their advantages and suitability in analysis.
If you would like to know more, a good book to read on the subject is Language is never, ever, ever, random by A. Kilgarriff.
I was saddened to part with Dr Hardie whose voice was very familiar to me, as I followed his online Youtube tutorials for his CQPweb. Luckily for me, the tea and coffee break provides the opportunity to continue the discussion over delicious pastry (not vegan, but the option of fresh fruit made up for it).
You may not believe this, but the next lecture is given by Dr. Stefan Gries an expert and a quantitative corpus linguist at the intersection of corpus linguistics, cognitive linguistics, and computational linguistics. The lecture deals with the question of what quantitative methods can contribute to corpus linguistic analysis?
Through presenting several case studies, Dr Gries discusses the implementation of CL and statistical methods to the study of linguistic features and language change.
Case study 1: Short term language change in new forms of communication in CMC
Looking at new spellings, for example new ‘Internet spelling’ in Spanish speakers chat writing e.g. mucho x muxo ; aqui x aki, some regularities were found amidst the irregular orthographic forms.
The study sampled dialects and a few different sites and genres, and the data was compared to a Spanish corpus. Speakers deleted certain letters and this deletions of certain word features is attributed to word frequency and sociolinguistics, particularly to the notion of coolness/ slang.
Data and findings
The study looked specifically at linguistic features such as:
- Word-initial character repetitions (uhh)
- Whole-word repetition
- Word-final character repetition (bestttt).
Dr Gries reports that there is a strong correlation between the number of repetition of a character and it’s particularity characteristic.
A fascinating and advanced lecture, Dr Gries’ expertise is made evident as he discusses further case studies.
After another refreshing break of cookies, tea and coffee, I am excited to sit in on Dr John Flowerdew’s lecture titled Corpus-based Approaches to Language Description for Academic Writing.
The principle of frequency as described by Leech (2011: 14) that “more frequent = more important to learn can scarcely be gainsaid as a general principle” suggests that just because a word is found to be frequent, doesn’t mean that it is important for learners to learn.
Units of analysis that we may consider are multiple:
– single words
– the idiom principle: the idea that language consists of recurrent patterns
-move analysis, a concept suggested by Swales, considers that in academic writing there are patterned moves that can be examined separately in a corpus: introduction, literature Review, methodology, conclusion.
The argument for Specialist Corpora
The language of different genres needs to be considered when designing a project.
- Professional corpus-based Academic Language Description
Professional here means research articles, language written not necessarily by native speakers but by academics. Biber (2006) wrote about ‘University Language’ and how different disciplines have different vocabulary. In fact, Biber’s study drew on Hyland’s (2000) study titled Disciplinary Discourses: Social Interactions in Academic Writing. Why do different disciplines use language differently? The culture of different fields and their language use is investigated using discourse analysis, discourse-based interviews and CL. e.g. sociology uses more verbs such as ‘suggest’, ‘claim’, while biology uses verbs such as ‘describe’.
Flowerdew and Forest (2014) discuss signalling nouns in English further in their study.e.g. of nouns (nominalisations): realization, role, causes. These nouns vary across disciplines in terms of frequency. Variation was also seen across genres (journals, textbooks, etc).
Later that evening, we attend the summer school social dinner which gives us the opportunity to discuss what we learnt today with scholars and get to know others’ areas of research. We enjoyed a lovely (Dr. Gablasova – here’s an example of a use of this word!) meal and got to speak to our lecturers. It is such a rare opportunity that I don’t quite realise how fast time has gone and another day is over.