Verbs in the Written English of Chinese Learners: A Corpus-based Comparison between Non-native Speakers and Native Speakers by Xiaotian Guo A thesis submitted to the University of Birmingham for the degree of DOCTOR of PHILOSOPHY Supervisor: Professor Susan Hunston The Department of English The School of Humanities The University of Birmingham October 2006 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. iAbstract This thesis consists of ten chapters and its research methodology is a combination of quantitative and qualitative. Chapter One introduces the theme of the thesis, a demonstration of a corpus-based comparative approach in detecting the needs of the learners by looking for the similarities and disparities between the learner English (the COLEC corpus) and the NS English (the LOCNESS corpus). Chapter Two reviews the literature in relevant learner language studies and indicates the tasks of the research. The data and technology are introduced in Chapter Three. Chapter Four shows how two verb lemma lists can be made by using the Wordsmith Tools supported by other corpus and IT tools. How to make sense of the verb lemma lists is the focus of the second part of this chapter. Chapter Five deals with the individual forms of verbs and the findings suggest that there is less homogeneity in the learner English than the NS English. Chapter Six extends the research to verb–noun relationships in the learner English and the NS English and the result shows that the learners prioritise verbs over nouns. Chapter Seven studies the learners’ preferences in using the patterns of KEEP compared with those of the NSs, and finds that the learners have various problems in using this simple verb. In this chapter, too, my reservations about the traditional use of ‘overuse’ and ‘underuse’ are expressed and a finer classification system is suggested. Chapter Eight compares another frequently-occurring verb, TAKE, in the aspect of collocates and yields similar findings that the learners have problems even with such simple vocabulary. In Chapter Nine, the research findings from Chapter Four to Chapter Eight are revisited and discussed in relation to the theme of the thesis. The concluding chapter, Chapter Ten, summarises the previous chapters and envisages how learner language studies will develop in the coming few years. iiAcknowledgements First and foremost, I would like to thank my supervisor Professor Susan Hunston. She spent a large amount of time on my thesis and guided me from the design of the research to the last version of each chapter. As an experienced supervisor and teacher, she knows very well when to leave me free exploring for something useful and when to bring my attention back to things with value. She hardly tells me what to do, but offers suggestions, comments, and clues for further development, leaving me enough time to reflect and digest. Undoubtedly, the knowledge I obtained from her supervision will be the most valuable assets for my academic career. Secondly, my thanks should go to my beloved wife, Xiaorong (Wang). Actually, she sacrificed so much for my PhD study that I can hardly find appropriate words to express my gratitude. Different from many students who were funded by one means or another, my PhD was self-sponsored. Therefore, my finance became the dominating difficulty of my PhD study. In order to overcome this obstacle, she worked extremely hard and underwent great hardship and suffering. Even though she deserves a long break after the submission of my thesis, the unfortunate damage caused to her health may take the rest of her life to mend. In this sense, any words of thanks are incredibly weak and inadequate. Thirdly, my sincere thanks go to my colleagues and friends who have supported me in many different aspects. Without their help my thesis could not have been accomplished by now. The names to follow are only some of them (with all the given names first and surnames last to be consistent): Richard (Zhonghua) Xiao, Scott (Songlin) Piao, Wenzhong Li, Pernilla Danielsson, Seo-In Shin, and Frank (Maocheng) Liang for their help in IT and corpus technologies; Geoff Barnbrook, Antoinette Renouf, Wenzhong Li and Jinbang Du for their valuable comments and suggestions; Sylviane Granger, John Milton, Angela Hasselgren, Shichun Gui, Jianzhong Pu, and Michael Rundell for their articles, PhD theses or other information sent to me when I was in desperate need of them; Wenjin Zhao, Zequan Liu, Laiqi Zhang, Junhua Zhang and Yaodong Wang for their encouragement and support as friends. There are others who helped me in one way or anther, but I am afraid I cannot list them all here. iii Fourthly, I am grateful to my external examiner Mike Scott and internal examiner Martin Hewings for their valuable comments and advice and the chair to my viva Murray Knowles for his valuable time. In addition, I am deeply indebted to my sister who looked after my parents together with my brother while I could not fulfil my part of duty as a son. I also thank my wife’s family, Shulin and his family for their encouragement and support. My special thanks go to my daughter who accompanied me through the ups and downs of the years, especially when my wife had to work in another place. She also helped me with the proofreading of the Chinese pin-yin (the remaining errors still belong to me, of course). Furthermore, thanks are overdue to the Great Britain-China Education Trust and Sino-British Fellowship Trust for the £1000 fellowship which was sent to me on the very day of the Chinese Spring Festival of 2003. It was the only funding I gained throughout my PhD study. Even though such an amount was far from liberating me from the financial strains, the very act of providing such a grant justified my study and greatly encouraged me to go through the rest of the difficulties. It meant a lot to me. Last but not least, I must thank the University of Birmingham, especially the staff members of the Department of English, the School of Humanities, the Information Service, the Academic Office and the International Office for their unfailing and patient support. ivTable of Contents INTRODUCTION 1 1.1 THE THEME AND AIM OF THE RESEARCH 1 1.2 INTRODUCING COMPUTER LEARNER CORPUS RESEARCH 1 1.3 THE BACKGROUND TO THIS RESEARCH 2 1.4 THE IMPETUS OF THIS RESEARCH 3 1.5 THE FOCUS AND RESEARCH QUESTIONS OF THE RESEARCH 4 1.6 THE METHODOLOGY OF THE RESEARCH 4 1.7 TWO ASSUMPTIONS BEHIND THIS RESEARCH 5 1.8 THE STRUCTURE OF THE THESIS 6 CHAPTER TWO 8 A LITERATURE REVIEW OF LEARNER LANGUAGE STUDIES 8 2.1 EARLIER LEARNER LANGUAGE STUDIES 8 2.1.1 Error analysis recalled 8 2.1.2 Second language acquisition reviewed 11 2.1.3 Conclusion 11 2.2 COMPUTER LEARNER CORPORA: A NEW ERA 12 2.2.1 The International Corpus of Learner English 13 2.2.2 The Longman Learners’ Corpus 13 2.2.3 The Hong Kong University of Science and Technology Learner Corpus 14 2.2.4 The Chinese Learner English Corpus 14 2.2.5 Computer learner English studies as a ‘newborn baby’ of applied linguistics 15 2.3 TYPOLOGY OF CLC DATA 16 2.3.1 Synchronic vs. diachronic 16 2.3.2 Written vs. spoken 17 2.3.3 Un-annotated vs. annotated 18 2.4 CLEAN-TEXT POLICY AND ANNOTATION 18 2.5 LEARNER CORPUS ANNOTATION 21 2.6 CONTRASTIVE INTERLANGUAGE ANALYSIS AND ITS DATA PROCESSING APPROACHES 22 2.6.1 The notion of Contrastive Interlanguage Analysis (CIA) 22 v 2.6.2 Quantitative plus qualitative: approaching CLC data 22 2.7 LEARNER ENGLISH FEATURES 23 2.7.1 The informal and speechlike features of written learner English 24 2.7.2 Small vocabulary range, overuse of general vocabulary and the ‘teddy bear principle’ 28 2.7.3 More open-choice-principled than idiom-principled 30 2.7.4 Proficiency level and fossilised errors 31 2.7.5 The essential role of L1 in L2 production 33 2.7.6 A narrower range of senses in the use of vocabulary 34 2.8. APPLICATIONS OF RESEARCH RESULTS 35 2.8.1 TeleNex 35 2.8.2 CALL Tools 36 2.8.3 Dictionary compilation 37 2.8.4 Textbook enhancement 39 2.8.5. Data-driven learning 39 2.9 SOME LIMITATIONS OF PREVIOUS CLC RESEARCHES 40 2.9.1 Lack of systematic study of lexis 41 2.9.2 Lack of POS segmentation for multiple-POS words 41 2.9.3 Lack of semantic segmentisation for multiple-sensed words 41 2.9.4 Lack of in-depth exploration in learner language feature identification 42 2.9.5 No linguistic standards to scale the level of learner English 43 2.9.6 Some reservations about the use of ‘overuse’ and ‘underuse’ 45 2.9.7 Some reservations with error-tagging 45 2.10 CONCLUSION 49 CHAPTER THREE 50 THE DATA AND THE TOOLS 50 3.1 INTRODUCTION 50 3.2 THE DATA 50 3.2.1 The Learner Corpus – COLEC 50 3.2.2 The Native Speaker Corpus - LOCNESS 52 3.2.3 The back-up resources 56 vi3.2.3.1 The Bank of English 56 The Google search engine 57 3.3 THE WORDSMITHTOOLS 58 3.3.1. Concord 58 3.3.2 WordList 64 3.4 CONCLUSION 65 CHAPTER FOUR 66 MAKING AND MAKING SENSE OF TWO VERB LEMMA LISTS 66 4.1 INTRODUCTION 66 4.2 SOME ISSUES IN MAKING A VERB LEMMA LIST 67 4.2.1 The significance of making a verb lemma list 67 4.2.2 Some notions 67 4.2.3 The difficulties in making a verb lemma list 68 4.2.4 Two approaches to making a verb list 69 4.3 MAKING TWO VERB LEMMA LISTS 70 4.3.1 The lemma list archetype 70 4.3.2 Tagging the corpora 72 4.3.3 Editing the raw verb lemma lists 74 Dealing with small-frequency lemmas 75 Detecting wrongly used lemmas 75 4.4 MAKING SENSE OF THE TWO VERB LEMMA LISTS 76 4.4.1 A rational study 76 Some explorations in semantic theory applications in vocabulary teaching 76 Some pioneering work concerning the presentation of vocabulary to learners 81 Some explorations in verb classification based on syntactic constructions 82 Some explorations of the links between the known and unknown and between L1 and L2 84 4.4.2 Working out a design for the grouping of the verb lemmas of COLEC and LOCNESS 85 4.4.3 General principles of grouping the verb lemmas in COLEC and LOCNESS 86 Neighbouring concept groups (1) 92 vii4.4.3.2 Neighbouring concept groups (2) 96 Near antonymous groups 100 Six large family groups 105 Special concept groups 109 The miscellaneous groups 110 4.5 RESEARCH QUESTIONS REVISITED AND ANSWERED 114 4.6 CONCLUSION 118 CHAPTER FIVE 120 VERBS IN DIFFERENT FORMS COMPARED 120 5.1 INTRODUCTION 120 5.2 A GENERAL VIEW OF THE TOTAL FREQUENCY OF THE DIFFERENT FORMS OF VERBS 121 5.3 THE TOP 20 VERBS IN THEIR DIFFERENT FORMS IN LOCNESS AND COLEC 122 5.3.1 The top 20 verbs in their different forms in LOCNESS 123 5.3.2 The top 20 verbs in their different forms in COLEC 124 5.4 THE DIFFERENT FORMS OF THE TOP 20 VERBS COMPARED 126 5.4.1 The V-e forms of the top 20 verbs in the two corpora compared 127 5.4.2 The V-s forms of the top 20 verbs in the two corpora compared 128 5.4.3 The V-ing forms of the top 20 verbs in the two corpora compared 129 5.4.4 The V-ed forms of the top 20 verbs in the two corpora compared 131 5.4.5 The V-n forms of the top 20 verbs in the two corpora compared 132 5.4.6 Some summary remarks 133 5.5 EXAMINING THE MATCHED VERB FORM LISTS 136 5.5.1 Matching the V-i form lists 137 5.5.2 Matching the V-e form lists 138 5.5.3 Matching the V-s form list 139 5.5.4 Matching the V-ing form lists 140 5.5.5 Matching the V-ed form lists 142 5.5.6 Matching the V-n form lists 142 5.5.7 Some remarks in summary 145 5.6 SOME PEDAGOGICAL IMPLICATIONS 146 5.6.1 Significance for the writer of teaching materials 146 viii5.6.2 Significance for the teacher and the learner 147 5.6.3 Significance for learner English level evaluation 148 5.6.4 Implications for further corpus design, construction and comparison 148 5.6.5 Some problems revealed concerning CLC studies 149 5.7 CONCLUSION 150 CHAPTER SIX 151 BETWEEN VERBS AND NOUNS 151 6.1 INTRODUCTION 151 6.2 A GENERAL VIEW OF THE DISPARITY BETWEEN THE TWO CORPORA IN TERMS OF THE SELECTION BETWEEN VERBS AND NOUNS 152 6.3 A DETAILED LOOK AT THE DISPARITY BETWEEN THE TWO CORPORA IN TERMS OF SELECTION BETWEEN VERBS AND NOUNS 155 6.3.1 Between the verb use and the noun use within the same word form 156 6.3.2 Between verbs and nouns with different word forms 161 6.3.3 Between verbs and nouns in prepositional phrases 164 Between verbs and nouns in simple prepositions 166 Between verbs and nouns in complex prepositions 168 6.4 Discussions 171 6.5 Conclusion 173 CHAPTER SEVEN 174 USING PATTERNS AND PHRASES TO INTERPRET LEARNER ENGLISH 174 7.1 INTRODUCTION 174 7.2 INTRODUCING THE RATIO RELATIONSHIPS BETWEEN THE TWO CORPORA 175 7.3 DEFINING ‘PATTERN’ AND ‘PHRASE’ 179 7.4 LOOKING AT THE PATTERNS OF KEEP IN COLEC AND LOCNESS 180 7.4.1 Interpreting the frequency relationships between COLEC and LOCNESS 180 A large frequency in COLEC vs. a large frequency in LOCNESS 182 A large frequency in COLEC vs. a small frequency in LOCNESS 184 A small frequency in COLEC vs. a large frequency in LOCNESS 185 A small frequency in COLEC vs. a small frequency in LOCNESS 185 No frequency in COLEC vs. a small frequency in LOCNESS 186 [...]... – what native speakers of the language in question typically write or say (either in general or in a situation / in a certain text type) For language teaching, however, it is not only essential to know what native speakers typically say, but also what the typical difficulties of the learners of a certain language, or rather of certain groups of learners of this language, are 12 As seen above, there... In an article by Schachter and Celce-Murcia (1977: 442), a vivid depiction of the prevalence of EA is presented thus: A cursory glance at the titles and abstracts in recent issues of journals such as this one [TESOL Quarterly] (and others such as Language Learning and IRAL) would indicate that the advocates of EA have prevailed and that EA currently appears to be the “darling” of the 70’s However, EA... on a study of verb-related features of Chinese learner English The aim of the research is to demonstrate how a corpus linguistic approach to learner English studies can help us to find out the similarities and disparities between the written English of a group of non -native speakers (NNSs) and that of a group of native speakers (NSs) It is hoped that the identification of similarity and difference between. .. contribution of [the researchers] has provided them with any significantly new information It was a significant advance when EA researchers to have placed the learner language (rather 2 than L1 and L2) under examination A central consensus among EA researchers was that the learner’s errors, instead of being seen as negative, should be treated as positive The learner’s language was treated as “interlanguage”... KEEP, to investigate how the learners’ performance approximates that of the NS in terms of patterns (in line with Hunston and Francis 1999) Chapter Nine summarises the findings of the research chapters and discusses the advances this research has made in learner corpora studies The pedagogical implications of this research will be addressed in this chapter and some possible studies in the area of learner... which CLC has emerged Earlier research in learner language may be traced to EA It was generally maintained before the EA era, for instance in CA, that the learner’s errors are undesirable because they are a sign of non-acquisition Since the CA researchers found a relationship between the learner’s errors and the difference between the learner’s mother tongue (L1) and their second language (L2), they tried... similarities and disparities between the learner English and the NS English in the aspect of the width and depth of verbs? (By the width of verbs, I mean the size of vocabulary in verbs By the depth of verbs, I mean the range of senses of verbs and the many words which, while being other POS, have a verbal function.) 2) What kinds of techniques could be used to answer the previous research question? 3) What are... research into NS corpora contribute to the description of the native language alone and provide “no information as to the relative difficulty and learnability of particular features to be taught” and studies “based on the analysis of native -speakers behaviour fail to consider the productivity of particular features from the learner’s perspective” In the words of Granger (1998b: 7), native corpora cannot... depth of learners’ vocabulary knowledge, whereas actually both of the aspects “constitute equally important and vital components of the overall lexical ability” Bearing this in mind, this thesis explores both the breadth and the depth of the learners’ lexicon in the aspect of verbs In Chapters Four and Five, the research focuses on the breadth of the learners’ lexicon in verbs Chapters Seven and Eight then... is their most important aspect) they are indispensable to the learner himself, because we can regard the making of errors as a device the learner uses in order to learn It is a way the learner has of testing his hypothesis about the nature of the language he is learning In explaining the process of how EA scholars conduct error analysis, Ellis (1994: 68-69) has summarised it in four stages, i.e the . Verbs in the Written English of Chinese Learners: A Corpus-based Comparison between Non -native Speakers and Native Speakers by Xiaotian Guo A thesis submitted to the University of Birmingham. demonstration of a corpus-based comparative approach in detecting the needs of the learners by looking for the similarities and disparities between the learner English (the COLEC corpus) and the NS English. Louvain Corpus of Native English Essays NL native language NNS non -native speaker NS native speaker POS part of speech SL second language SLA Second Language Acquisition TL target language
