|Vocabulary size||Text coverage|
|Vocabulary size||% coverage||Density of unknown words|
2000 + proper nouns
| 1 in every 10|
1 in every 16
1 in every 25
1 in every 67
|Researchers||1st 1000||2nd 1000||Total|
a long economics text
a range of texts
What is also interesting is the number of different words (word types)
from the second 1000 that actually occurred in a mixture of different kinds
of texts compared with more homogeneous texts. In any one text, such as
a novel or a textbook, around 400 to 550 of the second 1000 words from the
GSL actually occurred. When a mixture of texts was looked at however around
700 to 800 of the second 1000 words occurred (Hirsh and Nation, 1992; Sutarsyah,
Nation and Kennedy, 1994).
The second 1000 words behave in this way because they are lower frequency words than the first 1000 words and have a narrower range of occurrence. That is their occurrence is more closely related to the topic or subject area of a text than the wide ranging more general purpose words in the first 1000. But given a range of topics and genres, and enough texts, the second 1000 words are more generally useful than other lists of words.
After the 2000 high frequency words of the GSL, what vocabulary does a second language learner need? The answer to this question depends on what the language learner intends to use English for. If the learner has no special academic purpose then the learner should work on the strategies for dealing with low frequency words. If however the learner intends to go on to academic study in upper high school or at university, then there is a clear need for general academic vocabulary. This can be found in the 836 word list called the University Word List (UWL) (Xue and Nation, 1984; Nation, 1990).
The UWL consists of words that are not in the first 2000 words of the GSL but which are frequent and of wide range in academic texts. Wide range means that the words occur not just in one or two disciplines like economics or mathematics, but occur across a wide range of disciplines. The word frustrate for example which is in the UWL can be found in many different disciplines. The UWL is really a compilation from four separate studies, Lynn (1973), Ghadessy (1979), Campion and Elley (1971), and Praninskas (1972). Here are some items from it.
accompany formulate index major objective
biology genuine indicate maintain occur
comply hemisphere individual maximum passive
deficient homogeneous job modify persist
edit identify labour negative quote
feasible ignore locate notion random
The value of the UWL can be seen when we look at the coverage of academic text that it provides.
an economics text
|Source||1st 2000 (GSL)||UWL||Total|
Note the low coverage the UWL has of fiction. Newspapers and magazines
which are more formal make use of more of the UWL. Very formal academic
text makes the greatest use of the UWL. The UWL is thus a word list for
learners with specific purposes namely academic reading. The purpose behind
the setting up of the UWL is to create a list of high frequency words for
learners with academic purposes, so that these words can be taught and directly
studied in the same way as the words from the GSL can.
Word frequency lists
The major theme of this paper has been that we need to have clear sensible goals for vocabulary learning. Frequency information provides a rational basis for making sure that learners get the best return for their vocabulary learning effort. Vocabulary frequency lists which take account of range have an important role to play in curriculum design and in setting learning goals.
This does not necessarily mean that learners must be provided with large vocabulary lists as the major source of their vocabulary learning. It does mean however that course designers should have lists to refer to when they consider the vocabulary component of a language course, and teachers need to have reference lists to judge whether a particular word deserves attention or not, and whether a text is suitable for a class.
The availability of powerful computers and very large corpora now make the development of such lists a much easier job than it was when Thorndike and Lorge (1944) and their colleagues manually counted 18,000,000 running words. The making of a frequency list however is not simply a mechanical task, and judgements based on well established criteria need to be made. The following list suggests several of the factors that would need to be considered in the development of a resource list of high frequency words.
1Representativeness The corpora that the list is based on should adequately represent the wide range of uses of language. In the past, most word lists have been based on written corpora. There needs to be a substantial spoken corpus involved in the development of a general service list. The spoken and written corpora used should also cover a range of representative text types. Biber's (1990) corpus studies have shown how particular language features cluster in particular text types. The corpora used should contain a wide range of useful types so that the biases of a particular text type do not unduly influence the resulting list.
2Frequency and range Most frequency studies have given recognition to the importance of range of occurrence. A word should not become part of a general service list because it occurs frequently. It should occur frequently across a wide range of texts. This does not mean that its frequency has to be roughly the same across the different texts, but means that it should occur in some form or other in most of the different texts or groupings of texts.
3Word families The development of a general service list needs to make use of a sensible set of criteria regarding what forms and uses are counted as being members of the same family. Should governor be counted as part of the word family represented by govern? When making this decision, the purposes of the list and the learners for which it is intended need to be considered. As well as basing the decision on features such as regularity, productivity, and frequency (Bauer and Nation, 1993), the likelihood of learners seeing these relationships needs to be considered (Nagy and Anderson, 1984).
4Idioms and set expressions Some items larger than a word behave like high frequency words. That is, they occur frequently as a unit (Good morning, Never mind), and their meaning is not clear from the meaning of the parts (at once, set out). If the frequency of such items is high enough to get them into a general service list in direct competition with single words, then perhaps they should be there. Certainly the arguments for idioms are strong, whereas set expressions could be included under one of their constituent words (but see Nagy, this volume).
5Range of information To be of full use in course design, a list of high frequency words would need to include the following information for each word - the forms and parts of speech included in a word family, frequency, the underlying meaning of the word, variations of meaning and collocations and the relative frequency of these meanings and uses, and restrictions on the use of the word with regard to politeness, geographical distribution etc. Some dictionaries, notably the revised edition of the COBUILD dictionary, include much of this information, but still do not go far enough. This variety of information needs to be set out in a way that is readily accessible to teachers and learners.
6Other criteria West (1953: ix) found that frequency and range alone were not sufficient criteria for deciding what goes into a word list designed for teaching purposes. West made use of ease or difficulty of learning (it is easier to learn another related meaning for a known word than to learn another word), necessity (words that express ideas that cannot be expressed through other words), cover (it is not efficient to be able to express the same idea in different ways. It is more efficient to learn a word that covers a quite different idea), stylistic level and emotional words (West saw second language learners as initially needing neutral vocabulary). One of the many interesting findings of the COBUILD project was that different forms of a word often behave in different ways, taking their own set of collocates and expressing different shades of meaning (Sinclair, 1991). Careful consideration would need to be given to these and other criteria in the final stages of making a general service list.
With a continuing emphasis on communication in language teaching there is a tendency to give less attention to the selection and checking of language forms in course design. Now that the benefits of form focused instruction are being positively reassessed, we may see a change in attitude towards vocabulary lists and frequency studies. The benefits of giving attention to principles of selection and gradation in teaching however remain important no matter what approach to teaching is being used. A goal of this review of the findings of research on vocabulary size and frequency is to show that this information can result in considerable benefits for both teachers and learners.
Bauer, L. and I.S.P. Nation. 1993. Word families. International Journal of Lexicography 6, 4: 253-279.
Biber, D. 1990. A typology of English texts. Linguistics 27: 3-43.
Campion, M.E. and W.B. Elley. 1971. An Academic Vocabulary List. Wellington: NZCER.
Carroll, J.B., P. Davies and B. Richman. 1971. The American Heritage Word Frequency Book. New York: American Heritage Publishing Co.
Carter, R. and M. McCarthy (eds.) 1988. Vocabulary and Language Teaching. London: Longman.
D'Anna, C.A., E.B. Zechmeister and J.W. Hall. 1991. Toward a meaningful definition of vocabulary size. Journal of Reading Behavior 23: 109-122.
DeRocher, J.E. 1973. The Counting of Words: A Review of the History, Techniques and Theory of Word Counts with Annotated Bibliography. New York: Syracuse University Research Corp.
Dupuy, H.J. 1974. The Rationale, Development and Standardization of a Basic Word Vocabulary Test. Washington, D.C.: U.S. Government Printing Office.
Ellis, R. 1990. Instructed Second Language Acquisition. London: Blackwell.
Engels, L.K. 1968. The fallacy of word counts. IRAL 6: 213-231.
Fox, J. and J. Mahood. l982. Lexicons and the ELT materials writer. English Language Teaching Journal 36, 2: l25-l29.
Francis, W.N. and H. Kucera. 1982. Frequency Analysis of English Usage. Boston: Houghton Mifflin Company.
Fries, C.C. and A.A. Traver. 1960. English Word Lists. Ann Arbor: George Wahr.
Ghadessy, M. 1979. Frequency counts, word lists, and materials preparation: a new approach. English Teaching Forum 17, 1:24-27.
Goulden, R., P. Nation and J. Read. 1990. How large can a receptive vocabulary be? Applied Linguistics 11: 341-363.
Hazenburg, S. and J. Hulstijn. 1996. Defining a minimal receptive second-language vocabulary for non-native university students: An empirical investigation. Applied Linguistics 17, 1: in press.
Hirsh, D. 1992. The vocabulary demands and vocabulary learning opportunities in short novels. Unpublished MA thesis, Victoria University of Wellington, New Zealand.
Hirsh, D. and P. Nation. 1992. What vocabulary size is needed to read unsimplified texts for pleasure? Reading in a Foreign Language 8, 2: 689-696.
Hwang, K. 1989. Reading newspapers for the improvement of vocabulary and reading skills. Unpublished MA thesis, Victoria University of Wellington, New Zealand.
Hwang, K. and P. Nation. 1989. Reducing the vocabulary load and encouraging vocabulary learning through reading newspapers. Reading in a Foreign Language 6, 1: 323-35.
Hwang, K. and I.S.P. Nation. 1995. Where would general service vocabulary stop and special purposes vocabulary begin? System 23, 1: 35-41.
Jamieson, P. 1976. The acquisition of English as a second language by young Tokelau children living in New Zealand. Unpublished Ph.D. thesis, Victoria University of Wellington.
Joe, A., P. Nation, and J. Newton. 1996. Speaking activities and vocabulary learning. English Teaching Forum 34, 1: in press.
Judd, E. L. l978. Vocabulary teaching and TESOL: a need for re-evaluation of existing assumptions. TESOL Quarterly l2, 1: 7l-76.
Kucera, H. 1982. The mathematics of language. In The American Heritage Dictionary. Boston: Houghton Mifflin. 2nd ed.
Laufer, B. 1989. What percentage of text-lexis is essential for comprehension? In C. Lauren and M. Nordman (eds.), Special Language: From Humans Thinking to Thinking Machines. Clevedon: Multilingual Matters.
Liu Na and I.S.P. Nation. 1985. Factors affecting guessing vocabulary in context. RELC Journal 16, 1: 33-42.
Long, M. 1988. Instructed interlanguage development. In L. Beebe (ed.) Issues in Second Language Acquisition. New York: Newbury House.
Lorge, I. and J. Chall. l963. Estimating the size of vocabularies of children and adults: an analysis of methodological issues. Journal of Experimental Education 32, 2: l47-l57.
Lynn, R.E. 1973. Preparing word lists: a suggested method. RELC Journal 4, 1: 25-32.
McKeown, M.G. and M.E. Curtis (eds.) 1987. The Nature of Vocabulary Acquisition. Hillsdale, N.J.: Erlbaum.
Meara, P. and G. Jones. 1990. The Eurocentres Vocabulary Size Tests. 10KA. Zurich: Eurocentres.
McIntosh, X., M. Halliday and P. Strevens. 1961.
Milton, J. and P. M. Meara. 1995. How periods abroad affect vocabulary growth in a foreign language. ITL 107-108: 17-34.
Nagy, W.E. and R.C. Anderson l984. How many words are there in printed school English? Reading Research Quarterly l9: 304-330
Nagy, W.E., P. Herman, and R.C. Anderson. l985. Learning words from context. Reading Research Quarterly 20: 233-253.
Nation, I.S.P. l982. Beginning to learn foreign vocabulary: a review of the research. RELC Journal l3: 14-36.
Nation, I.S.P. 1990. Teaching and Learning Vocabulary. New York: Newbury House.
Nation, I.S.P. 1993a. Using dictionaries to estimate vocabulary size: essential, but rarely followed, procedures. Language Testing 10, 1: 27-40.
Nation, I.S.P. 1993b. Vocabulary size, growth and use. In The Bilingual Lexicon. ed. R. Schreuder and B. Weltens, Amsterdam/Philadelphia: John Benjamins. pp. 115-134.
Nation, I. S. P. forthcoming. Teaching Listening and Speaking.
Paivio, A. and A. Desrochers. 1981. Mnemonic techniques in second language learning. Journal of Educational Psychology. 73, 6: 780-795.
Praninskas, J. 1972. American University Word List. London: Longman.
Pressley, M., J.R. Levin and H. Delaney. l982. The mnemonic keyword method. Review of Educational Research 52: 6l-9l.
Richards, J.C. l974. Word lists: problems and prospects. RELC Journal 5: 69-84.
Rosenweig, M.R. and D. McNeill. l962. Inaccuracies in the semantic count of Lorge and Thorndike. American Journal of Psychology 75: 3l6-3l9.
Schmitt, N. and D. Schmitt. 1995. Vocabulary notebooks: theoretical underpinnings and practical suggestions. English Language Teaching Journal 49, 2: 133-143.
Schonell, F.J., I.G. Meddleton and B.A. Shaw. l956. A Study of the Oral Vocabulary of Adults. Brisbane: University of Queensland Press.
Seashore, R.H. and L.D. Eckerson. l940. The measurement of individual differences in general English vocabularies. Journal of Educational Psychology 3l: l4-38.
Sinclair, J. 1991. Corpus, Concordance, Collocation Oxford: Oxford University Press.
Sternberg, R.J. 1987. Most vocabulary is learned from context. In McKeown and Curtis, 89 105.
Sutarsyah, C. 1993. The Vocabulary of Economics and Academic English. Unpublished MA thesis, Victoria University of Wellington, New Zealand.
Sutarsyah, C., I.S.P. Nation and G. Kennedy. 1994. How useful is EAP vocabulary for ESP? A corpus based case study. RELC Journal 25, 2: 34-50.
Thorndike, E.L. and I. Lorge. l944. The Teacher's Word Book of 30,000 Words. Teachers College, Columbia University.
Thorndike, E.L. l924. The vocabularies of school pupils. In J. Carelton Bell (ed.) Contributions to Education. New York: World Book Co.
Webster's Third New International Dictionary. 1963. Massachusetts: G. & C. Merriam Co.
West, Michael l953. A General Service List of English Words. London: Longman, Green & Co.
Xue Guoyi and I.S.P. Nation. 1984. A university word list. Language Learning and Communication 3: 215-229.
Notre Dame Seishin University, 2-16-9 Ifuku-cho, Okayama, Japan 700
Tel 086 252 1155 Fax 255 7663 Home 086 223 0341
Return to Main menu of papers