A Comparison of the Receptive and Productive Vocabulary Sizes of some Second Language Learners.

ROB WARING

Appeared in Immaculata; The occasional papers at Notre Dame Seishin University. 1997



Introduction.

Vocabulary size has been of interest to researchers since Hellenic times (DeRocher et al, 1973). Over the years there have been many vocabulary size studies trying to find out something about the vocabulary size of people. Many of the studies have focussed on the native speakers of a language (Goulden, Nation and Read, 1990) and many on second language learners (See, among others, Takala for a study of Finnish children learning EFL, 1985; Jaatinen, and Mankkinen, 1993; Gui for a study of Chinese learners of English, 1982). Some researchers have tested productive vocabulary (Eringa, 1974; Holden, 1890 for example) and some have tested aspects of receptive vocabulary see Groot and Hoekstra (1981); Seashore and Eckerson (1940); Meara and Jones (1990); Milton and Meara (1995); Hartmann (1946); Diack (1975); and the following cited in Teichroew (1982) - Marton (1977); de Greve and Passell (1973) and Chamberlain (1965). Other studies have looked at vocabulary size for specific purposes (see Hirsh and Nation, 1992; Hwang and Nation, 1995).

Interestingly, studies that compare receptive and productive vocabulary size of individuals are the most difficult to find and are somewhat dated (Eringa, 1974; Moesberger-Verhagen, 1980; and Morgan and Oberdeck, 1930). It has been said that the difference between receptive and productive vocabulary is 'of little significance' (Teichroew, 1982 p 19). Her conclusion comes from a study by Moesberger-Verhagen (1980, cited in Teichroew) who found that 29% of the 3200 words in the Le Français Fondamental corpus of high frequency French vocabulary were known productively, but 42% were known receptively. Similarly, Eringa (1974) estimated that English high school learners with 6 years of French had a 1500-2000 word productive vocabulary but had a 5000 receptive vocabulary. Morgan and Oberdeck (1930, cited in Teichroew, 1982) studied vocabulary growth receptively and productively. They found that in the early stages of learning receptive vocabulary was double that of the productive and as language proficiency increases so does the gap, however at more advanced levels it diminishes in favour of the productive. They state that "for a time the passive vocabulary develops faster than the active, but later this relatively slow development of the active is partly compensated for" (p. 218). All these studies compared how much was learned after a certain amount of exposure to an L2.

Vocabulary frequency profile studies are rare (as compared to vocabulary size studies). A vocabulary frequency profile measures the amount of words known at various frequency bands as a snap shot at one point in a learner's progression to higher levels of language proficiency. It is not intended to provide a size figure, but to generate information to see how a learner's vocabulary is distributed. This kind of measure, from a word frequency perspective, is useful to teachers and learners to enable them to determine which words are well known and which need the most attention. It was with this in mind that the study below was conducted.

THE STUDY

Intention
The intention of the study was to find out about the nature of the receptive and productive vocabulary frequency profiles of second language learners. The intention was not to find out how many words they know, but the difference between the two or 'how much larger is a person's receptive vocabulary than their productive in percentage terms?' Two tests were administered which tested the same words at varying frequency levels receptively and productively.

Subjects.
76 female Japanese learners of English were subjects in this study. They form two complete classes at a private women's university in Western Japan and are all English majors in their first or second year of a 4 year degree. The proficiency level of these students ranges from elementary to upper intermediate with the average being upper elementary. The study was a within-subjects design measuring an individual's own receptive and productive vocabulary and thus there was no need to test for exact proficiencies or prior knowledge.

Method.
The instruments.
The tests selected for this experiment were based on Nation's (1990) Receptive Vocabulary Levels test and the Laufer and Nation (1995) Productive Vocabulary Levels test. These two tests contain the same words tested receptively and productively. Testing the same words was necessary in order to avoid an effect for the words chosen. The tests measure knowledge of vocabulary at the 2000, 3000, 5000 and 10,000 word bands as well as the Xue and Nation's (1984) University Word List (UWL) levels test. (The 2000 word band would test words 1 - 2000, while the 3000 band would test the 2001 - 3000 most frequently occurring words in English.) The test designs are outlined below. The word bands correspond to the 2000, 3000 and so on most frequent words in English, are loosely based on West's (1953) General Service List and the Thorndike and Lorge (1944) list.

Test design.
This levels test design had been used diagnostically on similar learners at the same university the previous academic year. However, the data did not give much insight into the vocabulary profile of the learners as the scores were biased towards the higher frequency words with zero scores being very common with the UWL and 10,000 word levels. It was decided that a general vocabulary frequency profile was needed, not an academic one, so the UWL section of both tests was dropped. Also as the 10,000 word level was unlikely to generate a score of more than 2 to 3 percent with these learners on the receptive test due to its difficulty and the danger of intimidating the learners at the beginning of a course, it was also dropped. A new 1000 word level test was made to test vocabulary knowledge more accurately at the lower levels as the test does not discriminate between the most frequent vocabulary very well (Schouten-van-Parreren, 1996: p64).

The 3000 and 5000 word bands were taken from the above mentioned sources verbatim with one exception. Only 17 items appeared at the 5000 word band on the published productive test (by error it seems) instead of 18 on the receptive, so a new productive question was formulated from the appropriate missing word on the corresponding receptive test. The items on the original 2000 word tests were all used in the revised tests, however some come from the 1 1000 and some from the 1001-2000 most frequent word bands. The items from the 1001-2000 word band test were retained on the revised 2000 word band test and the items on the original 2000 word band test from the 1-1000 word level were moved to the new 1000 word band test. As each band has 18 items, additional items were randomly selected from the relevant word band to make up the full 18 words per band. From the new words, the 1000 and 2000 band receptive and productive tests were made using the same format the originals.

Below is an example set from the 1-1000 word level receptive test (the full word list appears in the appendix).

1 agree
2 poster _______ normal
3 colouring _______ have the same opinion
4 second _______ something put in food
5 average
6 result


In the receptive test, the subject had to choose which of the six words matched the three meanings given. Each of the 6 sets tested 3 words (the meanings on the right which match the words on the left) making 18 items at one band. Thus the total was 72 items tested receptively with the exact same words being tested productively on the other test making a total of 144 items to be answered on both tests. The receptive test was designed to be sensitive to any vocabulary knowledge held by the learner, therefore each word in the test was distinctly different within each set of words being tested. An 'insensitive' test would have had words that were very similar but different (for example 'annoyed' and 'irritated' in the same set) so the learner would have to know more about the word to select the correct item than in the current test.

The productive test had a sentence with a word missing like a normal gap fill or 'C-Test' which the subject must complete correctly. 3 examples follow (the full word list appears in the appendix and the unmodified version of the test appears in Laufer and Nation, 1995).

They always seem to ag_______ about what to do on the weekend.
Scientists are worried about the amount of colo_______ in our food nowadays.
He's not a very bright child, he's about ave_______.

The beginning half of the word was given for two reasons. Firstly, to make sure that the learner did not produce an alternative that might fit the context. For example, if 'kettle' was being tested, a sentence such as 'She put the _____ on the hot stove' could generate 'pan', 'pot' or any similar item and must be judged correct. However, in order to restrict the subject to produce the desired item, the sentence 'She put the ket___ on the hot stove' would be used. It could only generate 'kettle' as the context restricts the learner to that single item. Other words starting with 'ket' would probably not be suitable such as 'ketchup' which is not normally put on a hot stove. Secondly, the test was designed to be insensitive to any similar word or a word from the word family. For example if 'accommodation' was the target word but 'accommodating' was supplied, as it is the wrong word it would be marked as incorrect even though the learner probably knew something about the word family that contains 'accommodate'. This makes the productive test more demanding than the receptive one.

In order for the two tests to be equivalent in format, the same format could be used in both tests. That is, the productive test would have 3 words and 6 meanings whereas the receptive test would have 6 words and 3 meanings. However as the 3 meanings and 3 words still appear in both tests, it creates problems in that there is a very strong possibility that there would be transfer of learning from one test to another. In the present study it was decided to keep the original design of the gap fill test because it allows for a wider range of productive possibilities and thus reduces the test to test learning effects.

Procedure.
The tests were administered as a part of a normal class early on in an academic year with the diagnostic intention of finding out about the distribution of vocabulary knowledge both within subjects and for the class as a whole. It was intended that these data would inform syllabus planning and the learners of their vocabulary frequency profile. All the subjects were familiar with the test designs as they had been used before in similar vocabulary tests the classes had taken. Post test written reports indicated that in general the learners found the results to be of use to them. The productive test was administered first in order to ensure that the learners had not seen the target words that appeared in the receptive test. A distractor exercise (non vocabulary based) was given between the two tests. The researcher waited until all subjects had finished thus unlimited time was given for both tests. Those who finished early were given other work to do.

Scoring
.
All the tests were scored by the researcher, but were checked by another native speaker of English. The receptive test was scored as one point for each correct answer. The productive test was scored differently so as to allow for a degree of knowledge to be shown. This is because the productive test was more difficult and the researcher wanted to allow a more sensitive measure for scoring to be used to compensate for this. For example, if the test was scored strictly then spelling mistakes would lead to a zero score although the subject knew a lot about the word. The guidelines used for spelling mistakes were as follows. If the subject misspelled the word wrongly by one letter and the overall shape of the word was similar to the target word, half a point was awarded. If the spelling was too similar to another word in English it was not accepted and a zero was given as it maybe that the subject had misspelled the wrong word. For example 'apron' could be spelled 'aplon' or 'appron' to score half a point, but 'aprane', 'aplan' or 'applon' would get zero. Similarly where a plural was needed but the 's' was omitted or a present tense was given rather then a past, half a point was awarded. Full lists of words given half points were kept and referred to as the marking was done to ensure consistency. The second reason for the different scoring method was that the productive test scores would have been lower especially at the 5000 word band. This would lead to many very low scores which could create a floor effect and make the receptive and productive ratio meaningless as a few extra (or less) points can severely alter the ratio.

DATA

From the data in Table 1 their receptive vocabulary is clearly larger at all frequency bands. In fact, analysis of all 76 subjects' tests revealed that not even one subject scored higher on the productive test than the corresponding receptive test at any of the frequency bands. That is, receptive vocabulary was always larger than productive vocabulary for all subjects at each frequency band.

Table 1: Mean scores (max 18 per band) for the 4 vocabulary frequency band tests taken productively and receptively. (Standard deviations are in parenthesis).

Productive as Receptive times
Level Receptive Productive % of Receptive larger than Productive

1000 band 16.01 (1.44) 10.32 (3.19) 64.5% 158%
2000 band 14.71 (1.71) 8.20 (2.23) 55.7% 179%
3000 band 14.62 (2.37) 4.61 (2.12) 31.5% 317%
5000 band 9.62 (2.61) 1.49 (1.39) 15.5% 645%-------------------------------------------------------------------------------------------------------

Total 54.96 (5.85) 24.62 (6.87) 44.8% 223%

A 2 (active Vs passive) X 1 (frequency band) repeated measures ANOVAs for each frequency band were calculated. All results were significant with the 1000 word band showing F = 279.9 (75, 1) p .0001, the 2000 word band was F = 648.2 (75, 1) p .0001; the 3000 word band was F = 1900.9 (75, 1) p .0001 and the 5000 word band was F = 904.4 (75, 1) p .0001. From these data it would be a mistake to say that a learner's receptive vocabulary is 2.23 times (223%) larger than her productive vocabulary as this is too simplistic a statement. This is because it depends on which vocabulary you are talking about. A learner's receptive vocabulary is larger than her productive at all bands but by less at the higher frequency end than at the lower frequency end. In other words, if a word from the highest frequency band is known receptively, then there is a good chance (64.5%) it will also be known productively. However, there is a small chance (15.5%) that if an infrequent word is known receptively it will also be known productively.

The data presented above show the profile for all subjects. However, not all subjects were equal in vocabulary size. Further analysis of these data was undertaken to see if the same profile exists for learners with different vocabulary scores. The subjects were divided into three proficiency levels based on their total scores for both tests. The score for both tests was added making a total score and the 76 subjects were divided into 3 categories of lower, middle and upper vocabulary size groups making 25 or 26 subjects in each group based on these scores. The lowest scoring group was termed 'lower', the middle scoring group 'middle' and so on. The intention was to see if the general results (a widening in the gap between receptive and productive vocabulary at the extreme of your vocabulary) hold for subjects with different vocabulary sizes. That is, does a person with a smaller total vocabulary measured receptively and productively still demonstrate the same differential distribution? Or to put it another way, is her receptive vocabulary size still 6 times that of her productive? Also does this 6:1 ratio of receptive to productive hold for people with a larger vocabulary? The results of the analysis are in table 2. The results from table 2 are graphically summarized in Figure 1.

Table 2: The mean scores (max 18 per band) receptively and productively for 3 groups having different vocabulary sizes by word frequency band. (Standard deviations are in parenthesis.)

Lower group (n=25) Productive as Receptive times
Band Receptive Productive % of Receptive larger than Productive
1000 band 15.28 (1.72) 7.36 (2.37) 48% 207%
2000 band 13.36 (1.38) 6.24 (1.50) 47% 214%
3000 band 12.72 (2.25) 2.44 (1.04) 19% 521%
5000 band 8.36 (2.22) 1.00 (0.92) 12% 836%

-------------------------------------------------------------------------------------------------------

Total 49.72 (4.36) 17.04 (3.04) 34% 292%

Middle group (n=25)
1000 band 15.88 (0.93) 10.58 (2.03) 67% 150%
2000 band 14.76 (1.54) 8.50 (1.27) 58% 174%
3000 band 14.60 (1.80) 4.84 (1.30) 33% 301%
5000 band 8.84 (2.43) 1.14 (0.84) 13% 775%-------------------------------------------------------------------------------------------------------

Total 54.08 (2.90) 25.06 (2.34) 46% 216%

Upper group (n=26)
1000 band 16.85 (1.12) 12.92 (2.28) 77% 130%
2000 band 15.96 (1.11) 9.79 (2.17) 61% 160%
3000 band 16.46 (1.33) 6.48 (1.55) 39% 254%
5000 band 11.58 (1.98) 2.29 (1.80) 20% 505%-------------------------------------------------------------------------------------------------------

Total 60.84 (3.50) 31.48 (4.58) 52% 193%

From Table 2 and Figure 1, we can see that as a second language learner's vocabulary size increases, the differential size of her receptive and productive vocabulary increases slightly but still remains high. The profiles are roughly the same across the three groups with the upper group demonstrating less of a difference at all word bands. The lower group demonstrated consistently higher difference at each band.

DISCUSSION

Why would receptive vocabulary be larger than productive?

The data presented here give a clear indication that the receptive vocabulary is larger than the productive. However it is not clear why this may be so. The data show that words are easier to Figure 1: Productive Vocabulary as a percentage of Receptive Vocabulary by groups and by frequency band. (A score of 100% represents equality between the two scores.)

access receptively than to use a word productively. A common, and reasonable, explanation is that a learner has more to do when using a word - that is he has to not only know the meaning but also the spelling or pronunciation. Crow (1986) mentions at some length the differences between what it takes to know a word receptively or productively stating that 'a much larger body of knowledge is required' for the productive (p. 242). This would mean that a person would have more receptive knowledge than productive.

Why is there a slope? Why is it this shape?


For each word one learns there are a multitude of aspects to master (Weshe and Paribakht, 1993). For reception one needs the ability to recognize a word in its written and spoken forms and to recall the meaning when met. Productively a lot more is needed - knowledge of the spelling and pronunciation; the various shades of meaning have to be distinguished to a higher degree that for receptive control; an understanding of the meaning; the ability to use it correctly with it different collocates; an ability to know how to mesh the word with the grammar rules and the sociolinguistic aspects of language production; and so on (Crow, 1986). It seems then that there is a much heavier demand on a learner when faced to produce. Exactly how much greater may be a reflection of the results found here. From this perspective it would seem that one's productive ability is later acquired than the receptive for any given word. Therefore the slope moves vertically and to the right as one's vocabulary size increases as shown by Figure 1.

Hypothesizing from the data.

Advanced and beginner learners vocabulary profiles were not measured in this study. However, from these data it seems safe to assume that a learner with a very sizable vocabulary who scored 100% on both tests at the 1000 and 2000 bands would still have something of a similar sloping profile for her higher bands at say 10,000 or above albeit possibly flatter than for a beginner. Similarly, a beginning learner with a small vocabulary may have a profile starting at the bottom end of the Y axis and hitting zero by the middle of the X axis on Figure 1 or earlier with the line being somewhat steeper but having the same sloping profile. It seems then as one's vocabulary size increases this slope moves upwards and to the right. However, it is not clear whether this line flattens or becomes more vertical as proficiency increases.

PROBLEMS

Further work must be done in this area especially with learners outside the proficiency levels tested here. The upper group had demonstrated an almost complete knowledge of the 1000 2000 word bands and had nearly reached their ceiling of 18 points at these bands, maybe the upper groups were not stretched enough at the 5000 word band receptively and a higher frequency band may be needed to show more clearly what their profile might be. There also needs to be a study of beginner learners. If learners with very low vocabularies are to be tested using the format in these tests, the current bands may not be suitable. One of the design features of these tests is that the words being tested must come from a lower or the same frequency band. Given the above, if a new band at say the 500 word band level was introduced, it would be difficult to test this band of vocabulary using only vocabulary from this band and therefore a L1 / L2 test may be more suitable.

Also in this study, the receptive test was decontextualized and given arguments put forward by Read and Nation (1986), it might be best to also confirm these findings in a more contextualized way. Further work must also be done to check these findings with a more sensitive test battery which allows the gathering of deeper knowledge of these items. Such a test would include knowledge about the shape of the word, its collocates, similar words and so on.

A weakness in this study is the use of different scoring methods for the tests. In the researcher's opinion this was necessary as it would seem that a strictly marked productive test would yield lower productive scores and falsely widen the differences found here. Even a small incorrect part of knowledge of the word must lead to a rejection and a zero score . However, a much more sensitive set of tests (what does the word look like, is it long or short, what is the approximate spelling, what is a similar word?) may not have generated scores near the receptive.

CONCLUSION.

We have seen that comparisons of receptive and productive vocabulary size are meaningless if presented as a fixed percentage as was stated in earlier studies (Eringa, 1974; Michel, 1972; Chamberlain, 1965). The reason may be that although the words are known receptively, they are just not used (Blum and Levenston, 1978) and that comprehension precedes production, or that simply production is more difficult than reception.


References


Blum, S. and E. A. Levenston. 1978. Lexical simplification in second language acquisition. Studies in Second Language Acquisition, 2, 2: 43-64.

Chamberlain, A. 1965. Learning a passive vocabulary. International conference on Modern Language Teaching.

Crow, J. 1986. Receptive Vocabulary Acquisition for Reading Comprehension. Modern Language Journal. 70, 3: 242 - 250.

D'Anna. C. A. and E. B. Zechmeister. 1991. Toward a meaningful definition of vocabulary size.Journal of Reading Behaviour: A Journal of Literacy. 23, 1: 109-122.

de Greve M. and F. van Passel. 1973. Linguistique et Enseignement des langues etrangeres. Langue et Culture. Ed. Labor, Nathan.

DeRocher, J.E.et al. 1973. The Counting of Words: A Review of the History, Techniques and Theory of Word Counts with Annotated Bibliography. ERIC document.

Ellegard, A. 1960. Estimating vocabulary size.Word. 16, 2: 19-244.

Eringa, D. 1974. Enseigner, c'est choisir: vocabulaire-verwering. Levende Talen. 306: 260 267.

Groot, P. M. J. and J. G. Hoekstra.1981. Tests of English vocabulary command for EFL students at University level. Toegepaste Taalwetenschap in Artikelen. 11: 98-136.

Goulden, R., P. Nation, and J. Read. 1990. How large can a receptive vocabulary be? Applied Linguistics. 11, 4: 341-363.

Gui S. 1982. A survey of the size of vocabulary knowledge of Chinese students. Language Learning and Communication, 1, 2: 163-178.

Hartmann, G. W. 1941. A critique of the common method of estimating vocabulary size, together with some data on the absolute word knowledge of educated adults. Journal of Educational Psychology. 32: 351-358.

Hirsh, D. and P. Nation. 1992. What vocabulary size is needed to read unsimplified texts for
pleasure? Reading in a foreign Language. 8, 2: 689-696.

Holden, E. S. 1890. On the number of words used in speaking and writing. Bulletin of the Philosophical society of Washington, II, (Appendix V1). p. 16.

Hwang K. and Nation, P. 1995. Where would general service vocabulary stop and special purposes vocabulary begin? System. 23, 1: 35-41.

Jaatinen, S. and T. Mankkinen. 1993. The size of vocabulary of university students of English. In: K. Sajavaara and S. Takala (Eds.) Finns as learners of English: three studies. Jyvaskyla: Cross-Language Studies. 16.

Laufer, B. and P. Nation. 1995. Vocabulary size and use: lexical Richness in L2 written production. Applied Linguistics. 16 (3): 307-322.

Marton, W. and A. Mickiewicz. 1977. Foreign vocabulary learning as problem number one of language teaching at the advanced level. Interlanguage Studies Bulletin. 2, 1: 33-47.

Meara, P. and G. Jones. 1990. The Eurocentres Vocabulary Size Test. Zurich: Eurocentres.

Michel, J. 1972. The Problem of time: some techniques for teaching vocabulary. ERIC report. 27.

Moesberger-Verhagen. T. 1980. Unpublished Master's thesis, University of Utrecht.

Morgan, B. and L. Oberdeck. 1930. "Active and passive Vocabulary" In Bagster-Collins, E. Studies in Modern Langugae Teaching. New York.

Read, J. and P. Nation. 1986. Some Issues in the Testing of Vocabulary Knowledge. Paper presented at Quiryat Anavim, Israel, May 11-13.

Schouten-van-Parreren, C. 1996. Vocabulary Learning and Metacognition. In: K. Sajavaara and C. Fairweather (Eds.). Approaches to Second Language Acquisition. Jvaskyla.

Seashore, R. and L. Eckerson. 1940. The measurement of individual differences in general English vocabularies. Journal of Educational Psychology. 31, 14-38.

Takala, S. 1985. Estimating students' vocabulary sizes in foreign language teaching. In: V. Kohonen, H. von Essen and C. Klein-Braley (Eds.) Practice and problems in language testing 8. Tampere: AFinL. A.

Teichroew, M. 1982. Receptive versus productve vocabulary : A survey. Interlanguage Studies Bulletin, 6.2 5-33.

Thorndike, E. and I.Lorge. 1944. The Teacher's Word Book of 30, 000 Words. Teachers College, Columbia University.

Wesche, M. and T. S. Paribakht. 1993. Assessing vocabulary knowledge: depth versus breadth. Unpublished paper.

Xue G. and P. Nation. 1984. Language Learning and Communication. 3, 2: 215-229.

Appendix.

The list of the words in the tests. The unmodified versions of the tests appeared in Nation (1990) and Laufer and Nation (1995).

1000 word band

agree
colouring
average
condition
develop
escape
rang
notice
storm
follower
pleasure
mean
wisdom
limited
twice
head
reality
throw 2000 word band

original
rub
moral
elect
manufacture
melt
decay
invite
roar
debt
pride
spoil
flesh
salary
temperature
birth
sport
victory 3000 word band

administration
angel
herd
bench
charity
province
darling
echo
slice
palm
scheme
thrill
encounter
illustrate
toss
annual
definite
savage 5000 word band

apron
mess
phase
sermon
stool
magnificent
apparatus
compliment
revenue
bruise
ledge
mortgage
blend
devise
hug
fragrant
gloomy
wholesome


Contact Info:
Rob Waring
Notre Dame Seishin University, 2-16-9 Ifuku-cho, Okayama, Japan 700
Tel 086 252 1155 Fax 255 7663 Home 086 223 0341
Email:Rob Waring


Return to Main menu of papers