Scales of Vocabulary Knowledge in Second Language Vocabulary Assessment



Appeared in Kiyo, The occasional papers of Notre Dame Seishin University. March 2002


There are many theories of how vocabulary is acquired. Almost all vocabulary acquisition researchers would agree that modeling vocabulary development is a complex task. Furthermore, they would probably agree that it is a multi-faceted phenomenon about which we know very little, despite more than a century of research.  However, this has not stopped researchers and theorists coming up with models to explain vocabulary behaviour. One of the most commonly accepted views of vocabulary acquisition is that it occurs along a continuum of development. A continuum view of vocabulary development, either at the lexeme or lexicon levels, might be represented as follows.

Figure 1. A scale or continuum of knowledge


The fundamental idea is that as one acquires more knowledge of a given word, one will move along the continuum of knowledge.  This basic notion has also been extended beyond the single word level to the general notion of receptive and productive vocabulary which are placed at either end of the continuum. From this perspective, the receptive knowledge of a word comes before the productive and essentially is a requirement of productive knowledge. In other words it reflects the notion that eone has to meet a word in reception before it can be producedf.  This seems so utterly straightforward and sensible that the notion is rarely challenged (although later we will look at some ways in which it may not be so straightforward). This receptive / productive continuum can be illustrated as follows.

Figure 2. A continuum of receptive and productive vocabulary knowledge


It is not clear where this notion first arose, but proponents of this view are not hard to find as this is probably the majority view at present and is being presented in many of the most recent of writings in this area within applied linguistics. Following are the views of several researchers.


gthe distance between  (receptive and productive word knowledge is) a line ea continuum of knowledgef. h (Melka 1997: 99-100).


            git is possible that in learning a second language many vocabulary terms move along a continuum  from passive to activeh (Pigott, 1981: 4)


            gWe should think of vocabulary knowledge as a continuum between the ability to make sense of a word and the ability to activate the word automatically for productive purposesh (Faerch, Haastrup and Phillipson, 1984: 100).           


            gat one end of the continuum we have ... potential vocabulary ... moving along the continuum we have real vocabulary .. which includes those foreign language words that the learners have learned at some stage in the learning process, and that they can either only understand (passive real vocabulary) or both understand and use (active real vocabulary)h  (Palmberg 1987: 201)


Henriksen (1996) has taken this a step further by creating a model of vocabulary acquisition based on development along three continua. Figure 3 shows how Henriksen suggests that language develops along a continuum, or in some kind of hierarchical order. The partial-precise continuum is a knowledge continuum whereby levels of word knowledge are operationalized at different levels of understanding or comprehension.  The receptive-productive continuum is a control continuum, which describes the different levels of access or use ability operationalized through different kinds of receptive and productive tasks.  The depth-of-knowledge continuum entails not only the wordfs referential meaning, but also the paradigmatic and syntagmatic


Figure 3: A model of vocabulary acquisition (Henriksen, 1996)


relationships. Henriksen says that word meaning is learned along the partial-precise continuum (I), thus this is a knowledge continuum. Knowledge moves from initial word recognition through rough characterisation or vagueness to mastery of finer shades of meaning.  That is, the better the word meaning is known, the further along the continuum one moves.


Henriksen says that gvocabulary knowledge is often defined as precise comprehension, which is operationalized as the ability to translate the lexical items into the L1, the ability to find the right definition in a multiple-choice task, or the ability to give a target language paraphraseh (1996:7).  Similarly, the more paradigmatic and syntagmatic relationships that are known, the further along the depth-of-knowledge continuum (III) one moves. As one comes to know a word better, the learner is said to draw on and develop the knowledge between lexical items. Therefore, she hypothesizes that development along the depth-of-knowledge continuum is an important factor for the development of partial to precise meaning.  This is because the knowledge of a given word grows in relationship to other words and their relationships with others. 


The receptive-productive continuum is a control continuum that involves the control or accessibility aspect of lexical competence, or whether the learner is able to use a lexical item receptively or productively. For Henriksen, the difference between the partial-precise continuum and the receptive-productive continuum is that the partial-precise continuum is a knowledge continuum where different levels of declarative word knowledge are tapped, but the receptive-productive continuum has to do with how much control or access or use ability one has.  Thus explicit declarative knowledge moves along both the word meaning continuum (partial-precise) and the depth-of-knowledge continuum.


Much of the work cited above has been influential with researchers as the notion has a certain intuitive appeal. This has led some researchers, particularly those involved in collecting second language vocabulary acquisition data, to suggest that continua might be employed in ways that might provide insights into vocabulary assessment in the development of tests of vocabulary knowledge.


Continua and vocabulary assessment


Within the last decade or so, the notion of continua of vocabulary acquisition have metamorphosed into escales of vocabulary knowledgef, so much so in fact, that they have become a cottage industry of their own and are found in numerous recent studies (Joe, 1994, 1995; McNeill, 1996; Scarcella and Zimmerman, 1997, are but four examples of more than two dozen such studies). These scales can be used to assess the degree of vocabulary knowledge held by a learner. A typical scale of vocabulary knowledge is a modified self-judging Yes / No test with extra levels, often containing three or four stages of differing degrees of knowledge.  Here is an example scale of vocabulary knowledge.


0.     I do not know this word

1.     I have seen this word before, but do not know its meaning

2. I have seen this word before and know its meaning a little

3. I know this word


One can immediately see a continuum of development in this scale. The theoretical construct underlying such scales assumes that word knowledge is not bipolar in nature i.e. gknownh or gunknownh, but involves several stages of acquisition which can be measured by degrees. Knowledge goes from not known to known well through intermediary stages, and is thus essentially linear.


One of the ways in which these scales show promise is that as knowledge scales essentially tap declarative knowledge, they can tell us what is known about declarative, reportable word knowledge. Thus within vocabulary assessment, these scales are most often used by learners to self-assess their knowledge of a list of words. Proponents of these tests of vocabulary therefore suggest that a single scale can assess both breadth (how many words) and depth (how well known) aspects of vocabulary knowledge. Proponents suggest these tests can be used for assessing knowledge pre and post some treatment, and that a better analysis of what is going on can be achieved than with a standard yes/no test.


These scales are not new and often pre-date the notion of continua of development. They are variations of Thurstone scales dating from the 1930s which attempted to measure degrees of response to a variety of stimuli, most notably in marketing research. Heim and Watts (1958, 1961) and Cook, Heim and Watts (1963) developed a scale as one of two tests used to assess the knowledge of vocabulary in verbal tests for adult English speakers.  Eichholz and Barbe (1961) developed their vocabulary acquisition construct with the guiding principle of word acquisition in which gwords cannot be classified as either known or unknown.  Any word in an individualfs vocabulary may be placed along a continuum whose extreme poles are known and unknown but which has intermediate stages of knowing.h (p.2) They noted that the number stages are arbitrary but may best be described as follows:


Figure 4: Eichholz and Barbefs test of word knowledge.

                   Unknown ..0..c.......1...cc.....2.c....|cc....3...cc.....4....c....5.. Known


                                                                               of Action


0      A word at this stage is completely unknown. The individual has never seen or heard of it.

1      The individual has seen or heard the word, but has shown little if any, reaction to it.

2      The word is about to reach the threshold of action.  The individual is now ready to act on it in some way.

3      Here the threshold of action has been crossed.  The person has looked in a dictionary, asked about, or guessed at the meaning of the word: there has been an overt response to learn its meaning

4      The word has become a part of a fund of words available for use. It still has only a vague limited meaning and the person is able to define it only in a very vague way.

5      The word has become part of the individualfs active vocabulary. He uses it with facility and has given it concrete meaning.


Thus, the selective perception and action of the individual determines the position of the word on the continuum.  This scale appears to be very difficult to read and remember and one could expect test takers to need to constantly refer to it for guidance.


Dale (1965) used a four stage scale starting with eI never saw the word beforef, moving on to eI know there is such a word bit I donft know what it meansf. The third stage was a evague contextual placing of the wordf and the final stage is ewhen we have pinned the word downf. DfAnna and Zechmeister (1991) used a knowledge scale in their study of the vocabulary size of university undergraduates. The subjects had to rate their knowledge on the following scale.


Figure 5: DfAnna and Zechmeisterfs Knowledge Scale


1        have never experienced the word before

2        have seen or heard the word before, but do not know its meaning

3        have either seen or heard the word before and have a vague idea of its meaning

4        would be able to recognize the meaning of the word if given the word in a multiple-choice test which included the correct meaning and several incorrect meanings

5        know the meaning of the word well enough to give its definition.


Zimmerman (1997) used a four point Knowledge Scale test to assess levels of word knowledge.


Figure 6: Zimmermanfs Knowledge Scale

a)         I donft know the word

b)        I have seen the word before but I am not sure of the meaning

c)         I understand the word when I see it or hear it in a sentence, but I do not use it in my own speaking or writing

d)        I can use the word in a sentence


Within the last 5 or 6 years, The Vocabulary Knowledge Scale (VKS) (Paribakht and Wesche, 1993; Wesche and Paribakht, 1996) has gained significant currency in second language vocabulary assessment and is being used (or its variants) in a variety of different studies, some of which have already been mentioned.  The particular aim of the VKS is to construct a gpractical instrument for use in studies of the initial recognition and use of new wordsh (1996: 29). Their theoretical framework for the construction of this instrument is based on Gassf (1988) 5 levels of vocabulary acquisition and reflects the initial vocabulary acquisition as a multi-stage, iterative process involving repeated exposures to new words in meaningful contexts (Paribakht and Wesche 1993: 155). It should be noted that they do not intend to claim that vocabulary knowledge is linear, particularly at higher levels, despite the use of a linear scoring scale.


The VKS differs from the others scales because it requires verifiable evidence of knowledge held at higher levels. Their scale is thus a control scale of how well words are known and what control one has over them. This is their scale.


Figure 7:  The Vocabulary Knowledge Scale from Wesche and Paribakht (1993)


I:          I don't remember having seen this word before

II:        I have seen this word before but I don't know what it means

III:       I have seen this word before and I think it means ________ (synonym or translation)

IV:       I know this word. It means __________ (synonym or translation)

V:        I can use this word in a sentence. e.g.: ___________________ (if you do this section, please also do section IV)


The basic idea of the scale is to measure progressive degrees of word knowledge. Level I is not really a level at all, but reflects what the subject does not know. Levels II, III and IV are a measure of recognition vocabulary and Level V, one of productive vocabulary.  When subjects provide evidence of their knowledge the score assigned to that test item is determined by a re-assignment of scores according to the following categories.


Figure 8 : The  VKS scoring categories: Assignment of Scores to Responses




For example, an unsuccessful attempt at Level V or IV will result in a score of 2., 3 or 4.  If knowledge of a meaning of the word is shown in a Level V response, but the word is appropriately used in the sentence context, a score of 3 is given. And so on.


Practical problems with these scales[i]


While these scales may seem attractive and have a certain intuitive appeal, there are several reasons why they might not best represent the nature of developing vocabulary knowledge, at least in the way that they have been used. For example, Read (1998) notes that git is doubtful whether learnersf developing knowledge of second language words can be meaningfully represented by a single linear scaleh (2000:136). An obvious limitation to these scales is that they only test written and reading vocabulary. Nowhere is there a mention of oral or aural abilities.  Nor do they measure how fluent the subject is with these words, or even deal with polysemous words. While the tests might be altered somewhat to accommodate these points, there are deeper and more serious matters to give us pause for concern.


            Is receptive and productive knowledge on a scale?

In these scales there is an assumption that receptive vocabulary is somehow lower or earlier than the productive. Although this may seem to be plausible, or even logical to some, it is still a matter for theorists to show[ii]. Moreover, the scales tend to be very heavily balanced in favour of the receptive ability and if there is only one item at the productive (e.g. Level V in the VKS) there is insufficient evidence of the depth of knowledge of that ability, despite it being a gdepthh measure.


            Internal inconsistency

More seriously, the VKS and the other scales tend to be internally inconsistent in several ways.  There are a variety of keywords in these scales used as knowledge prompts, such as know, have seen, means and can use. An obvious advantage with this is that the scales then become multi-dimensional as they test different aspects of word knowledge. However, the inconsistency in the use of the these terms at each level mean that different types of word knowledge, as reflected in the choice of key words such as know and see, are not consistently tested at each level. Thus we are not able to see if the same aspect of word knowledge is known consistently at each level.  Moreover, a subject could know a word but have never seen it in writing, but know the pronunciation of it. These scales appear to have several sub-scales of word knowledge and control within a single scale. The results from which would be unnecessarily complex and difficult to disentangle without arbitrary inferences about what it all means. There is therefore a pressing need for a single scale if we are to accept scales of vocabulary knowledge as valid measures of a complex phenomenon such as vocabulary. Using less complicated scales might prove to be more fruitful. In the following scale there is only one dimension (how ewellf the learner knows the word) which would make interpretation far easier.


                        A.  I do not know this word

                        B.  I know this word a little

                        C.  I know this word quite well

                        D.  I know this word very well


The above points also imply that often these are not really escalesf in a linear sense as it is possible to write at Level IV but not be able to do Level III on the VKS.  Furthermore, it is possible to provide evidence of word knowledge by using collocations such as contact lens and know what they mean, but not know the individual meanings of contact and lens. Moreover, one could reproduce them with semantic and grammatical accuracy, but not be able to provide a synonym.  This means that Level V can be seen as closer to Level II.  This calls into question the supposed difference between gsemantic appropriatenessh and gsemantic and grammatical appropriatenessh.  If this is taken to mean collocational knowledge, how would one rate a perfectly grammatical sentence where the target item is used in the wrong word class from the target, or any other of a series of problems stemming from lack of explicitness? Furthermore, asking subjects to constantly respond to the same target item may cause the learner to become confused and moreover, severely limits the number of words that can be tested in a given time.


Level II (I have seen this word before but I don't know what it means) cannot be verified.  It is possible that learners may think they have seen the word but in fact it was something else.  A further problem implicit in Categories II and IV is the use of synonyms or translation equivalents as the criterion for knowing at that level.  If this instrument were to be etranslated e into other languages, then we would have to be very careful in assuming that all languages have a rich store of synonyms and translation equivalents to call upon.  For example, if such a test were constructed for German L2 speakers, as German has relatively few true synonyms, the task would be exceedingly difficult and not because of a lack of linguistic knowledge.


Appropriate assignment of scores

The complexity of interpreting the multiple dimensions within a scale is further compounded by the marking scheme. In the VKS, for example, a subject who efailsf at one Level is re-assigned a score on the basis of the re-assignments as in Figure 8. The word may be rated 4 or 2 depending upon the subjective judgment of the marker, which seems to be a rather arbitrary re-assignment. No justification is offered as to how, and why, a failure at say Level IV makes it a Level III word or Level II.


Score interpretation problems

The VKS was constructed to be used for research into learning and developed to assess small increments in knowledge pre- and post- a treatment.  When using these scales, the differences between the pre-and post- test scores are totaled and averaged to determine how much has been learned because of the treatment. However, what would a subjectfs score of 3.5 at pre-test and 3.7 at the post test mean? Can we say that a learner has egainedf knowledge in the interim? If the scores were 2.1 and 4.8, it would be easier to assess how much knowledge had been gained, but the problem would remain as to what it meant. For example, two learners both with a mean rating of 3.0 might have completely different profiles.  The scores of the following two hypothetical sets of data illustrate this problem.


Subject A         1 1 1 1 1 5 5 5 5 5  = 30 / 10  (average 3.0)

Subject B         2 4 1 4 2 3 2 5 3 4  = 30 / 10  (average 3.0)


Even though both subjects have the same mean score, a cursory look shows that they have very different profiles.  When we add in a second set of data, the interpretation becomes even more difficult. My own research (Waring 1999) with these kinds of scales found considerable variation between test times. Here are some typical data. How would one interpret these scores?


Test 1       0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5   = 45 / 18 = 2.5

Test 2       1 3 4 0 1 5 1 2 2 3 4 1 1 5 4 3 5 4   = 49 / 18 = 2.7


It is difficult to see how one can read anything into this. When we compare the subjectfs performance pre- and post- some treatment, a total mean score cannot tell us which ratings for which words had changed because of the treatment. We do not know for example, if there was a better understanding of the word form, a better understanding of the meaning, or an increase in ability to use the word.  All this potentially vital information is lost in the search for a number upon which we can hang our statistical tests.


Furthermore, by converting the levels into ordinal data and averaging them, the researchers assume that there is a scale of knowledge following the number sequence. This in turn implies that the difference between each Level is somehow equi-distant. That is, the knowledge gain from Level II or Level III should involve the same effort, amount of knowledge or suchlike as a move from Level III to Level IV. It has not been shown that the distances between each Level are equal and thus worth an equal score.  It is plausible to suggest that an alternative interpretation of the Levels would have a small distance between categories I and II and wide ones between II and III, narrowing again between II and IV and a very large gap appearing between IV and V. Other possible edistancesf could be equally plausible.


Par t of the source of the problem is that data gathered from such scales is nominal not ordinal and cannot be added up and averaged as the researchers do. This is a problem as the scoring in the research using the tests is ordinal and numerical, which is a linear scale, and is thus diametrically opposed to the authorsf intention of not having the test seen as a linear scale.  Because the data are nominal, it also implies new forms of analysis need to be performed on these kinds of scales in order to determine what the data generated by the scales mean.  Some exploratory work has already begun in this area (Waring 1999, 2000; Rodriguez Sanchez).      


            Problems with validation

The lack of guidelines for scoring the definitions and sentences is a major problem within the VKS. It is to be presumed that a learner should write a general definition at Level III and a more detailed definition at Level 4. How would one score the following three definitions of diamond as a egemf, or a estonef, or ea piece of jewelleryf? How would one justify such a decision without being totally arbitrary?  How would (could) one consistently apply these criteria to all words? At Level V, a subject could write the following responses to their knowledge the prompt old and according to the criteria for grammatical accuracy, should be given full marks as their use was perfect, but these hardly demonstrate that one actually knows, or can use the word to any great extent.

                I am old

                I heard the word old in class today


The following sentence demonstrates this knowledge far better. However, the problem remains as to how one justifies the scoring.


                I know I am old because I am 78 and my skin is wrinkled.


McNiellfs (1996) results using a modified VKS with Hong Kong learners are relevant here.  McNeill found that his learners were able to write sentences illustrating their understanding of the word in very sophisticated ways. For example, some of them wrote the following for demographic  (at the end of each sentence is the definition given post hoc).


                Chinese immigrants are producing demographic problems in Hong Kong, especially in education (ggreath)

                There is a demographic difference between Hong Kong and Kowloon (grelief / contourh)


This highlights the cautions we must take if we are to assess written knowledge of words in this way using definitions as an indicator of word knowledge. 


If we were to use these scales in their present format we would need to a) require subjects to write sentences showing they know the word and, b) add a second check test to ascertain whether this word is known in the meaning sense it was intended to convey in the writing of the example sentence.  Then we need to set guidelines for assessing how well the check test assesses the meaning known in the sentence definition and then reassign scores accordingly. We then run into problems of finding a suitable test format for the check test. An interview may be performed, but there are time and other problems with this as noted above. A multiple-choice test may be adequate but the reason the VKS was designed was in part a reaction to the problems with multiple-choice tests.   These additional levels of checking leaves an uncomfortably large element of subjective assessment which may render the verification of knowledge too vague to be of real use.


            Can native speakers perform on this kind of test?

It has long been known that this is not always possible for native speakers to get perfect scores on multiple choice and cloze tests. It is unclear whether or not native speakers could use this instrument and provide the knowledge required, each time getting a maximum score.  Without knowledge of these properties of the tests, it would be unreasonable to ask L2 subjects to do so.


            Bias to elinguistically awaref subjects

It has often been suggested in FLT circles that L2 learners eknowf more grammar than L1 speakers in the sense that they are more linguistically aware of their declarative knowledge about the language, such as the tense systems and so on.  Thus it may follow that this kind of task may be easier for formally trained L2 subjects than some native speakers.  Alternatively, some native speakers may have a great awareness of their own language and can perform with ease.  Thus the task is at least in part dependent on onefs own metacognitive awareness of the target language. This means that the use of such reporting devices as described here may need to be made comparable in task difficulty for L1 and L2 subjects.  Additional problems arise in that subjects may prefer to show their declarative knowledge in ways other than required by the tests.  For example, a subject may prefer to give a dictionary definition, or prefer some other way other than give a synonym.  This means that Levels II and IV may not be able to tap certain aspects of knowledge of subjects who have different preferences for demonstrating their knowledge. More importantly, this kind of test would thus probably favour advanced learners over beginners who would be less able to find synonyms, translations and write egrammatically accuratef sentences.


            Construct definition

Furthermore, depending on the task required, the scales may not adequately tap the knowledge gained despite their claims for greater degrees of discriminatory ability and for gathering breadth and depth data. For example, if a subject is given a text to read and is being measured on the amount of incidental learning from the text, then a translation or synonym test, or even a sentence definition test as part of the scale, may not be suitable measures of assessing this knowledge. Furthermore, it has yet to be shown that there is a hierarchical order for demonstrating word knowledge starting with an ability to translate, moving on to being able to supply a synonym, and finally to give a sentence definition. Each of these tests involves varied linguistic abilities and are not uniform and to assume that all words would follow the same steps and stages in the same sort of ways that the scales imply may be mis-representing the very nature of vocabulary.


This therefore implies that if we are to construct vocabulary knowledge scales of any kind, then we should do so with a particular framework in mind.  The VKS as it stands may be able to obtain a general score for the subject, but at the level of specific detail it may miss certain aspects of knowledge gains. This also suggests that these kinds of instruments may be better able to assess knowledge at either end of the scale than that in the middle. It may be that some words such as concrete nouns can get to the higher categories very quickly whereas other never get there at all. Also a subject may hold some level of knowledge about a word that is non-specific, but will not be able to show that knowledge or gain on the VKS.


            Test reliability

How reliable is the VKS?  Wesche and Paribakht report high correlations (0.92 to 0.97) between the learnersf responses and the way their responses were scored. This may have something to do with the words selected in the 1996 study.  There are many low frequency (desensitize, surrogate, ailment, masterpiece) and high frequency items (since, because, lose, allow, sea) but not many in the middle.  This would probably lead to consistently high ratings for the frequent words and consistently low ratings for low frequency words. The high frequency words cannot increase in ratings as they will be known, and it is unlikely that many low frequency items would change ratings from exposure to a single test. Therefore, the ratings pre- and post- are likely to remain the same and thus generate very high correlations.  Independent validation of the reliability of this instrument should therefore be a priority.


Conceptual problems with linearity in vocabulary assessment.


In the escalesf papers there seems considerable tension in the conceptualization as to what the stages, Levels, categories, or sections mean.  Wesche and Paribakht have attempted to conceptualize the instrument by saying that  gits purpose is not to estimate general vocabulary knowledge, but rather to track the early development of knowledge of specific words in an instructional or experimental situationh (1996: 33). From the comments above we can see that this instrument is attempting to identify certain levels of knowledge, but this knowledge does not follow a linear scale. 


It is instructive to look at this notion of linearity to see what insights we can derive from it.  As we have seen, a continuum or linear view of vocabulary development assumes we move from less to full knowledge of a word (Figures 1 and 2).  Thus we can see that it implies


a)      growth along the continuum

b)      there is a no knowledge level and a mastery level

c)      parts of word knowledge are learned sequentially

d)      there is a movement from receptive types of knowledge to productive types


There are at least two ways which vocabulary could be represented in this way - at the word and at the lexicon levels.  It might be argued that it is the growth of the lexicon that moves along a continuum, with the receptive vocabulary preceding the productive and that the development of each word follows its own separate continuum. In seeking more incisive assessment instruments than vocabulary scales, we need to explore the notion of a continuum or scale of development in some detail to see how far they can go, and what limitations bind them.


If we return to the notion of a single continuum that shows receptive and productive vocabulary development being on the same continuum (Figure 2), this view would hold that onefs receptive vocabulary must be complete before any aspect of production can proceed. This is very improbable as learners can use a word quite well without really understanding all aspects of its meaning. Furthermore, learners are constantly experimenting with their production of words over which they do not have a complete understanding in a search for feedback on whether they used the word appropriately.  Their success or failure provides receptive input as to their success and thus production occurs before reception is complete. Thus, a continuum from the receptive to the productive would also imply that it is impossible to gain only part of the receptive knowledge and part of the productive. Additionally, it would imply that in the attrition of knowledge as represented by a move backwards along the continuum, the productive knowledge is lost before the receptive. As we have seen, this view does not take into account the multi-dimensionality of the growth of the lexicon.


An alternative view is to suggest that continua representing different aspects of word knowledge can serve as a better model. Such knowledge might be represented as follows:



This multi-continua view as shown above sees receptive knowledge as distinct and separate from productive knowledge but this quickly leads us into very deep water. This kind of model would mean that any knowledge gained on the receptive continuum will not affect the productive, and vice versa. It also means that it is possible to have complete receptive control but no productive. The logical corollary is that it is also possible to have full productive knowledge without any receptive ability whatsoever. This model becomes unsatisfactory given that both receptive and productive knowledge can feed each other.


Another way to conceptualize this is to assume that a certain threshold of receptive vocabulary is needed as a prerequisite for the onset of productive. It might be illustrated thus.



This would seem an admirable solution as onefs receptive and productive vocabulary can develop with interaction, but again this view is fraught with dangers. The two knowledge sources develop independently once a threshold has been reached and it is still possible within the bounds of this model to have much more productive ability than receptive. A further problem is that it is not clear quite where this threshold might be. Positioning the intersection point to the left side of the receptive end could imply that only a minimal amount of knowledge is needed to reach the threshold, whereas if positioned to the right would imply a much greater level of knowledge that would be needed. A more serious problem is that as the two continua develop independently knowledge gained beyond the threshold cannot interact with knowledge on the other continuum. So for example, the feedback obtained from the testing of a hypothesis about the use of a word by using it in speech cannot renter the system on the receptive continuum.


We could go on for quite while drawing refinements of these continua, where the different knowledge sources interact at various points along the continuum with feedback loops and added and so on. However, no linear diagram can represent the multifarious nature of the development of the lexicon. Thus we must be extremely cautious when looking at data resulting from the use of these scales of vocabulary knowledge as their theoretical base is suspect.



The proponents of the scales of vocabulary knowledge that we have looked at are attempting to measure the various stages of acquisition.  One of the problems with the VKS and the other scales, is that they set out to measure stages of acquisition under the assumption that word knowledge is learned incrementally (i.e. an extra piece of knowledge is added onto the end of what is already there, much like a pay bonus is added to onefs regular salary). The incremental learning of a word can mean there may be incremental changes within each stage as well as increments of change between stages.  The term eincremental learningf implies a linear step-by-step view of vocabulary acquisition.  A more multi-dimensional term is eaccretive learningf.


In order for there to be stages in these kinds of instruments, the boundaries need to be explicitly defined.  The problem facing these scales then is one of construct definition.  In theorizing about the various stages of acquisition the position is usually taken that the researcher decides what these stages are and develops a test to assess them. If the results are said to be problematic then the stages are revised and progress made.  Research into finding these boundaries would lead us to see whether in some real metacognitive and psycholinguistic sense these stages actually exist as independent levels or stages.


The essential problem may be that the question is being asked the wrong way round. Instead of the researcher tying to pin the subjectsf knowledge into pre-set categories, would it not be prudent to have the subjects decide for themselves what their own knowledge is and report that?  If we look at knowledge scale tests in this way then the conceptualization of them changes completely and so would the questions we ask and the amount of inference needed to ascertain whether knowledge is known or not. 


The VKS and other scales seek to fulfill an objective assessment function and as such are not particularly successful. The basic problem that it is all too easy to fall into the trap of assuming that a test is measuring only one level of information. The test formats we currently are endowed with are probably not able to capture specific aspects of word knowledge either in a componential sense or in an acquisition order sense.  There are just too many variables floating around.  This is not to say the work developing these scales work has not valuable, but that the instruments currently being used to assess the relative have limited introspective abilities at the level of specifics. 


Part of the problem using these scales of vocabulary knowledge is the way in which the data derived from their use is analyzed. If the data gathered from these scales is re-conceptualized in terms of knowledge States which are functionally independent from a testing perspective, and retain the nominal status of the data, there will be considerable promise in finding a suitable method for assessing knowledge gained using such tests. This view of word knowledge is captured in Multi-State models of vocabulary assessment and has been reported elsewhere (Meara and Rodriguez Sanchez, 1993; Rodriguez Sanchez, 2000; Waring 1999, 2000).


Considerable work remains. This paper has identified several problems with these scales particularly with the ways in which they have been constructed and operationalized. Much more work has to be done to remove many of the more troublesome aspects of these scales. Until we know more about their properties, it is therefore prudent to view research based on these tests with considerable pause.




Cook, J. M., Heim, A. W. and K. P. Watts. The word-in-context: a new type of verbal reasoning test. British Journal of Psychology. 54 (3): 227-237. 1963.

Dale, E. Vocabulary measurement: techniques and major findings. Elementary English. 42: 895-901. 1965.

D'Anna, C., A. Zechmeister. Toward a Meaningful Definition of Vocabulary Size. Journal of Reading Behavior; Vol. 23 (1); 109-22. 1991

Eicholtz, G. and R. Barbe. Vocabulary development. Elementary School Journal 61: 414. 1961

Faerch, K. Haastrup and R. Phillipson. Learner Language and Language Learning. Copenhagen: Multilingual Matters. 1984

Gass, Susan M. Second Language Acquisition and Linguistic Theory: The Role of Language Transfer . In Flynn S. and O'Neil W. (Eds.). Linguistic Theory in Second Language Acquisition. Dordrecht : Kluwer Academic Press. 384 –403. 1988. 

Haastrup, K. and B. Henriksen. Vocabulary acquisition: from partial to precise understanding. In Haastrup, Kirsten and A. Viberg Perspectives on Lexical Acquisition in Second Languages: Lund University. 97 – 126. 1998.

Heim, A. W. and Watts, K. P. A Preliminary study on the self-judging vocabulary scale. British Journal of Psychology 52 (2): 175-186. 1961.

Heim, A. W. and Watts, K. P. Preliminary Note on the self-judging vocabulary scale. Psychological Reports 4: 222. 1958.

Henriksen, B. and K. Haastrup. Describing learnersf lexical competence across tasks and over time: a focus on research design. In Haastrup, K. and A. Viberg Perspectives on Lexical Acquisition in Second Languages: Lund University. 61-96. 1998.

Henriksen, B. Semantisation, Retention and accessibility: Key concepts in vocabulary learning. Paper presented at the AILA Congress, Jyvaskyla, Finland.  August 1996.

Joe, A. Text based tasks and incidental vocabulary learning.       Second Language Research. 11 (2): 95-111. 1995

Joe, A. What effects do text based tasks and background learning have on incidental vocabulary learning?. MA Thesis. English Language Institute Victoria University Wellington. 1993.

McNeill A. Vocabulary Knowledge profiles: Evidence from Chinese speaking ESL speakers. Hong Kong Journal of Applied Linguistics 1 (1): 39-63. 1996.

Meara, P. and I. Rodriguez Sanchez. Matrix models of vocabulary acquisition: an empirical assessment. In: M Wesche and S. Paribakht (Eds.) Symposium on Vocabulary Research. Ottawa: CREAL. 1993.

Melka, F. Receptive vfs productive aspects of vocabulary. In Schmitt, N. and M. McCarthy (Eds.): Vocabulary: Description, Acquisition and Pedagogy. Cambridge: Cambridge University Press. 84-102. 1997.

Oller, J. Jr.  Toward a Theory of Technologically Assisted Language Learning. CALICO Journal. 13 (4): 19-43.1996.

Palmberg, R. Patterns of vocabulary development in foreign language learners. Studies in Second Language Acquisition. 9: 201-220. 1987.

Paribakht, T. and Wesche, M. Reading comprehension and second language development in a comprehension-based ESL program. TESL Canada Journal. 11(1): 9-29. 1993.

Piggott, P. Vocabulary growth in EFL beginners. MA project, Birkbeck College, London. 1981.

Rodriguez Sanchez, I. Algebraic Models of Vocabulary Acquisition. PhD Thesis University of Wales, Swansea. UK. 2000.

Scarcella, R. and C. Zimmerman. ESL student performance on a text of academic lexicon. Studies in Second Language Acquisition, 20 (1): 27-49. 1998.

Waring, R. Tasks for Assessing Receptive and Productive Second Language Vocabulary. Ph.D. Thesis. University of Wales, Swansea, UK. 1999.

Waring, R. The gState Rating Taskh - an alternative method of assessing receptive and productive vocabulary. Kiyo, Notre Dame Seishin University: Studies in Foreign Languages and Literature. 24 (1): 125-154. 2000.

Wesche, M. and T. Paribakht. Assessing vocabulary knowledge: depth vs. breadth. Canadian Modern Language Review, 53 (1): 13-40. 1996.

Zimmerman, C. Historical trends in second language vocabulary instruction. In Coady, J. and T. Huckin. Second language Vocabulary Acquisition: A rationale for Pedagogy. Cambridge: Cambridge University Press. 5-19D1997.



[i] While the majority of the problems mentioned are specifically about the VKS, these problems face the other scales.  The VKS received most comment because it is the most often used assessment tool.

[ii] There are occasions when learners can produce words they have never met, through lexical inferencing; through the use of either affix or derivative knowledge; or even from an understanding of how L1 words and L2 words are similar. For example, Spanish learners of French can use their Spanish knowledge of words ending in -açion and infer that the French equivalent root will be the same with the suffix changing to –ation. In this way the learner does not have to meet the word in French to be reasonably sure of correct production.