Connectionism and Second Language Vocabulary.

ROBERT WARING

Abstract.
In Part 1 I will explore the nature of connectionism and point out some of the ways it seems to account for aspects of second language vocabulary knowledge at a micro-cognitive level which is not subject to introspection. From there, I will look at some of the limitations of this view and will show the connection with higher cognitive functions. In Part 2 a model of lexical storage will be presented to show this relationship. There will be some discussion of the limitations of the model and some suggestions for directions for future research.


Part 1: What is connectionism?
Connectionism as a term was first mentioned in Thorndike's study (1898) of the way cats learn in incremental stages. Connectionism as a paradigm of learning has its roots in associationism. Associationism dates from classical times but was substantially refined by the seventeenth century philosophers Hobbes and Locke. The fundamental belief of associationism is that learning could be regarded as the formation of associations between previously unrelated information based on their contiguity. Connectionism is also based on this principle but is somewhat different in that it encompasses much more as outlined below. Connectionism borrows heavily from associationism and is a term that covers neural networks and Parallel Distributed Processing (PDP). Neural networks seek to explain cognition in biological or neurological terms and PDP tries to show that the information is not stored in the brain in one place but is distributed throughout the various parts of the brain which serve certain linguistic and non-linguistic functions. Generally PDP and connectionism are seen as being synonymous. Associationism by contrast, does not contain many of the more advanced and sophisticated notions of connectionism (see Bechtel and Abrahamsen (1991) or Cohen et al. (1993) for reviews in this area).

There is no unified agreement on what exactly connectionism is, however most connectionist models seem to share several properties. Connectionist architectures of cognition are loosely based on the architecture of the brain. Connectionists do not use neurological terms such as synapses and neurons directly, but instead use the terms nodes and networks which are said to represent a crude but effective approximation of the neural state of the brain at a superficial level . These nodes are massively interconnected with other nodes to form a network of interconnections, hence the term connectionism. Each of these nodes can be connected to many different networks. The knowledge is stored in these interconnections and is associated with other kinds of knowledge contained in the network and to other networks, hence the relationship to associationism. Connectionists believe that these interconnections store the lexical information, however this does not mean that the information is stored in one place (one cannot look inside the brain and find a particular word for example), but in the interconnections between the nodes in the form of a network. One could visualize that the representation of a word might involve interconnections between various parts of the network, for example to the phonological, semantic or orthographic parts of the network. From this we can see that the knowledge is distributed among many interconnections. This distribution information provides us with several advantages which will be discussed later.

Some connectionists believe that information is related to each other in the brain in the form of massively interconnected sub-networks rather than as a simple unified system. These sub-networks store information that can be accessed by other sub-networks. For example, a sub-network of morphological knowledge can connect with a sub network of word roots, which in turn can connect to a semantic sub network which stores meanings of words. While the exact make up of these interconnections is not known, we do have some insights from our knowledge of the mental lexicon what it might look like. Future research may be able to clarify this knowledge.

From the interaction of these inter-related networks we can form the meaning of a word and find the correct word to choose. If we have to find a past tense form for example, the morphological network can be tapped to retrieve it. Each sub-network making up a 'word' as such, would be connected to hundreds or thousands of other nodes making up a mini-network for that word. These sub-networks will be connected to areas of the brain that control the phonological, speech, auditory functions as well as the storing of lexical-specific information. The sum of all these interconnections for that word make up the knowledge about that word which the learner has. Therefore, a well known word will have a very intricate network of interconnections and less well known words will have fewer interconnections. A different word would have a different set of nodes connected to hold that information - another mini-network. It may, of course, share many of the same nodes as other words, or may not depending on the make up of that word. In essence then our lexicon (or lexicons) is made up of hundreds or thousands of these sub-networks all massively interconnected to form the lexicon.

Within a network the nodes are organized into 'levels' such that any one node excites or inhibits other nodes at its own or different levels. Patterns, habits and rules are not stored in these interconnections, but what is stored are the interconnection strengths that allow these patterns and rules to be recreated. Knowledge is seen at the micro structure level rather than macro-structure level of cognition. Therefore, the strength of the interconnections reflects the relative knowledge one has about an item of vocabulary. Prototypical representations of the lexical environment emerge as a natural outcome of the learning process. See Bechtel and Abrahamsen (1991) Broeder and Plunkett (1994) Ney and Pearson (1990) for more detail in these areas. Learning, therefore, is a by-product of processing.



Developing the picture.
In order to bring the abstract to the concrete, the following diagram seeks to illustrate a learner's knowledge of the verb to see .


Diagram 1: A hypothetical partial representation of a learner's concept of 'see'.

The reader should immediately notice several things about this network and the limitations of representing the network in diagrammatic form. The first and most obvious, is that the representation is incomplete and is only a partial representation of a learner's knowledge of see. That said, the concept of see in diagram 1 is distributed among many interconnections, some of which are thin and some are thick. The stronger the interconnection (thicker line) the more 'well-known' the information is, the thinner the line represents less 'well-known' information. The learner is relatively sure that see means something like 'an image comes to my eyes' and that it collocates with some objects. She is less sure about her knowledge (partial knowledge) that the pronunciation of the past tense of saw is /sØ:/ represented by the thinner line.

Secondly, diagram 1 shows, for diagrammatic purposes only, nodes that have been labeled 'meaning' 'past tense ending', 'preposition use' and 'objects that collocate' with their sub-categories. The learner could assign labels to these quite differently and in fact not even have them categorized as shown, but in some completely different way - reflecting her own view of the word see. Alternatively, these nodes could not exist at all, or there could be no interconnections between them, reflecting no knowledge between these nodes and thus no knowledge of see. The diagram does not show all the other possible nodes about see - for example there are no nodes for its 'idiomatic use', the knowledge that see is pronounced the same as sea and so on.

Thirdly, each of the nodes and sub-categories, such as 'meaning' are shown as being connected to other parts of the network by the lines leaving the diagram. Therefore, the network is immensely complex in structure. It would take only a little imagination to conceive of a diagram which could represent knowledge of 'affix knowledge', 'words I have problems with', 'semantic networks, 'words to use when apologizing in German' and indeed many facets of vocabulary acquisition all linked together. Such a highly interconnected network would, of course, be beyond diagrammatic representation.

What can the model demonstrate?

Associative learning.
Clearly, the associative nature of vocabulary is shown here. Each network of knowledge is connected to many other networks. This model can also demonstrate how we could instantiate knowledge from the network in a schematic way. One piece of lexical information connects to another and can instantiate a related idea or word (see Rumelhart and Ortony, 1977 for a discussion of schema). Schema theory has shown us the importance of background knowledge and the relationship it has to comprehension (see Brewer and Treyens, 1981 for an example). Sometimes learners cannot comprehend a lexical item due to insufficient conceptual development or lack of background knowledge. This model can show the interconnections (or lack thereof) to non linguistic knowledge that can hamper comprehension. By the same token, if a learner comes across a new word he may be able to guess from context prior lexical knowledge. Clearly the richer the network of associations, the more chance there will be of comprehension. The learning of an L2 lexicon would involve deepening and enriching these networks and their interconnections.

Partial knowledge
The model can account for full, partial and incorrect storage of lexical knowledge. Knowledge that things are not something can also be accounted for in this model. For example, this learner may explicitly know that the past tense ending of see is not seed (is not /I:d/) This would be represented by drawing a line to that part of the diagram - either thickly or thinly depending on the strength of that knowledge.

Incremental learning
Interlanguage phenomena point to a learner system whereby learning is incremental, and done in successive and / or recursive steps. This models reflects this well as it can account for information that is not part of the L1 nor the L2, but nevertheless is systematic and which the learner is constantly updating (or has fossilized) (see Klein, 1986).


Content addressability.
Word knowledge in this network is content addressable . This means that if a learner is asked for a word that means 'a round, hollow leather thing that you can play soccer with', or asked for the meaning of 'soccer ball' he can answer from both directions. Therefore, the information is stored in a connectionist architecture can be accessed in many ways.

Individual variation.
Each learner will have a different network of associations and interconnections. Clearly the L1 can intrude on the transfer of L1 lexical knowledge. This model can account for learner variation even with learners from the same L1 and with the same input having differing lexicons. Some SLVA researchers have proposed different lexicons serving different purposes such as productive or receptive L2 lexicons. In this model there is no reason to assume that sub-networks for separate lexicons could not exist side by side or be interconnected. At the other extreme, those who say there is only one lexicon for all lexical knowledge, both from the l1 and the L2 could also be accommodated here.

Advantages of a distributed network.
The knowledge is stored in the interconnections, therefore each node can connect to many networks. A major advantage of this is economy of the network in the sense that a single node could be connected to many others thus allowing one node to form many representations. This in turn allow the parallel processing of information where the brain can process many things at once. Clearly we receive many different forms of input at any given time all of which must be process simultaneously .

Another advantage of a distributed system is that if one part of the system deteriorates (for example a given word is known but temporarily cannot be recalled) the whole system does not break down as the forgotten word will be connected to other words which could replace it. This is often called graceful degradation. For example, if a learner had learned collapse but when called upon to produce it cannot access it, a substitute could be found from within the network such as fall down. The network thus has built in redundancy in that the capacity to continue correct operation despite the loss of part of the information comes from the fact that the original network had encoded more information than was necessary to maintain the network.

Human-like behaviour.
One of the main achievements of a connectionist system is that it can process information and learn in ways which mirror some aspects of human learning and information processing such as, pattern matching, spontaneous generalization, stimulus categorization and concept learning. This make these models very attractive to psychologists in particular. In addition, some of the models developed have been able to model at least some specific aspects of human performance (see Cohen et al, 1993, and Haberlandt, 1994 for numerous examples).

Generation from experience.
Connectionist systems have the ability to automatically or spontaneously generalize from experience. This could be done both productively and receptively. For example, if a learner knows the affix '-ist' can refer to a person doing a particular kind of job or work, then when the learner meets an unknown word ending in '-ist' he can guess that it would be a person doing a certain type of work. Similarly, if a learner wanted to generate a word '-ist' could be added to a person's job to create that word. For example, he might generate 'pianist' (if it was not known) from knowledge about piano, work, and -ist. Alternatively, he could create a novel word such as 'computerist'. Furthermore, overgeneralization of lexical applications can be explained in these terms.

Lack of lexical knowledge can be represented.
Beginning vocabulary learners often will not have an L2 network set up for some words/concepts let alone one that can find substitutes when needed. This lack of a developed network could help to account for why it is that learners are at a loss for words at times - simply the network has not been set up or it contains insufficient knowledge. It would therefore follow that lexical items which are not repeated or met frequently could have tenuous interconnections. Therefore there needs to be constant practice and reinforcement. It should be noted that this practice is not behaviourist in the sense that each items will need repeating many times and that is the only way to learn. What it does mean is that the interconnections may need reinforcing to strengthen the interconnections.

Learning under this model.
As we learn, we constantly match new input to old information and adjust our knowledge store network according to the new information. Our processing of the input affects our future potential output in that the present knowledge store has been altered by new input and a new status quo is made until new input comes along to confirm the present state or lead us to review it again.

Connectionism rests on the assumption that we learn by trial and error in successive steps, incrementally and through exposure to input. Successive steps in the learning process alter the associative interconnections by the strengthening or weakening of the interconnections. The more well known a piece of word knowledge is, the stronger the interconnection that makes up that part of the word's knowledge. This matches the view that a new word will be not learned completely on first meeting, but the knowledge of that word (such as the pronunciation, spelling, 'grammatical' features of the word, its collocates, register and so on) will incrementally grow with the number of times the word is met in various contexts. It will be a rare occasion that a new word is learned at one trial with all its features readily available for use, though connectionist networks do not prevent this happening. This does not mean that a word cannot be learned at one trial however, despite the fact that present connectionist simulations cannot do so.

The network or sub-networks making up the lexicon is ever changing and one could view it as never resting. Imagine for a moment we could take a snap shot of the network at rest. If a network representing say the word/concept 'do' were caught at rest, we could see that some interconnections were strong reflecting perceived well-known information (even if it is wrong) and others were weak reflecting less well known information. As new information is added, new interconnections are made to different nodes to account for this. The strength of these interconnections is altered by the input strengthening some interconnections, for example confirming that we in fact say 'do the washing' and weakening interconnections of other parts of the mini-network making up 'do'. For example, if a learner said 'do a crime' and was corrected, then the learner could then connect 'crime' with 'commit' rather than with 'do' making a new interconnection. This would not mean that the collocation would be 'learned' but that a link had been made and probably the learner will continue to use 'do' in preference to 'commit' until the network has been so altered through repeated exposure, practice and use to reflect the preference for 'commit' over 'do'.

Evidence from second language data.
Very little work on connectionism has been done in second languages. Notable exceptions are Schmidt's review (1990); Broeder and Plunkett's study of developmental order for pronouns in L2s (1994); Blackwell and Broeder's work on frequency (1992) Gasser's work on word order, (1988, 1990) Shirai's work on L1 transfer (1992) and Marchman (1992) and Sokolik and Smith's work on critical periods (1992). This lack of work does not mean a lack of interest however and is understandable in tat the field is only 10 years old. Most of the research has been done in the first language and it has only been very recently that work has started on a second language. An extensive search found no specific studies of second language vocabulary acquisition from a connectionist perspective. This may be due to the very complex and multi-faceted nature of SLVA and the fact that researchers may be more interested in the bigger picture of SLA rather than SLVA in particular.

What can the model not demonstrate?
The working mind, intention and higher cognitive functions.
One thing missing in many discussions of connectionism is the conscious working mind - the things that we call a consciousness, memory, intention and so on. These are often referred to as higher cognitive functions. These quite obviously exist in some form or another as we can all say we have them. Purist connectionism views these as the by-products of the processing of information, whereas more traditional views of cognition (the current dominant experimental paradigm) see the mind as being somehow broken into parts. This modular view says we have different forms of memory and storage and that these can be tested in certain ways to find out how our lexicons work. It is clear that there are levels of human processing for which PDP models may not be an appropriate level of analysis, at least given the current generation of PDP models. If the higher cognitive functions exist in PDP terms, we would need to be able to explain why there are parts of a PDP system which are transparent and why other parts are not.

Universal application.
It is not generally accepted even by PDP researchers that a connectionist (=PDP) model can account for all areas of human cognition, although many try to resist external explanations. The challenge for these researchers then is to develop a system to 'account for the phenomena which are handled rather well by rules but also, without additional mechanisms, give an elegant account of other phenomena as well' (Betchel and Abrahamsen, 1991 p. 217).

Connectionist models are good at the lower-level of cognition such as content addressability, low level perception and spontaneous generalization. However there has been little success in discovering such examples at higher levels of cognition. It may be that we should not be trying to explain all things at all levels, but we could fall back on the idea of levels (to be outlined below).

Capturing syntactic structure.
Fodor and Pylyshyn (1988) argue that a connectionist system cannot capture the representation of syntax well. Their example says that a PDP system can connect Joan, loves and florist in 'Joan loves the florist' to give it meaning, but it cannot discriminate it from the relationship in semantic terms with 'The florist loves Joan'. A network could add the representation but it could not disambiguate the two sentences. Therefore, they say that a PDP is inadequate to the task of representing syntactic knowledge. This is despite Chomsky (1986) stating recently that the generatabiliity of syntax is no longer the goal of generative linguistics.


Developmental sequences
Fodor and Pylyshyn state that the model is not good at learning in developmental stages which a rule-based approach can capture. This ignores the fact that some aspects of language tend to be rule-governed and some aspects do not. All languages have exceptions such as go / went and suru (do) and kuru (come in Japanese. PDP systems do in fact go through seems developmental stages as do first language learners. This is not clear for second language learners however. In the first phase the systems tend catch the irregularities by rote, and the second phase concentrating on rule-governed regularities. In the final stage the model strikes a balance between the two poles of regularity and irregularity and even overgeneralizes at times as children would do (e.g. feets instead of feet).

Non-human behaviour
Due to the very nature of these systems not being transparent, they cannot be tested empirically at the micro level of cognition and we are left with computer simulations of learning. These computer models cannot sufficiently model human behaviour exactly and indeed sometimes generate very non-human responses. In addition the computer simulations take a long time to learn whereas humans can learn at one trial and new simulations need to be developed to account for these inadequacies.

Differences from symbolic processing.
It is important to distinguish connectionism from a symbolic account of learning and knowledge storage. In symbolic systems word knowledge is couched in terms of parts of speech such as nouns, verbs, or semantic groups such as 'words for travel' and so on each having a label for the kind of knowledge stored - a symbol for that knowledge. Typically, symbolic systems have rules by which this information can be processed and rules which state what is impossible in a language.

Symbolic systems are context insensitive in that they are distinct from their environment. Elman (1991, p. 221) says 'this insensitivity allows for the expression of generalizations which are fully regular at the highest level of representation (e.g. purely syntactic), but they require additional apparatus to account for regularities which reflect the interaction of meaning with form and which are more contextually defined. Connectionist models on the other hand begin the task at the other end of the continuum. They emphasize the importance of context in the interaction of form with meaning'. Symbolic systems, therefore are subject the fallacy that things can only be referred to in symbolic terms and therefore do not connect themselves to the real world. That is, an alien listening to us via radio signal might learn the sounds of the language but not the semantics unless they could observe a word's relationship with objects and the events to which it refers. A network system, by comparison, can deal with anomalies by adding further assumptions. See Johnson-Laird et. al. (1984) for a review in this area.

Symbolic systems such as the generative linguistic paradigm would account for linguistic knowledge in terms of nouns, subjects, objects and so on. These terms do not exist in purist (PDP) systems . PDP systems will accept that rules can be stored in a connectionist network, but they are not the foundation stone on which the network is made. This means that under a PDP paradigm, the symbolic system loses its causal role in cognition and is thus an unacceptable outcome to many linguists as a typical UG proponent would see these rules as essential to human linguistic processing. However, it may be that aspects of human performance which appear so regular as to be conveniently summarized by rules (like the rules of grammar in a language), may arise out of the general properties of parallel distribution which operate without any reference to such rules. Recent debates by Fodor and Pylyshyn (1988) and the Jacobs and Schumann (1992) v's Eubank and Gregg debate (1995) and a reviews by Bechtel and Abrahamsen (1991) and Morris (1989) underscore these differences. These debates take place on the basis of accepting one view means the other is unacceptable. Both sides tend to see things in extreme terms - a universal take-it-all-or-leave-it-all view (Pinker and Prince 1988). Neither side has produced evidence for this universality and clearly both have their limitations (see Cohen et al. (1993) for a review).

However, if one views the connectionist / symbolic argument in terms of an non-universal answer then the situation changes somewhat and one can see things in terms of complementary rather then confrontationary stance. Clearly much has been learned about the workings of memory in relation to vocabulary learning in a second language in cognitive terms (see Nation, 1990 for a review) but they offer little in the way of insights into the micro-view of cognition which connectionism seems to explain quite well. It seems therefore that the issue of whether the current symbolic paradigm or connectionism is the one and only explanation misses the point.

Summary.
Connectionist systems of vocabulary acquisition have many characteristics that are desirable in simulations of human cognition, for example graceful degradation automatic generalization and so on. Many of these are found in other models of cognition, but it is unusual to find so many in one model. These models show the learning process over time, this is important as most studies in SLVA have been cross sectional in nature.

There are parts of our cognitive apparatus which are open to inspection and are transparent in nature and empirically testable, such as memory span, lexical competence, attention and so on. There are other parts of our cognitive system which are not open to inspection, such as how we retrieve lexical information from our brain or how we process the auditory information and add it to our store. A connectionist account of lexical knowledge is good at describing the storehouse of vocabulary. That is, how the words are connected through their associations; how we may store and retrieve lexical knowledge; how lexical knowledge is schematic or associative, and how it can substitute for lack of knowledge, how we can guess the meaning of words and so on.

It seems that the connectionist architecture could operate at a lower 'impenetrable' level of cognitive activity whereby we are not able to access it by introspection, in a sense it is unavailable to us and the interconnections are made automatically without our intervention. The transparent part of our cognitive system may operate at a higher level and would include what we know about memory and so on. This would lead to a two level interdependent model of vocabulary acquisition. It would make sense to have a two level hybrid system because the symbolic machine operates according to its own autonomous set of principles. This view is the one currently coming into fashion (see Kempen, 1992; Marcus et al. 1992, 1993; and Pinker, 1991).

Part 2: An account of lexical storage.

In the model of lexical storage to be outline below, we can see a micro structure of vocabulary knowledge stored in the connectionist style network but linked to and controlled in a sense by the working memory and transparent cognitive systems. This working memory retrieves its information from the network storage area and uses that as a basis for making linguistic decisions. There is therefore a two way highway going between these two.

The principle components of the model.
The principle components of the model outlined below are a sensory register, working memory (after Baddeley, 1990) and a storage area called the network / store. One is always faced with problems when attempting to represent non 2 dimensional ideas diagramatically. However, for the convenience of the reader I have provided a very rough outline in diagram 2 below.



Diagram 2: A representation of the interface between input
data, working memory and memory networks.

In this model input, is received at the sensory level (the level at which information is registered on the retina or ear drums) the central executive will attend to some of this input and put it into working memory. The information considered most salient and unexpected is usually that which is attended to . This input then becomes conscious in that you are consciously (at varying levels) aware of it - the rest of the input is not attended to and is effectively discarded. This input is then compared with pre-existing lexical information in the network which, in keeping with schema theory, is in a constant state of expectation that the incoming data will match pre-existing vocabulary knowledge and thus confirm that knowledge and lead to comprehension.

As the reader can imagine, this is a simplistic account of the process. It will be expanded below. One can see from diagram 2 that the central executive functions as a mediator between the network and the input. It is primarily concerned with (among other things) what to attend to, what to ignore, how to put this into the network and so on. There is freedom of movement between the network and working memory and the higher cognitive functions allowing the flow of information back from the network into working memory for re-evaluation and reflection.


Features of Working Memory.
Working memory has three major components and is modeled on Baddeley (1990). The central executive regulates, monitors and coordinates the operation of the other components. The phonological loop, which is divided into the articulatory control system, which can work at the sub-vocal level, and a phonological store which holds speech based information. The final part is the visuo-spatial sketch pad which receives inputs either directly from visual perception or by retrieving information from long-term memory in the form of images. It is at this level that we are said to have intention and a 'consciousness'.

The central executive regulates what will be attended to (or not). Baddeley in recent work has said that the central executive closely resembles attentional control and thus possesses limited capacity and is of little use in the active processing of lexical information. It seems that attention demanding tasks such as lexical problem solving, reading, word learning, writing all utilize this central executive. In addition the central executive monitors the performance of sequences of actions to be performed in the right order. Baddeley sees the central executive as essentially similar to the Norman and Schallice (1980) model called the Supervisory Attentional System (SAS) which controls ongoing behaviour, maintains goals and resists distractions. The advantage of this system is that a working memory model treats the short-term storage of memory and general processing in a single framework.

Adjusting the Network (Learning).
When new lexical input is received, the input data are compared with pre-existing lexical knowledge to see if it matches. The input can be processed at the level of content (facts about the story being read - the characters and the plot for example) or at the linguistic level (the finding of new words, expressions, the making interconnections between previously unconnected words etc.). The central executive could also find that the network had previously tagged that item for further investigation leading to the potential for a change in, say, reading behaviour (for example stopping and re-reading in order to find out more about the tagged word).

The default setting at the time of input is that the understanding of this input will be easily predictable by the pre-existing lexical network and thus would require a minimal level of focal attentive processing. An example would be that if a learner was studying superlatives he would instantiate that network and expect to read (probably unconsciously) such words as the biggest, most, and so on. If this is indeed the input he receives, then the learner matches this input (the words he's reading) with the current network related to that word / phrase or type of text and finds that it fits the network as he had expected. If it does confirm the pre-existing knowledge by fitting in the network smoothly, then the meaning of the message may be retained (the content of the message may later be accessible for report, review, reflection and so on). The lexical network is thus being adjusted by the strengthening of present interconnections confirming already known information and the addition of other networks.

In some circumstances the input is perceived or noticed to be different from the previously understood or learnt information. In the above example, this might be noticing that he had believed the superlative of good is bestest , when in fact it is best. A gap between pre-existing knowledge and new input is noticed. At this point the learner can readjust the network to accommodate this new information. Sometimes this will be easy if the network is highly developed, but considerable adjustment (along with possible confusion and 'thinking') may be necessary if the information is vastly different from that stored. This adjustment to the network is called 'learning'.

It should be clear that learning takes many forms. Any adjustment to the network is a form of learning. For example learning a new collocation, the spelling of a word a different use, learning that the learner has incomplete information and so on. Examples of some adjustments to the network could include:-
€ The activation of a new interconnection between existing nodes would reflect the linking of previously unconnected information. Initially this new link would probably be weak - showing unsure information, but could be a strong link depending on the interconnection relationships and the depth of processing.
€ The activation of a completely new node for completely new information being added to the network.
€ Forming, checking, rejecting and reformulating lexical hypotheses.
€ Accounting for greater control, depth of analysis.

One of the functions of the central executive is to reconcile information between the lexical network and the input. The network contains many kinds of linguistic knowledge - from the L1, and from the L2 in terms or grammar, word associations, chunks, phonological and orthographic knowledge and so on. If new input does not match the present network, then some adjustment has to be made to account for the information. The central executive has an ability to look at the network and infer from linguistic or lexical patterns, (guessing from context is an example) and to generalize (and thus overgeneralize). To do this there has to be communication between the network and working memory.

Limitations of the model.
The central executive.
We can find out a lot about the phonological and visuo-spatial components but only very little about the executive itself as it is not transparent. That is, we know a lot about the multitude of tasks it deals with such as demands of lexical tasks, allocation of attention and so on, but very little about how this is achieved. Moreover, this variety of functions poses problems in terms of describing the precise function of the executive. It may be best to compartmentalize the central executive into several specific processing systems. We cannot of course do away with it as there needs to be some way for us to control the chaos that would result from several operating mechanisms all working independently.

In addition the central executive cannot easily distinguish between attentional processing, which demands attentional control, and automatic processing which does not. Furthermore, it is not clear how some tasks can be done without requiring any working memory capacity (such as breathing when speaking). It may e therefore that the working memory system may be more flexible than was first imagined.

The borderline between what is and what is not governed by working memory is not clear, nor indeed is the way that the two levels work together. It may be that there is a part of the central executive which acts as a huge switchboard regulating what happens where. At the moment this is supposition. This may not be a problem as there will come a point in our experiments where the workings of the mind become impenetrable and we are left with looking at phenomena not being able to look deeper.

Predictability.
One problem is the inability of current versions to predict what structures or aspect of word knowledge will be learnt next. Clearly this is important for SLVA and SLA in general. Again it might not be a theoretical problem not to be able to predict as we are not able to see the network. We may only need to look at the result of the network and how it acts to determine its function.

Problems with computer simulations.
Connectionist models are limited under present technology to investigation by computer simulation. Limits to what can investigated are put on the program itself. It is assumed that as these computer simulations get more sophisticated, so will the results of the PDP studies. There is a need to accommodate some of the criticisms of connectionism, such as the present inability to learn in one trial. this does not appear to be a problem with the PDP theory, but more with the implementation in software.

Connectionist simulations have problems in representing time relationships, which is critical in the language domain, although see Elman (1990) and Jordan (1986). Sampson (1987) suggests that one reason for this is that it is the connectionist model itself that extends through time, via the gradual setting of the network, whereas in a model based on rules one can think of the rules as applying instantaneously. Thus it is difficult to treat time as an input-output feature and to input data sequentially would cut across the very parallel processing nature of the models. Alternatively it may also be that as, by definition, connectionist computer simulations are devoid of a 'here-and-now' component and the context for it, then the concept of time is external to the micro-level of cognition, and can be left to the higher levels. Again this seems to be a problem with software rather than with theory.

Fluency and accuracy
There has been little discussion about fluent or skilled vocabulary use. The learning of skills is an important area in SLVA. It has been noted that connectionist systems excel at pattern recognition. It could be that skills are highly organized patterns of behaviour and thus can fit under this paradigm. The problem is how to represent fluency in the model. Is it just dependent on the strength of the interconnection or is it something more complex? There is some light however as mentioned in the section above about the developmental stages of learning in a connectionist network.

Forms of lexical knowledge.
The lexicon of a second language learner is very complex. There is great debate as to whether a learner has a single multi-store lexicon or indeed several lexicons for different languages. This system needs to explain why that might be and how the information is stored. Is it just simply a matter of adding a new network or again is it something more complex?

Newness.
Connectionism is a radically different view from the more traditional paradigms and thus as it is the 'new boy on the block' it is subject to all the stages of finding its way in a new world. It is also highly mathematical and can only be tested by computer simulation and thus bares little resemblance to what actually happens in a class. Finally, some argue that it is a return to behaviourism in that stimuli and responses affect the nature of the networks, but as it offers a much more comprehensive explanation that purely a behaviourist view, these arguments falter somewhat.

Innateness of the language faculty.
Some UG proponents would argue that a connectionist system needs to explain the logical problem of language acquisition and that there must be some innate quality to language acquisition. It would seems at first glance that the two positions are opposed. The symbolic camp needing innate rules and the connectionist camp not needing them. There is no reason to assume that we could not be born with a prewired lexical / syntactic and general cognitive network all set up for language acquisition with all the 'parameters' being set at an early age. If this is so it would strengthen both sides' arguments rather than weaken one against the other.

Summary.
The model presented here can account for numerous aspects of SLVA. However there are still may unexplained areas specifically at the explicitness of the network and its relationship with the higher level cognitive functions. It does hold some promise although will need considerable revision preferably as a result of extensive research into the connectionist nature of second language vocabulary acquisition.

Footnotes.

References.

Baddeley, A. Human Memory: Theory and Practice. Boston: Allyn & Bacon. 1990.
Bechtel, W. and A. Abrahamsen. 1991. Connectionism and the mind: An introduction to parallel processing in networks. Blackwell, Oxford.
Blackwell, A. and P. Broeder. 1992. Interference and facilitation SLA: A connectionist perspective. Paper presented to Seminar on Parallel Distributed Processing and Natural Language Processing, San Diego. UCSD.)
Brewer, W. F. and J. C. Treyens. 1981. Role of schemata in memory for places. Cognitive Psychology. 13: 207-30.
Broeder, P. and K. Plunkett. 1994. Connectionism and Second Language Acquisition. In: N. Ellis (Ed.) Implicit and explicit Learning of Languages. Academic Press, London.
Chomsky, N. 1986. Reflections on Language. New York, Pantheon.
Cohen, G., G. Kiss and M. LeVoi. 1993. Memory. Current issues. London. The Open University.
Crick, F. and C. Asanuma. 1986. Certain aspects of the anatomy and physiology of the cerebral cortex. In J. L. McClelland et al.
Elman, 1991. Incremental learning or the importance of starting small. La Jolla: University of California, San Diego, Centre for Research in Language (Technical Report 9101).
Eubank, L and K. Gregg. 1995. Et in Amygdala ego. UG, (S)LA and Neurobiology. Studies in Second Language Acquisition. 17: 35-57.
Fodor, J. A. and Z. W. Pylyshyn. 1988. Connectionism and cognitive architecture: a critical analysis. Cognition. 28: 401-12.
Gasser, M. 1988. A connectionist model of sentence generation in a first and second language. Unpublished doctoral dissertation, University of California, Los Angeles.
Gasser, M. 1990. Connectionist models. Studies In Second Language Acquisition. 12: 179 - 99.
Haberlandt, K. 1994. Cognitive Psychology Needham Heights, Mass : Allyn & Bacon.
Jacobs, B. and J. H. Schumann. 1992. Language acquisition and the neurosciences: Towards a more integrative perspective. Applied Linguistics. 13: 282-301.
James, W. 1890. The Principles of Psychology. New York: Holt.
Jordan, M. 1986. Serial order: A Parallel Distributed Processing Approach. La Jolla: University of California, San Diego, Institute for Cognitive Science. (Report 8604).
Johnson-Laird, P. N., D. J. Herrmann, and R. Chaffin. 1984. Only connections: A critique of semantic networks. Psychological Bulletin. 96: 292-315.
Kempen, G. 1992. Second language acquisition as a hybrid learning process. In F. Engel, Bouwhuis, D. Bösser, T. and d'Ydewalle, S. (Eds.) Cognitive modelling and Interactive environments in Language learning. (pp., 139-44). Berlin Springer.
Klein, W. 1986. Second Language Acquisition. Cambridge: Cambridge University Press.
Marchman, V. 1992. Language learning in children and neural networks: Plasticity, capacity and the critical period. San Diego, CA: Center for Research in Language (UCSD) (Technical report 9201).
Marcus, G. Brinkmann, U., Clahsen, H. Wiese, R., Woest, R. and Pinker, S. 1993. German Inflection: The exception that proves the rule. Cambridge, Mass.: MIT occasional paper 47.
Marcus, G., Pinker, S. Ullman, M., Hollander, M., Rosen, T and Xu, F. 1992. Overregularization in language acquisition. Monographs of the Society for Research in Child Development. 57 : 4 serial 228.
McClelland, J. L. 1981. Retrieving general and specific knowledge of specifics. Proceedings of the Third Annual Conference of the Cognitive Science society. 170-2.
McClelland, J. L. and D. Rumelhart and the PDP research group. 1986. Parallel Distributed Processing: Explorations in the micro - structure of cognition. Vols. I and II. Cambridge, MA: MIT Press.
McClelland, J., D. Rumelhart & G. Hinton. 1986. The appeal of parallel distributed processing. In McClelland, J. L. et al. (Eds.)
Morris, R. (Ed.) 1989. Parallel distributed processing. Oxford: Oxford University Press.
Nation, I. S. P. 1990. Teaching and Learning Vocabulary. New York. Heinle Heinle.
Ney, J. and B. Pearson. 1990. Connectionism as a model of language learning. The Modern Language Journal. 74: 474-82.
Norman, D. A. and T. Schallice. 1980. Attention to Action: Willed and automatic control of behaviour. University of California, San Diego, CHIP Report 99.
Pinker, S. 1991. Rules of Language. Science 253: 530-35.
Pinker, S. and A. Prince. 1988. On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition. 28: 73-193.
Rumelhart, D. E and A. Ortony. 1977. The representation of knowledge in memory. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey and R. D. Luce (Eds.). Steven's handbook of experimental psychology: Learning and cognition. Pp. 99-135. Hillsdale, NJ: Erlbaum.
Sampson, G. 1987. Review of Rumelhart, D., McClelland, J. L. and the PDP research group, Vols. I and II. Parallel Distributed Processing: Explorations in the micro - structure of cognition. Cambridge, MA: MIT Press. 1986. Language 63: 871 - 86.
Schmidt, R. W. 1988. The potential of parallel distributed processing for second language acquisition theory and research. University of Hawaii working papers in ESL, 7 (1); 55 - 66.
Shirai, Y. 1992. Conditions on transfer: a connectionist approach. Issues in Applied Linguistics. 3: 91-120.
Smolensky, P. 1986. Neural and conceptual interpretation of PDP models. In J. L. McClelland et al.
Sokolik, M. and M. Smith. 1992. Assignment of gender to French nouns in primary and secondary language: A connectionist model. Second Language Research. 8: 39-58.
Thorndike, E. 1898. Animal intelligence. New York, Macmillan.


Contact Info:
Rob Waring
Notre Dame Seishin University, 2-16-9 Ifuku-cho, Okayama, Japan 700
Tel 086 252 1155 Fax 255 7663 Home 086 223 0341
Email:Rob Waring


Return to Main menu of papers