This paper will focus on recent findings from research into general cognitive theory, experimental psychology, connectionism and Second Language Acquisition (SLA) research and learning theory in general. I shall confine myself to adult second language acquisition and research for my position, however references to first language acquisition will be made from time to time.
At the outset, it will be necessary to refresh ourselves on the relative
positions taken in SLA regarding innatist generative positions and
cognitive positions. From there, we will see that the UG position does
not adequately explain the interface between the input and the
triggering of the language learning system and neither does a purist
cognitive perspective. This leaves us with an interface gap between
linguistic and cognitive perspectives which needs to be filled in terms
of the linguistic input being received, the processing of it and its
storage in memory. I propose that there are cognitive factors at work
at this crucial juncture between intake of data and the processing of
input that leads to storage and retrieval. I shall try to demonstrate my
position by looking at the role of memory, the role of attention, in SLA,
as well as conditions for fluency and control, depth of processing and
other related factors.
Despite the claims on both sides, there is no clear direct evidence to support either the cognitive or UG based views of SLA (there are claims and counter-claims on both sides for indirect, circumstantial and inferential evidence) and as both sides cannot accept the other's point of view, we can take it that there is no one explanation of SLA as yet . I shall hazard a prediction. At present, current accounts of language and language acquisition will not remain unchanged, but will move towards a compromise, with revision on all sides to take into account findings in all disciplines. Theoretical progress depends on a catholicity of perspectives. This does not mean that one cannot take a view that reflects a certain leaning toward one side, which has been done here, but it is the relative nature of this leaning that is at issue and is the concern of this paper.
1.1 Cognitive and the Innatist Perspectives.
1.1.1 Innatism in SLA.
In spite of the nearly 40 years of dominance of the generativist / nativist paradigm in language acquisition, there are still two major problems generativists have not accounted for nor resolved in second language acquisition . The first is the relationship between the underlying competence and the observable performance. The second is the nature of the interaction between the input data through which language acquisition occurs and the proposed innate knowledge of language (Sampson, 1987). This is not to say that there is no evidence for an innatist position, but that the current evidence in SLA is sketchy. One of the arguments against cognition and for the generativist position has been that there have been no empirical models of language learning which could be tested in at least a rigorous manner as the generative paradigm hypotheses can . This has lead some researchers to claim that in the absence of an alternative empirical model, the nativists 'won by default' (Sampson, 1987:878).
A central issue in the debate about language acquisition concerns the innateness of language. There are, as one would expect, a whole range of views of innatism all with different emphases . The rationalist innateness hypothesis (for child first languages) states that "not only is the human species genetically 'pre-wired' to acquire language, but the kind of language is also determined. The principles that determine the class of human languages that can be acquired unconsciously, without instruction, in the early years of life has been referred to as Universal Grammar (UG). This Universal Grammar underlies the specific grammars of all languages" (Fromkin and Rodman, 1993: 412). While this may be true for first language learners this is not necessarily true for second language learners. There is considerable debate regarding the availability of UG in the second language. It is this researcher's belief that UG is unavailable to the second language (L2) learner, or if it is, it does not play a significant role in the learning of a second language, whereas a cognitive role is more of a factor in adult second language learning.
1.1.2 Cognitive perspectives in SLA.
Cognitive psychology tends to be concerned with conscious rather than unconscious processes (though there is some overlap), and with voluntary rather than involuntary responses. Generally, cognitive theories see linguistic knowledge as no different from other types of knowledge and view strategies responsible for knowledge development as general in nature, related to and involved in other types of learning. This is often contrasted with a rationalistic theory of L2 acquisition which "treats linguistic knowledge as unique and separate from other knowledge systems, and acquisition as guided by mechanisms that are (in part at least) specifically linguistic in nature" (Ellis, 1993: p. 347).
Contemporary cognitive science emphasizes knowing rather than responding. Cognitive scientists are concerned with finding scientific means for studying the mental processes involved in the acquisition and application of knowledge. Secondly, the cognitive sciences emphasize mental structure or organization arguing that humans organize their knowledge and that new input is interpreted in this light. Piaget suggests that "all living creatures are born with an invariant tendency to organize their experience and that this tendency provides the impetus for cognitive development" (McLaughlin 1990:114). Thirdly, cognitivists are also concerned with the notion that the learner at his own volition is active, constructive and planful rather than a passive recipient of environmental stimulation . This of course does not mean that the learner adopts all these strategies all the time, nor can she necessarily report the use of them, nor have control of them at all times.
The cognitive paradigm is also beset with problems. As a relative newcomer to the field of SLA research, it is often not taken seriously. A central issue is that researchers are dealing with unobservable data from an empiricist perspective (although see above). That is, one cannot look inside the brain and see what is happening or find a particular memory or a piece of linguistic knowledge. Cognitive theory cannot easily predict which structures will be learned, or automatized through practice; what will be restructured; which L1 (first language) structures will and will not be transferred and in light of this, direct classroom activities based on this paradigm are premature. Lightbown and Spada (1993:26) suggest that the cognitive paradigm is "incomplete without a linguistic framework". Other problems (not just confined to cognitive psychology) include behaviourist tendencies, ecological problems, fragmentation and artificiality (Haberlandt, 1994:28).
An emerging paradigm for second language learning is the cognitive notion of observe, hypothesize and experiment (Lewis 1993) . Essentially, this view sees the learner as an active participant in the learning process, working with linguistic information constructing her own interlanguage in a non-linear fashion (see later). In very crude and over-simplistic terms (to be expanded and clarified later), when the learner is introduced to input (visual or aural) she 'selects' some of the input to observe , attends to it and then either confirms it against pre existing knowledge, or if the new information does not fit easily into the knowledge system, she can hypothesize about what it means and attempt to fit it into the pre-existing knowledge system. When the learner hears the same item again, or wishes to produce the form or word, she experiments with it - testing to see if the newly acquired knowledge correctly fitted into her system. If all is well, she can confirm her hypothesis and the resulting feedback provides valuable confirming input. If all is not well, she modifies her knowledge system in some way (such as forming a new hypothesis) to account for the new information. She tries again next time after forming a new hypothesis creating an ever recurring process of gradual development until her language goals are met.
2.0 Some Cognitive Factors Involved in Learning.
It will be necessary to provide the reader with an account of important
factors involved in cognitive learning processes, specifically:-
a comparison of symbolic and connectionist models.
facets of memory - a connectionist account of how information might be stored in, and retrieved from, memory by looking at ACT theory and schema theory.
the role of practice.
U-shaped development in SLA.
factors involved in cognitive processing, such as depth of processing, serial v's parallel processing; analysis, control and automaticity and interlanguage theory.
There are countless factors involved in the learning of a language. Affective factors involved in language learning (motivation, need, attitude etc.), although extremely important will not be discussed here (for reasons of space) and neither will pragmatic and societal factors.
2.1 Symbolic V's Connectionist Models.
At the outset, the reader should be aware of the symbolic nature of linguistic theories and the non-symbolism of purist connectionism. Symbolic processing involves the manipulations of representations of data, often referred to by labels, such as 'verb', 'noun', 'adjective' and so on. Given the very nature and appeal of "symbols, rules and logic, the traditional view suffers from an unhuman-like brittleness. Linguistic and conceptual entities are assigned in all-or-none fashion to categories, rules typically apply in a fixed sequence, and deviations are not handled well. Connectionist models however, attempt to deal with such brittleness by avoiding it as there are no concrete symbols and rules as such; the entities that a connectionist system uses to characterize the world are fluid patterns of activation across portions of a network" (Gasser, 1990:179-80). Symbolic models often feel it is enough to describe a point in this process, whereas connectionist models try to see process from one state to another.
Connectionist models try to explain cognitive processing in computational terms (that is, in terms of data structures and the processes that operate on them, yielding input and output units). In general they do not make a fundamental distinction between competence and performance. There are several features which characterize connectionist models. Firstly, the memory system consists of a network of processing units joined by weighted connections (see later). Secondly, the behaviour of the system is loosely based on the neurophysiological structure of the brain - neurons and so on. Thirdly, longer term memories are held in the weights of connections between processing units. Fourthly, processing is parallel (see later). Lastly, control is distributed . This last item suggests that there is no central executive or decision maker to perform these operations in a connectionist account of learning (altering weights, making connections, retrieving and interacting with pre-existing knowledge, analyzing input etc.). However, the model outlined below proposes that there is a central executive that functions as a 'conscious' mind (in a very general sense) which processes information between the input and the distributed network.
There are several types of connectionism. Some are localist approaches which see particular units representing particular concepts using symbols to represent knowledge such as Anderson's ACT model (1983). Other types are the distributed models and are non-symbolic such as the PDP type of model (Rumelhart and McClelland and the Parallel Distributed Processing group, 1986) in which complex processes are distributed over many units each unit playing a part in the representation of a concept. Therefore, in a distributed connectionist model, a concept is not in one place as it were (as in a locative approach), but distributed among the connections.
It is no small exaggeration to state that connectionist theory has
revolutionized cognitive psychology (and other disciplines) and is the
most important theoretical concept of cognition in the last three
decades. Other connectionist models include MacWhinney, Bates and
Kligell's Competition Model (1984) and the Network model of Collins
and Loftus (1975). These models will not be investigated here. The
model proposed below takes a middle road - a symbolic / localist
connectionist model for memory storage, but has a Working Memory
central executive for the processing of input and the altering of
2.2 Facets of Memory.
2.2.1 Short-term, Long-term And Working Memory.
There are many models of memory in the cognitive sciences, as one would expect. In these models it is common to distinguish between short-term memory (STM), long-term memory (LTM) and working memory (WM). Early models of memory suggested a two-store model which include STM and LTM. These have been seen as useful 'laymen' concepts of memory, however research has shown that there is no such clear distinction between STM and LTM positions in experimental psychology.
In memory models that include a distinction between STM and LTM (Atkinson & Shiffrin (1968) for an example), the input is typically processed in this way. The senses register input (on the retina for example), this information is transferred to short term memory, some items are discarded, the rest is processed in some way and some goes to LTM and are stored. There is also provision for items to return to working memory from long-term memory for re-evaluation and reflection. As with many models, distinctions often seem crisp at the time they are first proposed, but later doubts creep in. One of the problems with this account is the evidence that short term memory is not a single store with limited capacity, but is probably a collection of temporary storehouses (Squire, 1987). Matlin (1993) after reviewing the Kintsch and Buschke study of synonyms (1969), proposed that there is some uncertainty about what is stored in short and long term memory. STM seems to be primarily phonological and LTM seems to be primarily semantic, although the distinction is fuzzy.
Therefore in light of this, Baddeley (1990) developed the most comprehensive description of short term memory to date and labeled it working memory (WM). WM is seen as the place where conscious activity is done. An example seeks to illustrate this phenomenon (Carpenter and Just, 1989).
A familiar example used to illustrate the function of working memory is the storage of a telephone number between the time when it is looked up in a phone directory and the time when it is dialed.
We see that the antecedent telephone number needs to be remembered in order to process it (17 words later) in the final clause, and this information is temporarily stored in working memory.
Baddeley suggests that there are 3 parts to working memory - phonological loops to store the phonological information; a visuospatial sketch pad to store visual and spatial information and a central executive which plays a major role in planning and controlling behaviour working like a scheduler or supervisor. The executive decides which issues deserve attention and which should be ignored (discarded). The executive also selects strategies, figuring out how to tackle problems, and retrieves pre-existing information from the memory store. There is a central problem with this (as Baddeley admits) which is that it is hard to confirm or deny this using contemporary research techniques.
2.2.2 Adaptive Control of Thought Theory.
One of the leading symbolic processing accounts of fluency and automaticity in cognitive psychology is that of Anderson's Adaptive Control of Thought (ACT) theory (1983). Anderson suggests five connections. 1) chunking; 2) procedural and declarative knowledge ; 3) strengthening (the more you repeat an action or word, the stronger the connections - in a sense learning by practice or rehearsal); 4) generalizing and 5) discrimination between two or more items. In Anderson's model, input is perceived in terms of propositions (the smallest unit of knowledge that can be true or false). As an example, a sentence such as "Susan gave a white cat to Maria who is the president of the club" can be shown in diagram 1. The circles represent a single locative node in working memory.
Diagram 1: A Propositional Network Representing the Sentence "Susan gave a white cat to Maria who is the president of the club".
Note that the arrows denote the links between propositions, but the exact words are not represented. This is because the propositions are abstract (after all we rarely remember exactly what some one told us at breakfast, however we may remember the message content). He suggests that each of these propositions can be represented in its own network in memory. One could imagine a network for cat for example.
In reference to second language learning, Anderson says we have two types of knowledge, declarative and procedural. Procedural knowledge is the knowledge of how to do an action, for example, to know how to use a word or pronounce it and is something that is learned by accretion over time, by exposure and by performing the skill. Declarative knowledge is knowledge of facts and things (for example, "add '-ed' to walk in the past tense") and can be learnt in an all-or-none manner and either slowly over time or suddenly at one time. Generally declarative knowledge can be articulated and procedural knowledge cannot. There have been some problems with this account as there is no clear dividing line between these two types of knowledge. However, these could be seen on two continua from low to high levels of declarative knowledge on one continuum, and low to high levels of procedural knowledge on the other.
Anderson suggests that the transfer from declarative to procedural knowledge is in three broad stages. In the first stage, we store facts such as the past tense of buy is bought, but this does not necessarily mean we can use it in conversation. In the second, the associative, stage the learner tries to sort out the information to make it more efficient. A typical way to do this is by 'composition' or combining several discrete items into one and 'proceduralization' (applying a general rule to a particular situation). For example if the learner knows (declaratively) that 's' must be added to third person present tenses, then the learner will formulate a rule such as, "if I want to use the third person in the present tense I have to add 's'". And for past tenses a rule might be 'add '-ed' to past tenses' thus this could account for overgeneralized forms that second language learners typically make such as 'goed' and 'ated'. In the final 'autonomous' stage the procedures become increasingly automatized where the mind still generalizes and discriminates more narrowly such as not applying '-ed' to irregular verbs. We also learned that we move from declarative knowledge through a series of stages which proceduralize this knowledge by way of practice and restructuring. It should be pointed out that practice is not the only component and it is the quality of this processing, not necessarily the quantity which is also a factor (see depth of processing and the role of practice below).
So how does this affect SLA? Anderson sums up the situation by saying, "we speak a second language by using general rule-following procedures applied to the rule we have learned, rather than speaking directly, as we do in our native language. Not surprisingly, applying this knowledge is a much slower and more painful process than applying the procedurally encoded knowledge of our own language". (Anderson, 1983: 224, cited in Ellis 1993: 389).
We can see therefore, that there is a great deal we can take from Anderson's model. The way information is stored in networks allows us to see storage and retrieval in schematic terms. It can account for the slow and painful 'bottom - up processing and the superior 'top-down' and 'schematic' processing. This will be tackled in the next section.
2.2.3 Schema Theory and Instantiation.
Schema theory has been one of the most dominant forces in cognitive disciplines in recent years. A demonstration of schemata at work (or not at work) is given by Bransford and Johnston (1972). The reader is invited to read the following passage.
The procedure is actually quite simple. First, you arrange things into different groups depending on their makeup. Of course, one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities, that is the next step; otherwise you are pretty well set. It is important not to overdo the endeavor. That is, it is better to do a few things at once than too many. In the short run this may not seem important, but complications from doing too many can easily arise. A mistake can be expensive as well. The manipulation of the appropriate mechanisms should be self explanatory, and we need not dwell on it here. At first the whole procedure will seem complicated. Soon, however, it will become just another facet of life. It is difficult to foresee an end to the necessity for this task in the immediate future, but then one can never tell.
When a learner reads a passage such as this , she searches her background knowledge, both linguistic and semantic to search for comprehension of the passage as a whole. Individual sentences are not a problem, and indeed neither is the vocabulary. Sometimes she may try to read things into the text that are not there or make deductions that were not possible from the text. While reading, the learner is trying to fit together pieces of information, she may think it is about putting something together, but with pieces of the text that don't fit with that hypothesis, she is lost. Obviously, the passage is about one central thing, but the learner can't find it. If she is later asked to recall information from the text, she will not be able to recall much as there is no central focus for comprehension. However, if the learner knew beforehand it was about doing the laundry, recall will be much better. Once the learner reads the passage again, all the background knowledge about the washing process helps learners fit the linguistic puzzle together and comprehension is much easier. Bransford and Johnston found that if people know the topic before hand then the learner will recall 73% more of it than if they had not known. This shows the importance of background knowledge, or schema, in understanding texts either written or aural.
Schema theory holds that we store knowledge in a network of associations, in a similar way to that of ACT theory, for example we all have a schema for 'offices' that might include items such as table, desk, lamp, window, papers, books, people working etc. . If we read a description of someone's office and immediately after reading we are given a surprise test and asked to list everything we read that was in the office in as much detail as possible, we will list many correct items. Interestingly, we more often than not list items that were not in the office at all, but are typical examples of things we usually find in offices. In addition, we tend not to list items which we would not usually find in an office even if they were in fact there (Brewer and Treyens, 1981). This demonstrates two things. The first is that knowledge is stored in semantically linked networks. For example, we have baseball schema, a disco schema, a studying English schema, an English verbs schema, a schema for Shakespeare and so on. Each one of these can be highly developed in some people and less developed in others, depending on our own view of the world and experience. For example, a beginner learner will probably have a less developed schema network for 'asking permission in English' than an advanced learner. These networks are primarily semantic rather than linguistic or syntactical ones. When trying to comprehend the meaning of a text, we do not pay attention to grammar and syntax, rather we focus on meaning (Sachs, 1967). The second, is that given incomplete information we can use our default schema to cover for lack of knowledge (see below).
So how are schemas activated and matched with pre-existing knowledge? If we read a sentence such as "John went into the restaurant", Shink and Abelson (1977) believe this is the time the 'restaurant schema' is invoked. When we are called upon to retrieve a part of this network, we instantiate (or activate) the relevant schema. For example, if we are reading a story about a wedding, our schema will have a network activated that includes bride, groom, white dress, reception, and wedding schema scripts (e.g. 'Do you take this woman .....') - all of which are connected in the network of associations to represent a schema. This does not mean that the schema is fixed, we would not have a perfect, inflexible schema for 'wedding', but a vague one - a 'more or less wedding schema' which allows for flexibility. This network of connections in a schema has various slots. If we read the word wedding, our 'wedding schema' will be activated. At this juncture, we have a network of connections, but all the slots are empty and waiting to be filled with relevant information from the actual story we are reading. So when we read that the bride's name is Mary, we fill the empty slot in our network with her name. When we find that the wedding will be in a church rather than a registry office, we instantiate that part, and so on for all the other details. If some of the slots don't get filled, this doesn't matter, as we saw above, we would rely on our default typical schema to fill it for us by inference or deduction. Any slot that is instantiated with plausible information, seems to confirm that part of the schema as being reliable (or strengthen the pre-existing connections). If, however, we read that the bride's family paid a huge amount of money to the groom's family we have come across unexpected information - that which is not typical of our pre-existing schema. If we wish to comprehend this unexpected information, it leaves us with two choices. One is to deduce that the wedding is not taking place in an English speaking country because the bride's family don't pay dowries (maybe it is in South Asia). The other would require us to adjust our schema network to accommodate the new information, by creating new links and connections, so that the next time we read about a wedding, we have a ready-build slot connected to the wedding schema network that now includes the expectation of finding payment of a dowry somewhere in the story, waiting for instantiation. It should be noted that the above example illustrated the comprehension and storage of content (facts of a story) rather then linguistic information. It would be possible to conceive of a linguistic schema at work for example, when studying a grammar rule one might have a 'present perfect tense' schema.
Schema are vital to semantic networks because they seem to account for how we store information and retrieve it, how we deal with unexpected information; how individuals are different in their knowledge stores; how we can store incorrect information; how we can mis-remember ; how we can overgeneralize and many other features of memory. However, several things should be noted. Firstly, it seems that schema do not influence recall if that recall is tested immediately after the material is learned. It seems that a longer rather than shorter delay can facilitate integration with existing material (Harris, et al, 1989). Secondly, we do often select material that is inconsistent with our schema. Thirdly, we often can recall exact words and text. Fourthly, we often fail to apply background knowledge we need to interpret new material, and lastly, we may keep items isolated in memory rather than integrate them.
2.2.4 Summary of Facets of Memory.
In this section we have seen that memory networks are key factors in cognition. They play a central role in storing information ready for retrieval. There is a central executive which controls and directs the input into intake and into the network of stored knowledge. This network is made up of habits, knowledge of facts, knowledge of how to do things, knowledge of how we performed an act previously and so on. Some of this knowledge will be automatic, some controlled, some of it neither.
2.3 The Role of Practice.
'Practice makes perfect' is an oft-heard maxim, if not an article of faith, in many language classrooms. The position of cognitive scientists and cognitive psychologists working in SLA can be understood in terms of the role of practice and the way a learner develops her knowledge systems. The belief is that sub-skills become automated through practice and attention, allowing the allocation of resources to higher processing levels. In a sense, controlled processing lays stepping stones for the learner to move to more difficult and complex levels. (See Shiffrin and Schneider (1977) for a review). Thus, complex tasks are seen in a hierarchy consisting of sub-skills and sub-tasks with the execution of one task requiring the execution of another. Levelt (1977) gives us an example of the process of carrying out a conversation. The first goal is to state intention by deciding on a topic and selecting a certain syntactic schema. This schema activates sub-activities such as formulating a series of phrases to express the intention. To activate these phrases, the learner needs to retrieve lexis, activate phonological patterns, utilize appropriate syntactic forms, meet pragmatic conventions and so on. Each component needs to be activated before the higher order goal can be realized. Here one can see practice is important, but it is crucial to understand the role of the speaker is active (but not necessarily conscious) which is different from the behaviourist way of practice .
2.4 'U' Shaped Development in SLA and Restructuring.
It would be a mistake to equate cognitive accounts of SLA with the notion of "practice makes perfect" in terms of habit formation in a behaviourist sense. Behaviourism and its sister Structuralism have been long discarded as viable sufficient conditions for language learning especially because they cannot explain the U-shaped development of language learning . In U-shaped language development, learners do not learn things in sequence even if sequences exist. This means that the learner may not necessarily have formed a linguistic pattern nor have firmly established that pattern. Learners appear to forget forms and structures which they had seemed to have previously mastered and had extensively practised.
McLaughlin (1990a) attempts to account for this U-shaped phenomenon
by putting forward the notion of restructuring . This theory suggests
that practice leads to an increase in sub-skill development, but can
lead to a decrease in performance due to a restructuring of the
Diagram 2: A U-Shaped curve of development (after McLaughlin)
system. This is the left side of the 'U' curve. The right side of the 'U' reflects a more complex internal representation of knowledge which increases to the level where a skill or knowledge become expertise. McLaughlin supports his view by going on to say that Clarke (1987) had found that even advanced learners who were still at the level of decoding (bottom-up processing) rather than using top down approaches to reading had not restructured their knowledge and moved to a different processing level. This explained the very similar test results between an advanced group and a beginner group of learners on a cloze test. The advanced group were restructuring their network bottom-up, whereas the beginner group were working top-down. McLaughlin summarizes by saying that "acquiring a second language involves a process whereby controlled attention-demanding operations become automatic by practice" (1990a:125). This notion is closely related to that of chunking (see later). As learning is by accretion, an increasing number of chunks are complied into an automatic procedure (proceduralization). This can lead to two effects, a decrease in performance due to restructuring of the knowledge system or an improvement in performance as sub-skills become automatized.
2.5 Cognitive Processing.
2.5.1 Serial V's Parallel Processing and Chunking.
Cognitive science shows that we can recognize images extremely quickly, indeed science abounds with examples of fast recognition (a necessary survival skill when being chased by wild animals). Given this, why is it that we are slower than computers at logical deduction, number crunching and so on, but we can process (and retain) extremely complex visual images that require massive amounts of processing in a tenth of a second (Hilderbrand, 1994 p. 167)?
The input from listening is both serial (sometimes known as sequential), where one sound is believed to precede another, and parallel (sounds uttered or received at the same time). Jusczky (1986) and Luce and Pisoni (1987) found that the sound of many phonemes do not neatly follow one after another and that some of the sounds of spoken English are transmitted in parallel. This means that as sounds flow together and are received together, they are probably processed together. This must be contrasted with reading where the characters are presented serially, but this does not mean that they are processed serially. For example, if you read
you could read the letters serially (one by one), but the processing of the meaning of it require much more than serial processing (Jusczky, 1986). You have to read it several times and then break it up into manageable chunks before making sense of it, some parts at the end may be 'understood' before earlier occurring ones. You might read " There, Don ate a kettle of ten chips", or "There, donate a kettle of ten chips", or "The red on a tea kettle often chips" . We have seen therefore, that the input is serial or parallel and that the processing of it, in terms of making sense of it, requires more that serial processing. One has to work with several things in working memory at the same time.
In Anderson's terms, the processing of the above sentence would
require the creation of a node in working memory for each part of the
utterance (say, 'there' ' ten', 'ate', ' red' and so on). The central
executive processes this information by interfacing these nodes with
pre-existing linguistic knowledge in the memory networks and tries to
make sense of it, by pattern matching, lexical retrieval, and the
retrieval of grammatical, syntactic and orthographic pre-existing
knowledge, with the goal of fitting it smoothly into the network. As
such huge amounts of information from various domains (linguistic,
semantic etc.) are retrieved at once the processing is therefore in
parallel. From the above we can see therefore that any model of input
processing that does not allow for parallel processing is severely
2.5.2 Analysis, Control and Automaticity.
A theory of L2 learning proposed by Bialystok (1979; 1981; 1988) explicitly states that the process of learning of a language is no different from other kinds of information. There are two dimensions to language proficiency - an analyzed factor and an automatic factor. The analyzed factor deals with the extent to which a learner has control over her linguistic knowledge. Unanalyzed knowledge therefore is not usually available for report and she states that this unavailability for report is not linked with consciousness, but analyzed knowledge does make articulatable knowledge possible and is therefore available in a formal (for example, classroom) setting. Automatic factors are concerned with the relative access which the learner has to knowledge - the more knowledge - the more automaticity. One of the problems with this model is that in its original form it didn't explain the well attested phenomenon of developmental sequences, by specifying which non-automatic and non-analyzed items of knowledge are developed first. Therefore, as there was a need to explain how learners develop their ability to use their knowledge to perform tasks (performance) and explain how learners acquire knowledge of L2 rules (competence), Bialystok and Sharwood-Smith (1985) proposed an amendment to the theory by distinguishing between knowledge and control.
Knowledge is said to have both a qualitative and quantitative dimension. This now leaves us in the position that a cognitive account of L2 learning can distinguish between control procedures and their application as they are seen separately (Ellis 1991;183). Control can apply to both explicit and implicit knowledge (see later) which allows the learner to access both unconscious and conscious knowledge about the rules of the language without affecting fluency. Newly acquired knowledge will be accessed through controlled processing and can be extended rapidly if free practice is given. The increases in control will be reflected in the accuracy the learner produces. However, it is still not clear what aspects of the qualitative dimension are governed by linguistic or cognitive factors. Logan (1989) suggests that part of the retrieval system we use and the speed at which it is done can be accounted for under his instance theory .
2.5.3 Deep Processing.
One of the most influential papers on memory and information processing has been the approach involving deep processing (Craik and Lockhart, 1972: Craik 1979). By 1980, in just eight years, Roediger (1980) had found this article had been cited over 700 times. The belief is that the more deeply you process information, the better it is retained . Craik and Lockhart theorize that people can analyze stimuli at different levels. They proposed that the shallow levels involve analysis in terms of physical or sensory characteristics. The deeper levels involve meaning. Shallow processing is said to lead to short term retention and deeper processing to longer retention . When a learner analyzes through meaning, she may instantiate other related associations, images and past experiences related to the stimulus. The by-product of this is a memory trace - a connection. If the processing was shallow the strength of the trace is weak, and conversely stronger for the deeper processing. The explanation for better retention for deeper processing come from three sources. The first is the distinctiveness of the input from the other items, the second is the kind of rehearsal and thirdly the self-reference effect.
Distinctiveness of the input.
Research has shown that the more distinctive the stimulus is from the other memory traces, the better the retention (Craik, 1979). For example, Hunt and Elliot (1980) found that we remember words that are distinctive in sequence of short and tall letters, such as lymph, khaki and afghan, better than words with a more common orthographic appearances, such as leaky, kennel, and airway.
Maintenance and Elaborative Rehearsal.
The effects of rehearsal, the process of recycling information, depend on the type of rehearsing being done, either maintenance rehearsal or elaborative rehearsal (Atkinson and Shiffrin, 1968). Maintenance rehearsal involves just repeating the information shallowly, whereas elaborative rehearsal involves a more meaningful analysis of the stimuli. Therefore if the learner used elaborative rehearsal, then the rehearsal will be helpful. Reviews of dozens of studies conclude that deep processing produces higher recall than shallow processing (see Baddeley, 1990; Horton and Mills, 1984; Koriat and Melkman, 1987 for examples).
Relating information to ourselves leads to better recall (Rogers, Kuiper, Kirker, 1977). If for example, when learning the word generous, if you process it deeply by making a connection with yourself (viewing yourself as a generous person for example), you will more likely remember the word, than if you had not (see also Brown et al's study of creative mental imagery, 1986; Katz's study of words related to creativity, 1987; and Reeder et al's study of long paragraphs of prose, 1987). It should be mentioned that this system works best for positive instances rather than negative instances - considering yourself to be mean, for example (Ganellen and Carver, 1985). Belzezza (1984) suggests that the reason why self-reference leads to better recall, is because the self is treated as a rich and organized set of internal clues to which information can be associated (demonstrating close associations with schema and connectionist theories). Klein and Kihlstrom (1986) also suggest organization as an explanation. Others (Greenwald and Banaji, 1989) suggest that in fact the self is not so different from other knowledge sources, but that it is such a rich source of ideas.
Craik and Lockhart's original model focused on input processing (encoding) only, which lead Moscovitch and Craik (1976) to add in a later paper that retrieval conditions should duplicate encoding conditions in order for deep processing to be effective. Bransford et al. (1979) found, as Moscovitch and Craik had intimated, that retrieval was better for 'shallowly' processed items which were encoded acoustically, and better than semantically processed items tested under free recall conditions. This appears to be a weakness in the model, albeit not an important one. If information was encoded shallowly, then it can be recalled if the testing conditions reflect the encoding process. If however, the information was processed deeply, and tested the same way as it was encoded, there will be superior recall. If the information is tested in ways other than the way it was encoded (e.g. an item was list learned, but tested in speech), there will be a better chance of recall than shallowly processed information.
Depth of Processing and Connectionist models.
How does depth of processing fit a connectionist model? Rehearsing and the self-reference effect are close to connectionist principles. Both suggest a close working relationship with the network as the central executive expands the network when analyzing input, making more connections and strengthening the links with otherwise poorly connected features relating to the feature to be stored ready for retention. We can see that the deeper the processing at the input analysis stage in working memory, the better the connections will be in the network, and the better chance there will be for recall. Part of this process will be the importance of noticing and noticing the gap (see below) between the input and stored knowledge - the deeper the processing when the gap is noticed, the better the retention. It should be obvious to the reader that the amount of effort as well as the quality of effort is also important.
2.5.4 Interlanguage Theory.
One of the most influential cognitive theories of SLA is that of Interlanguage theory. The term originally comes from Selinker (1972, 1992), however other researchers use similar terms such as 'approximative systems' and 'idiosyncratic systems'. Interlanguage theory was the first to try to explain SLA and many later theories were developments of it. Interlanguage theory states that there are several factors responsible for SLA. These include language transfer, overgeneralizations and learning strategies among others. The theory posits that an adult second language learner does not move smoothly from a first to a second language, but in fact creates a new separate third language system independent from the L1 or the L2. This system is in constant flux and under continual pressures to restructure (see above) the pre-existing knowledge (in the interlanguage) in response to new linguistic input. The learner makes progress along a series of continua representing knowledge in certain linguistic and paralinguistic areas. The movement reflects the U-shaped curve discussed earlier. This construct has seen itself subjected to both linguistic and cognitive interpretations and is important when describing the interface between input data, processing and pre existing knowledge as, by definition an interlanguage is made up of partial information and any model of cognition has to account for this.
Interlanguage processes have been discussed in terms of hypothesis testing (Corder, 1976) suggesting that a learner forms hypotheses about the structural properties of the target language on the basis of input received in an interface with pre-existing knowledge. "In this way a 'hypothesis grammar' is build which is then tested receptively and productively. Hypotheses are confirmed if learners' interpretations are plausible and productions accepted without comment or misunderstanding. They are disconfirmed if their understanding is defective and if their output fails to communicate and is corrected. In such cases learners may restructure their hypotheses, providing that they are sufficiently motivated to do so" (Ellis 1993:352). It is proposed that linguistic information needed to modify such hypotheses can be achieved through noticing (see below).
2.6 Summary of Above.
At this juncture we should provide ourselves of a short review. We have seen that memory is multi-dimensional. Working memory is in a constant contact with the memory storage networks which are built in a schematic way. These networks are made up of (sometimes tenuous) connections between related information which can be retrieved extremely quickly by virtue of parallel processing. Some of this knowledge is automatic and some is controlled, but it is in a constant state of flux, with restructuring dependent on the attention paid to information contained in input. It is proposed that if this information is processed deeply, then this will lead to better retention becoming more 'well-known' knowledge. The processing can be either top-down or bottom-up, depending on the data to be processed.
3.0 A Symbolic / Locative Connectionist Account of the Interface
between Input and Pre-existing Knowledge.
3.0.1 What we need to account for.
We have seen from the above that there are many factors to take into account when proposing a new account of learning. These include accounting for the different kinds of knowledge we have (linguistic - analyzed and unanalyzed, controlled and automatic), declarative, procedural, reportable, partial knowledge, etc.); the 'U' shaped development of linguistic performance; the relationship between long term memory and working memory; and so on as outlined above. Any theory must address the following (among other) issues.
The capability to store complete, partial and incorrect information, untested hypotheses, (or a store of items to check against incoming input), poorly processed information and so on
The ability a L2 learner has to infer, generalize and hypothesize from limited input.
The ability a learner has to process linguistic information very quickly and in parallel with other information.
The access to pre-existing knowledge (some would say UG principles and parameters).
Accounting for attention, discarding of input, forgetting etc.
Accounting for a learner's explicit and implicit knowledge.
3.1 A Model for the Interface between Input and Pre-existing
The principle components of the model outline below are a sensory register, working memory (after Baddeley, 1990) and a storage area called the network (after Anderson, 1983). One is always faced with problems when attempting to represent non 2 dimensional ideas in 2 dimensions diagramatically. However for the convenience of the reader I have provided a very rough outline in diagram 3 below.
We are constantly bombarded with input. From the words you are reading, to the feeling of the chair under you or the pen in your hand, to the sound of the radio next to you. All this input is in essence noise until some of it is registered in your brain. Divided attention studies have shown us that basically we can only attend to one or two things (either the book or the radio, but not both effectively) at once depending on whether the input can be processed automatically (fluently) or not. However, with practice we can overcome these limits (Allport, 1989; Hirst, 1986). Humans do not seem to have in-built fixed limits to the number of tasks they can perform simultaneously.
Diagram 3: A representation of the interface between input
data, working memory and memory networks.
When input is received at the sensory level (the level at which information is registered on the retina or ear drums) the central executive decides what to attend to and puts it into working memory. The information considered most salient and unexpected is usually that which is attended to . This input then becomes conscious in that you are consciously (at varying levels) aware of it - the rest of the input is not attended to and is discarded. This input is then compared with pre existing information in the network, which as explained above and in keeping with schema theory, is in a constant state of expectation that the incoming data will match pre-existing knowledge and thus confirm that knowledge and lead to comprehension. As the reader can imagine, this is a simplistic account of the process and will be expanded below. One can see from diagram 3 that the central executive decides what to attend to, what to ignore, how to put this into the network and so on. There is freedom of movement between the network and working memory allowing the flow of information back from the network into working memory for re-evaluation and reflection.
3.2 Features of Working Memory.
The central executive is modeled on that of Baddeley (1990). This regulates what will be attended to (or not). This attention is best described as being on a continuum from low to high attention. If a learner can process words she is reading very quickly then attention in terms of comprehension will be low. If, however, she is learning a completely new alphabet, then she will be focusing on each curve and line of each letter intensely in order to decode each letter and word - high attentive processing. Non-attended input would be that which the central executive in memory decided not to attend to. This focusing can be on one aspect of the various inputs available to be attended to, or on several in parallel. It is speculated that there is a limit to the amount of input that one can focus (or be forced to focus) on at once however, see the Allport (1989) and Hirst (1986) studies cited above. The more attention one focuses or concentrates on one feature of the input, the less attention one has for other aspects of the input. This attended to input or intake can also be analyzed deeply or superficially as in the depth of processing theory outlined above.
3.3 Attending to the input.
When the central executive in working memory decides to pay attention to a feature in the input it is also deciding not to pay attention to other input and thus discards it. The central executive can analyze the information in conjunction with the network either for bottom-up (discrete) items or top-down (from previous experience and world knowledge). Central to this is the notion of 'noticing' and the role of attention. In recent years 'noticing' has become a widely-accepted notion that can help a learner to use a feature of the language she had not previously used.
The notion of noticing has become an important explanatory phenomena for adjustment of behaviour (learning). There are two levels of noticing. One is the instant the item is registered by the central executive ("Oh, there's that wordlikes again") the second is when something about the item is seen by the central executive ("Oh, look it has an 's' on it, I wonder why?"). There is also another type of noticing when the central executive notices that something in the input is different from previously stored information and is called noticing the gap.
Researchers working in cognitive psychology, UG-based accounts of SLA and other such diverse fields of SLA inquiry, despite the epistemological differences, do seem to agree on the importance of noticing (Cook, 1993; Ellis, 1991,1993; Fotos, 1993; Fotos and Ellis, 1993; Rutherford and Sharwood-Smith, 1985; Saleemi, 1992; White, 1991 , however see Long, 1994 forthcoming). Schmidt (1990) provides an excellent account of the issues involved in terms of focal awareness (Atkinson and Shiffrin, 1968), perception and episodic awareness (Allport , 1979), a detailed account of which is unnecessary here. Several researchers also agree that noticing is a first step in learning. Sharwood-Smith (1981), Rutherford (1987) and McLaughlin (1987) have considered that there are 4 major steps for general processing. Fotos (1993 pp. 386-87) provides us with a summary:-
1 a feature in the input is noticed (level 1 noticing), either
consciously or unconsciously;
2 an unconscious comparison is made between existing linguistic knowledge, also called interlanguage, and the new input;
3 new linguistic hypotheses are constructed on the basis of the differences between the new information and the current interlanguage (if there are differences this would be noticing of the gap - level 2 noticing); and
4 the new hypotheses are tested through attending to input and also through learner output using the new form.
In a related proposal Schmidt (1990), taking his data from Schmidt and Frota (1986), distinguishes between perceived information, or input, and information which is noticed by the learner, or intake. Intake (linguistic forms which are noticed) is critical for subsequent processing of the forms. He observed (Schmidt and Frota, 1986: 141) that "I heard them (previously unnoticed features of the language) and processed them from the beginning, but did not notice the form for five months. When I finally did notice the form, I began to use it". Thus noticing has been suggested to reflect an interface between the development of explicit knowledge of a feature and the eventual acquisition of it (implicit knowledge).
3.4 The Network in Place.
Before learning can be demonstrated under this paradigm, we need to know the state of the network prior to new input. The network consists of interconnected nodes, the strengths of which hold the information. Such information includes skills, lexical knowledge, phonological knowledge, grammatical knowledge, knowledge of systems, chunked information, knowledge of how to do things and so on.
Central to a connectionist account of information processing one must see learning in a completely different light from traditional approaches. What learners know is characterized by a labyrinth of interconnections in a network. The network consists of nodes which regulate input and output and the connections between them. The weight of connection between these nodes reflects the strength of the connection. McClelland, Rumelhart, and Hinton (1986) proposed a theory of cognition that attempts to explain the way we can encode and decode information. In a connectionist network, information (or knowledge) is stored in the interconnections between nodes. Each node is connected to many other nodes, but not to all the nodes in the network. The connection can be excited or inhibited depending on the kind of processing happening. If a node is excited (via its input) due to its connections to other nodes, it stimulates another node via its output capability. These nodes are organized into 'levels' such that any one node excites or inhibits other nodes at its own or different levels (following Levelt above). Patterns and habits and rules are not stored in these connections, but what is stored are the connection strengths that allow these patterns and rules to be recreated. Knowledge is seen at the micro-structure level rather than macro-structure level of cognition. Therefore, the strength of the connections reflects the relative knowledge one has about an item of knowledge. Learning, therefore, is a by-product of processing.
If a learner is sure of a piece of information there will be a very strong link between the relative nodes that represent this information. If one has weak or partial knowledge then the connection strengths will be weak. If there is no knowledge, then there may be no connection, or alternatively a connection with no weight on it. In order to bring the abstract to the concrete, the following diagram seeks to illustrate a learner's knowledge of the verb see .
Diagram 4: A hypothetical partial representation of a learner's concept of 'see'.
The reader should immediately notice several things about this
1) The first and most obvious, is that the representation is incomplete and is only a partial representation of a learner's knowledge of see. That said, the concept of see in diagram 4 is distributed among many connections, some of which are thin and some are thick. The stronger the connection (thicker line) the more 'well-known' the information is, the thinner the line represents less 'well-known' information. For example, the learner is relatively sure that see means something like 'an image comes to my eyes' and that it collocates with some objects. She is less sure about her knowledge (partial knowledge ) that the pronunciation of the past tense of saw is /sØ:/ represented by the thinner line. In this model we can see that see is one node in the centre - a locative perspective. (Alternatively one could envisage see as distributed and existing as a function of the interconnections between the nodes and having no one central index point - a distributed view.)
2) The diagram shows nodes that, for diagrammatic purposes only, have been labeled 'meaning' 'past tense ending', 'preposition use' and 'objects that collocate' with their sub-categories. The learner could assign labels to these quite differently and in fact not even have them subcategorized as shown, but in some completely different way - reflecting her own view of the word see. Alternatively, these nodes could not exist at all, or there could be no connections between them, reflecting no knowledge between these nodes and thus no knowledge of see.
3) The diagram does not show all the other possible nodes about see - for example there are no nodes for its 'idiomatic use', the knowledge that see is pronounced the same as sea and so on.
4) Each of the nodes and sub-categories, such as 'meaning' are shown as being connected to other parts of the network by the lines leaving the diagram. Therefore, the network is immensely complex in structure.
5) Knowledge that things are not something can also be accounted for in this model. For example this learner may explicitly know that the past tense ending of see is not seed (is not /I:d/) This would be represented by drawing a line to that part of the diagram - either thickly or thinly depending on the strength of that knowledge.
6) It would take only a little imagination to conceive of a diagram which could represent knowledge of 'the present perfect', 'formal letter writing styles' 'vocabulary networks', 'semantic networks, 'pragmatics' and indeed all facets of SLA all linked together. Such a highly interconnected network would, of course, be beyond diagrammatic representation.
3.5 Types of Knowledge in the network.
Stored knowledge involves the storage of the input via an adjustment of the network links and connections which is done by the central executive in working memory, that leads to a new status quo or in real terms a new expectation for that word if it is met again, in keeping with schema theory. This does not preclude the bringing back of this connection to working memory through quiet reflection as shown by the two-way line in diagram 3. If subsequent reflection finds that the previous adjustment didn't lead to a full matching with all other parts of the network and it will therefore need to go back through the process again.
The input which is attend to can potentially become stored input, the central executive can also discard it, (however, see below). There are several facets to pre-existing knowledge. Knowledge that is complete (e.g. knowledge of my own name) or incomplete (partial); understood or non-understood; analyzed or unanalyzed; subject to high levels of control or not; subject to a hypothesis or not and tagged or not, which existed prior to new input being received, is called pre-existing knowledge. All stored knowledge becomes pre-existing knowledge and it is with this that new input is interacted. Pre-existing knowledge can be represented by several continua, which are exemplified below as continua for explanatory purposes only. The reader is asked to imagine that each node and connection of each network can be subject to varying degrees of the following aspects of knowledge.
The first aspect of pre-existing knowledge represents knowledge that is complete knowledge (the store of information in networks that the learner is comparatively certain of - concrete information in a sense), to the less well known and tenuous information. The second and third continua would include analyzed and unanalyzed knowledge, as well as knowledge about which the learner has varying levels control (fluency) (after Bialystok, 1979). A fourth dimension would be knowledge that is subject to a hypothesis or not (after interlanguage theory). If the central executive processes input but cannot neatly fit it into pre existing knowledge, the central executive at the level of consciousness can form a hypothesis to test when the same item is received in the input. This does not mean that this conscious hypothesis is subject to introspection and can be explained or articulated by a learner if asked, however she may be able to articulate it. Another aspect would be knowledge that is tagged or not. This means the learner has consciously tagged a piece of information as incomplete in some way and is awaiting new input to see what can be learned from more exposure. This is related to the concept of valency. The central difference between knowledge that is tagged and that which is subject to a hypothesis is that the tagged input does not have a hypothesis to test - it is just awaiting more exposure to see what happens. Of course, knowledge can be untagged and not be subject to a hypothesis. A fifth dimension would be that of comprehended and uncomprehended knowledge. Non-understood knowledge is a type of stored knowledge which has been attended to and has entered the network but could not be fitted into the network (made sense of) and essentially is uncomprehended (this information is available for later reflection, however). Understood knowledge is a type of stored knowledge that is that input which has been attended to, entered the network and has been matched perfectly with the pre-existing system and thus comprehended. Understood knowledge requires no adjustment to the network overall shape other than strengthening of the network links through confirmation of the match of new input with the previously learnt information and the storage of content in formation about the story being read, such as the details of the story. It should be noted that a learner generally can focus on the linguistic system or the content, but not both at once.
The interaction between the types of knowledge held in the networks (or more correctly network of networks) and working memory will lead to a change in the network, that is all input that is attended to will be stored (unless discarded at the working memory stage by the central executive). This does not mean that all input will be learned. For example, a learner could come across a new word and make it immediately available for output (say, speech). Alternatively she could tag the item as an unknown word, or subject it to a hypothesis, and wait for it to appear again, to obtain more information about it. She could also rate it as low priority and essentially discard it.
This discarding does not preclude, however, the possibility that the learner may remember having met the word when it is met again, allowing for incomplete discarding. Of course she could 'discard' a new word at the first meeting and at the second meeting react to it as if it were a new word, which means that it was completely discarded at the first meeting. Two types of 'discarding' must be distinguished. Complete discarding happens when a learner is asked if he has ever seen a word before (even though he had been introduced to it before) and he states that he has never met it before and has no recollection of it. Partial discarding happens when the learner remembers the previous input but only when it is brought to his attention. There would of course be no discarding if in the same situation he can recall the item without trouble. In this second sense there is provision in this model for the instantiation of pre-existing knowledge that had some how been 'hidden' until the new input was detected. This instantiation can be of a small element of a network accessing limited information at first, with a later instantiation of the related information stored in the connections in the network to bring out more information about its meaning, pronunciation, use and so on.
3.6 Adjusting the Network (Learning).
When new input is received, the input data is compared with pre existing knowledge to see if it matches. The input can be processed at the level of content (facts about the story being read - the characters and the plot for example) or at the linguistic level (the finding of new words, expressions, the making connections between previously unconnected items etc.). The central executive could also find that the network had previously tagged that item for further investigation leading to the potential for a change in, say, reading behaviour (stopping and re-reading for example in order to find out more about the tagged word). The default setting at the time of input is that the understanding of this input will be easily predictable by the pre existing schematic network and thus would require a minimal level of focal attentive processing. An example would be that if a learner was studying superlatives he would instantiate that network and expect to read (probably unconsciously) such words as the biggest, most, and so on. If this is indeed the input he receives, then the learner matches this input (the words he's reading) with the current network related to that word / phrase or type of text and finds that it fits the network as he had expected. If it does confirm the pre-existing knowledge by fitting in the network smoothly, then the meaning of the message may be retained (the content of the message may later be accessible for report, review, reflection and so on). The network is thus being adjusted by the strengthening of present connections confirming already known information and the addition of other networks.
In some circumstances the input is perceived or noticed to be different from the previously understood or learnt information. This unexpected input is a second form of noticing - called noticing the gap, between the input and the previously stored information (Schmidt, 1990; Sharwood-Smith, 1981; Ellis, 1993; Fotos, 1993; Rutherford, 1987). In the above example, this might be noticing that he had believed the superlative of good is bestest , when in fact it is best. A gap between pre-existing knowledge and new input is noticed. At this point the learner can readjust the network to accommodate this new information. Sometimes this will be easy if the network is highly developed, but considerable adjustment (along with possible confusion and 'thinking') may be necessary if the information is vastly different from that stored. This is called 'learning'. It should be clear that learning takes many forms. Any adjustment to the network is a form of learning. For example learning facts and new information either content or linguistic; learning that the learner does not know something, learning that the learner has incomplete information and so on. Examples of adjustments to the network could include:-
The activation of a new connection between existing nodes would reflect the linking of previously unconnected information. Initially this new link would probably be weak - showing unsure information, but could be a strong link depending on the connection relationships and the depth of processing.
The activation of a completely new node for completely new information being added to the network.
Forming, checking, rejecting and reformulating hypotheses.
Tagging items of knowledge as explained below.
Accounting for greater control, depth of analysis.
A connectionist model can explain the relationship between explicit and implicit knowledge. A facile example seeks to illustrate this. If a reader reads a sentence such as 'Did Leonardo da Vinci have knees?', he probably does not know explicitly if da Vinci had knees or not, but could easily have inferred or spontaneously generated this knowledge from his knowledge of the world, but he had probably never, prior to reading the sentence, made an explicit connection. Such a statement shows the status of information in the network. Prior to the question being asked, the link might have existed, and had no weight on it (no direct or explicit knowledge was stored). After the question, there will either be a strengthened connection in the network representing Leonardo da Vinci, knees and past tense (and maybe few more nodes too), or a new connection. From this we can see that knowledge comes partly from the environment as input, and also partly from existing in our world knowledge or schemata (see above). If we had been asked to draw a picture of da Vinci painting the Mona Lisa, rather than asked if he had knees or not, we would probably have drawn his knees without thinking about it. This internal or implicit knowledge is shown by the connections in our networks waiting to be 'given weights'. Implicit knowledge, therefore is something we have sufficient information to 'know' consciously or unconsciously, but are not aware of it until the connection is made . However, by extension or noticing we can get to know it explicitly. Obviously, this does not mean that we are born knowing everything and are waiting for connections to be made, it means we make connections all the time between previously unrelated information or we create new nodes and connections for new information. It is this substantial revision by second language learners of their pre-existing knowledge (their first language) that causes such problems when learning a second language.
One of the functions of the central executive is to reconcile information between the network and the input. The network contains many kinds of linguistic knowledge - from the L1, and from the L2 in terms or grammar, word associations, chunks, phonological and orthographic knowledge and so on. If new input does not match the present network, then some adjustment has to be made to account for the information. The central executive has an ability to look at the network and infer from linguistic or lexical patterns, (guessing from context is an example) and to generalize (and thus overgeneralize) (see above for examples). To do this there has to be communication between the network and the input. One way to do this would be to consciously make a hypothesis about the relationship between the new word in the input and store it away in the network as 'tagged' awaiting the same input to test the hypothesis. A crude example of such a hypothesis may be "look or wait for a chance to test this, if I hear or read the third person singular present tense, look to see if there is an 's' on the verb". It should not be assumed that the learner can instantly recall a list of these 'tagged' hypotheses, however when questioned, might be able to say "I'm not sure about third person 's'". This is a reflection of the unarticulatable knowledge mentioned before. By tagging a particular item, it might be connected to a store in the network dealing with 'things to find out' and thus can be instantly instantiated when it is next noticed in the input. It is suggested that the deeper the processing of the hypothesis (the more well-formed it is), the better the chances of the hypothesis being proven right.
A second way would be to ignore it and decide it was not important enough, or other things were more pressing, A third option could be to just 'tag' the item to see what the learner notices (level 2 noticing) about it the next time it is noticed (level 1 noticing) in the input, without making a hypothesis. A fourth option could be to 'tag' it for reflection and consideration later - in a sense revision. There are doubtless many more options available to the interaction between the input, working memory, the network of previously stored.
3.7 Factors of Learning a Connectionist View Can Account For.
Much of learning a second language is fuzzy in the sense that it can not be easily described, taught, learnt or might not be able to be explained explicitly, and is rather gray - such as the English article system or the French gender system. We do not have to have rules to be able to use them, we can produce these features of the language at varying degrees of control and automaticity by 'feel' as it were.
We can also see that due to the highly connected nature of the networks, the instantiation of one part of this knowledge system can immediately instantiate its connections to other parts of the network and locate its attributes. For example, if one is asked to try to remember all the words you know that you would typically associate with 'holidays' then that word holiday will instantiate other words, such as happy, travel, free time, homework and so on for some learners, but for other learners it may instantiate part-time job, boredom, loneliness and so on (demonstrating learner differences). This concept is known as lexical mapping or mind maps - which are an extension of connectionism. Connectionist networks therefore can form pattern-like behaviour. It must be stressed that this does not necessarily mean that the system is learning rules. It is not. What is stored is rule-like behaviour (however explicit knowledge of rules may be stored). The patterns are not stored as such, but are created in the connections made by the central executive which adjusts the weights. In addition, the model can explain patterns that are formed that can link the various senses. For example a learner can match a sound from a tape while reading a transcript of the text on the tape, or vice versa. If a learner reads a word she can probably access the phonological knowledge linked to it. This can be extended to the links between say, pigs and the sight of a pig, the representation of the smell and so on.
This model accounts for partial, incorrect and incomplete knowledge. In keeping with the fact that very few learners of a second language ever reach native speaker levels, a learner will continue to learn throughout her life and the knowledge system will be incomplete and partial. This is particularly true for the interlanguage system a learner has which is by definition incomplete. This interlanguage can contain part-processed information, and even completely incorrect information that the learner still has to learn is wrong. This could also include information which the learner has reached a level of performance which he feels he does not need to develop beyond. This could be due to affective or linguistic factors which are beyond the scope of this paper.
The model accounts for how we can guess or make inferences with incomplete knowledge. If, for example, a learner knows that some past tense of verbs end in '-ed', then if a learner meets a new verb she could try to guess if it was regular or not. She may be right or she may be wrong, but nevertheless she used her pre-existing knowledge to guess and generalize. This also applies in grammaticality judgment tests. One does not necessarily need to be able to report why or how one knows, but that a general feeling tells you and leads you to make your inference or spontaneous generalization (and overgeneralizations). In addition, the model has the ability to demonstrate how we generate hypotheses to test various linguistic features of the target language and tag items for further investigation. In addition, in this model memory can function with inappropriate input and can make default assignments for missing information
The model explains the relationship between skill, automatization and control in a way that is consistent with current theory with particular attention to the parallel processing of the various types of pre-existing knowledge - phonological, orthographic, semantic, lexical, grammatical and so on is all interconnected. It is this interconnectedness which allows for extremely fast retrieval.
3.8 Some Problems with Connectionist Models.
One of the problems of current versions of connectionist models, is the difficulty they have in clearly accounting for loss of information, both rapid forgetting and 'graceful decay'. There are two possible explanations for loss of knowledge. The first is that the connections were weak and tenuous to start with, implying the need for repeated processing of the item / skill, either through receptive input or production. A second explanation could be that the connections made were subject to a form of competition from other items in the network or from other items in the central executive in working memory. A second problem is the inability of current versions to predict what structures will be learnt next and other related aspects outlined in the introduction. Thirdly, Cognitive learning theory is a powerful explanation of how learners develop their ability to use their L2 knowledge, but is not really able to provide an adequate explanation of how this knowledge is acquired in the first place as it fails to recognize that linguistic factors also play a role. A fourth problem concerns time. Connectionist models have problems in representing time relationships, which is critical in the language domain. Sampson (1987) suggests that one reason for this is that it is the connectionist model itself that extends through time, via the gradual setting of the network, whereas in a model based on rules one can think of the rules as applying instantaneously. Thus it is difficult to treat time as an input-output feature and to input data sequentially would cut across the very parallel processing nature of the models.
3.9 UG and Connectionism
Under the rationalist paradigm one would assume that there is no role for UG in connectionist models, as UG is seen as biologically innate. However, this would be misleading as in fact the network so described here has an initial state - that of the nodes and connections in the network and of course, general cognitive abilities. There is no reason to assume that UG could not be part of this network and a determining and restricting factor working with the central executive to instantiate or trigger parameters for languages. There are two possible explanations of the possible compatibility with UG based accounts of SLA.
The first entails the notion that some of the nodes are relatively fixed
and vary in degrees of plasticity. Some connections might start with
fixed weights and others that can have their weights modified.
Sampson (1987) suggests that the "modifiability of these connections
might decrease over time as one way of modelling the increased
difficulty with which adults experience language learning" (p. 187) A
second explanation for the availability of UG-type constraints in a
connectionist system could be that nodes have to wait to have their
weights set on the basis of relatively small set of inputs - a kind of
'parameter' setting. The probability of UG based constraints seems
remote as connectionist cannot be 'pre-wired' to make direct reference
to complex syntactic constructs. Gasser (1990) gives us an example
why. He says that the principles and parameters of the UG of
Government and Binding Theory are stated in terms of variables, such
as "q-marking". This says that "if a subcategorizes the position
occupied by b, then a q-marks b" (Sells, 1985). He continues by saying
that these cannot be wired into the network from the start. Even if
they could be, the network would not know what to do with it. This
does not, however rule out incompatibilities with innate mechanisms.
Indeed it could even be seen to be supportive. Walker (1989, cited in
Gasser, 1990) had found that "it appears that the connectionist models
may be able to shed new light on the nativism issue because of their
ability to outperform previously conceived empirical systems".
7.0 Summary and Conclusion.
This paper has focused on the relationship between input data and pre existing knowledge. There has been no attempt to explain societal, affective and pragmatic aspects of SLA. Connectionist model of information processing can account for a wide variety of factors involved in the processing of input and its relationship with memory storage. They are not without problems however. Gasser provided us with a balanced account of connectionist models (1990) and their implications for second language learning.
It is now clear that some form of connectionism will figure in a general model of linguistic behaviour. The only question is whether it will be a minor one, relegated to low-level pattern matching tasks and the learning of exceptional behaviour, or whether the (PDP type of) connectionist account will supersede symbolic accounts, rendering them nothing more than approximations of the actual messy process (p. 186).
It is this writer's assertion that connectionist accounts of the interface between new input and pre-existing knowledge play a significant role in second language learning.