Clive Matthews,
University of East Anglia, Norwich, England
Abstract:
Most recent work in ICALL has tended to focus on syntactic structure. Clearly the grammar formalism chosen for such systems is of some importance. However, as this paper argues, little consideration seems to have been paid to such matters beyond the question of computational efficiency. Following previous work, the paper further argues for choosing a formalism that potentially meshes with work in SLA. Of all the main grammar formalisms being developed, GB theory, with its emphasis on Universal Grammar, has had the most impact on SLA research. Recent advances in "principle based" parsing now make possible the integration of such work into ICALL.
INTRODUCTION
The theme of this conference is "bridges.' Bridges help to reduce the boundaries between disciplines and this is something I have been recently advocating as necessary with reference to Intelligent CALL (ICALL) (Matthews, 1992a). It is a topic that I want to pursue in a little more detail in this paper. The connections I most want to explore are three-fold: between linguistic theory and ICALL, between Second Language Acquisition theory (SLA) and ICALL and, finally, between linguistic theory and SLA (Figure 1).
0x01 graphic
5
The claim is that the link-, between these disciplines allow each to inform the other, so that findings and advances in one area will provide insights and gains in the others. It is only in this way that we can hope to enhance the effectiveness of CALL (intelligent or otherwise). Unfortunately, the indicated ties frequently exist more in theory than practice; this is why the lines have been dotted in the diagram. As becomes apparent later, these often weak dependencies will only allow a few tentative conclusions in the overall thrust of our argument.
We start by justifying the connections. ICALL can be roughly characterized as the attempt to use techniques from Artificial Intelligence (Al) within CALL. The claim is that the resulting systems will have the ability to respond flexibly to the users of the software so that not only will they be able to handle any input-including, crucially, the unexpected — they will also be able to tailor their interactions to the individual.
The obvious Al research areas from which ICALL should be able to draw the most insight are Natural Language Processing (NLP) and Intelligent Tutoring Systems (ITS). Indeed, it is usual to conceive of an ICALL system in terms of the classical ITS architecture pictured below.
0x01 graphic
The Expert module contains the knowledge that is to be tutored, the Student module consists of a model of what the student knows regarding that domain (plus any other relevant details such as learning preferences) and the Tutor module determines what should be taught and how.
6
In this paper, our interest is in characterizing the Expert module and, in particular, an Expert module concerned with grammatical form.1 Consequently, it is the NLP side of ICALL that is most relevant to our cause rather than the peculiar ITS concerns of student modeling and tutoring strategies.
It is for this reason that we have drawn the link between ICALL and linguistic theory in the diagram since it would seem to be a truism that NLP will have close links with linguistic theory. That is, one would expect linguists to develop particular grammar frameworks (GFs) and descriptions of various languages using these GFs which the computational linguist then simply implements. However, this is a relationship that is sometimes more honored in the breach than in the observance; hence, the dotted line in the diagram. Usually the problem is that linguists do not work within a sufficiently precise formal framework as required for a computational implementation. This, however, is becoming less so and we will adopt the assumption that, since our ICALL system is going to need a GF in which to couch its grammatical descriptions, linguistic theory is a reasonable place to start the search.
Turning to the link between ICALL and SLA, I have argued previously that there is a stronger commitment to "going AI" than simply the importation of a set of clever programming techniques into CALL (Matthews, 1992a). That is, the majority view of Al sees it as an attempt to understand the (psychological) mechanisms underlying human intelligent behavior. To assess its achievements in this undertaking, AI must ally itself to those other disciplines which take human intelligence as the focus of their enquiry. These fields range, amongst others, from philosophy and linguistics to (cognitive) psychology and anthropology via visual perception and neuroscience. In brief, this "cognitive science" view of ICALI, requires that it be informed by theories of psychology and, in particular, theories of SLA. As we shall have cause to remark later, there are various shortcomings in much SLA research which precludes such a beneficial relationship- again, this is why the line is dotted. However, that work which is relevant in SLA should provide a keen stimulus to ICALL.
The final connection, between linguistic theory and SLA, is based upon the (sometimes controversial) assumption that SLA, just like NLP, should be informed by linguistic theory. Of course, we are not claiming that linguistic theory will account for the full range of facts that constitute SLA; clearly, other social and psychological factors are involved. However, linguistic theory does play the crucial r6le of being able to describe
7
(within this limited domain) what is acquired and, given a suitably articulated theory, some indications as to how this might be achieved.
It is this complex network of connections and concerns that makes ICALL such a potentially rich area of study. As Alan Bailin notes:
[ICALL is] part of an important endeavor. [It] constitute[s] an experimental scientific investigation of [second] language teaching and learning... [which]...explicitly or implicitly make[s] general claims about the components of language teaching and learning. [Bailin, 1991]
Note that this passage clearly (and correctly) implies that bridges provide access between disciplines from both directions.
The underlying assumption of much of the above is that the only direction of influence within this net is all towards ICALL. However, we have also suggested that this effect, in many cases, is rather weak But, once the connections have been noted and the nature of the bridges, it becomes quite possible to look for the influence also flowing in the other direction. Indeed, it should be expected.
What GF would one choose to characterize the (grammatical) Expert domain of an ICALL system? Here our main interest is in discussing some of the criteria that should be applied in making this decision. As this section has suggested, at least some of these criteria should revolve around considerations having to do with linguistic and SLA theory.
In order to be able to make the discussion more concrete, we will compare and contrast two frameworks, Definite Clause Grammars (DCGS: Pereira and Warren, 1980) and the current version of Chomsky's transformational grammar known either as Government and Binding (GB) theory Or, increasingly, Principles and Parameters Theory (PPT: Chomsky, 1986). These GFs have, in part, been chosen because of their very different approaches to characterizing a language. It turns out that these differences also have interesting computational properties.
It is probably not an exaggeration to say that DCGs (and closely related frameworks) are currently the favored GF in ICALL. PPT, to the best of my knowledge, has not so far been used in ICALL.2 The burden of this paper is to argue that, all things considered, there are considerable arguments in favor of adopting PPT as a GF for ICALL.
8
CRITERIA OF ADEQUACY
It may prove useful to start with a brief survey of (some of) the main GFs that have so far been utilized in ICALL:
(1) Various Augmented Phrase Structure frameworks (including DCGS) as used, for example, by Chen and Barry (1989), Schwind (1990), Labrie and Singh (1991), and Sanders (1991). Also included are systems embedded under PATR-114ike environments such as Levin, et al. (1991) and Chanier, et al. (1992).
(2) Augmented Transitions Networks (ATNS) used by Weischedel, et al. (1978). Handke (1992) uses a Cascaded ATN variant.
(3) Lexical Functional Grammar (LFG) used by Feuerman, et al. (1987).
(4) Systemic Grammar used by Fum, et al. (1992).
(5) Tree Adjoining Grammar (TAG) used by Abeillé (1992).
(6) Incremental Procedural Grammar (IPG) used by Pijls, et al. (1987).
(7) Word Grammar used by Zähner (1991).
(8) Preference Semantics used by Wilks and Farwell (1992).3
Even the above list does not include some of the frameworks enjoying considerable support in recent linguistic theory; for instance, Categorical Grammar, Generalized Phrase Structure Grammar (GPSG), and Head-driven Phrase Structure Grammar (HPSG) to mention just a few.
What criteria should be applied when deciding on a GF for ICALL?4 Amongst a number that come to mind, we choose to highlight the following:
Computational effectiveness
Since the GF is to be incorporated within a computer system, it should be capable of an efficient computational implementation. This imposes various conditions.
9
For example, the GF should be associated with a grammar formalism in which the framework is to be formulated. It is usually assumed that this formalism, itself, should have a clear syntax and semantics so that we know whether we have accurately expressed the relevant parts of the GF (Pereira and Shieber, 1984). This has shown to be the case with the formalism used for DCGs but has frequently been doubted with respect to the notation used within PPT. It is certainly true that there is a plethora of different notational forms used in expressing PPT within the linguistic literature. However, recent interest in principle-based parsers has had to address this problem and more precise formalisms are now being used — for example, Ed Stabler has formulated the whole of Chomsky's (1986a) 13arriers framework using First Order Logic (Stabler, 1992).
A second computational requirement is that the GF be associated with well-defined and efficient parsing algorithms. Here, until recently, DCGs held the advantage over transformationally-based grammars. The reason is that the DCG formalism can be run almost directly as Prolog code. Given that there are now efficient Prolog compilers, DCGs can be compiled into impressively fast parsers. Equally, DCGs may also be associated with a whole range of different parsing strategies apart from the top-down, depth first, left-to-right strategy that "comes for free" when running the grammar directly as a Prolog program.
The early history of transformational parsing, on the other hand, is not so impressive. Although various systems were developed in the mid 1960's based upon the early versions Of transformational grammar, these tended to be highly inefficient due to their highly non-deterministic procedures and produced a large number of spurious analyses before finally finding a genuine candidate. Indeed, it is often reported that the MITRE system (Zwicky, et al., 1965) took 36 minutes to analyze an 11 word sentence. Matters have greatly improve d since then. For example, Marcus's PARSIFAL parser was the first of a new generation Of transformationally-based systems (Marcus, 1980). Adopting a deterministic procedure — via lookahead — PARSIFAL was respectably efficient. More recent systems based upon PPT perform with even greater efficiency and will be discussed in more detail below.
Accordingly, it is rather difficult to choose between our two exemplar GFs based on the criteria of this section. We return to other aspects of this question in the section "Rule- vs. Principle-based Parsing."
10
Linguistic perspicuity
GFs play two main roles within our disciplines. Their first, and most important, contribution is descriptive; they provide the tools with which to analyze the grammatical structures of language.
However, different GFs (and formalisms) tend to focus on or highlight different sets of linguistic phenomena and the decision to choose one framework over another might be made because one rather than the other facilitates the description of a particular phenomenon that we deem to be important.
As an example of this point consider the following. When introducing DCGs the first exemplar rule is often something like:
s -- > np(Num), vp(Num).
The focus of attention here is on how this rule (in combination with others) determines subject-verb agreement. That is, the term Num is a variable which is intended to stand for the number of an item. Because the same variable is used with both np and vp, the rule ensures that the value — whatever it is; singular or plural — of the subject NP is the same as that of the verb which heads the VP.
Such agreement properties are easy to express in DCGS. It is no accident, then, that those researchers who have chosen DCGS, and their like, as their ICALL GF tend to focus on agreement facts within the language.
Now compare the DCG account of subject-verb agreement with that in PPT. This involves the mysterious "Rule R" which associates agreement features generated under the 1-node with the (lexical) verb at the level of PF. This (morphological) rule will determine that, say, like + singular is realized as likes. The I-node is also co-indexed with the subject NP as part of specifier-head agreement. Accordingly, both NP and verb get associated with the same agreement features although via different mechanisms.
As this description makes clear, subject-verb agreement receives a more complex analysis within PPT when compared with a DCG. Accordingly, if all one were interested in was agreement properties of a language then a DCG would, ceteris paribus, be the obvious choice of GF. Of course, the overall decision is unlikely to be based upon such a simple example and there are other areas where DCGs are not so perspicuous and where PPT has an advantage. In the long term, choosing a GF is a matter Of trading
11
disadvantages against advantages with respect to the particular features that interest the researcher.
Little consideration seems to have been paid to the question of the descriptive adequacy of the various GFs within SLA. In part, this is due to the already remarked upon weak links that tend to bind SLA theory to linguistic theory. Surveying the literature, by far the most consistent adherence has been to the various species of transformational grammar including PPT. However, even when some version of PPT is adopted, the reason has more often to do with concerns relating to the acquisitional claims associated with PPT — notably questions to do with innateness — than with the descriptive adequacy of the framework.
ICALL cannot neglect such issues. Unfortunately, the impression is that many researchers adopt a framework such as a DCG simply because it is an easily implementable theory and then shoe-horn their analyses into the required form. Hence the emphasis on different types of agreement error, even though this does not seem to loom large in discussions of student errors within SLA. We might make two comments on this situation. First that even if SLA cannot always supply particular GF-solutions to the questions of ICALL, it can help to constrain particular GF-answers developed solely within ICALL. Second, here is a dear area where the general informational flow is quite likely to be from ICALL towards SLA.
Acquisitional perspicuity
Besides the descriptive role of GFs, their other contribution is explanatory; they aim to provide justifications regarding actual linguistic acquisition and development. Surprisingly, however, the influence of linguistic theory on both First Language Acquisition (FLA) as well as SLA has been Surprisingly small. There was a flurry of activity for about a decade starting in the mid 1960's which was heavily influenced by the then major (indeed, probably only) GF, transformational grammar. Soon after this pioneering work, disenchantment set in amongst both FLA and SLA researchers with linguistic theory so that for the next decade little acquisitions] work was informed by theoretical syntax. In the last few years, however, there has been a resurgence of interest in some of the new GFs. Pre-eminent amongst these has been the new incarnation of transformational grammar, PPT. There is now quite a large and rapidly expanding body of interesting work using PPT within FLA (see Atkinson, 1992, for a detailed survey) and SLA (see White, 1989, for an overview).
12
There are two aspects to the influence of PPT on such research. The first is acquisitional. Here the main claim is that FLA is mediated by innate Structures. It is the question of whether such mechanisms are still available to the second language learner which has most exercised the SLA community. The second aspect is developmental with a number of researchers beginning to use the PPT framework to account for actual maturational sequences observed in FLA (see, for example, Hyams, 1986, and Radford, 1990). This has not been investigated with the same vigor within SLA (although see du Plessis, et al. [1987] for some suggestive work relating to German).
Accounting for developmental sequences may well be an important consideration as far as the student and tutoring modules are concerned in the rest of the ICALL architecture.5 Again, we might make the same observation as at the end of the last section and note that the informational flow is quite likely to be from ICALL to SLA theory.
It might be thought that, given the drift of the last few pages, we are now in a position of being able to justify the already stated conclusion of this paper, namely that there are Strong reasons for using PPT as a GF for ICALL. However, we adopt a far more tentative position. Certainly, given a commitment to bridges as argued for and given the state Of Current research, there are strong reasons why PPT should be accorded careful consideration as a GF for ICALL. However this conclusion is drawn almost by default; other GFs just have not been applied with the same vigor. Adopting PPT on such grounds, then, would be premature.
The choice, therefore, between DCGs and PPT is not that clear when considering the second and third criteria above. However, there are some interesting factors which emerge when reconsidering the two GFs from the computational aspect. The arguments to follow are presented in terms of the properties of rule-base d vs. principle-based grammars. A DCG is an example of the former and PPT, as the name suggests, of the latter. Before turning to these arguments we first need to distinguish between the two types of framework.
RULE- VS. PRINCIPLE-BASED FRAMEWORKS
It is easiest to see the distinction between the two types of approach by thinking in terms of particular grammatical constructions. Usually, the comparison involves constructions such as active and passives. However, the same properties can be seen by considering something as simple as VP structure.
13
Rule-based frameworks work by defining specific rules for specific constructions; in our case this means a separate VP rule. A principle-based approach, on the other hand, sees a particular construction as resulting from the interaction of a number of simple, but relatively abstract, syntactic principles. As such, there is no one principle which solely defines the various properties of a VP. Diagrammatically we have:
0x01 graphic
We now flesh out this abstract description a little. Consider how a VP such as the bracketed example below would be described in the two approaches:
The Danes [like Maastricht]
With a DCG the basic structure would be handled by the simple rule:
vp --> v, np.
which induces the following tree (with relevant lexical items):
0x01 graphic
As we see, one rule, one structure. The description of the same structure in terms of a principle-based theory, initially, looks far more complicated. Here we need to describe some of the modules and their principles that make up PPT.
14
First, X-bar theory. This can be thought of as describing the basic tree structures that are allowed in natural languages. Roughly it says that trees take the form head-argument or argument-head (assuming there to be an argument). For example, this module will license, amongst many others, the following (simplified) trees:
0x01 graphic
The first tree is an example where the verb (the head) does not take any arguments. The other two trees represent possible structures where the verb takes a single NP argument to the right or left respectively.
The part of the theory which determines whether a verb appears with an argument or not is known as Theta theory (-theory). -theory relates to questions of who did what to whom. So, a verb such as like involves someone doing the liking and something being liked. These are the -roles of the verb. The main principle of -theory, the Criterion, (partially) states that each -role should be associated with a syntactic argument (which in this case means an NP). That is why an NP must appear in the VP headed by like in order to receive the verb's (internal) -role.
-theory, however, does not determine that the NP must follow the verb.6 This is accounted for in terms of Case theory. The main principle of Case theory states that all overt (i.e. pronounced) NPs must be assigned Case by a Case assignor (either Tense, V or P). Case is assigned under government — the assignor must govern the assignee — but it is also assumed that the assignment is directional. In particular, English is a language where the verb's Case is assigned rightwards. Accordingly, the only permissible tree which satisfies all the principles is that shown in Figure 4.
The other tree allowed by the combination of X-bar and -theory is ruled out by Case theory since the NP, which requires Case in order to escape the Case Filter, is in the wrong position to be assigned Case.
15
This might seem like great deal of intellectual baggage to account for a simple construction — especially when compared with the DCG account — and would be so if it were not for the fact that the same principles are used to account for other constructions. Consider, for example, the complex NP:
The enemy's destruction of the city
With a DCG this will require the addition of a new rule, say:
np --> np_poss, n, pp.
With PPT there is no need for additional machinery. The X-bar theory will determine the various possible tree structures for this string. This will be much as before except that "NP" will replace "VP." -theory determines that because destruction has two -roles to assign — just like the associated verb destroy — two NPs are required. Finally, Case theory requires these NPs are to be assigned Case. Since nouns are not Case assignors, we account for the presence of the Case assigning possessive's' and the preposition of. What looks like a cumbersome theory when considering a single structure starts to take on a more compact aspect when its coverage is expanded.
RULE- VS. PRINCIPLE-BASED PARSING7
The examples in the last section give some idea of the difference in approach between the two types of theory. We now turn to some of their computational consequences when implemented as parsers. We choose to examine those that have especial significance for ICALL.
Grammar Size
Rule-based frameworks require a large number of rules to describe a language. This is not usually apparent when looking at prototype systems since they only cover a highly restricted portion of the language. However, those systems with a wide syntactic coverage of English use literally hundreds if not thousands of rules. Matters are even worse for languages such as Japanese with a freer word order than English. In such languages the same basic construction may require a number of different rules to describe each of the various permissible permutations due to the variable word Order. This significantly increases the size of the rule set.
16
There is an obvious consequence for ICALL. As has been frequently pointed out, various learner errors appear to be due to transfer from the native language. One approach to handling such interference errors is by parsing the informed input with a combination of both the native and target language grammars (see Schuster, 1986, for an example within a highly restricted domain). But if this entails a complete system having a full grammar for, say, Japanese as well as English, the rule base will be astronomical in size.
Clearly, just discovering and writing such a large number of rules is problematic. However, there is also a computational problem since the parsing algorithms associated with rule-based systems run as a function of grammar size; the larger the grammar, the slower the performance.8 The consequence is that as grammars become more complete they will become less efficient when incorporated as parsers.
Solutions to this problem might be found with specially devised algorithms or dedicated hardware. Alternatively, one could move towards a principle-based system where the many different combinations of the same set of a dozen or so principles (with parametric variation) can encode the same information as many thousands of rules. Of course, here there is a promissory note that there are efficient parsing algorithms which can make use of such a grammar.
Grammar Specificity
Not only are rule-based frameworks construction specific, they are also language specific. Each grammar is tailored to describe a specific language and, because of their nature, does not Provide any easy way of stating connections between languages. Take as a simple example the different Word order of English and Japanese. As we have already seen, English complements follow the verb; however, in Japanese they precede it:
gave a book to Shunsuke
Shunsuke ni hon o age-ta
Shunsuke book give-past
For rule-based grammars such differences have to be accounted for by stating separate rules which are rather different in form:
English: vp -- > v, np, pp.
Japanese: vp -- > np, np, v.
17
Clearly, on such an analysis it is hard to capture the fact that these two rules are actually describing the same construction in the two languages. The root of the problem here is the rule-based approach's emphasis on defining string sets. Because the English and Japanese strings — lexical items aside — are very different, so are the rules.
Principle-based theories, on the other hand, try to abstract out a set of deeper and more explanatory (universal) principles which underlie such constructions. The same principles apply in all languages. It is the notion of parametric variation which accounts for the differences between languages. In the example under consideration, the assumption is that the relevant module is Case theory. We have already noted that the Case Filter requires that all lexical NPs must have Case. This principle applies to both English and Japanese. The difference lies in how Case is assigned within the two languages. Here the crucial factor is the direction of the assignment. The assumption is that in English, Case is assigned to the right — this is why NP complements follow the verb in order to pass the Case Filter. In Japanese, Case-assignment is to the left so NP complements must precede the verb.
The idea, then, is to describe cross-linguistic variation in terms of a set of common principles but associated with parametric variation. Rather than write a completely new grammar for each language, as the rule-based approach has to, a principle-based parser simply has to determine the particular parameter settings for each language (plus its lexicon). The result is that languages are seen as related rather than unconnected objects. Clearly, as far as ICALL is concerned this is a preferable conclusion to the position that no languages have anything in common apart from being the result of string concatenation. It opens up, for example, the possibility of being able to give principled accounts of language transfer.
Ungrammaticality
Any natural NLP system will encounter ill-formed input from time to time and a robust system should be able to handle such cases. The main differences with ICALL are that (a) ill-formed examples are likely to be more common because of the nature of the users, (b) certain recovery strategies such as asking the user to try again are unlikely to result in any improvement, (c) the ill-formedness is likely to be of a higher degree and (d) some pedagogic response will, on occasion, be required.
18
Parsing ill-formed input is problematic for a rule-based system. This relates back to the previously noted emphasis within such approaches on the description of a distinguished string set defined over the words of the language. Once this set of well-formed strings — i.e. sentences — has been defined the job of the theory is over. Accordingly, ill-formed strings are "beyond the pale" of the theory and to handle them within a parser utilizing such a theory requires some additional machinery.
A standard solution to this problem is to introduce more rules designed specifically to handle ill-formedness. This is a familiar approach in ICALL. Of course, it has the drawback of increasing the size of the rule set with the attendant efficiency problems noted in the first section. It is also a problematic exercise in writing enough rules to handle all possible ill-formed input. Finally, there is also the problem in that simply producing a rule to allow through an HI-formed example does not provide an explanation for that failure.
An alternative way of handling ill-formed input in a rule-based approach is by constraint relaxation. Take subject-verb agreement. Suppose a sentence fails to satisfy the well-formedness constraint that both the subject NP and verb heading the VP agree in number. The designer of the parser may specify that this particular constraint may be relaxed (with a record of the error being made). This was what the "failable" predicates of Weischedel, et al.'s (1978) German tutor were meant to achieve. Which predicates were to count as "failable" was left entirely at the discretion of the designer and, as such, provided no principled theory as to why certain predicates were failable and others not.
Constraint relaxation approximates a principle-based parsing approach but on a less systematic footing. If the emphasis in rule-based approaches is on defining string sets, the emphasis within PPT is on defining the underlying abstract principles which underpin the language. As such, any string may be considered with an eye as to how many of these principles (and which) it satisfies. Theoretically it does not matter whether all the principles are satisfied since the emphasis is not on sentences per se but on the principles. Of course, we can define a sentence as a string that satisfies all the principles but this is a derivative notion. In terms of ungrammaticality, the more principles that a string violates, the more ill-formed it is. But even if a string fails a number of principles at least some structure will be assigned. So, taking the ill-formed string:
John a book to the librarian gave
19
A parser will, at least, be able to assign it an X-bar structure. The problem has to do with Case theory and the already mentioned direction of Case assignment; since it is rightwards, the NP does not get assigned Case in violation of the Case Filter. Of course, it is quite easy to recover from this violation simply be assuming the alternative parameter setting of Case assignment being to the left.
Other principle violations can lead to far greater problems. This is, in part, due to the links between the various modules of the theory. For example, Case theory is defined relative to X-bar structure (the relevant structural notion being that of government). Accordingly, if a string cannot be assigned an X-bar structure there will also be Case theory violations. This would account for the extreme ungrammaticality of "word salad' strings such as:
the a to book librarian gave John
Such relationships between the modules of the theory also provide, in principle, a theory as to why certain violations seem to result in greater processing difficulties than others.
CURRENT IMPLEMENTATIONS OF PRINCIPLE-BASED PARSERS
This nexus of considerations indicates the potential of principle-based parsing for ICALL. Of course, they are of little worth if workable parsers cannot be developed.
Work has been proceeding for some time on implementing principle-based parsers. Wehrli's work on the parsing of French is an early example (Wehrli, 1983). In addition, a certain amount of research has attempted to combine insights from PPT within the Logic Grammar paradigm (see, for example, Stabler, 1987). There is even some work on Connectionist implementations (Rager and Berg, 1992).
The work with the highest Profile, however, has been emanating (mainly) from the AI lab at MIT under the leadership of Robert Berwick. Of especial interest here are the parsing system PO-PARSER (Principle-Ordering Parser) (Fong and Berwick, 1991, and Fong, 1991), and the machine translation system UNITRAN (Universal Translator: Dorr, 1990 and 1991). Kazman (not of MIT) has also produced interesting work relating to FLA (Kazman,1991). Using a parser called CPP (Constrained Parallel Parser) he has shown that by resetting various (adult) parameters all the sentences of two chosen
20
children between the ages of 24 and 29 months can be successfully parsed. Further, as the children age the parser with the child's initial settings fails on an increasing number of sentences whilst the adult settings produce increasingly successful parses.
Each of the above examples has been implemented and in the face of what seem like substantial architectural and computational design problems. That is, although it is reasonably easy to see how the various principles fit together conceptually within PPT, this does not tell us how a principle-based parser should proceed in constructing a parse. In fact, it might be thought that such parsers will encounter the same problems of non-determinism as the archaic TTG-based parsers mentioned earlier. Indeed, overgeneration is the major problem for such parsers.
The fault lies in the nature of the set of principles, each of which only contributes a small part to the overall effect of a structure. In terms of parsing, each principle will not constrain the final sentence structure to any great degree; it is the principles in combination which gives the theory its force rather than any one individual axiom. Consequently, each of the generator principles — such as X-bar theory — may license many thousands of structures, each consistent with the input string. Even applying the filtering principles — such as Case and -theory — to such sets will still often result in large numbers of postulated structures.
Of course, such overgeneration will lead to problems of slow parsing. In order to overcome this, various control regimes are imposed upon the principles. One possibility involves different orderings on the application of the principles. Fong has shown that certain sequences produce parses orders-of-magnitude faster than others. Indeed, Fong's PO-PARSER also allows dynamic principle ordering so that the parser can change its sequencing depending on the sentence type being parsed; this typically also increases the speed of the parse.
An alternative control Strategy is to co-routine the principles, interleaving one with the other. For example, the parser might start building a piece of Structure based on X-bar theory, then break off to check this partial structure against Case and 0-theory, and then return to its structure building.
Other questions of some import relate to which particular levels of the grammatical theory are to be computed. For example, some of Johnson's parsers work without constructing either D- or S-Structure (Johnson, 1991).
21
Experimenting along these lines, principle-based parsers have now been built which run at least as efficiently as large rule-based systems, producing parses within a few seconds. One can only imagine that future research will lead to even further gains.
CONCLUSION
The ostensible theme of this paper has been the choice between two contrasting GFs as potential components of an ICALL system. As such, we have seen that there are a number of reasons converging on the choice of PPT. However, the deeper thesis has revolved around the nature of the proposed criteria. As an interdisciplinary exercise ICALL must be sensitive to criteria pertaining to related fields. In the particular case, the choice of GF has stemmed from considerations relating to linguistic theory and SLA as well as the more obvious concerns of computational efficiency. A more complete discussion might well branch out into more general questions of psycholinguistics and cognition. The central claim is that this wide-angle view should equally be applied to any aspect of ICALL — indeed, to CALL in general — whether it be other aspects of the knowledge domain Or questions relating to the student and tutoring modules. Returning to the metaphor of the conference, the bridges linking ICALL to other disciplines are to be seen as furnishing essential supply lines of information rather than some kind of optional tourist attraction. The richness and complexity of the resultant theories will raise questions of the utmost difficulty but that is as it should be since it reflects the nature of second language learning.
NOTES
1 The majority of work in ICALI, has concentrated on the grammatical domain with error correction being the main tutoring strategy. Such work appears to sit uneasily with the communicative methodologies that currently hold sway in pedagogical theory. Accordingly, ICALL has occasionally been dismissed as being irrelevant to learning needs. This claim misses various points. For example, any complete ICALL system should be able to detect and correct errors — just as any human teacher can do. Error detection can tell us much about the current state of knowledge of the language learner. Armed with such information, the human teacher may respond in various ways; for instance, choose to ignore it, offer overt correction, ask another question which may focus the learner on the problem area or decide to move to another topic area more in keeping with the learner's current ability. Current systems are weak partly in the range of errors that they can detect and also in the (lack of) flexibility with which they can
22
respond. The claim is, then, that error detection and correction is a valid part of any complete ICALL system but that because of the limitations of current knowledge it appears to be the sole objective. Similar remarks can be made with regards to grammatical form; any system must be able to handle grammatical form, even for communicative purposes, since it is a crucial determinant of both semantic and pragmatic structure. Current ICALL concerns should, then, be seen as developing subsystems that will eventually take their (perhaps limited, with respect to, tutoring) part within the ultimate ICALL system.
2 At the conference I claimed that the only (passing) reference to its potential use within ICALL seemed to be Ghemri, 1991. It came as a (pleasant) surprise to find the very next presentation by Melissa Holland described work using PPT developed at the U.S. Army Research Institute Alexandria.
3 This last example is something of a maverick approach since it is based on the assumption that the mapping from text to semantic structure can be achieved without the mediation of a syntactic component.
4 Note that we are not claiming that the various grammar frameworks under consideration should, directly, form the basis of instruction. It is true, for example, that Fum, et al. (1992) and Pijls, et al. (1987) have chosen systemic grammar and IPG, respectively, because they believe that they provide a suitable pedagogic as well as linguistic/computational grammar. However, it seems quite clear that PPT does not fulfill this role (nor, for that matter do DCGs). If PPT is to form the grammatical base of an ICALL system, we will have to assume that there is an intervening pedagogic grammar which mediates between the computational/linguistic grammar and the user (see Chanier, et al., 1992, for some discussion):
Computational -> Pedagogic -> User
Grammar Grammar
Clearly, there are considerable problems in determining the link between the computational and pedagogic grammars.
5 We remain neutral on this point since if the various principles of PPT are (innately) still available for SLA then an ICALL system may be able to simply ignore them. See Cook, 1989, for some discussion.
6 This is not true if -role assignment is directional.
7 The presentation closely follows the proselytizing papers of Berwick (1991) and Berwick and Fong (1990).
8 For example, the Earley algorithm for context-free languages can quadruple its parsing time when the grammar is doubled.
23
REFERENCES
Abeillé, A. (1992)."A Lexicalized Tree Adjoining Grammar for French and its Relevance to Language Teaching," M. Swartz and M. Yazdani (Eds.), 65- 87.
Atkinson, M. (1992). Children's Syntax: An Introduction to Principles and Parameters Theory. Basil Blackwell: Oxford.
Bailin, A. (1991). "ICALI Research Investigations in Teaching and Learning," CALICO Journal, 9, 5-8.
Berwick, R. (1991). "Principle-based Parsing," P. Sells, S. Shieber and T. Wasow (Eds.): Foundational Issues in Natural Language Processing. Bradford Books, MIT Press: Cambridge, Mass., 115-226
______. (1991a). "Principles of Principle-based Parsing," R. Berwick, S. Abney and C. Tenny (Eds.), 1-37.
______, S. Abney and C. Tenny (Eds.) (1991). Principle-based Parsing: Computation and Psycholinguistics. Kluwer: Dordrecht.
______, and S. Fong (1990). "Principle-based Parsing: Natural Language Processing for the 1990's," P. Winston and S. Shellard (Eds.). 287-325.
Catt, M. and G. Hirst (1990). "An Intelligent CALI System for Grammatical Error Analysis," Computer Assisted Language Learning, 3, 3-26.
Chanier, D., M. Pengelly, M. Twidale and J. Self (1992). "Conceptual Modeling in Error Analysis in Computer-assisted Language Learning Systems," M. Swartz and M. Yazdani (Eds.), 125-150.
Chen, L. and L. Barry (1989). "XTRA-TE: Using Natural Language Processing Software to Develop an ITS for Language Learning," Fourth International Conference on Artificial Intelligence and Education, 54-70.
Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. Praeger: New York.
______. (1986a). Barriers. MIT Press: Cambridge, Mass.
Cook, V. (1989). "Universal Grammar Theory and the Classroom," System, 17, 169-181
Dorr, B. (1990). "Machine Translation: A Principle-based Approach," P. Winston and S. Shellard (Eds.). 327-361.
______. (1991). "Principle-based Parsing for Machine Translation," R. Berwick, S. Abney and C. Tenny (Eds.), 153-183.
24
du Plessis, J., D. Solin, L. Travis and L. White (1987). " UG of not UG, That is the Question: A Reply to Clahsen and Muysken," Second Language Research, 3, 56-75.
Feuerman, K., C. Marshall, D. Newman and M. Rypa (1987). "The CALLE Project," CALICO Journal, 4, 25-34.
Fong, S. (1991). "The Computational Implementation of Principle-based Parsers,' R. Berwick, S. Abney and C. Tenny (Eds.), 65-82.
______, and R. Berwick (1991). "The Computational Implementation of Principle-based Parsers," M. Tomita (Ed.) Current Issues in Parsing Technology. Kluwer: Dordrecht, 9-24.
Fum, D., B. Pani and C. Tasso(1992)."Naive vs. Formal Grammars: A Case for Integration in the Design of a Foreign Language Tutor," M. Swartz and M. Yazdani (Eds.), 51-64.
Ghemri, L. (1991). "Specification and Implementation of a GB Parser," C. Brown and G. Koch (Eds.) Natural Language Understanding and Logic Programming 3. North-Holland: Amsterdam. 11 1-126.
Handke, J. (I 992). "WIZDOM: A Multiple-purpose Language Tutoring System Based on AI Techniques," M. Swartz and M. Yazdani (Eds.), 293-305.
Abstract:
Most recent work in ICALL has tended to focus on syntactic structure. Clearly the grammar formalism chosen for such systems is of some importance. However, as this paper argues, little consideration seems to have been paid to such matters beyond the question of computational efficiency. Following previous work, the paper further argues for choosing a formalism that potentially meshes with work in SLA. Of all the main grammar formalisms being developed, GB theory, with its emphasis on Universal Grammar, has had the most impact on SLA research. Recent advances in "principle based" parsing now make possible the integration of such work into ICALL.
INTRODUCTION
The theme of this conference is "bridges.' Bridges help to reduce the boundaries between disciplines and this is something I have been recently advocating as necessary with reference to Intelligent CALL (ICALL) (Matthews, 1992a). It is a topic that I want to pursue in a little more detail in this paper. The connections I most want to explore are three-fold: between linguistic theory and ICALL, between Second Language Acquisition theory (SLA) and ICALL and, finally, between linguistic theory and SLA (Figure 1).
0x01 graphic
5
The claim is that the link-, between these disciplines allow each to inform the other, so that findings and advances in one area will provide insights and gains in the others. It is only in this way that we can hope to enhance the effectiveness of CALL (intelligent or otherwise). Unfortunately, the indicated ties frequently exist more in theory than practice; this is why the lines have been dotted in the diagram. As becomes apparent later, these often weak dependencies will only allow a few tentative conclusions in the overall thrust of our argument.
We start by justifying the connections. ICALL can be roughly characterized as the attempt to use techniques from Artificial Intelligence (Al) within CALL. The claim is that the resulting systems will have the ability to respond flexibly to the users of the software so that not only will they be able to handle any input-including, crucially, the unexpected — they will also be able to tailor their interactions to the individual.
The obvious Al research areas from which ICALL should be able to draw the most insight are Natural Language Processing (NLP) and Intelligent Tutoring Systems (ITS). Indeed, it is usual to conceive of an ICALL system in terms of the classical ITS architecture pictured below.
0x01 graphic
The Expert module contains the knowledge that is to be tutored, the Student module consists of a model of what the student knows regarding that domain (plus any other relevant details such as learning preferences) and the Tutor module determines what should be taught and how.
6
In this paper, our interest is in characterizing the Expert module and, in particular, an Expert module concerned with grammatical form.1 Consequently, it is the NLP side of ICALL that is most relevant to our cause rather than the peculiar ITS concerns of student modeling and tutoring strategies.
It is for this reason that we have drawn the link between ICALL and linguistic theory in the diagram since it would seem to be a truism that NLP will have close links with linguistic theory. That is, one would expect linguists to develop particular grammar frameworks (GFs) and descriptions of various languages using these GFs which the computational linguist then simply implements. However, this is a relationship that is sometimes more honored in the breach than in the observance; hence, the dotted line in the diagram. Usually the problem is that linguists do not work within a sufficiently precise formal framework as required for a computational implementation. This, however, is becoming less so and we will adopt the assumption that, since our ICALL system is going to need a GF in which to couch its grammatical descriptions, linguistic theory is a reasonable place to start the search.
Turning to the link between ICALL and SLA, I have argued previously that there is a stronger commitment to "going AI" than simply the importation of a set of clever programming techniques into CALL (Matthews, 1992a). That is, the majority view of Al sees it as an attempt to understand the (psychological) mechanisms underlying human intelligent behavior. To assess its achievements in this undertaking, AI must ally itself to those other disciplines which take human intelligence as the focus of their enquiry. These fields range, amongst others, from philosophy and linguistics to (cognitive) psychology and anthropology via visual perception and neuroscience. In brief, this "cognitive science" view of ICALI, requires that it be informed by theories of psychology and, in particular, theories of SLA. As we shall have cause to remark later, there are various shortcomings in much SLA research which precludes such a beneficial relationship- again, this is why the line is dotted. However, that work which is relevant in SLA should provide a keen stimulus to ICALL.
The final connection, between linguistic theory and SLA, is based upon the (sometimes controversial) assumption that SLA, just like NLP, should be informed by linguistic theory. Of course, we are not claiming that linguistic theory will account for the full range of facts that constitute SLA; clearly, other social and psychological factors are involved. However, linguistic theory does play the crucial r6le of being able to describe
7
(within this limited domain) what is acquired and, given a suitably articulated theory, some indications as to how this might be achieved.
It is this complex network of connections and concerns that makes ICALL such a potentially rich area of study. As Alan Bailin notes:
[ICALL is] part of an important endeavor. [It] constitute[s] an experimental scientific investigation of [second] language teaching and learning... [which]...explicitly or implicitly make[s] general claims about the components of language teaching and learning. [Bailin, 1991]
Note that this passage clearly (and correctly) implies that bridges provide access between disciplines from both directions.
The underlying assumption of much of the above is that the only direction of influence within this net is all towards ICALL. However, we have also suggested that this effect, in many cases, is rather weak But, once the connections have been noted and the nature of the bridges, it becomes quite possible to look for the influence also flowing in the other direction. Indeed, it should be expected.
What GF would one choose to characterize the (grammatical) Expert domain of an ICALL system? Here our main interest is in discussing some of the criteria that should be applied in making this decision. As this section has suggested, at least some of these criteria should revolve around considerations having to do with linguistic and SLA theory.
In order to be able to make the discussion more concrete, we will compare and contrast two frameworks, Definite Clause Grammars (DCGS: Pereira and Warren, 1980) and the current version of Chomsky's transformational grammar known either as Government and Binding (GB) theory Or, increasingly, Principles and Parameters Theory (PPT: Chomsky, 1986). These GFs have, in part, been chosen because of their very different approaches to characterizing a language. It turns out that these differences also have interesting computational properties.
It is probably not an exaggeration to say that DCGs (and closely related frameworks) are currently the favored GF in ICALL. PPT, to the best of my knowledge, has not so far been used in ICALL.2 The burden of this paper is to argue that, all things considered, there are considerable arguments in favor of adopting PPT as a GF for ICALL.
8
CRITERIA OF ADEQUACY
It may prove useful to start with a brief survey of (some of) the main GFs that have so far been utilized in ICALL:
(1) Various Augmented Phrase Structure frameworks (including DCGS) as used, for example, by Chen and Barry (1989), Schwind (1990), Labrie and Singh (1991), and Sanders (1991). Also included are systems embedded under PATR-114ike environments such as Levin, et al. (1991) and Chanier, et al. (1992).
(2) Augmented Transitions Networks (ATNS) used by Weischedel, et al. (1978). Handke (1992) uses a Cascaded ATN variant.
(3) Lexical Functional Grammar (LFG) used by Feuerman, et al. (1987).
(4) Systemic Grammar used by Fum, et al. (1992).
(5) Tree Adjoining Grammar (TAG) used by Abeillé (1992).
(6) Incremental Procedural Grammar (IPG) used by Pijls, et al. (1987).
(7) Word Grammar used by Zähner (1991).
(8) Preference Semantics used by Wilks and Farwell (1992).3
Even the above list does not include some of the frameworks enjoying considerable support in recent linguistic theory; for instance, Categorical Grammar, Generalized Phrase Structure Grammar (GPSG), and Head-driven Phrase Structure Grammar (HPSG) to mention just a few.
What criteria should be applied when deciding on a GF for ICALL?4 Amongst a number that come to mind, we choose to highlight the following:
Computational effectiveness
Since the GF is to be incorporated within a computer system, it should be capable of an efficient computational implementation. This imposes various conditions.
9
For example, the GF should be associated with a grammar formalism in which the framework is to be formulated. It is usually assumed that this formalism, itself, should have a clear syntax and semantics so that we know whether we have accurately expressed the relevant parts of the GF (Pereira and Shieber, 1984). This has shown to be the case with the formalism used for DCGs but has frequently been doubted with respect to the notation used within PPT. It is certainly true that there is a plethora of different notational forms used in expressing PPT within the linguistic literature. However, recent interest in principle-based parsers has had to address this problem and more precise formalisms are now being used — for example, Ed Stabler has formulated the whole of Chomsky's (1986a) 13arriers framework using First Order Logic (Stabler, 1992).
A second computational requirement is that the GF be associated with well-defined and efficient parsing algorithms. Here, until recently, DCGs held the advantage over transformationally-based grammars. The reason is that the DCG formalism can be run almost directly as Prolog code. Given that there are now efficient Prolog compilers, DCGs can be compiled into impressively fast parsers. Equally, DCGs may also be associated with a whole range of different parsing strategies apart from the top-down, depth first, left-to-right strategy that "comes for free" when running the grammar directly as a Prolog program.
The early history of transformational parsing, on the other hand, is not so impressive. Although various systems were developed in the mid 1960's based upon the early versions Of transformational grammar, these tended to be highly inefficient due to their highly non-deterministic procedures and produced a large number of spurious analyses before finally finding a genuine candidate. Indeed, it is often reported that the MITRE system (Zwicky, et al., 1965) took 36 minutes to analyze an 11 word sentence. Matters have greatly improve d since then. For example, Marcus's PARSIFAL parser was the first of a new generation Of transformationally-based systems (Marcus, 1980). Adopting a deterministic procedure — via lookahead — PARSIFAL was respectably efficient. More recent systems based upon PPT perform with even greater efficiency and will be discussed in more detail below.
Accordingly, it is rather difficult to choose between our two exemplar GFs based on the criteria of this section. We return to other aspects of this question in the section "Rule- vs. Principle-based Parsing."
10
Linguistic perspicuity
GFs play two main roles within our disciplines. Their first, and most important, contribution is descriptive; they provide the tools with which to analyze the grammatical structures of language.
However, different GFs (and formalisms) tend to focus on or highlight different sets of linguistic phenomena and the decision to choose one framework over another might be made because one rather than the other facilitates the description of a particular phenomenon that we deem to be important.
As an example of this point consider the following. When introducing DCGs the first exemplar rule is often something like:
s -- > np(Num), vp(Num).
The focus of attention here is on how this rule (in combination with others) determines subject-verb agreement. That is, the term Num is a variable which is intended to stand for the number of an item. Because the same variable is used with both np and vp, the rule ensures that the value — whatever it is; singular or plural — of the subject NP is the same as that of the verb which heads the VP.
Such agreement properties are easy to express in DCGS. It is no accident, then, that those researchers who have chosen DCGS, and their like, as their ICALL GF tend to focus on agreement facts within the language.
Now compare the DCG account of subject-verb agreement with that in PPT. This involves the mysterious "Rule R" which associates agreement features generated under the 1-node with the (lexical) verb at the level of PF. This (morphological) rule will determine that, say, like + singular is realized as likes. The I-node is also co-indexed with the subject NP as part of specifier-head agreement. Accordingly, both NP and verb get associated with the same agreement features although via different mechanisms.
As this description makes clear, subject-verb agreement receives a more complex analysis within PPT when compared with a DCG. Accordingly, if all one were interested in was agreement properties of a language then a DCG would, ceteris paribus, be the obvious choice of GF. Of course, the overall decision is unlikely to be based upon such a simple example and there are other areas where DCGs are not so perspicuous and where PPT has an advantage. In the long term, choosing a GF is a matter Of trading
11
disadvantages against advantages with respect to the particular features that interest the researcher.
Little consideration seems to have been paid to the question of the descriptive adequacy of the various GFs within SLA. In part, this is due to the already remarked upon weak links that tend to bind SLA theory to linguistic theory. Surveying the literature, by far the most consistent adherence has been to the various species of transformational grammar including PPT. However, even when some version of PPT is adopted, the reason has more often to do with concerns relating to the acquisitional claims associated with PPT — notably questions to do with innateness — than with the descriptive adequacy of the framework.
ICALL cannot neglect such issues. Unfortunately, the impression is that many researchers adopt a framework such as a DCG simply because it is an easily implementable theory and then shoe-horn their analyses into the required form. Hence the emphasis on different types of agreement error, even though this does not seem to loom large in discussions of student errors within SLA. We might make two comments on this situation. First that even if SLA cannot always supply particular GF-solutions to the questions of ICALL, it can help to constrain particular GF-answers developed solely within ICALL. Second, here is a dear area where the general informational flow is quite likely to be from ICALL towards SLA.
Acquisitional perspicuity
Besides the descriptive role of GFs, their other contribution is explanatory; they aim to provide justifications regarding actual linguistic acquisition and development. Surprisingly, however, the influence of linguistic theory on both First Language Acquisition (FLA) as well as SLA has been Surprisingly small. There was a flurry of activity for about a decade starting in the mid 1960's which was heavily influenced by the then major (indeed, probably only) GF, transformational grammar. Soon after this pioneering work, disenchantment set in amongst both FLA and SLA researchers with linguistic theory so that for the next decade little acquisitions] work was informed by theoretical syntax. In the last few years, however, there has been a resurgence of interest in some of the new GFs. Pre-eminent amongst these has been the new incarnation of transformational grammar, PPT. There is now quite a large and rapidly expanding body of interesting work using PPT within FLA (see Atkinson, 1992, for a detailed survey) and SLA (see White, 1989, for an overview).
12
There are two aspects to the influence of PPT on such research. The first is acquisitional. Here the main claim is that FLA is mediated by innate Structures. It is the question of whether such mechanisms are still available to the second language learner which has most exercised the SLA community. The second aspect is developmental with a number of researchers beginning to use the PPT framework to account for actual maturational sequences observed in FLA (see, for example, Hyams, 1986, and Radford, 1990). This has not been investigated with the same vigor within SLA (although see du Plessis, et al. [1987] for some suggestive work relating to German).
Accounting for developmental sequences may well be an important consideration as far as the student and tutoring modules are concerned in the rest of the ICALL architecture.5 Again, we might make the same observation as at the end of the last section and note that the informational flow is quite likely to be from ICALL to SLA theory.
It might be thought that, given the drift of the last few pages, we are now in a position of being able to justify the already stated conclusion of this paper, namely that there are Strong reasons for using PPT as a GF for ICALL. However, we adopt a far more tentative position. Certainly, given a commitment to bridges as argued for and given the state Of Current research, there are strong reasons why PPT should be accorded careful consideration as a GF for ICALL. However this conclusion is drawn almost by default; other GFs just have not been applied with the same vigor. Adopting PPT on such grounds, then, would be premature.
The choice, therefore, between DCGs and PPT is not that clear when considering the second and third criteria above. However, there are some interesting factors which emerge when reconsidering the two GFs from the computational aspect. The arguments to follow are presented in terms of the properties of rule-base d vs. principle-based grammars. A DCG is an example of the former and PPT, as the name suggests, of the latter. Before turning to these arguments we first need to distinguish between the two types of framework.
RULE- VS. PRINCIPLE-BASED FRAMEWORKS
It is easiest to see the distinction between the two types of approach by thinking in terms of particular grammatical constructions. Usually, the comparison involves constructions such as active and passives. However, the same properties can be seen by considering something as simple as VP structure.
13
Rule-based frameworks work by defining specific rules for specific constructions; in our case this means a separate VP rule. A principle-based approach, on the other hand, sees a particular construction as resulting from the interaction of a number of simple, but relatively abstract, syntactic principles. As such, there is no one principle which solely defines the various properties of a VP. Diagrammatically we have:
0x01 graphic
We now flesh out this abstract description a little. Consider how a VP such as the bracketed example below would be described in the two approaches:
The Danes [like Maastricht]
With a DCG the basic structure would be handled by the simple rule:
vp --> v, np.
which induces the following tree (with relevant lexical items):
0x01 graphic
As we see, one rule, one structure. The description of the same structure in terms of a principle-based theory, initially, looks far more complicated. Here we need to describe some of the modules and their principles that make up PPT.
14
First, X-bar theory. This can be thought of as describing the basic tree structures that are allowed in natural languages. Roughly it says that trees take the form head-argument or argument-head (assuming there to be an argument). For example, this module will license, amongst many others, the following (simplified) trees:
0x01 graphic
The first tree is an example where the verb (the head) does not take any arguments. The other two trees represent possible structures where the verb takes a single NP argument to the right or left respectively.
The part of the theory which determines whether a verb appears with an argument or not is known as Theta theory (-theory). -theory relates to questions of who did what to whom. So, a verb such as like involves someone doing the liking and something being liked. These are the -roles of the verb. The main principle of -theory, the Criterion, (partially) states that each -role should be associated with a syntactic argument (which in this case means an NP). That is why an NP must appear in the VP headed by like in order to receive the verb's (internal) -role.
-theory, however, does not determine that the NP must follow the verb.6 This is accounted for in terms of Case theory. The main principle of Case theory states that all overt (i.e. pronounced) NPs must be assigned Case by a Case assignor (either Tense, V or P). Case is assigned under government — the assignor must govern the assignee — but it is also assumed that the assignment is directional. In particular, English is a language where the verb's Case is assigned rightwards. Accordingly, the only permissible tree which satisfies all the principles is that shown in Figure 4.
The other tree allowed by the combination of X-bar and -theory is ruled out by Case theory since the NP, which requires Case in order to escape the Case Filter, is in the wrong position to be assigned Case.
15
This might seem like great deal of intellectual baggage to account for a simple construction — especially when compared with the DCG account — and would be so if it were not for the fact that the same principles are used to account for other constructions. Consider, for example, the complex NP:
The enemy's destruction of the city
With a DCG this will require the addition of a new rule, say:
np --> np_poss, n, pp.
With PPT there is no need for additional machinery. The X-bar theory will determine the various possible tree structures for this string. This will be much as before except that "NP" will replace "VP." -theory determines that because destruction has two -roles to assign — just like the associated verb destroy — two NPs are required. Finally, Case theory requires these NPs are to be assigned Case. Since nouns are not Case assignors, we account for the presence of the Case assigning possessive's' and the preposition of. What looks like a cumbersome theory when considering a single structure starts to take on a more compact aspect when its coverage is expanded.
RULE- VS. PRINCIPLE-BASED PARSING7
The examples in the last section give some idea of the difference in approach between the two types of theory. We now turn to some of their computational consequences when implemented as parsers. We choose to examine those that have especial significance for ICALL.
Grammar Size
Rule-based frameworks require a large number of rules to describe a language. This is not usually apparent when looking at prototype systems since they only cover a highly restricted portion of the language. However, those systems with a wide syntactic coverage of English use literally hundreds if not thousands of rules. Matters are even worse for languages such as Japanese with a freer word order than English. In such languages the same basic construction may require a number of different rules to describe each of the various permissible permutations due to the variable word Order. This significantly increases the size of the rule set.
16
There is an obvious consequence for ICALL. As has been frequently pointed out, various learner errors appear to be due to transfer from the native language. One approach to handling such interference errors is by parsing the informed input with a combination of both the native and target language grammars (see Schuster, 1986, for an example within a highly restricted domain). But if this entails a complete system having a full grammar for, say, Japanese as well as English, the rule base will be astronomical in size.
Clearly, just discovering and writing such a large number of rules is problematic. However, there is also a computational problem since the parsing algorithms associated with rule-based systems run as a function of grammar size; the larger the grammar, the slower the performance.8 The consequence is that as grammars become more complete they will become less efficient when incorporated as parsers.
Solutions to this problem might be found with specially devised algorithms or dedicated hardware. Alternatively, one could move towards a principle-based system where the many different combinations of the same set of a dozen or so principles (with parametric variation) can encode the same information as many thousands of rules. Of course, here there is a promissory note that there are efficient parsing algorithms which can make use of such a grammar.
Grammar Specificity
Not only are rule-based frameworks construction specific, they are also language specific. Each grammar is tailored to describe a specific language and, because of their nature, does not Provide any easy way of stating connections between languages. Take as a simple example the different Word order of English and Japanese. As we have already seen, English complements follow the verb; however, in Japanese they precede it:
gave a book to Shunsuke
Shunsuke ni hon o age-ta
Shunsuke book give-past
For rule-based grammars such differences have to be accounted for by stating separate rules which are rather different in form:
English: vp -- > v, np, pp.
Japanese: vp -- > np, np, v.
17
Clearly, on such an analysis it is hard to capture the fact that these two rules are actually describing the same construction in the two languages. The root of the problem here is the rule-based approach's emphasis on defining string sets. Because the English and Japanese strings — lexical items aside — are very different, so are the rules.
Principle-based theories, on the other hand, try to abstract out a set of deeper and more explanatory (universal) principles which underlie such constructions. The same principles apply in all languages. It is the notion of parametric variation which accounts for the differences between languages. In the example under consideration, the assumption is that the relevant module is Case theory. We have already noted that the Case Filter requires that all lexical NPs must have Case. This principle applies to both English and Japanese. The difference lies in how Case is assigned within the two languages. Here the crucial factor is the direction of the assignment. The assumption is that in English, Case is assigned to the right — this is why NP complements follow the verb in order to pass the Case Filter. In Japanese, Case-assignment is to the left so NP complements must precede the verb.
The idea, then, is to describe cross-linguistic variation in terms of a set of common principles but associated with parametric variation. Rather than write a completely new grammar for each language, as the rule-based approach has to, a principle-based parser simply has to determine the particular parameter settings for each language (plus its lexicon). The result is that languages are seen as related rather than unconnected objects. Clearly, as far as ICALL is concerned this is a preferable conclusion to the position that no languages have anything in common apart from being the result of string concatenation. It opens up, for example, the possibility of being able to give principled accounts of language transfer.
Ungrammaticality
Any natural NLP system will encounter ill-formed input from time to time and a robust system should be able to handle such cases. The main differences with ICALL are that (a) ill-formed examples are likely to be more common because of the nature of the users, (b) certain recovery strategies such as asking the user to try again are unlikely to result in any improvement, (c) the ill-formedness is likely to be of a higher degree and (d) some pedagogic response will, on occasion, be required.
18
Parsing ill-formed input is problematic for a rule-based system. This relates back to the previously noted emphasis within such approaches on the description of a distinguished string set defined over the words of the language. Once this set of well-formed strings — i.e. sentences — has been defined the job of the theory is over. Accordingly, ill-formed strings are "beyond the pale" of the theory and to handle them within a parser utilizing such a theory requires some additional machinery.
A standard solution to this problem is to introduce more rules designed specifically to handle ill-formedness. This is a familiar approach in ICALL. Of course, it has the drawback of increasing the size of the rule set with the attendant efficiency problems noted in the first section. It is also a problematic exercise in writing enough rules to handle all possible ill-formed input. Finally, there is also the problem in that simply producing a rule to allow through an HI-formed example does not provide an explanation for that failure.
An alternative way of handling ill-formed input in a rule-based approach is by constraint relaxation. Take subject-verb agreement. Suppose a sentence fails to satisfy the well-formedness constraint that both the subject NP and verb heading the VP agree in number. The designer of the parser may specify that this particular constraint may be relaxed (with a record of the error being made). This was what the "failable" predicates of Weischedel, et al.'s (1978) German tutor were meant to achieve. Which predicates were to count as "failable" was left entirely at the discretion of the designer and, as such, provided no principled theory as to why certain predicates were failable and others not.
Constraint relaxation approximates a principle-based parsing approach but on a less systematic footing. If the emphasis in rule-based approaches is on defining string sets, the emphasis within PPT is on defining the underlying abstract principles which underpin the language. As such, any string may be considered with an eye as to how many of these principles (and which) it satisfies. Theoretically it does not matter whether all the principles are satisfied since the emphasis is not on sentences per se but on the principles. Of course, we can define a sentence as a string that satisfies all the principles but this is a derivative notion. In terms of ungrammaticality, the more principles that a string violates, the more ill-formed it is. But even if a string fails a number of principles at least some structure will be assigned. So, taking the ill-formed string:
John a book to the librarian gave
19
A parser will, at least, be able to assign it an X-bar structure. The problem has to do with Case theory and the already mentioned direction of Case assignment; since it is rightwards, the NP does not get assigned Case in violation of the Case Filter. Of course, it is quite easy to recover from this violation simply be assuming the alternative parameter setting of Case assignment being to the left.
Other principle violations can lead to far greater problems. This is, in part, due to the links between the various modules of the theory. For example, Case theory is defined relative to X-bar structure (the relevant structural notion being that of government). Accordingly, if a string cannot be assigned an X-bar structure there will also be Case theory violations. This would account for the extreme ungrammaticality of "word salad' strings such as:
the a to book librarian gave John
Such relationships between the modules of the theory also provide, in principle, a theory as to why certain violations seem to result in greater processing difficulties than others.
CURRENT IMPLEMENTATIONS OF PRINCIPLE-BASED PARSERS
This nexus of considerations indicates the potential of principle-based parsing for ICALL. Of course, they are of little worth if workable parsers cannot be developed.
Work has been proceeding for some time on implementing principle-based parsers. Wehrli's work on the parsing of French is an early example (Wehrli, 1983). In addition, a certain amount of research has attempted to combine insights from PPT within the Logic Grammar paradigm (see, for example, Stabler, 1987). There is even some work on Connectionist implementations (Rager and Berg, 1992).
The work with the highest Profile, however, has been emanating (mainly) from the AI lab at MIT under the leadership of Robert Berwick. Of especial interest here are the parsing system PO-PARSER (Principle-Ordering Parser) (Fong and Berwick, 1991, and Fong, 1991), and the machine translation system UNITRAN (Universal Translator: Dorr, 1990 and 1991). Kazman (not of MIT) has also produced interesting work relating to FLA (Kazman,1991). Using a parser called CPP (Constrained Parallel Parser) he has shown that by resetting various (adult) parameters all the sentences of two chosen
20
children between the ages of 24 and 29 months can be successfully parsed. Further, as the children age the parser with the child's initial settings fails on an increasing number of sentences whilst the adult settings produce increasingly successful parses.
Each of the above examples has been implemented and in the face of what seem like substantial architectural and computational design problems. That is, although it is reasonably easy to see how the various principles fit together conceptually within PPT, this does not tell us how a principle-based parser should proceed in constructing a parse. In fact, it might be thought that such parsers will encounter the same problems of non-determinism as the archaic TTG-based parsers mentioned earlier. Indeed, overgeneration is the major problem for such parsers.
The fault lies in the nature of the set of principles, each of which only contributes a small part to the overall effect of a structure. In terms of parsing, each principle will not constrain the final sentence structure to any great degree; it is the principles in combination which gives the theory its force rather than any one individual axiom. Consequently, each of the generator principles — such as X-bar theory — may license many thousands of structures, each consistent with the input string. Even applying the filtering principles — such as Case and -theory — to such sets will still often result in large numbers of postulated structures.
Of course, such overgeneration will lead to problems of slow parsing. In order to overcome this, various control regimes are imposed upon the principles. One possibility involves different orderings on the application of the principles. Fong has shown that certain sequences produce parses orders-of-magnitude faster than others. Indeed, Fong's PO-PARSER also allows dynamic principle ordering so that the parser can change its sequencing depending on the sentence type being parsed; this typically also increases the speed of the parse.
An alternative control Strategy is to co-routine the principles, interleaving one with the other. For example, the parser might start building a piece of Structure based on X-bar theory, then break off to check this partial structure against Case and 0-theory, and then return to its structure building.
Other questions of some import relate to which particular levels of the grammatical theory are to be computed. For example, some of Johnson's parsers work without constructing either D- or S-Structure (Johnson, 1991).
21
Experimenting along these lines, principle-based parsers have now been built which run at least as efficiently as large rule-based systems, producing parses within a few seconds. One can only imagine that future research will lead to even further gains.
CONCLUSION
The ostensible theme of this paper has been the choice between two contrasting GFs as potential components of an ICALL system. As such, we have seen that there are a number of reasons converging on the choice of PPT. However, the deeper thesis has revolved around the nature of the proposed criteria. As an interdisciplinary exercise ICALL must be sensitive to criteria pertaining to related fields. In the particular case, the choice of GF has stemmed from considerations relating to linguistic theory and SLA as well as the more obvious concerns of computational efficiency. A more complete discussion might well branch out into more general questions of psycholinguistics and cognition. The central claim is that this wide-angle view should equally be applied to any aspect of ICALL — indeed, to CALL in general — whether it be other aspects of the knowledge domain Or questions relating to the student and tutoring modules. Returning to the metaphor of the conference, the bridges linking ICALL to other disciplines are to be seen as furnishing essential supply lines of information rather than some kind of optional tourist attraction. The richness and complexity of the resultant theories will raise questions of the utmost difficulty but that is as it should be since it reflects the nature of second language learning.
NOTES
1 The majority of work in ICALI, has concentrated on the grammatical domain with error correction being the main tutoring strategy. Such work appears to sit uneasily with the communicative methodologies that currently hold sway in pedagogical theory. Accordingly, ICALL has occasionally been dismissed as being irrelevant to learning needs. This claim misses various points. For example, any complete ICALL system should be able to detect and correct errors — just as any human teacher can do. Error detection can tell us much about the current state of knowledge of the language learner. Armed with such information, the human teacher may respond in various ways; for instance, choose to ignore it, offer overt correction, ask another question which may focus the learner on the problem area or decide to move to another topic area more in keeping with the learner's current ability. Current systems are weak partly in the range of errors that they can detect and also in the (lack of) flexibility with which they can
22
respond. The claim is, then, that error detection and correction is a valid part of any complete ICALL system but that because of the limitations of current knowledge it appears to be the sole objective. Similar remarks can be made with regards to grammatical form; any system must be able to handle grammatical form, even for communicative purposes, since it is a crucial determinant of both semantic and pragmatic structure. Current ICALL concerns should, then, be seen as developing subsystems that will eventually take their (perhaps limited, with respect to, tutoring) part within the ultimate ICALL system.
2 At the conference I claimed that the only (passing) reference to its potential use within ICALL seemed to be Ghemri, 1991. It came as a (pleasant) surprise to find the very next presentation by Melissa Holland described work using PPT developed at the U.S. Army Research Institute Alexandria.
3 This last example is something of a maverick approach since it is based on the assumption that the mapping from text to semantic structure can be achieved without the mediation of a syntactic component.
4 Note that we are not claiming that the various grammar frameworks under consideration should, directly, form the basis of instruction. It is true, for example, that Fum, et al. (1992) and Pijls, et al. (1987) have chosen systemic grammar and IPG, respectively, because they believe that they provide a suitable pedagogic as well as linguistic/computational grammar. However, it seems quite clear that PPT does not fulfill this role (nor, for that matter do DCGs). If PPT is to form the grammatical base of an ICALL system, we will have to assume that there is an intervening pedagogic grammar which mediates between the computational/linguistic grammar and the user (see Chanier, et al., 1992, for some discussion):
Computational -> Pedagogic -> User
Grammar Grammar
Clearly, there are considerable problems in determining the link between the computational and pedagogic grammars.
5 We remain neutral on this point since if the various principles of PPT are (innately) still available for SLA then an ICALL system may be able to simply ignore them. See Cook, 1989, for some discussion.
6 This is not true if -role assignment is directional.
7 The presentation closely follows the proselytizing papers of Berwick (1991) and Berwick and Fong (1990).
8 For example, the Earley algorithm for context-free languages can quadruple its parsing time when the grammar is doubled.
23
REFERENCES
Abeillé, A. (1992)."A Lexicalized Tree Adjoining Grammar for French and its Relevance to Language Teaching," M. Swartz and M. Yazdani (Eds.), 65- 87.
Atkinson, M. (1992). Children's Syntax: An Introduction to Principles and Parameters Theory. Basil Blackwell: Oxford.
Bailin, A. (1991). "ICALI Research Investigations in Teaching and Learning," CALICO Journal, 9, 5-8.
Berwick, R. (1991). "Principle-based Parsing," P. Sells, S. Shieber and T. Wasow (Eds.): Foundational Issues in Natural Language Processing. Bradford Books, MIT Press: Cambridge, Mass., 115-226
______. (1991a). "Principles of Principle-based Parsing," R. Berwick, S. Abney and C. Tenny (Eds.), 1-37.
______, S. Abney and C. Tenny (Eds.) (1991). Principle-based Parsing: Computation and Psycholinguistics. Kluwer: Dordrecht.
______, and S. Fong (1990). "Principle-based Parsing: Natural Language Processing for the 1990's," P. Winston and S. Shellard (Eds.). 287-325.
Catt, M. and G. Hirst (1990). "An Intelligent CALI System for Grammatical Error Analysis," Computer Assisted Language Learning, 3, 3-26.
Chanier, D., M. Pengelly, M. Twidale and J. Self (1992). "Conceptual Modeling in Error Analysis in Computer-assisted Language Learning Systems," M. Swartz and M. Yazdani (Eds.), 125-150.
Chen, L. and L. Barry (1989). "XTRA-TE: Using Natural Language Processing Software to Develop an ITS for Language Learning," Fourth International Conference on Artificial Intelligence and Education, 54-70.
Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. Praeger: New York.
______. (1986a). Barriers. MIT Press: Cambridge, Mass.
Cook, V. (1989). "Universal Grammar Theory and the Classroom," System, 17, 169-181
Dorr, B. (1990). "Machine Translation: A Principle-based Approach," P. Winston and S. Shellard (Eds.). 327-361.
______. (1991). "Principle-based Parsing for Machine Translation," R. Berwick, S. Abney and C. Tenny (Eds.), 153-183.
24
du Plessis, J., D. Solin, L. Travis and L. White (1987). " UG of not UG, That is the Question: A Reply to Clahsen and Muysken," Second Language Research, 3, 56-75.
Feuerman, K., C. Marshall, D. Newman and M. Rypa (1987). "The CALLE Project," CALICO Journal, 4, 25-34.
Fong, S. (1991). "The Computational Implementation of Principle-based Parsers,' R. Berwick, S. Abney and C. Tenny (Eds.), 65-82.
______, and R. Berwick (1991). "The Computational Implementation of Principle-based Parsers," M. Tomita (Ed.) Current Issues in Parsing Technology. Kluwer: Dordrecht, 9-24.
Fum, D., B. Pani and C. Tasso(1992)."Naive vs. Formal Grammars: A Case for Integration in the Design of a Foreign Language Tutor," M. Swartz and M. Yazdani (Eds.), 51-64.
Ghemri, L. (1991). "Specification and Implementation of a GB Parser," C. Brown and G. Koch (Eds.) Natural Language Understanding and Logic Programming 3. North-Holland: Amsterdam. 11 1-126.
Handke, J. (I 992). "WIZDOM: A Multiple-purpose Language Tutoring System Based on AI Techniques," M. Swartz and M. Yazdani (Eds.), 293-305.
Tidak ada komentar:
Posting Komentar