LINGOL was originally conceived as a language intended for use by serious researchers. Due to its author's preoccupation with more mathematical pursuits during the past few years, LINGOL has not been exercised with anything but small-scale student-generated programs. Since the appearance of [Pratt1973], several more small-scale LINGOL programs have been written. More recently, the author has begun work on a large-scale program to see whether LINGOL really can be used as the research tool it was originally designed to be, without compromising those features that made it attractive to beginners (ease of use and low resource consumption). In this paper we briefly describe work to date on this project, and then concentrate on the implementation of the current version of LINGOL. This latter discussion will complement the one in [Pratt1973], which concerned the motivation for the language, that part of the LINGOL system accessible to the user. Inter alia, we shall present a new context-free parsing algorithm that, paradoxically perhaps, is responsible for the efficiency of the very context-dependent parser in LINGOL's cognitive component. The idea behind this algorithm is one that may be of value in other structure-eliciting type problems besides parsing.
SHRDLV is the obvious choice of names for such an imitation of SHRDLU, being its alphabetic successor (forbidding longer names). It has nothing to do with the author's first initial.
SHRDLV takes English sentences and translates them into a program which takes the action appropriate to the sentence, answering queries, noting assertions and obeying commands in a mindlessly dutiful way. For obeying commands to perform operations on the blocks accessible to the robot, there exist a set of high level operations including "pickup b", "put b on c", "grasp", "ungrasp", "tower(b,c,d,...)" and the like which know about such issues as finding places to put things and making room for things. What has been built to date are the translator and the blocks world operations. The interpreter for the output of the translator is in its initial stages of construction.
Note that the translator has left some of the so-called "deep-structure" analysis of the sentences up to the interpreter. In particular, the interpreter is aware of failings of the translator; it turns out to be simpler to have the programs running under the interpreter compensate for such things as an "OBJ" appearing somewhere and denoting an unfilled object slot than to make the translator go to the trouble of seeing that everything it produces is in perfect health. Also, anaphoric reference is resolved by the interpreter, although we suspect that this is a bad thing and propose to shift this back to the translator.
Examples:
PICK UP A BLOCK.
(ACHIEVE (PICK YOU (CHOOSE BLOCK) UP))
536. MILLISECONDS.
WHAT ARE YOU HOLDING?
(QUERY (GRASP YOU (WHAT)))
928. MILLISECONDS.
WHAT COLOR IS IT?
(QUERY (BE IT (WHAT COLOR)))
853. MILLISECONDS.
PUT IT BESIDE THE PYRAMID.
(ACHIEVE (PUT YOU IT (BESIDE (DEF PYRAMID))))
560. MILLISECONDS.
WHERE IS THE RED BLOCK?
(QUERY (BE (DEF RED BLOCK) OBJ (WHAT LOC)))
594. MILLISECONDS.
WHAT IS IT ON?
(QUERY (BE IT (ON (WHAT))))
542 MILLISECONDS.
WHAT SHAPE IS THE BIG RED THING?
(QUERY (BE (DEF BIG RED) (WHAT SHAPE)))
1176. MILLISECONDS.
IF YOU PUT ONE BLOCK ON ANOTHER, CAN YOU THEN PICK UP THE LATTER?
(ASSUMING (PUT YOU (CHOOSE BLOCK) (ON (CHOOSE OTHER)))
(QUERY (POSSIBLE (PICK YOU (DEF LATTER) UP))))
1759. MILLISECONDS.
CAN YOU PUT A BLOCK ON A PYRAMID?
(QUERY (POSSIBLE (PUT YOU (CHOOSE BLOCK) (ON (CHOOSE PYRAMID)))))
976. MILLISECONDS.
HOW MANY BLOCKS CAN YOU PICK UP?
(QUERY (POSSIBLE (PICK YOU (CHOOSE (WHAT QTY) BLOCK) UP)))
1114. MILLISECONDS.
WHY DID YOU DO THAT?
(QUERY (PAST (DO YOU (DEF) (WHAT REASON))))
703. MILLISECONDS.
WHICH IS TALLER, THE RED CUBE OR THE GREEN ONE?
(CHOICE (BE (WHAT) (RANK HEIGHT 1)) (OR1 (DEF RED CUBE) (DEF GREEN)))
2244. MILLISECONDS.
For example, the program for DEF expects a list of predicates, some of which are atomic, like "RED" or "BLOCK", and some of which are implicit lambda-expressions (with bound variable X), like "(OWN I X)". It searches for an item satisfying all of the predicates, and takes obvious actions depending on how many it finds. Some predicates, like "(QTY 2)" or "(RANK HEIGHT 1)" are in the nature of global predicates rather than local ones. The former means that two are expected ("the two red cubes"), while the latter means maximize with respect to height among all candidates. DEF is intelligent enough to conduct the search efficiently, preferring to begin with predicates known or likely to apply to few items.
A similar operation is CHOOSE, which differs from DEF in that it does not suppose that the speaker has a particular item in mind, but only wants it to have certain properties. Thus CHOOSE, unlike DEF, feels free to carry out its own optimization of choice with respect to what the item is to be used for. This avoids such foolish things as having the arm "pick up a block" from the far end of the table when there was one right beside the hand.
Space does not permit here any sort of demonstration of the Little Robot block manipulator, and the subject matter of this paper makes it somewhat irrelevant here. However, visitors to the MIT AI lab are more than welcome to watch it in action -- it makes an interesting demonstration to see a mechanical device solving complex logistics problems in finding space for things, navigating around things and dealing with several sub-goals nested simultaneously.
For the benefit of non-readers of [Pratt1973] we describe briefly the overall organization of LINGOL. The LINGOL system is envisaged as a translator from some natural source language to some target language (natural or artificial) of the user's choice, not necessarily a different language from the source language. There are two phases, one that elicits the surface structure of a sentence and one that produces the desired translation(s). The intention is that issues relevant to determining the intended surface structure versus those of translation should be separated out. This corresponds to the recommended practice when translating from English to French, say, of understanding each sentence (naturally taking into account previous sentences) before attempting the translation. No restrictions are made on where the LINGOL programmer draws this boundary, or to what extent the information in the two components is duplicated. In fact, he can omit the generative component entirely and put everything into the cognitive component, though at some cost in resource consumption. At run time, no firm commitment is made by the cognitive component to a particular choice of surface structure of an ambiguous sentence, allowing the generative component to pick and choose when the cognitive component has not had enough information to decide. At present LINGOL users are encouraged to try to make their cognitive component intelligent enough to make the right decision, and so far no LINGOL programs have attempted disambiguation in the generative component. One would expect this to change as people attempt more sophisticated programs.
A LINGOL program is a set of modules each having three components, a context-free rule, a cognitive function and a generative function. Their respective roles are as follows. The CF rule specifies a general English construction, the cognitive component (or "critic") supplies the expertise about that construction and the generative component supplies the information about the target language that may be relevant to this English construction. (Our tacit assumption of English as the source language reflects LINGOL's applications to date.)
It is fashionable these days to want to avoid all reference to context-free grammars beyond warning students of computational linguistics that they are unfit for computer consumption as far as computational linguistics is concerned. (See [Pratt1973] for arguments attacking context-free grammars as being too powerful for specifying programming languages, and [Pratt1969] for arguments attacking them as too weak for computational linguistics.) In LINGOL, their role is different from that in, say, the Harvard Predictive Analyzer [Kuno1965]. Instead of being used to encode all information about English, they form the basis of a pattern-directed non-deterministic programming language. This strategy has several advantages.
Before immersing ourselves in the technical details of the algorithm, let us consider the options open to us. The goal for the parser is to build the surface structure intended by the sentence's speaker. All it has to go on is the top and bottom of this tree, and the rules (grammar) constraining plausible surface structures. A decision must be made as to where growth should begin. Two extremes are the top-down approach, in which the tree is grown from its root, and the bottom-up, where growth begins from the bottom, that is, from the words of the sentence. Methods with the flavor of either (or both) of these extremes inherit their name(s). These methods have other aliases in the literature. Any scheme that claims to be doing "predictive" analysis, that is, that has expectations about what is coming next and uses those expectations to "drive" the program is essentially a top-down method. A program that finds substructures (say conceptual dependency structures [Schank1970]) and uses them to build bigger structures is a bottom-up method. Terms suggested to the author by R. Moore are "hypothesis-driven" for "top-down" and "data-driven" for "bottom-up". These concepts transcend phrase-structure grammars and may be applied to any system responsible for building an hierarchically organized system.
No matter where the construction begins, we do not know how to carry it out in any straightforward way, even if we want nothing more than to satisfy the context-free rules of our grammar. We always run the risk of letting the construction wander down blind alleys. For some grammars and some sentences, the top-down method is less likely to run into cul-de-sacs, but the dual case can also arise. It is hardly surprising, given this state of affairs, to hear people wish that they could build structures both top-down and bottom-up in a way that somehow reduced the overhead. One seemingly ambitious form of this wish is to request a single algorithm that builds no node a bottom-up method would not consider, nor anything a top-down method would not build. An alternative and even more ambitious desideratum might be that no node N be built unless that part of the text seen to date is part of some sentence having a surface structure in which N participates. For a backup-less parser like LINGOL's present one, this is the strongest possible thing one could ask for as far as exploring cul-de-sacs is concerned. One might add to the above the requirement that the parser be able to cooperate with other processes such as routines written by the user, but with so many wishes stacked up the situation seems hopeless.
The current implementation of LINGOL achieves all of the above goals. To be more precise, every node it builds is built by both the Cocke-Kasami-Younger bottom-up algorithm and the ingenious Earley top-down algorithm, the two algorithms cited in [AhoUllman1972] as the canonical methods for parsing general context-free languages. That is, the work performed is the intersection of the work done by each of these methods, at least with respect to proposed phrases. Moreover, as each phrase is discovered, LINGOL is able to accept advice from other sources (namely the user's cognitive component) and use it to guide the parse.
Roughly speaking, the minimization of searching is accomplished by running the Cocke-Kasami-Younger algorithm and as each phrase is discovered asking an "oracle" whether the Earley algorithm would have discovered it. The remarkable thing is that this question can be answered in time independent of the length of the input, without having to go and actually run the Earley algorithm to see what it would have done. (The time is proportional to the size of the grammar, but in the LINGOL implementation, asking the question involves no more than forming the logical and of three 36-bit vectors for a grammar of a hundred non-terminals, a size that SHRDLV attains with the help of its cognitive component, which takes over much of the work the large CF grammars of the 60's used to do.)
Before discussing the construction of the oracle, let us sketch the version of the Cocke-Kasami-Younger algorithm we shall use. We assume that all rules are of the form either A->B or A->B C, where A, B and C are non-terminals, or of the form A->a where a is a terminal. The presence of A->B means that this is not really Chomsky normal form, and allows either the user or some preprocessor to turn an arbitrary grammar into this form using only the trick of replacing all but the first item on the right side of a rule having three or more items by a non-terminal which is itself rewritten to be the replaced non-terminals, and so on until all rules have right sides of length 1 or 2. (In [Earley1968] the notion of "state" is introduced, which elegantly plays the role of these introduced non-terminals. In Earley's notation, the state AB.CDE plays the role of the non-terminal that replaces the CDE. Everything we say in terms of our restricted grammars can be rephrased more elegantly in terms of Earley's states. The main advantage of our notation is that things look less complicated if the reader is not required to think about arbitrarily long states.) With this form of grammar we can and shall talk about the left and right sons of binary nodes and the "only" sons of unary nodes. For our purposes it will be convenient to refer to only sons as left sons.
In the following version of the algorithm, we use the notation ?x (where x is a
variable) to mean "find all x such that". Thus the code
for ?i+?j=5 do print(i,j)
will print all pairs of numbers summing to 5. This avoids cluttering up the
algorithm with details of searching that the programming reader will have no
difficulty filling in.
We shall employ "between-word" notation for positions rather than "at-word". That is, rather than saying that the first word in the sentence is at position 1, we will say it lies between positions 0 and 1. This avoids any possible ambiguity when referring to the string lying between positions i and j, and also simplifies naming the common boundary of two concatenated strings.
The Cocke-Kasami-Younger algorithm is:
parse: for each word w between positions j-1,j (scanning left to right) do
(for each rule ?A->w do note(A,j-1,j);
for each phrase (?B,?i,j) in turn on the queue do
(for each rule ?A->B do note(A,i,j);
for each rule ?A->?C B such that (C,?k,i) has been built do
note(A,k,j))).
note(A,i,j): build(A,i,j); enqueue(A,i,j).
We assume as given the primitives that build a node (representing a phrase
consisting of a non-terminal, a starting position and an ending position -- we
shall use "phrase" and "node" interchangeably) and that enqueue and dequeue
items.
To make this algorithm run in polynomial time (O(n^3)) to be exact) "note" is
usually written as:
note(A,i,j): if (A,i,j) not yet built then (build(A,i,j); enqueue(A,i,j)).
The above suffices for context-free recognition. For parsing, nodes record (in
addition to the three items type, start and end) two additional items, namely
which rule was invoked when noting that node, and which phrases are its sons.
The former allows us to access the cognitive and generative components
associated with the rule at a later date, while the latter allows us to recover
the surface structure. (Having the rule present makes the syntactic category
of the node redundant, and in fact LINGOL omits it.)
We now introduce the oracle. Two things are required to construct this oracle: a readily accessible representation of the left-most-character relation, and the notion of a goal. We first deal with the goals. Associated with each position in the sentence is a set of goals. A goal is a desired non-terminal. If a phrase of the same type as some goal is discovered starting in the position with which the goal is associated, then that phrase can be used as a right son, in conjunction with some phrase discovered earlier, to form a new phrase. For example, if there is a rule A->B C and a B has been found ending in position i, then a goal C associated with position i will be created. Later, if a C is found starting in position i, it can be used with B to form an A.
Initially there is one goal, to find a Sentence (assuming this is the distinguished non-terminal of the grammar), which is associated with position 0. As the algorithm runs, other subgoals will be generated. While this sounds like a top-down method, no structures are built top-down -- they grow bottom-up and the goals are associated with the frontiers of the growing tree (with the exception of the Sentence goal). The goals appear just when and where you would expect them to; for example, when reading the sentence "Secretary of State Henry Kissinger announced today the cessation of ware in the Middle East", after reading the fifth word it will have set up (inter alia) a goal associated with position 5 and wanting a verb phrase. In fact all goals for a given position are created at once, just before reading the word at that position. We defer for a bit the question of where goals come from, and look first at how they are used.
Let R be a binary relation on the non-terminals of the grammar such that A R B if and only if there exists a rule of the form A->B ... . Then R*, the transitive closure of R, is the left-most-character relation alluded to above. Thus A R* B can hold just when there exists a derivation A =>* B ..., or for those who prefer trees (myself included) it should be possible for B to occur on the left edge of some tree with root A that satisfies the productions of the grammar. (See [GriffithsPetrick1965] for other applications of this relation.)
It is straightforward to incorporate this oracle and the goal machinery into
the algorithm. We assume primitives for storing and fetching goals which are
pairs (A,i) of non-terminals and positions. To introduce goals, append to the
innermost loop of "parse" the code:
for each rule ?A->B? C
such that goal (?D,i) exists satisfying D R* A do
setgoal(C,j)
and precede the code for "note" with the test:
if goal(?B,i) exists such that B R* A holds then
To see that the algorithm as modified does no less than it has to (i.e. that it
overlooks nothing), suppose we have an initial segment of a sentence of the
language of the given grammar. Then in any tree for this sentence, we claim
that all phrases in the tree contained within the initial segment will have
been proposed by the time we have reached the end of the segment. We also
claim that every right son in the tree will have been generated as a goal just
before the parser reached the starting position of that phrase. (The induction
proofs of these claims are messy and probably inappropriate for this paper --
the interested reader is encouraged to fill in his own details.) It follows
from these two claims that the oracle will always answer in the affirmative
when a phrase that appears in the tree is proposed to the oracle, because every
phrase is the left-most character of either some non-left-son or of the root,
and all such possible goals will have been created by the time we propose this
phrase. Hence the phrase is correctly built.
To see that no node is built that could not participate in some surface structure of some completion of the sentence, we must assume that for each non-terminal of the grammar there exists a derivation starting with that non-terminal and ending with a string of all terminals. This follows immediately if we require that every non-terminal appear in at least one surface structure of some sentence of the language, a perfectly reasonable requirement. Suppose that we have just built some node (B,i,j). Then there exists some goal (A,i) such that A R* B holds. Hence there exists a tree with A at the root and B on the left edge. We can therefore extend the sentence so that the part starting with the B reduced to A. One more reduction is now possible, using the rule that gave rise to the goal in the first place. We continue up the tree in this fashion, progressively extending the sentence and satisfying more goals, until we satisfy the Sentence Goal. At this point we have the desired sentence. This completes the proof of the claim that every node built has a chance of being used in the final surface structure.
What does this fancy algorithm buy as far as the practically minded user is concerned? One thing we do not claim is any improvement over the traditional O(n^3) speed limit for parsing sentences of length n. (See [Valiant1974] for an improvement to this situation -- he offers O(n^2.81...), though practical considerations make it much worse than most O(n^3) algorithms with respect to both speed and ability to take advice when parsing "typical" English sentences.) However, we do claim what amounts to an even better improvement in practice than going from O(n^3) to O(n^2), namely an improvement proportional to the number of non-terminals in the grammar. That is, there exist grammars for which the Earley algorithms may generate large state sets in its operation when our algorithm builds very many fewer nodes. (However, there do not exist any grammars where Earley's algorithm runs in time O(n^a) while ours runs in time O(n^b) for b<a. There do exist such grammars when comparing our algorithm with the Cocke-Kasami-Younger algorithm.)
In even more practical terms, how does this affect parsers working with a purported grammar of English? We conducted an experiment to compare our algorithm with the Cocke-Kasami-Younger algorithm by the simple expedient of suppressing the test in the procedure "note" that makes our algorithm different from the other. Working with the grammar of English used in the "9-hour" French translator exhibited in [Pratt1973], we found an improvement of a factor of five in the number of nodes built altogether! In fact, with the new algorithm almost all of the nodes built were used in the final surface structures of the sentences we tried. Lack of time has not allowed us to compare the algorithm with Earley's, but the sort of thing that makes our parser better than Earley's in some grammars is exactly what arises in English grammars. For example, if one has the rules Sentence->Np Vp, Sentence->Wh Vp and Sentence->Vp, then Earley's algorithm will generate states corresponding to each of these rules even when the sentence begins with, say, "Why". Earley's algorithm is not smart enough to realize that the first and third rules can be ruled out here (we are making some obvious assumptions about what the rest of this simple grammar might look like.)
The style of programming used in the cognitive component is analogous to that described in [Pratt1973] for the generative component. The primary difference is that, since the structures are being built bottom-up at the time the cognitive component is being built, it is not possible to declare variables high up in the tree for use by routines lower down, a facility that gives the generative component considerable power. This inability is inherent in the nature of any system that wants to do criticism on the spot without waiting for the rest of the sentence. If you don't know what's coming, you can't (other than by guessing) make assumptions about what the higher nodes using the one in question will look like. (This is not altogether true -- if the relation R* were represented explicitly as a collection of paths in R, it might be possible to set up variables on high as the goals are being generated, provided nodes being discovered below set their own sights on only one goal. It is likely that this would add substantially to the overhead of the system, however.)
Given that one's program can be expected to encounter many competing interpretations of a sentence, and that in many cases it will have to pass non-trivial judgement on these cases, it can make very difficult the task of writing a program to deal with much of English. LINGOL allows the user to organize his program so that the burden of the bookkeeping associated with discovering and comparing all these possibilities is shifted to the system, allowing the user to concentrate on writing code to criticize individual situations. M. Marcus has suggested to the author that in so doing he is allowing the user to concentrate on the competence aspects of English by supplying him with packaged performance.
A long range goal in this regard is to develop a high level language version of LINGOL such that grammars may be given to either computers or people. Then if the computer "understood" English on the basis of that grammar, and if people could read it painlessly, it would make an ideal theory of English. Since people read procedures painfully slowly if at all, the high level language will have to be considerably less procedural than at present.
In our approach, the CF rules function as a crude approximation to English that permits LINGOL to rapidly select from a huge set of possible structures for the sentence a small plausible set for more detailed (and expensive) criticism by the cognitive component. The context-free representation for the "crude approximation" is chosen partly because it is not difficult to construct quite good approximations using context-free grammars, and partly because there exist remarkably efficient algorithms for exploring the space of possible surface structures for sentences of context-free grammars, yet that can accept advice on a step-by-step basis.
This situation of having the system do one's bookkeeping was what obtained in the hey-day of context-free parsers, of course. One side-effect of the later disenchantment with and abandonment of context-free grammars was to throw the baby out with the bath-water by reverting to doing much or all of the bookkeeping oneself. Not only does this require an indefinitely larger programming effort, it also requires of the programmer considerable sophistication in parsing techniques if his code is to operate as efficiently as one of the better context-free parsing algorithms, especially when that algorithm can cooperate effectively with the user's code in assisting it to reduce the search space even further.
A comparison of our system with that of Woods [Woods1969] is inevitable. Where Woods has augmented transition networks, we have augmented context-free grammars. One difference is in our parsing algorithm, a property of the implementation rather than of the LINGOL language. We would like to claim that it is more efficient than Woods', but in fairness we cannot tell without trying them out on comparable grammars, a test yet to be arranged. Another difference is in the language -- we take a static view of English, Woods a dynamic view. Both have their merits and demerits, but mainly from a human engineering than a computer science point of view. It all depends on how the programmer likes to think about English. One notable difference is that LINGOL has far fewer of its own concepts than does Woods language, without losing any of the features of Woods system. The idea here is that the constructs provided explicitly in Woods system are provided in LISP anyway, so why duplicate them? Another difference is our separation of the cognitive and generative components, which we feel is valuable from an economic point of view.
Given grammar's language's sentence's initial segment assumed given in order to see no less overlooked than necessary. (Translation: To see that no less is overlooked than necessary, assume we are given an initial segment of a sentence of the language of the given grammar.) If you stumbled over this sentence then perhaps it is because the syntax is not there to speed things up for you. Conventional groupings of words have been rearranged in relatively unfamiliar, though not entirely ungrammatical or meaningless, ways and some "noise" words are gone. Nevertheless, with a little extra effort you should at least be able to parse the sentence correctly, and after a few passes you will begin to wonder why you ever had any trouble with it at all. Moreover, there seems no obvious reason why a program that could handle the original sentence could not equally well handle the above version. (I had difficulty restraining myself from replacing it with a more difficult sentence by the time I had typed it up.) The claim is that in the original version, part of the reason why you had less trouble with it was that it was phrased in a very conventional style that you have encountered frequently, allowing you to go straight to the places where you expect to find the information in the sentence. There are "noise" words all along the way, but to see that they are not really all that noisy, try replacing them with other noise words; the effect will be somewhat like switching all the street signs when navigating in one's car.
This then is a case for even more attention to syntax than is paid by either LINGOL or any other system known to this author, who strongly believes that extremely fast algorithms that waste no time at all in searching down blind alleys are possible. We have only to elicit the (presumably large) number of syntactic rules that seem to have gone unnoticed by traditional grammarians along with computational linguists but which our subconscious parsers presumably use constantly to make the right choice every time. (We append the usual speculation disclaimer here.)
(The "even more attention" in the case of LINGOL refers more to attention to previously unnoticed constructions rather than to attention to noise words. This author has noticed that LINGOL runs considerably more slowly on sentences containing relatively few "closed class" words in the sense of [Thorne1968].)
Thus while it is conceivable that one can get by without syntax (though what this means exactly is surely open to debate), even if one does so one is faced with culling out the structure desired from a huge choice without the benefit of syntactic information to reduce the search space.
Earley, J. 1968. An Efficient Context-free Parsing Algorithm. Ph.D. Thesis, Computer Science Dept., Carnegie-Mellon University, Pittsburgh, Pa.
Fahlman, S. 1973. A Planning System for Robot Construction Tasks. AI TR-283, MIT, Cambridge, Massachusetts.
Griffiths and Petrick, S. 1965. On the Relative Efficiencies of Context-free Grammar Recognizers, CACM 8, 5, 289.
Kuno, S. 1965. The Predictive Analyzer and a Path Elimination Technique. CACM 8, 7, 453-462.
Pratt, V.R. 1969. Translation of English into Logical Expressions. M.Sc. Thesis, University of Sydney.
[Pratt1973] Pratt, V.R. 1973. A Linguistics Oriented Programming Language. Proceedings of the Third International Joint Conference on Artificial Intelligence, Stanford University, Calif., August 1973.
Schank, R., L. Tesler and S. Weber, 1970. "Spinoza II--Conceptual Case Based Natural Language Analysis", AI- 109, Stanford University, Stanford, California.
Thorne, J., P. Bratley and H. Dewar, 1968. The Syntactic Analysis of English by Machine. In Machine Intelligence 3, Michie, D. (ed.)
Valiant, L. 1974. Technical Report on a fast context-free parsing algorithm, Carnegie-Mellon University, Pittsburgh, Pa.
Winograd, T., 1971. "Procedures as a Representation for Data in a Computer Program for Understanding Natural Language", Project MAC TR-84, MIT, Cambridge, Massachusetts.
Woods, W.A., 1969. "Augmented Transition Networks for Natural Language Analysis", Report No. CS-1 to the NSF, Aiken Computation Laboratory, Harvard University, Cambridge, Massachusetts.