By way of preparation for the demo, Doug Lenat sent me his recent paper "Enabling Agents to Work Together," which I read and responded to as follows.
Date: Sun, 10 Apr 94 22:32:01 PDT From: Vaughan PrattHi, Doug. I've now read the paper "Enabling Agents to Work Together" that you sent me, and also your August 1990 CACM paper (which was hard to find since it was cited in your EATWT paper as appearing in July).
Message-Id: <9404110532.AA03997@Coraki.Stanford.EDU> To: firstname.lastname@example.org Subject: Visit Cc: pratt
The papers didn't include any output from a CYC demo, so I'm not entirely clear as to what I should be expecting to see on Friday. Could you give me an idea of what to expect? Will you just be demonstrating CYC doing something under its own steam, or will there be an interactive session with CYC? If interaction, will you just be invoking some of its subroutines to demonstrate what they do, or will you be asking CYC questions? If questions, can CYC be queried in English (CycNL?) at all, or only in a formal language (CycL?). In either case, can CYC answer only prearranged questions, or can it field new questions?
If CYC can handle new questions, in what domains might it reasonably be expected to perform well? For example does it know about counting numbers, arithmetic, or lists, and if so, up to what level? Does it know that the world has meridians, latitudes, and poles? What does it know about travel, e.g. miles, gallons, and miles per gallon, or the concept of distance between two towns? Does it know how things move, such as that people can get places by walking? What other forms of transport does it know about? What does it know about things in the sky (sun, clouds, etc.), or the weather? What does it know about humans, e.g. does it know they have height, weight, organs, etc? Does it know anything about government, such as needing lots of votes in order to get elected? And are there other areas like these that CYC might be expected to handle reasonably well?
Also, how much reasoning ability does CYC have, approximately? For example if I told it that a was bigger than b and b bigger than c, could CYC tell on its own that a was bigger than c or would it need help? Examples of CYC solving concrete problems that demonstrate the range of its current reasoning abilities would help here.
I'm assuming the demo will work best if my expectations are well matched to CYC's current capabilities. If I arrive expecting too much I may go away disappointed, but if my expectations are set too low initially we may spend too long on things that don't do justice to CYC's full range of abilities.
I did not receive a reply.
(i) A spreadsheet (database, relation) labeled "Activity" from one source was displayed. It contained a record in which the IDL, a fictitious organization in the Middle East, attacks and destroys a Palestinian village between 0010 and 0300 on 7/4/93. CYC inferred an inconsistency with a record in another spreadsheet labeled "Organization" from another source that contained a record claiming that IDL is a pacifist organization. Inconsistencies were indicated by coloring the responsible cells of the spreadsheet red.
(ii) A spreadsheet labeled "Organization" contained a record showing that IBM believes in capitalism. A spreadsheet labeled "Personality" contained a record showing the ideology of an IBM employee, James A. Cannavino, to be communist. These entries were flagged as inconsistent. Manually changing "communist" to blank removed the inconsistency, and CYC then guessed "capitalist" as a replacement entry. I suggested changing IBM's belief to blank instead, which also removed the inconsistency.
The database showed another inconsistency involving Cannavino's date of birth, listed as 1953. This inconsistency turned out to be due to date-last-updated of this record being 1950, before he was even born. When the date-last-updated was changed to 1954 this removed the inconsistency.
Guha feeback: ``There were really two functionalities being displayed here. The first is of course the ability to detect inconsistencies. The second (and possibly more important of the two) abilities was that of integration of information accross multiple structured information sources. The three tables involved could have been created (i.e., not just the actual filled tables but their schemas) by 3 different people, on different machines, never having spoken to each other, etc., and Cyc would still have automatically "lifted" the cells' entries and noticed the cross-table contradictions. As a trivial example of this, you may have noticed that the Personality table had a "beliefs" column that was closely tied in with the Organization table's heading "ideology". The information from these tables gets mapped into a "universal schema" (Cyc) and then, after inference, translated back into the schemas of these tables.''
Guha feedback (cont'd): ``It is worth remarking that this data -- these spreadsheets -- were prepared for us (i.e., not by us) by our DOD customer, as a sanitized version of classified DB's. It is also worth reiterating that no particular example of its behavior is "the big deal." Any single cell value it highlighted as being suspicious, or set of cell values it highlighted as being inconsistent, etc., could trivially be caught by an expert system rule, finer typing and constraints on the DB's original schema, etc. The point is, rather, to note the breadth of such constraints which might prove useful in SOME case someday (and, to a lesser extent, to note the shallowness of the searches involving such knowledge.)''
The CYC demo was done with 20 images, each described by half a dozen CYC-L axioms. The request "Someone relaxing" yielded one image, 3 men in beachwear holding surfboards. CYC found this image by making a connection between relaxing and previously entered attributes of the image.
This inference was made using the following reasoning shown in a window. (I asked if we could just email the reasoning to me, but Guha said this would require the customer's permission, so I copied down what was in the window more or less verbatim. RA, X, G1 abbreviate longer gensym's, "allGenls" means "Subset", "allInstanceOf" means "memberOf.")
1. (=> (logAnd (allInstanceOf RA RecreationalActivity) (allInstanceOf X SentientAnimal) (DoneBy RA X)) (holdsIn RA (feelsEmotion X RelaxedEmotion Positive))) 2. (=> (performedBy X Y) (doneBy X Y)) 3. (allInstanceOf G1 AdultMalePerson) 4. (allGenls Vertebrate SentientAnimal) 5. (allGenls Mammal Vertebrate) 6. (allGenls Primate Mammal) 7. (allGenls Person Primate) 8. (allGenls HumanAdult Person) 9. (allGenls AdultMalePerson HumanAdult)These axioms plus certain of the half dozen or so properties typed in for that photo permitted the inference that G1, one of the three surfers, was relaxing.
Guha feedback: ``Plus some existing assertions about recreational activities. If given a picture (and corresponding caption) of someone actually surfing, rather than "standing, holding a surfboard", there would be both pro- and con- arguments over relaxing, and the con- argument would either dominate or at least tie.''
Another photo showed a girl reclining on a beach. The request "find someone at risk for skin cancer" turned up both the 3-surfer photo and this one. The logic used here was that reclining at the beach implies suntanning and suntanning promotes risk of skin cancer.
Guha said that CYC supports nonmonotonicity (exceptions, e.g. "unless you are wearing sunblock"). I asked if the image database contained any examples of nonomonotonicity, he replied that it didn't.
Guha correction: ``We can add the assertion that she is under a beach umbrella, and then it WON'T find her image to the skin cancer query. Then we can tell it that the umbrella is broken, has holes in it, etc., and her picture will be back. Then we can tell the system that it's cloudy out, and she'll be not found again. Etc.'' (Suprising such an example isn't already featured in the demo, given the importance attached in AI to nonmonotonic reasoning. -v)
I tried retrieving some of the other photos in this way. This worked for two requests, but then I asked for "A tree", and it failed to find the picture captioned "A girl with presents in front of a Christmas tree." We then asked for "A Christmas tree" with no more luck. Apparently CYC-NL was translating "Christmas tree" to "trimmed Christmas tree"; Guha tested whether this was the problem by adding the adjective "trimmed" to the information about the photo in the image database. It still didn't find the picture, so we left this as an unresolved mystery.
Guha feedback: ``The system we were running was the experimental system (i.e., the one to which code changes, etc. are being made) and not the released one. These two problems [this and the one below about whether one can drink bread] have since been fixed.''
This 20-image database is the only demo involving CYC-NL, CYC's natural language component. Guha said that CYC-NL correctly parses 85% of two pages worth of USA Today sentences, and gets the right semantics as well for 70%. I asked if we could look at these parses but they were not available, being in Austin. I expressed a strong interest in seeing these at some point in the future.
Guha feedback: ``We are in the process of making CycNL usable for the more general purpose of just browsing the KB and this should be available in a couple of weeks. That should be of more interest than looking just at a couple pages of static already-parsed sentences.''
An example of CYC-NL tranlating from English to CYC-L's internal language was provided by the caption "A girl is on a white lounge chair" for an image not previously entered (if I understood correctly). CYC-NL's translation of this English sentence was
(LogAnd (mtImageDepicts GirlLoungingAtBeachImageMt ChaiseLounge-1-G5055-365) (mtImageDepicts GirlLoungingAtBeachImageMt FemaleChild-1-G5054-364) (InstanceOf FemaleChild-1-G5054-364 FemaleChild) (on-2 FemaleChild-1-G5054-364 ChaiseLounge-1-G5055-365) (allInstanceOf ChaiseLounge-1-G5055-365 ChaiseLounge) (colorOfObject ChaiseLounge-1-G5055-365 WhiteColor))That is, the girl-lounging-at-beach image depicts two particular objects, chaise-lounge-365 and female-child-364. The object female-child-364 is an instance of a female child and is on chaise-lounge-365. (Guha feedback: in CYC's sense 2 of on---CYC has dozens of senses of "on".) The object chaise-lounge-365 is an instance of a chaise-lounge (Guha explained that without the "all" in "InstanceOf," the containing class would be required to be the minimal containing class) and is white-colored.
Guha feedback: ``To be precise, Cyc-NL translates the input which is then processed further to take into account the context of the utterance, i.e., that the statement describes what is depicted in that image. The context, in other words, is that of telling Cyc about images. So if I say "there's a girl..." what I really mean is "The image explicitly depicts a girl..." and Cyc gets this.''
Other axioms for this photo included "The girl is on a beach" and "The girl is reclining." The request "someone relaxing" found this image by inferring from the fact that she was reclining that she was relaxing.
Guha feedback: ``The number of axioms entered by hand was until recently well over 2 million. The new smaller number is the result of serious compaction, generalization, cleaning up of redundancies, etc. Our staff comprises 22 individuals at present (19 FTEs); we hope to staff up to over 30 soon, assuming that is we "stay in business" at MCC.''
[I figure that if 15 people worked 250 days a year for six years putting in half a million axioms, this would be 22 axioms per person per day. This rate for declarative programming is better than twice the often-used figure of 10 lines of code per day for imperative programs.]
Guha feedback: ``This analysis is not very accurate for a few reasons: our staff size has gone up and down, but for the first 5 years in particular we had a much smaller staff. Also, the typical knowledge enterer will work on a topic for several days, then enter several hundred axioms in a burst, in a day.'' (So presumably the average rate is *considerably* better than twice. Also imperative programming is surely at least as bursty. -v)
I wanted to know what CYC knew, and asked how we could find out whether it knew certain things. I began by asking whether CYC knew that bread is food. Guha asked this question in the form
(evaluate-term '(#%allGenls #%Bread #%Food))and then
(evaluate-term '(#%allGenls #%Bread #%EdibleStuff))and CYC returned True in each case. I then asked if CYC considered bread to be drink. Guha typed
(evaluate-term '(#%allGenls #%Bread #%Drink))which returned NIL, but Guha said that this merely indicated no knowledge. To get positive information one needs positive data, so Guha added
(#%MutuallyDisjointWith #%Bread #%Drink)to CYC's axioms. CYC was unable to infer from this that food was nondrink. Guha wasn't sure why, and after a bit of fiddling we dropped this question. [Guha feedback: fixed.]
I wanted to know if CYC knew that people needed food. To find this out, Guha asked CYC to show all axioms having "Dying" as a consequence. CYC found hang gliding and touching toxic substances but not starvation or anything related to food. Lots of axioms had "Eating" in their antecedent, but we didn't run across any bearing on the *need* for food, though we did run across many other items about food such as 8 ounces being the typical amount of soup that one eats.
Guha feedback: ``Cyc does know that lack of food causes hunger. And it knows that starvation is one way to cause death. It was missing the definition of starvation, in effect. This is exactly the sort of debugging involved in fleshing out the Cyc KB: get the answer to a question wrong, and see what it's missing, and add it.''
I then asked what CYC knew about the earth. CYC didn't have anything bearing on "Earth" under that name, but after some searching for axioms that might be relevant, Guha turned up one that mentioned PlanetEarth, which told us the name we should have used.
Did CYC know how big the earth was? CYC knew that PlanetEarth was bigger than PlanetVenus, but such comparisons with other planets was all CYC knew about the size of the earth. (To my surprise, no one else in my family knew the diameter of the earth even approximately, but all three knew that Venus was smaller, so at least on this detail CYC seems to be an excellent reflector of human knowledge.)
I asked what CYC knew about the sky. Guha said that CYC knew that the earth has sky (we didn't formulate a question testing this) but doesn't know what color the sky is. CYC knows about air that the atmosphere has air as one of its constituents, and that air contains oxygen, CO2, gaseous water, etc., though not the proportions.
Guha remarked at this point, if I understood him correctly, that 5-10% of CYC's knowledge consisted of axioms that someone typed in that held in Austin on a particular day.
Guha feedback correcting this: ``5-10% of the knowledge is of random specific information (such as the people who work on the project, etc.) Of course I should hope that 99% of it is true even in Austin, where many folks do believe it or not have common sense, and think that bread is edible, etc.''
Apropos of CYC's reasoning capability, Guha said that only a few of CYC's axioms are flagged as forward-chaining, e.g. male implies masculine (i.e. if you say someone is male then CYC immediately infers that he is also masculine rather than waiting for "masculine" to enter the arena in some other way).
I then wanted to know what CYC knew about cars, e.g. their number of wheels, range in miles, maximum velocity, etc. We found axioms indicating that the typical cost of a car was $6K to $80K, but none that contained any answers to my questions. Guha said that this remaining information would later on be obtained from extant databases once CYC had the ability to read them.
Guha feedback: ``Cyc does have the ability to read them already (as displayed in Demo 1). We do not have the relevant databases however. See the comment above about populating the KB with specific facts; that comment goes DOUBLE for data which is best held in a DB.''
The demo ended at 12:30, having taken two and a half hours.
Guha feedback: ``Actually, this is the fourth time this course is being taught. The first 2 times, it was taught by Doug and myself, the third time, it was taught by Doug, myself and Pat Hayes.''
Guha feedback: ``Don't forget, our goal, with building our NL front end, to to enable trained Cyc knowledge enterers work faster. I hope that in future years it also extends to allow Cyc-uninitiated folks to sit down and converse with it, but that is not a high priority for us this year.''
The demo was very helpful in calibrating me on the level I should have been testing CYC at. First, English was available for no CYC application other than retrieving images from among the twenty images in the prototype image database. Second, even when the questions are phrased in CYC's retrieval language (e.g. "Is bread food?" became (evaluate-term '(#%allGenls #%Bread #%Food))), or by associative search of CYC's half-million axioms, our main retrieval mode during the demo, the demo made clear that my expectations had been set way too high.
Guha feedback: ``The good thing about that, is that now your expectations have been set so LOW that you will be astounded at the progress we seem to have made the next time you take a look at it.'' (I will make myself available for this occasion when it arises. -v)
After looking at the axioms it is clear to me that merely lowering expectations is not sufficient in order to come up with a suite of questions that CYC is likely to be able to answer say 20% of. The distribution of CYC's half-million axioms in "knowledge space" showed no discernible pattern that would allow me to construct such a suite, short of simply picking particular axioms out of CYC's database and carefully phrasing questions around those axioms. And even then our experiences with "Is bread drink" and the Christmas tree indicated that CYC would still have difficulty with a sizable percentage of the questions of a suite constructed in this way.
Guha feedback: ``Our goal (at least for the next few years) for Cyc is not a program that can simulate a child in its input/output behaviour. The goal is more to create a common sense substrate for information retrieval based on content. Also, I am not sure how to respond to your complaint about not being able to discern a pattern in what Cyc knows. Getting a good grasp of this is one of the hardest part of our training people on the project and easily takes a couple of months. I should also point out that the NL part of the project has been around only for a year or so.''
Had my initial expectations been meet, I would have continued with the following questions, which I would have thought ranged in difficulty (for computers) from very easy to difficult but by no means impossible.
Guha feedback: ``At least some of these questions can be posed to and answered by Cyc in its current state. If you are interested, I can try an experiment involving this. I'd be interested in hearing from you about any other program with which you have had more luck in getting these questions answered.'' (What other programs exist that claim to be as comprehensive in their general knowledge as claimed for CYC? -v)
Try these yourself or on your kids to get some idea of how difficult you think they are for people. Then estimate how long it will be before someone writes a computer program that can answer say 50% of questions at this general level of difficulty.
I tried them on my kids, and to my surprise they both enjoyed the whole test as a low-stress off-the-wall pop quiz. Since they found the questions so easy, they kept looking for trick aspects; here one might expect a computer to do much better than a person in finding lots of "trick" interpretations.
When you suspect that the answer is just a guess, it is fair game to ask "Why?", bearing in mind that each successive "Why?" may be an order of magnitude harder than its predecessor.
For the questions themelves, click on Commonsense Problems.
The impression one gets in reading and hearing about CYC from its authors is that CYC is well along the path to having comprehensive general knowledge. What is lacking here is a quantitative measure of how far along. As things stand right now there exists for example no way of telling whether adding an English front-end enhances the rate at which CYC can acquire general knowledge, since there is no way of measuring this rate. If one goes by mere axiom count then the recent compression from 2 million axioms to half a million would indicate a step backwards. Presumably this is unlikely, but in the absence of an objective measure of progress towards comprehensive general knowledge, how can it be demonstrated concretely that the compression had a substantially more beneficial effect than would have been achieved merely by removing the first 1.5M axioms?