peter: so it's our greatpleasure to welcome today, doug lenat from cycorp. doug, asmany of you know, is one of the leaders in the field ofai, one of the original fellows of the ai association. i remember reading about hisautomated mathematician program in 1976. it's one of things thatreally inspired me to go into the field. and doug sort of had consistentvision and
leadership in this field ofa common sense knowledge representation. he's been the moses of thefield, striving to write down the knowledge in the book. since then, things haveevolved and changed. and in terms of connecting themost people with their queries to knowledge, i guess googlehas taken over the field of being the mountain in terms ofanswering the most queries. but today, moses has cometo the mountain.
and we reverently listento what he has to say. douglas lenat: thankyou, peter. thank you. that was an amazingintroduction. as you could tell, we had someaudio/visual problems. so the various demos and so on that iwas going to show you, i won't be able to. but after the talk, if you'reinterested, come on up to the front.
and on a very, very smallscreen, you'll be able to see some of the stuffrunning live. but the powerpoint presentation is almost as good. so basically, i had a meetingwith vint cerf recently where he encouraged me to use theword relevance as much as possible in this talk. so really what i'm goingto talk about today is semantically determinerather than
syntactically determine relevance. the basic problem is that evenafter 50 some years, software is still incredibly brittle. if you talk to medical diagnosissystems, even if they're world class diagnosticexperts, as a joke, we told the system about my oldrusted out pontiac. and it asked questions like, arethere spots on the body? yes. more on the trunkthan elsewhere?
no. what color? reddish brown. diagnosed it as havingmeasles with a very high confidence level. or car loan approval systemwhich granted a loan to someone who put down that theyhad 19 years of on-the-job experience at their last job,even though they were less than 19 years old and some.
i can go on and on, but yousort of have all these examples yourself from youreveryday life of this kind of brittleness of software. in terms of google, you getqueries all the time like this one, which if you basicallytype this into google, you won't actually get the answerin any one place. and it's heartbreaking, becauseof course, there are lots of places on the web thathave one or the other of these building heights.
in fact, there are a lot ofpages that have both of the and because they just happen notto have a sentence of this form, they're not going togive you the answer. they essentially would make youget the answer yourself by doing the arithmetic yourself,or if you have a query like this, some movie to take thekids to nearby here that's starting soon. why should you have to go totwo, or three, or four different sites in order to putthe information together
to get the answer toa query like this? by the way, this is in caseyou were dubious about the fact that you couldn'tanswer that query. and the first hit actually seemsto answer it, but it turns out they're talkingabout replicas of these objects in las vegas, not theactual originals and so on. now there is actually a pageyou can go to and get the height of one and theheight of the other. and so if you're able to dosubtraction, you can get the
answers to your question. so really what i'm talking abouthere are a combination of missed opportunities wherethe software doesn't really understand what the user isasking and can't combine information across pieces ofsoftware that have been written so that even if someonewrites a program which is able to answer a certain kindof question, you can't just dump all those programstogether and have them be as smart as someone who hadall those capabilities.
there's a real danger and anincreasing danger in this kind of brittleness, namely asprograms get more and more power in the real world. you're giving power to what arein effect idiot savants. you wouldn't really take yourchild to a physician who was an idiot savant who didn'tunderstand that automobiles can't get measles and so on. and similarly, every 20 years orso, there's a surge of sort of a media frenzy aboutforthcoming home robots.
you may remember 20 some yearsago, it was on nolan bushnell and enron. but now, partly because of theroomba and so on, you see lots of articles now about theimpending home robots that will mind the babyand mow the lawn. and so, well, the trouble, ofcourse, is that they'll just as blithely mow the baby,because they don't know. they don't care. they don't have common sense.
programs have the veneer ofintelligence at most, not true intelligence. and sometimes when they havethe veneer of intelligence, they're even more dangerousthan when they don't. so to give you an example whati mean by the veneer of intelligence, we could go back40 years to the eliza or doctor program that joeweizenbaum did in mit back when rogerian psychologywas very popular. this is sort of reflection, soyou say things like, i smoke.
and it says things like,tell me more about the fact that you smoke. so one of my favorite exampleswas where we said my dog's mother died recently. and the program said, tell memore about your mother. now the less you know aboutcomputer science, the more deep the psychiatric insightreally appears to be. mostly, what's going on hereis that the program simply doesn't know the word dog.
so it sort of hears blah,blah, blah, mother. and then it says tell memore about your mother. this is a lot like the garylarson far side cartoon that you may have seen where it'slike this guy is talking to his dog, and it's like whathe says and what the dog actually hears. and for those of you who aredog people, i will show you the rarely seen cat version ofthis of what you say to the cat and what the cat hears.
so basically, fast forward 40years, then you say, well, surely in 40 years, we can doa lot better job than that. well, it turns out if youlook at the touring test competitions that go on, there'sstill one by these annoying little chat bots likealice where you say like what is the color of a blue car? and it gives you backsome garbled eliza-like version of that. or you say i'm going to ask yousome simple questions, and
it says like, do you thinkyour plan will succeed? or where is sue's nose whensue's in her house? and it says where it belongs. now this is actually notsuch a bad answer. but it then ruins it by goingon to say try searching the world wide web. anyway, obviously, if you go toencarta and ask that, you get some garbled-- well, actually, in this case,you get two hits, one on the
history of automobiles, and thenone, i guess, because it doesn't understandcapitalization and punctuation on the central africanrepublic. if you go to ask jeeves, youpretty much get the same results that you'd get fromgoogle, namely pages that happened to mention thosewords but not actually understanding the question. so the basic idea is can we getthe computer to understand semantically, not justsyntactically, not just store
information for portrayal, anddepiction, and presentation to human beings, but actuallyunderstand the questions that are coming in, actuallyunderstand the material that it's displaying, and searchingthrough and indexing, to reason to decide what'srelevant, and even better, to reason to decide how toarithmetically or logically combine information from two,or three, or five different sources to answer a query thatisn't answerable on any one single page anywhere?
so ok, let's go about tellingthe computer all the sorts of things that you know aboutcars, and colors, and the eiffel tower, and heightsof buildings, and movies, and so on. and there's actually a lot ofstuff that people know. but still, we could writethat stuff down and tell it to computers. so here's a couple sentencesabout kitchen appliances. now after you say this, doesthe system understand that
microwaves and dishwashersare kitchen appliances? well, not really. i mean it has those sentences. and if you ask it in just theright way, it'll tell you yes, it understands that. but really, for all intents andpurposes, we could have type this in because it doesn'treally know what the terms mean. and so you could say, well, weneed to tell it more about
each of these kinds of things. well, this first thing requireselectricity, and the second one requires electricityand water. well, but remember it doesn'treally know english, so it doesn't know the meaning ofthose terms. so you have to tell more about those thingslike buzqa is shipped to people's houses in liquid formthrough pipes and so on. you keep on doing this again,and again, and again, keep explaining the meaning of theseterms, and not just the
terms, but also the relationslike requires. and slowly, it converge afterwriting millions and millions of these assertions into a setof axioms that have only one model, namely the real world. and finally, when you'vewritten enough, you can believe that the conclusionsthat would deductively come from all these assertions wouldbe the same conclusions that you would believe aboutthings in the real world, like pipes, and water, and liquids,and heights of
buildings, and so on. so to bring this home to youguys, how do i think the results that you present couldbe more relevant if the search engine had some sortof understanding? and i've actually included herefor old time's sake some examples that were motivated bypeople in the audience, in some cases, actually implementedby people in the audience 5, 10, and in thiscase, 15 years ago. this was something that r.v.guha worked on when he was
working on our project. so here, someone is asking forpictures of someone smiling. and the cyc system was able tocome up with a match on this particular captioned image, aman helping his daughter take her first up, where obviouslynone of these words are synonyms. so syntactic matchingis not going to find this particular match for you. but on the other hand, as ahuman being, you understand things about the realworld like, when you
become happy, you smile. and when somebody you loveaccomplishes something, it makes you happy and takingthe first step is an accomplishment. and parents lovetheir children. so if you believe these things,then it's a fairly short deductive proof to decidethat this image is likely to be relevant to thisparticular caption, i.e., this image is relevant to thisparticular query rather that
this image probably depictssomeone who is smiling. and so cyc has these pieces ofknowledge that represented in some machine manipulable, formallanguage, basically, predicate calculus form. and so we're talking about athree or four-step proof to decide that this query and thiscaption actually unify. so if you have these pieces ofknowledge, finding this match is trivial. if you don't have these piecesof knowledge, finding this
match is impossible. it's not like you could addanother 15,000 servers or let your algorithm run another fiveseconds, and it would find this match. it'll never find thismatch without these pieces of knowledge. here's another example from ther,v. guha days, where the query was for pictures of strongand adventurous people. and the entire document hereis a caption of just half a
dozen words, a man climbinga rock face. and you would have to knowthings like, when you're doing rock climbing, you haveto repeatedly lift your own body weight. so you have to be at leastmoderately strong. and if you put yourself at riskof dying like this, then you have to be at leastmoderately adventurous. and again, we're not talkingabout beating kasparov at chess doing 37 deep reasoning.
this is a trivial kind ofsearch if you have those pieces of knowledge and if thecomputer is able to use those to mechanically manipulatethose to do deductions. there's nothing specialabout image retrieval. one recent example from aproject we're working on for the government was an analyst'squery for government buildings damaged interrorist events in beirut during the 1990's. and the actual document talkedabout a 1993 pipe bombing of
france's embassy. so you have to know thingslike embassies are government buildings. and 1993 is during the 1990's. and if there was a pipe bombing,then probably, it was a terrorist event and so on. so again, knowing a few thingsabout the real world, you can answer this query. sometimes you need domaindependent knowledge as well,
like to answer this one, youhave to know things like sa-7's are capable of shootingdown low flying aircraft and so on. but still, you getthe basic idea. we're talking about relativelyshort, relatively simple searches, if you have thepieces of knowledge. so this is a little thankyou to guha for pushing us in this direction. not just finding information,but also consistency checking,
and in some cases, guessing atmissing pieces of information can be done this way. so here, you can think of thisas an excel spreadsheet or a relational database ofemployee information. and in the second row, we seethings like, well, this person looks like they were hiredbefore they were born. and they listed themself astheir own emergency contact. and the person that listedthem as their significant other is different from theperson they listed as their
significant other and so on. and so this doesn't violatethe data type of this particular data structure. it doesn't violate theconstraints let's say of the spreadsheet. but it violates common sense. it violates your knowledge aboutthe everyday world, and human attention shouldbe called to this. and you could say, well, whyisn't it the responsibility of
whoever put this together? when they were putting thisschema together, why didn't they preconceived all thedifferent constraints? well, in reality, therearen't 8 columns, there 80,000 columns. and they're not spread overone single table. they're spread over hundreds oreven thousands of tables. and the people who put thosedatabases together had no idea of the existenceof each other.
and it's your ability to readthe column or relation headings and understand whatthey mean in human terms, in common sense terms, that enablesyou to decide that some of these things arecontradictory with each other. so how can our programs beintelligent rather than just having the veneerof intelligence? and the answer is by having andby being able to apply, not just store and display thislarge corpus of knowledge spanning pretty mucheverything from
domain-dependent knowledge towhat you would call common sense that ogg the caveman had,like if you've got some open container of liquid, andyou turn it upside down, the stuff's going to fall out. so basically, i could go throughlots of different examples from lots of differentsub-fields but partly because of peter and someof the other folks in the audience here, i chose somepoignant examples from natural language understanding to drivethat point home of why
you really need to have thiskind of knowledge if you want to semantically understand,not just store and display this kind of information. a lot of these examples areactually 30 years old from a class i took from terry winogradat stanford back in the early '70s. but basically, here's one whereyou know that the first kind of pen is probably awriting implement, and the second one isn't.
but what tells you that? is it the definitionof these words? is it linguistic theory? no, it's your knowledge of howbig they are, and where they usually are, and stufflike that. or here, the police watch thedemonstrators because they feared violence. or the police watched thedemonstrators because they advocated violence.
one of the they's is goingto be the police. one of the they's is goingto be the demonstrators. but how do you knowwhich is which? the referent of that pronounis basically determined in your mind by your model ofpolice and demonstrators, and what they do in the context,and so on. mary and sue are sisters. if i say this to you, youprobably assume i mean that they're each other sisters, notthat they each just have a
sibling, and they're notrelated to each other. it would be cruel andmisleading if that's what i meant. on the other hand, if i say maryand sue are mothers, it would never cross your mindfor a second that they are each other's mothers. why is that? well, it's because of yourknowledge of biological reproduction.
it's not the english languageor linguistic theory that tells you this. or every american has a mother,obviously, not the same mother. but an almost identicalsentence, it is more or less the same president. john saw his brother skiingon tv. the fool didn't have a coat on. who's the fool?
presumably, the person skiing. but it's your knowledge ofskiing, and climate, and weather, and how televisionswork that enable you to determine that. if i'd said the fool didn'trecognize him, now the fool would be john, the personwatching television. again, your knowledge of thereal world helps you to disambiguate this possiblereference. i can go on and give youlots more examples.
but just one final example,almost every burns and allen routine is built around thiskind of misunderstanding. so here's one where georgeis saying, "my aunt is in the hospital. i went to see her today, and itook her flowers." and gracie says, "that's terrible. you should have brought herflowers." and that's because there are a lot of words liketook, and sanction, and table that actually reversed theirmeaning depending on the
context in which they're used. and we rely on our knowledge ofthe world in order to tell us what is actually beingintended by the speaker or the author. so when i talk about a largecorpus of knowledge, what is this knowledge? we're talking about facts, andrules of thumb, and so on. but we have to represent thisin some fashion that the machine can manipulate.
and so by using logic, by usingpredicate calculus as our representation, computerscan do deductive reasoning, and incidentally, inductive,and abductive reasoning as well, themselves, on thatrepresented knowledge. and because the sentences arecomposed of words, the full list of words or terms thatwe use in our language is something we refer toas the ontology. and because the grammar andsyntax is formally regulated, we can refer to this asessentially a formal ontology.
so when people talk about aformal ontology, this is pretty much all they're reallytalking about, restricted set of terms and a restricted setof grammar rules for how you can compose sentences outof this, and hopefully, restricted enough that themachine can logically create valid deductions outof all this. it's useful to organize terms inyour ontology in a kind of hierarchy or taxonomy. that gives you the power ofgeneralization, the power of
inheritance. so you can say things aboutvehicles, or trucks, or whatever, and some particulartruck down here will inherit all that information. things like this truck isprobably driven by a trained adult human being and probablycan't control its altitude. and that kind of taxonomy canalso help you to correctly place information. so if you say water vehiclesslow down in bad weather, you
look around at the neighboringparts of the ontology and you say, well, that really appliesmore to surface water vehicles, and it reallyapplies to surface vehicles in general. so even though you didn't havetrucks in mind when you wrote that rule, it now applies totrucks, and in particular, to this truck over here, that thesystem will understand that that truck will probably haveto slow down in bad weather. so you sort of get the idea ofusing the ontology to help the
system figure out and to helpthe human building the system figure out where knowledgeshould be attached. when i say that we want torepresent knowledge in logic, obviously, you know what i meanby representing knowledge in english. these are ways of representingin a simple predicate calculus notation that socrates is manor that men are mortal. these are just two alternateways, a different levels of verbosity of saying thatmen are mortal.
and you can go on and writemore and more complicated expressions to represent thingslike, everybody has a mother who's a female of theirspecies and so on. often what we do in a case likethis, if we see the same kind of form occurring again,and again, and again, is we introduce a new predicate, inthis case, relationallexists, so that what used to be acomplicated looking rule is now a ground atomic formula, inthis case, a simple ternary assertion in our language.
so slowly, the number ofrelations, the number of predicates, has increasedto about 16,000 in cyc. and that number is slowlyincreasing, especially as we add new domains. but we increase that numberkicking and screaming not because we want to be able saythat we have a large number of different relations. and what do i mean by the systemcan produce deductions? i mean that just like you wouldexpect a human being in
this case to conclude thatsocrates is mortal given that socrates is a man and men aremortal, if we ask our system, is socrates mortal? it should and will comeback and say yes. and if you ask for thejustification, it'll give you exactly the two-stepjustification that you would expect here. if we had our live demo running,i would show you some cute examples.
we have about 50,000 commonsense tests that we tried to run every night just to try tomake sure that the system keeps consistency. one of them is something like,can a can can-can? and if you ask that,it will say no. and if you ask why, it will saybecause cans are inanimate objects and doing the can-cangenerally requires at least a partially mental doer asthe motive force behind the action and so.
and similarly, oh, good, we canactually show one of the ways that this would actually beasked and one of the forms in which the argumentwill come out. here you see the justificationand predicate calculus. you can press a button, andthe system can generate mediocre english,understandable, but by no means english we're proud of totranslate these predicate calculus assertions into englishassertions like, inanimate objects can'tbe the doers of
partially mental events. and cans are inanimateobjects. and can-can dancing is atleast a partially mental event and so on. so you get the idea. and in terms of more complicatedexamples, we have something that we call theanalyst's knowledge base, which intelligence analysts useto answer questions like, were there are any attacks ontargets of symbolic value to
muslims since 1987 onchristian holy days or things like that? and i think i'll skip throughsome of that, but you can get the idea. here's an example where theanalyst is asking who had a motive for the assassinationof rafic hariri? and they type short phrases inthat get recognized well enough that novices who aren'tfamiliar with cyc, and its ontology in ai, and predicatecalculus can still get their
question formed in a way thatboth they and the system believed it understands. and then when you askthe question, you get various answers. some of which are surprisinglike, in this case, the us and israel being behind hariri'sassassination. and if you ask for the sourcesthere, it turns out, well, this is actually someeditorial that appeared on al jazeera.
and obviously, if you want,you can click over to the original source for that one. if you look at a moretraditional western answer like syria was behind hariri'sassassination, you can ask for the justification there. and basically, you get somethingwhich says, well, syria oppose lebaneseeconomic reform. and we think hariri advocatedlebanese economic reform. it's in blue because the systemisn't sure about this.
this is the kind of abductivereasoning. and if you ask whether or notthis is true, the system will generate a kind of augmentedquery handed to google. and you'll find a set ofarticles, in this case, 19 hits, all 19 of which areactually perfectly adequate for answering the question, wasrafic hariri an advocate of lebanese economic reform? to take an even simpler example,if all you want are articles about hisassassination, if you go and
essentially type in that togoogle, putting in different forms of the word assassinateand assassination, you get some large number of hits. but there a fair number of falsepositives and negatives in the results that come back. to see some examples of falsenegatives, we basically had cyc use knowledge that it hadabout his assassination, like the fact that it occurred asa car bombing while he was traveling in a motorcade.
so you put in some of thoseterms, and you actually get thousands more hits thanyou got before. and so there are really largenumbers of false negatives that were simply missed by theprevious query, because they happen not to use the wordassassinate or assassination. similarly, to see some of thefalse positives, cyc knows when the assassinationoccurred. and in particular, it knowsenough about causality to know that articles that came outyears before the assassination
are probably not aboutthe assassination. in this case, it's a statementhariri is making about assassination many yearsbefore he, himself, was assassinated. this was a hit that was amongthose returned earlier. anyway, i can go on, but in thisparticular audience, i'm loathed to talk too much aboutremoving positive and negative errors because probably everyonein this audience knows a little bit moreor a lot more than i
do about that subject. by the way, i want to thank joeltruher for pushing us in this direction severalyears ago. that's partly also how we metmichael witbrock, who's with us now as our vp of research. joel came to us when he was athotbot and actually came to us with this really cool ideaof taking ambiguous queries like this one. and i'm sure you can return thehits right away, in this
case, about 26,000 hitsgot returned. but there's a mixture of hitsabout veterans, and veterinarians, and other thingsinvolving motorcycle race veterans and so on. and i just asked a query. and if the user happens to clickon one of these, like the user happens to click onmilitary veteran, then go ahead and augment the query andask it, in this case, to [unintelligible]
with or's and and not terms,to basically, hopefully eliminate the unwantedveterinarian hits. and indeed, in this case, youget hundreds of thousands, not tens of thousands of hits. and they're all aboutveterans. and similarly, if you hadclicked on this one, then you get the symmetricaugmented query. and again, you go from tens ofthousands of hits to hundreds of thousands of hits.
and they'd all be aboutveterinarians. and the other idea that joel,which was a good idea, was use the understanding of thequery to suggest plausible follow-up queries. so you wouldn't suggestfollow-up queries like, how do i train to become aveteran and so on? so you basically getthe idea there. now some of the queries that ishowed you, like this one, basically require not justcommon sense knowledge but
up-to-date database typeknowledge, the kind of knowledge you might get byvisiting a website, in this case, a theater listing website,or google maps website, or the imdb databasewebsite, and so on. so how do we get that knowledgeaccessed via the cyc system as well? so this is really another kindof application of cyc to access structured semanticknowledge in databases and websites out there online.
here's an example in whichsomeone was asking how different in age qusay anduday hussein were. and for the sake of argument,let's suppose that one structured source containsinformation about one brother, and one structured sourcecontains information about the other brother. and obviously, using arithmeticand common sense, you as a human being couldput these pieces of information together.
and cyc, because it knows thingslike objects aged one year per year, can alsoanswer this question. and basically, in the case ofthis question, come up with two years as the answer. but more than that, it can comeup with 1966 over here, add it to this database, and putas the source this number 30 over here. and it can put the number32 here and put as its justification this number1964 over here.
then you can do somethingcool which is to throw away cyc entirely. and now you have these augmentedstructured sources that contain information theydidn't contain before. so they're a little bit morecomplete than they were before this process happened. here's an example that occurredmore recently than the one i just showed you inwhich analysts were asking cyc what cities were particularlyvulnerable to anthrax attacks.
and you have to know things likethe number of suitable zoonotic hosts residing neareach large city in the us. and if you're not careful, youadd things like the number of chickens and the numberof pullets, and you get a wrong number. because if you don't know thatpullets are kind of chicken, then you accidentally add themtogether, that sort of thing. by the way, in case you wonder,the lucky winner today is phoenix.
and it's basically becausephoenix is warm enough. and it has enough people. and it has some astonishinglylarge number of animals living near phoenix, and some horriblysmall number of hospital beds per residentin phoenix, and so on. so that makes it particularlygood target for anthrax attacks. and if you ask why philadelphiais unsuitable? it's because philadelphia wastoo cold on the day we ran
this and so on. it's worth mentioning thatthere is no one correct monolithic ontology. a lot of times people mistakethe cyc effort as trying to claim that there's a singlecorrect monolithic set of knowledge to tell thesystem about. that's really not the case. cyc's axioms are divided intoa vast number of locally consistent contexts or whatguha called microtheories.
and you can think of differentattributes like time, things true at one time and false atanother, things true at one level of granularity andfalse at another. so you end up with apparently,superficially contradictory statements or things believed byone group and not believed by another, like who killedrafic hariri and so on. if you didn't allow for thiskind of local consistency but global inconsistency, you'dquickly never be able to accommodate something asinherently inconsistent as the
human mind, let alone humanity'sworld wide web. there is a single correctmonolithic reasoning mechanism, namelytheorem proving. but in fact, it's so deadly slowthat really if we ever fall back on our theoremprover, we're doing something wrong. by now, we have over 1,000specialized reasoning modules. and almost all the time when cycis doing reasoning, it's running one or another of these
particular specialized modules. for instance, tva, which istransitiveviaarg, that was used in the "can a can can-can?"is used for rapidly answering questions thatinvolved transitive relations, and graph searching ontransitive graphs of relations, and so on. it's also worth mentioning thatalmost everything in the system is true by default,not absolutely true. you can later learned thingsthat will cause you to
disbelieve something that youused to believe after all. so we reason by argumentation. we gather up pro and conreasons, why we believe or don't believe something, andlet different meta-level heuristics, if necessary, decidewhether the system should believe somethingor not. there are also cases where ananalyst or a typical user will want the pro and con reasons. in this case, who was behind acertain event, or whether bill
clinton was a good president,or something? there is no singleright answer. there are pro and con argumentsin each case. a lot of times people askme things like how many predicates, and concepts,and assertions are there altogether in theknowledge base? and so i have a slidelike this to forestall those questions. but really, this is a redherring, and you shouldn't
really care aboutthese numbers. to give you an example of whyyou shouldn't, a small number of what we call siblingdisjointassertions in the knowledge base take theplace of billions of class level disjointness assertionsand really hundreds of trillions of instance levelnon-set membership assertion. so if you have a questionlike, is any seagull also a moose? now cyc should and cananswer this question.
by the way, the answer is no. and if cyc knows let's say10,000 kinds of animals, that means there are about 100million questions like this, it ought to be able to answer. so option one is we couldadd 100 million assertions to the system. then we could change thisnumber to 103 million. it will look reallymore impressive. but we're not here tolook impressive.
we're here to do as much as wecan with the smallest number of axioms like peano didhaving five axioms for arithmetic is reallyimpressive. so option two is we couldbasically add 50 million disjointwith assertions and onesingle assertion that says disjointwith is symmetric. a better option is to add 10,000linnaean biological taxonomy assertions and onesingle siblingdisjoint-ness assertion which basically says,if you've got any two
taxons, and you don't know thatone is a specialization of the other, assumethey're disjoint. so if you don't already knowthat seagulls and moose, one of these is a specialization ofthe other from those 10,000 assertions, just assume thatthey're disjoint taxons. so that's a really good rule. and with these 10,001 and rules,you can answer the same 100 million queries. or depending how you lookat it like, was
bullwinkle a seagull? you can answer hundreds oftrillions of queries. ok, so you get the basic idea. i don't have time to go intodetail of what's in the knowledge base. but just to give you the roughflavor of a few things that are in there, we have dozens ofways of talking about the way that something that existsin time relates to something else that exists in time, likestarts after the start of.
and using those kinds ofrelations, you can tell the system things and get the kindof deductive answers you'd expect, like if sharon was injerusalem pretty much all of 2005, and condi rice was therefor 10 days during february of 2005, then yes, they must havebeen in the same city for at least a few days during thatmonth of that year. and other pieces of knowledgewould tell you that people with their respective positionswould surely meet, even if there was no newsstory to that effect.
lots of senses of physicalcontainment, so you want to be able to answer questions like,is the sonora desert part of the sum of california,arizona, and mexico? actually, the answer thereis yes and so on. dozens of senses of physicalcontainment, and if you don't distinguish these dozens ofmeanings of in, if you just use the word in, because inenglish, we just use the word in, then you will get somequestions wrong that you otherwise would get right.
even things like whethersomething is nailed into the wall or screwed into the wall,you get the answer wrong of what will happen if i pullthis off the wall. so slowly, we had to addmetaphysical distinctions that are not capturedlinguistically. of course, you can express themin phrases and sentences in english, but they happen notto be captured in a single english word, or a singlejapanese or chinese word. but still, they've turnedout to be useful.
and so that's why the numberof predicates, even in our system, is fairly large. over 10,000 types of eventsranging from things like, i'm giving somebody something,to pumping fluid, to thinking, and so on. 400 ways of relating aparticipant in an event to that event, like somethingthat's created during an event, or somebody who did theevent, or something like that. lots of ways of talking aboutemotions, and contradictory
emotions, and what led tovarious emotions, and what the impact of having an emotionis and so on. lots of propositional attitudeslike knowing, dreading, believing, desiring,perceiving, and so on. all these are modal. they go beyond firstorder logic. and so again, kicking andscreaming, we had to extend our representation languageto second order, and then eventually nth orderpredicate calculus.
because otherwise, they're justlots of things that you can't express that you need toexpress because human beings deal with this, like israelwants egypt to believe that the united states wouldnever dot, dot, dot. you have to be ableto communicate and represent those. and if you can't, then yourlanguage is only going to represent a fraction of whathuman beings know and communicate with each other,thousands of kinds of devices
of various kind, and devicepredicates, and so on. so basically, the question ishow we're going to build this? we started in 1984. originally, my work in the '70sdealt with things like machine learning. the trouble with machinelearning is, well, one of the good properties seem to be thatlearning occurred at the fringe of what youalready knew. so you learn some new thingssimilar to what you know
already, and here arethe differences. so you could learn things thatwere one step away from what you already knew. so the more you know, the morerapidly you can learn. but unfortunately, if you're wayover here on the x-axis, you're way over hereon the y-axis. and a lot of our learningprograms were there 40 years ago. a lot of our learning programsare still there today.
they don't know much. they can't learn muchto the extent that they appear to learn. they're largely either doingstatistical parameter fitting, which of course, is extremelyuseful but limited in terms of what you can learn. or they're discharging potentialenergy that was stored in them bytheir creators. and since i wrote a lot of thoselearning programs, i'll
say unconsciously stored inthem by their creators potential energy in the form ofa judicious representation to use, a perfect set oftraining data to give the system, a perfect choice ofwhat variables to pay attention to, and so on. kepler had this little tableof nine, or i guess in his day, five pairs of numbers. he would have come up withkepler's law in an afternoon rather than in a lifetime.
so you get the appearance oflearning without really deeply learning if you trythis approach. and over and over again, peoplewho've tried to learn from scratch to get programs toevolve and so on have run into this problem. you're able to get parameterizedlearning of what you already know. but it's very hard to get thesystem to take off unless it already knows an enormousamount about the world.
so we have to prime the pump. so then we thought, well,we can get the system to understand english, to understand and process language. then we could just read allthe online material. even in the '80s, we believethat something like the web would be coming, and there wouldbe massive amounts of online material to read. but if you remember all thoseexamples i gave you about why
natural language understandingwas so hard, basically, you have to already have a lot ofcommon sense in order to benefit from reading naturallanguage except in isolated ways, which we'll seein a little bit. so the sad realization we cameto in the early '80s was to get the knowledge primed toactually build enough of this in to prime the pump. we have to manually add piecesof information one after another to the system until wegot enough in there that we
could get natural languageunderstanding until we can get automatic learningto take place. so the calculation we did onthe back of an envelope-- actually, minsky insisted onan actual envelope, so he could do the calculationon the back of it-- was that on the order of aperson-millennium of effort is what it would take. and just about this time,admiral bobby inman came to see me.
i was a professor at stanford. and he basically said, look,you got like half a dozen graduate students here. you do the math. if you're really serious aboutthis, you could work for 200 years and maybe get this done. or you could move to the wildsof austin, texas, have 50 people or 100 people workon this and live to see the end of it.
so it was a close decision,but i ended up deciding to move to texas. and to make a long storyshort, that's basically what we did. so we spent 10 years at mccgetting this pump primed. we spent the last 12 years asa separate spin-out company called cycorp continuingto do that. and to make a long story short,after a couple decades of working on it, we got closeenough to this crossover
point, where nowadays, most ofthe activity that we do in our company is not this manual,monks in cloisters scribing on illuminated manuscripts to addthe 3,000,007th piece of information, but rather,learning by automatically extracting information from theweb and extracting it in many cases from naturallanguage on the web. to give you an exampleof how we do that-- and this is stuff that wasmotivated by peter norvig and some other folks.
and again, you probably knowthe history of all this better than i do. but basically, every time youhave an organization, for instance, abu sayyaf, thereare 100 things you want to know about it. who are its leaders? where is its headquarters? when was it founded and so on? for each of those, we havevarious ways of generating
some english sentence fragmentsthat would basically be a way of saying, in thiscase, when the organization was founded. so you simply hand thisto google, and you get your answer. in this case, you get somethingthat says in the early 1990's. we have a way of representingthe early 1990's. some other source might actuallyhave the date.
various sources might haveconflicting dates and so on. so not all the information youget this way is reliable. in fact, only about 50%of it is reliable. here's another example for theheight of the eiffel tower, where basically youhave various ways of fishing for this. and in case you wonder whythere's only 50% reliability, well, why does it saythe height of the eiffel tower is 36 feet?
it's basically because if yougo there, the very first hit that google gives you saysthe height of the eiffel tower is 36 feet. now it continueson after that. but still, if you just readthe first part of the sentence, you get thewrong answer there. so the crossed out linebasically means cyc using knowledge of monuments andtowers to know that 36 feet is probably the wrong answer, andhundreds of feet is probably
the right answer. here's another case wherethere's not a single marital status, but there's a smallnumber, like half a dozen different marital statuses. so for each one of those maritalstatuses, you do what we just said. and in the case to see if he'smarried, you generate various things like this. and you find out, yes, somethingtalks about his
wife, and so he's probablymarried and so on. so we did some experiments aboutfive years ago for could we actually populate cyc'sknowledge base by doing this kind of fishing on the web? and we basically foundthat the answer was by and large yes. for various kinds of predicates,you could get fairly high rates of success. so remember, what we're talkingabout is translating
from cyc's formal predicatecalculus language to various english forms, handingthose english forms to search engines. sometimes we use altavistabecause it allows us to put in even longer queries character-wise than google does. and then based on the resultsfrom that, translate those back into predicate calculus. and we're able to get hundredsof ground atomic formulas,
ones that don't involvevariables per hour that way, which is pretty exciting. in case you wonder why it's not,again, 100% accurate in this like hats worn on head,it turns out if you look through the top, whatever it is,10 or 20 hits, about half of these aren't hat on head. they're hat on something else,like hat on nose, or hat on legs, or something. and that actually goes backto one of peter's example,
actually, that waterflowing downhill. it turns out because most peoplein the real world know that water flows downhill. if you look on the web, an awfullot of the expressions on the web are water flowinguphill used for metaphorical effect. and so the web is written forpeople who already know that water flows downhill. and it's confusing, or stupid,or bizarre to actually say
that in writing. and so if you're not careful,you end up getting the sort of 50% hit rate, 50% error rate. so what are we goingto do about that? what we decided we would dois take the cycl that got produced, generate alternatepower phrases in english, generate different ways ofsaying the same things in english, negate half of those,hand that to novices, tell the novices these web volunteersthat they're playing a
matching game very similar tothe esp game that cmu has come out with for captioningimages. and if you go to ourwebsite, you can actually play the game. and afterwards, i can showyou a live demo of it. i did bring a few powerpointslides in case something went wrong. so i'll show you a fewon powerpoint slides. but basically, the people whoplay the game are told things
like the active clinchingone's fist expresses frustration. and they can agree withthat, or not agree with that, or whatever. and if enough people agreewith it, then the system believes that. and half the people were toldthat it expressed something else just to make sure thatwe're not putting in stuff that people just say yesto all the time.
and sometimes, there are orderof magnitude questions like this, like what's the roughorder of magnitude of size of most liquid products? atom-sized, or well-sized,or shoebox-sized, or something like that? and once we know a whole bunchof things which are shoebox-sized, then pair-wise,we can ask volunteers which of them is bigger than whichother and so on. so you sort of getthe basic idea.
because cyc has no actualtaboos, it occasionally asks you questions that arelike embarrassing. but you win some,you lose some. this is actually a reasonablequestion, even if it's a little bit embarrassing soto get the basic idea. so there's a kind of apparentparadox between what needs to be shared if you're really goingto have this semantic understanding and thefact that there is no correct ontology.
so this is actually thebeginning of the summaries, so i'll try to wrap up in thenext five minutes or so. so what needs to be shared overthe course of the last five decades, people slowlymoved down this list. a lot of the semantic web people stillbelieve that something like sharing xml bags of keywordsor xml terms is going to be enough. the trouble is that 12 differentsites will differ on what the meaning of an employeeis, or what the
meaning of a company vehicle is,or what the meaning of a holiday is. and so if you're not careful,you have the appearance of understanding withoutreal understanding. if all you're doing is trying tofind relevant pages, that's not so much of an impediment. but if you're trying to answerarithmetic or logical questions by combininginformation, then small errors magnify as you combine theinformation to actually get
the answer for the user. so you really needto share content. and you need to share not justthe meaning of the terms, but the context in which variousthings were said. who believed this? when was it true? at what level of granularitywas it true and so on? so if you're not careful, ifyou just look at something like rdf, you have a handful ofrelations, even something
like daml+oil, owl, you havetens of relations. what we've found is that youneed tens of thousands of different relations to reallycapture the nuances that will keep you from making those sortof brittleness errors. you could think of this as theanalog of why do we have more than 5 or 50 words in english? basically, because if you tryto limit yourself to that small a vocabulary, there'sgoing to be an awful lot of misunderstanding amonghuman beings.
when i say there is no correctontology, i mean things like, are poinsettias red flowers? well, it turns out they're notreally flowers at all. but if your spouse asks you topick up those red flowers that he or she likes, and you comehome, and you don't have them. and you say i didn't pick themup because what you like are poinsettias, and poinsettiasaren't flowers. that's not a good thing to do. it's not a good strategy.
so there is an ontology ofsurvival in everyday marriage, for instance, in whichyou damn right poinsettias are red flowers. there is an ontology in whichapes are monkeys, and an ontology in which monkeysare apes, and so on. so basically, this is wherecontexts come in, where in one context, one generalizationrelationship holds in another context the converse orno relationship holds. so you really need to divideyour knowledge base up into
locally consistent contexts,much the same way that the earth is locally flat. even though you know that it'sglobally round and spherical, you act as thoughit were flat. and that's ok because it islocally flat in much the same way our inference engine actsas though our knowledge base were consistent. and that's ok because it'slocally consistent. i'm going to skip this issue,but basically, there's no
correct knowledge base factsthat are believed at one time or by one person. even things like, it's raining,you should carry an umbrella, which you mightthink is pretty uncontroversial. well, that's really only trulyif we're talking about human beings after the invention ofthe umbrella, and not if you're about to go swimming, andnot if you're someone who is basically dying of thirst,and things like that.
so each assertion has to beput in the proper context. and by now we've identifiedabout a dozen different facets, or attributes, ordimensions of context space or microtheory space. i won't go into them here, buti have a long article about this if any of you areinterested in that. and there are various calculifor the system automatically deciding things, like ifyou've got a piece of pennsylvania and a pieceof 1985, is the
statement still true? well, yes. in this case, thornburgh isstill governor, and reagan is still president. but if we had said things like,they're 900,000 doctors in the us, it's not true thatthere 900,000 doctors in lehigh county in februaryof 1985. so you have to be able to knowwhen you can and can't do these sorts of conclusions like,just because i'm talking
from 1:00 to 2:00 doesn't meanthat i'm talking at any particular second. well, it may seem that way butnot at every single second during this hour. so there's sort of a complicatedquestion of if p is true in one context, and pimplies q is true in another context, in what context canyou validly infer q? and that turns out to be a verycomplicated question. and we're slowly makingprogress on the system
automatically being ableto answer that. and so how did peopleharness our system? they extend the ontology. they add new vocabulary terms.they add new assertions, in many cases, using those newterms. in very rare cases, they have to add new reasoningmechanisms to the set of 1,000 heuristic level mechanismsthat we've got. and at another level, whatpeople are doing is making use of our ontology, which we'vemade available for free, even
for commercial use for anyonewho wants to use it. they're also using the entireknowledge base of the several million assertions involvingthose terms, which we made available for free for r&dpurposes for anyone who is interested. so if any of you are interested,we encourage you to make use of research likein your r&d projects. and if you have the time andwant to play that factory game as one of our volunteers,you're more than
welcome to do that. so opencyc contains about amillion assertions, even though it mostly containsthe hundreds of thousands of concepts. just the simple taxonomicassertions it contains are on the order of about a million. researchcyc pretty much containsthe other couple million assertions. we have a moderate number.
even though we haven't beenadvertising this in a big way, we already have 100,000 peopleusing opencyc in various ways and almost 100 different groupsaround the world who are using researchcyc forvarious purposes. so summary was i showed yousome examples of questions that are just sort ofheartbreaking in the sense that google can almost,but not quite, answer these questions. the final arithmetic or logicalstep still has to be
done by human being, that wecould see semantically break that bottleneck if we couldget the system to even partially understand the queriesand the content. and that you can do that bypriming the pump, getting enough knowledge in there thatthe automatic mechanisms that you guys are all interested indoing could use that as grist, could use that as a startingbase to rule out statistically implausible and semanticallyimplausible conclusions that were gotten by thelearning process.
so we pretty much haveprimed that pump over the last 22 years. and we're now at the point wherewe're focusing on that kind of learning and knowledgeacquisition. we look forward to workingwith any of you who are interested to help acceleratethis process so that we can achieve sergey's goal ofgeneral ai by 2020. [applause] douglas lenat: so i'll take afew questions now folks have.
and those of you who want to seesome of the stuff live on a very small screen,come on up and-- actually, we can try again toget it to project because, using the washer repairmanjuristic, this time it will work. yes? audience: have you tried datamining children's programs, like sesame street, where theydon't typically talk about [inaudible]?
douglas lenat: yes. so our original plan in 1984involved an awful lot of human subjects work with youngchildren, and so on, looking at children's books,looking at the-- i'm trying to rememberwhat it's called. not why we learn. it's "why it's true" orsomething series of books for kids and so on. and basically, what we foundto our chagrin was that
reading children's stories andtalking to children was, in many ways, linguistically justas complicated as reading adult stories. and there are all sorts ofadditional complications like, for no reason that we could tellin children's stories, it's ok for animals to talkwith no explanation. but it's not ok for animals tofly with no explanation. so it's like, what the hell? so it basically becamemore complicated.
and if you look, they're justvery, very metaphor-laden-- in fact, if anything, children'sbooks, and children's science books, and soon are even more laden and riddled with metaphors andanalogies to try to reach kids than college textbooks on thesame subject and so on. so somewhat grumblingly, wewere forced to sort of cut back on that kind of work. however, one of the things wedo is to constantly ask our people to come up with thesecommon sense tests.
remember, i said we have thislarge library of common sense tests that were constantlyasking the system as a way of measuring progress. and a lot of those common sensetests are what something that you noticed your kid sayingthe other day that caused you to realize your kidknew something and see if cyc knows that thing. so we definitely are interestedin what kids know. it's just that many of thestructured content, like
sesame street, and so on, issomething that it's going to take cyc plus additionalknowledge in order to really make effective use of. female speaker: can you repeatquestions [inaudible]? well, you can infer the questionfrom that answer. but i will repeat thequestion next time. other questions? audience: you didn't talk aboutprediction in these artificial intelligent systems.there's a lot of
slides that point toward[unintelligible phrase] things like that. what about, say, a questionof when will the virus [unintelligible] once the virusmutates [unintelligible] from all the research? douglas lenat: yeah. that's a good question. and one level, i basically wantto tell you that in order to do a good job of that, wewould have to do a vastly
better job of integratingprobabilistic or uncertain reasoning with the kind oflogical inference that we do. so having said that-- by the way, we keep waitingfor someone to do that. so five years ago, we werewaiting for daphne koller to do that. we're still waiting, and we'restill following folks who are doing that. michael, you're just waitingto tell us--
michael witbrock: we have someresearch projects we were working on [inaudible] that was the next thing i wasgoing to mention, which is having said that, wecan't do that. yes, under the covers, slightly,we're starting to work on little projects totry and do some of that. and in some cases, we're ableto come up with fairly provocative scenariosabductively of essentially terrorist threats that thegovernment should be watching
out for, not so much working outthe third or fifth decimal digit probability, or eventhe first decimal digit probability, but still coming upwith things that are likely enough that it's worth the humanworrying about this or that particular scenario. so we're just at the stage ofgrappling with this issue. and i look forward to the daywhen all of the issues that we're grappling with are sortof at this level rather than the kind of slowly pulling thesystem from idiot savant to
enough of a kind of artificialbut still ignorant human being, that it makes sense totry to get that next level of education done. michael witbrock: so that's the[unintelligible] things you can do-- douglas lenat: michael, do youwant to come and use the microphone? this is michael witbrock who'sour vp of research. michael witbrock: so there's[unintelligible]
and things that you can do ofpurely deductive reasoning for predictions. so for example, cyc contains alarge number of event types, so things like what goeson in a kidnapping. and one thing that we've got--in fact, several projects, both in the past and ongoingat the moment with the government-- is trying todo that sort of event recognition. so if you've got some indicationthat you've got
some sort of event going on,you can then take the roles which are instantiated in theevent which has happened so far and use those to predictwhat's likely to happen in the future. and that is something that youcan do usefully with purely deductive reasoning. you can do even better at it, wethink, if you're able to do probabilistic reasoning,especially with respect to recognizing what event typesare probably going on.
douglas lenat: one other exampleof that is if you have a comprehensive knowledge base,you can do reliable statistics based on that. so one of the largest contractswe actually have gotten was from the departmentof defense to build up a large terrorism database,large terrorism knowledge base in cyc. and so once it's complete, youcan already do this but not as accurately.
you can ask questions like, incases in the last 15 years when hamas has abducted someoneand ended up killing them, what was the number ofhours or days between the event and the killing? and if you have a completedatabase, it's not rocket science to answer a questionlike that. so based on that, you can beginto make quantitative predictions about thefuture and so on. audience: --sort ofevil purposes.
so can google, but it seems likethat yours is a little bit more directed, right? because i mean if i wanted to goto google and say tell me a city that has lots of birds andfew hospital beds, it'll be a lot of work, seems likeit would be a lot easier. and i'm not sure if there's agood answer to that question. the second question, which youcan also address, is i noticed that you had some sort ofproduct that you're pitching for sort of security calledlike secure cyc, or open
secure cyc, or somethinglike that. and i wonder if you couldtell us about that. douglas lenat: ok, so i don'thave a great answer although i have an answer for the canthis be used for evil. it's a kind of radar gun, radardetector situation, where of course, the technologycan be used for ill as well as for good. if you look back to electricityor almost any power source, the same thingcan be said for them.
in fact, the first commercialapplication of electricity, the contract that was foughtover bitterly by both westinghouse and edison wasfor the electric chair. and when westinghouse won thatcontract, edison for many years tried to get the verb towestinghouse to mean to kill by electrocution. but yes, of course, any powersource can be misused. but by and large, the usgovernment is much better funded and much better informedthan the terrorists.
and i'd much rather see thesetools in the hands of people working to safeguard this. because basically, yes, it maytake the terrorist 15 minutes to answer the question of citiesthat are warm enough, and have enough animals, andso on using only google. but they're going to spendthe 15 minutes. and so given that they're goingto be getting these answers anyway, i don't thinkwe should apply the kind of ostrich, head-in-the-sandapproach of let's not develop
the technology because itcould be used for ill. overall, if you think about whatartificial intelligence could bring to the world, it's akind of amplification of the human mind in much the sameway that physics and engineering has amplified ourphysical selves to enable us to do a lot more than ourmuscles can in terms of how far and how fast we can travel,and how far we can shout to another human being,and so on, in much the same way that biology and medicineamplify our physiological
selves, so we live longer lessdisease-ridden lives. so that kind of mentalamplification would allow people to misunderstand eachother less by doing a better job at machine translation,for example. and real time translation wouldenable people to be more creative, to search deeper, tosearch faster, to do more things in parallel. and if we as individuals aresmarter, then i believe it's going to follow thatwe as a species are
going to become smarter. and if you look backhistorically at when the last time was that our species gotqualitatively smarter that way, you probably have to goall the way back to the creation of language and say,well, just like we look at the pre-linguistic cave menand say they weren't quite human, were they? i think in the distant future,people will look back at us at this very moment in timeand say they weren't
then in terms of your secondquestion as far as cycsecure, one of the commercialapplications that we're trying to push, not actually gettingcommercialized, but we'd like to see it commercialized isusing cyc to come up with attack plans and defense plansthat would work against and defend for a particularnetwork. so here's a plan that's a 30,or 40, or 50-step plan that would only idiosyncraticallywork on your network. and it only works because thisperson takes their lunch hour
at a certain time, and thesethree machines are near each other physically, and so on. and maybe it involves real-worldsteps like calling in a false fire alarmand who knows what. so basically, use cyc'sknowledge base and use just plain old ai planning to comeup with attack plans and defense plans using somethinglike bugtrack, dod cert, and so on as the zeroth level plansfor what are the zeroth level ways of causing problemsor having vulnerabilities in
known commercial pieces ofsoftware out there, and having an ontology which we have ofdifferent kinds of attacks and different kinds of mischief andharm that people can cause to a website or to a company. michael witbrock: so withrespect to cycsecure-- douglas lenat: do you wantto use the microphone? michael witbrock: --the reasonthat hasn't been commercialized is due to a lotvery silly other people's startups explainingsorts of reasons.
and we would very much like to,for example, have a good network [unintelligible] to try and push thatforward again. so do contact either dougor me about that. and we can give you all sorts ofinformation about cycsecure and talk more about howit might be useful. douglas lenat: yes, i didn'twant to do any finger pointing, but basically, that'swhat happened which was an under-capitalized companythat tried to
commercialize it failed. and now we're backto starting to commercialize it once again. audience: one way you might gaincapital for that given the thing about the whole plotof when it ends up as someone not take your lunch hour, etcetera, you thought about selling that to, say, the nextmission impossible script? three more serious questions-- douglas lenat: we have actuallyhad one x-files based
on our stuff. audience: ok. three questions if possible,some which were related. what's the speed performance? douglas lenat: sorry,you're breaking up. no, that was a joke. in the case of the stuff wedid for hotbot, we had to perform in something likea 1/40 of a second. and that involved caching tablesof information from the
system so that we could do agood enough job in a small enough piece of time. and similarly, we have someother applications that we're working on right now thatrequire that sort of sub-second response time. for some of the complicatedqueries that i showed you, typically, two or three-secondresponse time is considered adequate. but we haven't ever reallypushed too hard on that.
the good news from your point ofview is that we've just got a contract from the usgovernment specifically to speed up inference in largeknowledge base systems. and so over the next year, we'll behaving some workshops with researchers around the world,like andrei voronkov, geoff sutcliffe, and so on, to tryto bring some of these techniques together and to tryto harness some of their theorem provers to try to speedup by at least one or two orders of magnitude the wayinference is getting done.
a second thing to remember isthat you really have to continue believing in moore'slaw at least for the next several years. so something that's a few timestoo slow right now won't be a few times too slowin the future. and all this is withoutparallelism. so if we had 17,000 machinesworking at once, i bet we could do a lot better than ifwe just had one machine, especially one little one-poundlaptop working for
two seconds-- audience: no idea whereyou go to find that. douglas lenat: --but michaelprobably got a good answer to this. michael witbrock: so if you lookat problems like those are answering, what terroristsevents happened in this location carried out by thisorganization against that sort of target? those generally answer insomewhere between 10 and 100
milliseconds. there are some which take verymuch longer than that. there are some of which takeless time than that. i think across the kb contentdisk, which is this large corpus of disk queries, theaverage time to first answer for them is around thesecond of the moment. and it's going down. so it's slower than we wouldlike it to be, especially since we like to be able tosolve complicated questions
which require deep inference. but it's not unusable formany applications. douglas lenat: right. and i was about to say there arelot of applications where a delay of a second or twois perfectly acceptable, especially if you're getting aqualitatively better answer than you otherwise would. audience: the other two are,i guess, sort of related. i guess they sort of tieinto reflection of meta
information. how knowledgeable is cyc aboutknowing when it doesn't know the answer or can'tcome up with it? i guess it's tie into how it'slearning from reading the web and the like but justin general sense. but also, you mentionedthe context of say pennsylvania 1995. how knowledgeable or aware ofit would it be for dealing with a fairly dynamic situationwhere things are
constantly changing, likethe position of the attacker is here? now one second later, theposition of the attacker is here, et cetera. douglas lenat: a lot of ourapplications actually are exactly that sort of nature ofcourse-of-action analysis, and battlefield awareness, andthings like that where the situation is rapidly changing. and so we do have to representdynamic situations, snapshots,
and four-dimensional time-spaceworms of situations changing and so on. in terms of cyc's knowledge ofits own knowledge base, i would say it has excellentability to represent all of that and mediocre coverage interms of what it actually does currently represent. but there's no reason whythe system itself can't automatically learn a lot ofthat material on its own by trying these questionssystematically and recording
what kinds of questionsit does and doesn't seem able to answer. michael witbrock: so in termsof driving usability, we're talking about the answer time,that sometimes it takes a long time to answer questions evenwhen the system doesn't know the answer. so one way that we are tryingto improve usability with respect to that is you canusually do some queries of the type where cyc can answer veryquickly which allow it to work
out whether it's likely tohave an answer or not. so in our interfaces, we tryto reflect the system's estimate of the likelihoodthat it would be able to answer a question of the sortthat you're formulating before you get to the end answer. and so we realized that thisnotion of knowing what you know and knowing whether you cananswer a question is very important for the typesof question answering that people do.
this is a very importantfacility that human beings have. and we're trying to workout how to do that inside a logical system. and we've got some approachesto it, but we don't have a complete solution yet. and some of the common sensetests are things like, at this very moment that you're runningthe test, is george w. bush inhaling or exhaling? and of course, the rightanswer is i don't know.
and any other answer is in somesense the wrong answer. so there are tests like thatwhich basically depend on the right answer being the systemshould know that it doesn't know enough to answerthis question. it should know that really,really quickly. take maybe one more questionand then call it quits for this large assemblage. but as i said, i welcome you tocome up and take a look at the system actually running.
yes, please. audience: is there a webinterface that you can do this, just type in somecommon sense question? douglas lenat: no. the question was is there aninterface where people can actually type in questionsto the system? so we don't have somethinglike that yet. but we're slowly inchingin that direction. and i'd like to believe that inless than a year we'll have
some facility like that,at least in the form of structured question-- sort oflike i showed you the cyc analyst knowledge base, whereyou won't be able to have a blinking caret and typein anything you want. but you'll be able to havefragments of queries, and fill in the blank fragments,and so on. and by pulling those togetherand filling in the blanks, there's a truly astronomicalnumber of queries you will be able to ask and then have thesystem go off and work on
those queries. well, let me stop at this pointand thank you all again. it's been great being here. and i hope to talk to some ofyou in the coming hour.