IBM’s double jeopardy

Tuesday, 8 February 2011 — 4:09am | Computing, Journalism, Science, Television

A few weeks ago, Colby Cosh—a friend of a friend of sorts who ordinarily writes reasonable things for a chap who still thinks the Edmonton Oilers are a real sports team—penned an article in his Maclean’s blog about Watson, IBM’s Jeopardy!-playing machine (“I’ll take ‘Cheap Publicity Stunts’ for $1000, Alex”, 16 January 2011), that I found to be dreadfully uninformed. The thrust of his argument is that Watson is a corporate “gimmick”—a fancy plea for media coverage by the faceless villains at IBM, with nothing of scientific interest going on underneath. Keep in mind that by the standards of this article, nothing in the “perpetually disappointing history of AI” will ever be interesting until we’ve graduated from tightly delimited objectives to Big Problems like the Turing Test:

Every article about Watson, IBM’s Jeopardy!-playing device, should really lead off with the sentence “It’s the year 2011, for God’s sake.” In the wondrous science-fiction future we occupy, even human brains have instant broadband access to a staggeringly comprehensive library of general knowledge. But the horrible natural-language skills of a computer, even one with an essentially unlimited store of facts, still compromise its function to the point of near-parity in a trivia competition against unassisted humans.

This isn’t far off from saying that particle physics will be perpetually disappointing until we’ve observed the Higgs boson, or that manned spaceflight is merely an expensive publicity stunt that will never be scientifically interesting until we’ve colonized the Moon: it leans heavily on popular culture as the ultimate barometer of scientific achievement, and it requires both ignorance of methodology and apathy towards specifics.

Colby and I had a five-minute skirmish about the article on Twitter, which as a format for debate is unwieldy as piss. I promised a proper response as soon as I cleared some other priorities off my plate. Those other priorities are still, to my annoyance, on my plate; but having finally paid good money to register my copy of MarsEdit, I’m thirsting for a scrap.

This topic will do as well as any. Reluctant as I am to swing the pretentious hammer of “I know what I’m talking about,” this really is (as the idiom goes) a chance for Faramir, Captain of Gondor, to show his quality. Computational linguistics happens to be my onetime research area, popular misunderstanding of science happens to be one of my favourite bugbears, and Kasparov’s anticomputer strategies against Deep Blue happened to make a cameo appearance in the meandering slop of my master’s dissertation. None of this matters a great deal, mind you. One should never be dismissive of journalists from a position of relative expertise; they’re the ones people actually read, and it’s vital to engage with what they say.

(It is a little game we play: they put it on the bill, I tear up the bill.)

When simplifications attack

What concerns me is not so much Colby’s perspective as a non-expert (invaluable), his resort to the familiar hand-waving sophistries of Dreyfus and Searle (expected), or even whether I should call him Colby when I don’t really know the fellow and haven’t gotten around to amending my unwritten style guide to arbitrate matters of semi-personal address (pedantic). The bigger problem, one that is endemic in journalism about science, is his exclusive reliance on popular simplifications by corporate PR, other journalists, and cherry-picked philosophers for pictures of what AI research is all about.

Surely it wouldn’t have hurt to consult a real computing scientist; there are plenty of those to choose from the public sector with no vested interest in the fortunes of IBM. The only thing this would have jeopardized is a premeditated thesis founded on dismissive assertions about the entire field of research. Why talk to someone credible when they’re unlikely to agree with you?

Here, there are several bad assertions in play—all of which are traceable to the selective consultation of sources.

Let’s consider this one paragraph alone—the crux of Colby’s entire argument that nothing terribly fascinating is going on inside the box:

Jeopardy!, after all, doesn’t demand that much in the way of language interpretation. Watson has to, at most, interpret text questions of no more than 25 or 30 words—questions which, by design, have only a single answer. It handles puns and figures of speech impressively, for a computer. But it doesn’t do so in anything like the way humans do. IBM’s ads would have you believe the opposite, but it bears emphasizing that Watson is not “getting” the jokes and wordplay of the Jeopardy! writers. It’s using Bayesian math on the fly to pick out key nouns and phrases and pass them to a lookup table. If it sees “1564″ and “Pisa”, it’s going to say “Galileo”.

Now let’s put some numbers beside the assertions:

  1. Jeopardy! is a trivia game, and all there is to trivia is looking up keywords. We know computers can do that.
  2. When Watson handles wordplay, it doesn’t do it like humans do. It isn’t really thinking; it doesn’t really understand the puns. Furthermore, this somehow matters.
  3. IBM would like us to believe that Watson really gets the jokes. If Watson doesn’t really get the jokes, the project is a hollow exercise in corporate self-promotion.

The first assertion vastly understates the complexity of what Jeopardy! demands. The nature of the game—a time-constrained, multi-agent affair—radically alters the straightforward problem of answering a question (or in this case, questioning an answer). Even simple pattern-matching is far from trivial when every millisecond counts.

Let’s run with Colby’s caricature for a moment. With a database of facts as gargantuan as the one Watson requires, looking up “1564” in conjunction with “Pisa” is a surprisingly time-consuming task, never mind the inference to Galileo’s date of birth. This isn’t something tractable via faster processors or larger memory banks: there are theoretical lower bounds on the efficiency of searching and sorting algorithms in proportion to the dataset’s size. Exhaustive traversals that perform perfectly on small scales are out of the question here. The algorithms have to take shortcuts and make approximate guesses. Semantic associations must be efficiently structured in the software’s abstract maps as well as the physical database in order to best distribute searches in parallel. When you consider these factors, drawing semantic inferences from the natural-language clues becomes a heuristic necessity if the approximate search queries are to be any good.

Crucially, the time constraint on a response is not a static value, but a dynamic one that depends on the performance of the other competitors. This is why a match against the most successful Jeopardy! players in history is an essential proof of concept. Every contestant who appears on the television show has to pass a solo audition first, and any of them could tell you—particularly if they meet with little success—that in a competitive setting, the game becomes a different kettle of fish.

This is to say nothing of the other decisions Watson has to make in order to be competitive in a live test. It has to assess the risk of answering a question, considering not only its confidence in its own correctness but the standing scores of both itself and the other players. It has to set wagers for Double and Final Jeopardy, which requires an assessment of confidence based on the category title alone; in the case of Double Jeopardy, this will also have to consider the money still up for grabs on the board. One of the reasons Ken Jennings had such an astonishing run on the show was that he was able to make excellent strategic wagers on the fly.

Contrary to what Colby suggests, if the structured decomposition of the process of taking a Jeopardy! clue all the way from answer to question is able to match and surpass the blazing speed of human intuition at its best, that would be a tremendous accomplishment indeed. Without the capacity to parse natural language in terms of meaningful semantic chunks—a task well beyond mere symbol manipulation—Watson wouldn’t have a prayer of displaying a fraction of the competence that it has already shown.

Trapped in the Chinese Room

The second assertion is a real howler, and one that has become downright boring to swat aside over the course of the past thirty years. That’s right, folks: say hello to John Searle and the Chinese Room. The Chinese Room objection to AI is this: a computer translating between English and Chinese is like an English speaker who knows no Chinese, but who sits in a room looking up symbol tables and matching the syntactic elements correctly. Even if the translation looks perfect to the outsider, argued Searle, you couldn’t say that the symbol-manipulating translator (i.e. the computer) understands Chinese.

In a general sense, the Chinese Room stands for a whole class of arguments that boil down to saying, it doesn’t matter how well the computer performs—it’s not really thinking because on the inside, it’s not processing information in the same way humans do. Colby makes an argument about Watson identical to the Chinese Room when he says that the system doesn’t “get” the jokes and puns in Jeopardy!‘s more puzzling clues. Apparently, it doesn’t matter if Watson solves the clues correctly: it still isn’t behaving like a human inside the box, so the whole shebang is all just smoke and mirrors.

The logic of the Chinese Room is spurious in many respects, and I won’t go through all of the embedded fallacies here. For those of you new to the debate, here are two of the more serious ones. The first is that the analogy is false. The appeal of the argument comes from how it personifies a particular component of the system to highlight its dissimilarity to real human understanding. This fallacy endures unchecked because its proponents are free to move the goalposts however they like: no matter how robust the system is, the critics can isolate a piece of the syntactic machinery, put a human face on it, and complain about the absence of high-level, humanlike semantics. The second fallacy lies in the deceptive assertion that the syntactic internals of a computer are completely unlike the internals of the human mind. In truth, we still know next to nothing about how the latter works. Our understanding of how we get from the low-level operations of neuroscience to the high-level processes of cognitive psychology is at least as discontinuous as our best notions of how semantic structures might emerge from the symbolic structures of computer systems.

I alluded to this in my initial salvo on Twitter:

Shockingly poor article by @colbycosh on Watson, IBM’s #Jeopardy AI. Apparently, Chinese Room fallacies never get old. http://t.co/VHzLzTX

To which Colby offered this astonishing reply:

@Nicholas_Tam It’s got nada to do with the Chinese room. The Turing test is the one most everyone agrees on & there’s NO progress toward it.

What do you want me to do? LEAVE? Then they'll keep being wrong!

Completely apart from the fact that one of Colby’s objections was precisely the Chinese Room, there’s a logical contradiction here along with a factual error. (Not bad, all in all, for 140 characters or less.) The contradiction arises from the failure to distinguish between external behaviours and internal thought processes. Let’s suppose, for a moment, that the goal for whichever AI system we’re talking about is to pass the Turing Test—that is, to be misidentified as the human being in a double-blind question-and-answer test where the questioner knows that one respondent is human and the other is a machine. If you read the original paper in Mind where Alan Turing introduced his “imitation game”, Turing’s whole point was to black-box the internals and take them out of the picture. The premise of the Turing Test is that if you can’t tell the difference between man and machine in terms of external behaviour, then functionally there may as well be no difference at all; this suffices as intelligence.

The Chinese Room argument, on the other hand, is a direct attack on the validity of the Turing Test. It seeks to establish that thoughts don’t supervene on actions: that is to say, identical external behaviours do not imply identical internal machinations.

Turing’s and Searle’s positions are more or less incommensurable. You can’t have it both ways. You can’t hold up the Turing Test (which is entirely about exterior performance) as the standard of achievement while griping, as Searle does, that even in a successful performance that passes for humanlike, symbol manipulation doesn’t really count. Contrariwise, Turing ventured that if a machine’s behaviour is indistinguishable from a human’s, it’s pointless to squabble over whether it qualifies as intelligent; from the available evidence, we might as well treat it as such.

If you accept the Chinese Room argument—and you really shouldn’t—the only function of bringing up the Turing Test at all is to set up a straw man. It has not escaped me that this may have been the intent.

Acting inside the box

Unfortunately for this transparent rhetorical tactic, the Turing Test is not the accepted benchmark for artificial intelligence research, nor is it even a commonly desired objective. AI is not one monolithic project that either has or hasn’t been achieved.

The goals of AI research have historically diversified along two separate axes (a schema for thinking about AI that most students of intelligent systems pick up from Russell and Norvig). The first key distinction is between acting (what a system does on the outside) and thinking (how a system gets there on the inside). The second distinction is between performing like humans and performing rationally or optimally (which may be entirely unlike humans, but may provide solutions to well-defined problems that outstrip the capacities of human agents).

This yields four quadrants that loosely circumscribe your garden-variety intelligent agents: systems that aim to think like humans, act like humans, think rationally, or act rationally. (Think of these categories more as design goals than as discrete kinds of agents, which in practice lie all over the map.) The first quadrant, systems that think like humans, is the area of interest for much of cognitive science. This is the type of system that the Chinese Room argument contends will in principle never succeed; Hubert Dreyfus’s objection, the thesis that human thought is fundamentally unformalizable, applies specifically to this category as well. The second quadrant, systems that act like humans, is the one where the Turing Test applies.

It must be said that the Turing Test is relevant here with specific reference to the indistinguishability of external behaviours—not to the requirement of aptitude in natural languages, as Colby seems to believe. Turing’s original imitation game was framed purely in terms of language, which remains an overwhelming challenge to this day, but it has since been expanded to other problem domains. (Jeopardy! is one of them.) To pluck out one example, natural language is hardly suitable as a test for computer vision, the branch of AI concerning how computers can perceive objects in photographs or positions in 3D space from the raw data of images. It would be preposterous to say that a robust system in computer vision fails as AI or marginalize its significance as a scientific accomplishment simply because it can’t pass for a human on the telephone.

Natural language is a particular problem domain—indeed, an umbrella category for all sorts of subproblems that are fascinating in their own right. It is not the essence of the Turing Test, nor is there any consensus that linguistic aptitude is the essence of intelligence.

It’s convenient for our discussion, however, that Jeopardy! involves natural language to the extent that it does. It should attract comparisons to Turing’s imitation game, and it has. Yet it bears mentioning that whether a system is really thinking is a completely incidental consideration for the vast majority of practical work in AI, just as it was for Turing. Nobody says, “Let’s build a system that possesses general intelligence.” What they actually say is this: “Let’s identify a chunky, intuitive problem that demands high-level thought and see if we can’t build a system to break it down and tackle it.”

Watson’s aim is clear: perform well enough in Jeopardy! to defeat the best human players. Any consequences for our beliefs about the nature of human intelligence is a byproduct and not the ultimate goal. That said, it is perfectly valid to speak of a Jeopardy! Turing Test. Watson would clearly fail the test not if it fell short of champion-level play, but if it ventured solutions to clues that don’t even make sense as guesses. (Consider the early test at about 1:50 into this video. The clue, from the category on I Love Lucy: “It was Ricky’s signature tune and later the name of his club.” Watson: “What is song?”)

But if indistinguishability from human-level performance is what we are looking for, Watson is already doing fairly well. There is a very important difference between defeating humans in Jeopardy! and passing for a human player, although the goals are intertwined. There is an even wider gulf between passing for a human Jeopardy! player and passing for a human being in toto. Everybody knows the latter goal is as far off as colonizing Mars, and nowhere in the promotional materials does IBM suggest otherwise.

Colby has a problem with this:

So why, one might ask, are we still throwing computer power at such tightly delimited tasks, ones that lie many layers of complexity below what a human accomplishes in having a simple phone conversation?

And one might also ask, why study nuclear physics when we seem to be no closer to harnessing fusion power than we were fifty years ago? First of all, in both cases, we are substantially closer in terms of how we understand the problem, even if our estimates for when the endpoint will show up on the horizon haven’t necessarily shortened. The achievements that scientists think of as the most significant may not be fixtures in popular culture, but that doesn’t mean they were pointless. Far more importantly: computing science, like nuclear physics, is inherently interesting. Designing AI systems for delimited problem spaces is an activity that leads us to all sorts of discoveries about the nature and structure of those problems, and of the minutiae of problem-solving processes in general. We learn all sorts of things about comparative strategies for structuring, representing, and manipulating information—and how they measure up to the relatively black-boxed processes of human minds.

So to answer Colby’s question:

@Nicholas_Tam So we can’t test AI by scrutiny of interior process OR the curtained-black-box Turing test? What’s left, religious revelation?

We “test” AI in the context of its performance with respect to well-defined goals. Those goals could certainly involve a Turing Test, be it for answering natural-language questions or some other specified task. Whether an artificial system has a human-like mind of its own, along everything that implies—consciousness, self-awareness, semantic understanding—is a problem we leave to the philosophers; and no, it’s not empirically testable. But neither is the problem of whether other humans have minds.

The inverted pyramid scheme

Now let us turn to the third assertion: that IBM is making outlandish promotional claims that oversell Watson in the name of fuelling a publicity blitz.

What does it mean to say that something is a “gimmick”? We mean to accuse it of being all dressing and no salad. We mean to expose its failure to accomplish what we are told it does on the surface. We mean to insist that we will not be duped into believing that something humdrum is, in truth, extraordinary.

The trouble for Colby’s argument is that Watson is extraordinary—just not in the way that he thinks IBM has misled him to expect. “AI researchers have arguably the highest conceivable standards to meet when it comes to thinking about thinking,” remarked one commenter at Maclean’s, “and it’s hard to fault them for failing to live up to the naive expectations of science fiction.” Colby replied: “By ‘the naive expectations of science fiction’ I presume you mean ‘the naive expectations deliberately created by IBM promotional materials and employees’.

I received a similar response:

@Nicholas_Tam Maybe you should look at the IBM ads. Your claims for Watson are a LOT more modest than theirs.

At the time of our repartee, I was admittedly only familiar with IBM’s own materials in passing; most of what I knew about Watson was from sources that discussed it in greater detail. I found it odd that Colby’s point of engagement was exclusively with the advertising and not the technology itself, but this was understandable: he was making a statement about hype, after all, and it’s very common nowadays that the implications of scientific accomplishments are exaggerated in the public sphere. (Refer to Jorge Cham’s excellent illustration of the science news cycle, which concerns university research but applies equally as well to corporate and governmental laboratories.)

By and large, this is a product of two sets of behaviour—one on the part of journalistic reporting, the other on the part of the research organizations. Let’s begin with the journalists.

The dominant template for journalistic narrative is the inverted pyramid: begin with the most important information, and continue to points that are less and less essential on the assumption that the reader could stop at any time. (Before the age of desktop publishing, this also made it easy for newspaper editors to literally snip away the last paragraph or two when assembling the columns on the page.) The trouble is the gulf between what journalists deem most relevant to non-expert readers and what scientists consider to be important contributions to their field.

The end result is sensationalism—and too many articles about science wind up looking like Martin Robbins’ parody. They begin with far-reaching implications that may or may not be related to the research at hand, and work their way down to the specifics that matter most. This is a narrative framework that is seriously divorced from the reality of research, which operates on the level of local challenges and goals. (This post by Greg Lusk on the inverted pyramid and the conflicting priorities of journalists and scientists is highly relevant here.)

Because long-term, big-picture implications like the performance gap between artificial and human intelligence (in Watson’s case) become the centrepiece of the story, they become the focus of media attention and debate, often with no consideration of the specifics of what has been accomplished. And this is why we see casual expressions of dissent like Colby Cosh’s criticisms of Watson: wildly off the mark, selectively researched from Wikipedia with an a priori verdict already in mind, and laced with a sprinkle of pseudo-expertly mumbling about Bayesian combinatorics that are far more involved than the author makes them out to be. Criticisms like these respond to the news stories, not to the science.

Of greed and gimmickry

Colby is convinced, however, that his projected misunderstandings of what Watson claims to achieve are fundamentally IBM’s fault. And it’s no use pretending that IBM isn’t a self-interested organization: like NASA in their recent fiasco over arsenic-based lifeforms (a discredited paper, but one that was widely misreported when people still thought it looked shipshape), if people take their promotional materials and statements to the press the wrong way, they have no incentive to correct anyone so long as their project is still in a positive light. Watson is a proof of concept for IBM’s enterprise hardware and the DeepQA question-answering system, both of which the company intends to license and sell.

Not all of the problems with science journalism is the fault of journalists: research laboratories, public as well as private, are often complacent about inaccuracies in secondary reporting because the attention (and the concomitant prospects for funding) are too attractive to throw away.

Let’s be very clear about one thing, however: IBM’s profit motive as an organization does not negate the intellectual interests of its researchers. As fashionable as it is these days to appeal to the trope of corporations that are only responsible to their shareholders and therefore can’t be interested in anything but the bottom line, the truth is that corporate laboratories in private industry are invaluable centres of research. Projects like Watson attract contributions from university scientists not because they all want to see IBM succeed, and not even necessarily because the pay is so much better (though it is), but because they provide access to hardware that enables large-scale work. Computing scientists in industry are taken every bit as seriously as their compatriots in the university world, and the two regularly cooperate on grand initiatives.

But what does that say about the marketing? Complacency aside, is IBM actively making Watson sound like a much bigger deal than it is?

I have now combed through IBM’s promotional videos, articles, and FAQs, and I would like to retract my earlier concession that their claims may have gone too far. IBM’s statements about Watson are fair reflections of what AI can realistically achieve and what a successful performance by Watson will demonstrate. About the most outlandish thing they say—the one that treads the furthest into the minefield of the philosophy of AI—is that Watson performs well in Jeopardy because it understands natural language. And strictly speaking, it does. The clues in Jeopardy! are undeniably in natural language, and differ from formal or heavily restricted sentences by a significant degree of complexity. About the only restriction on the clues is length. Discard the puns and puzzles and you still have challenging problems like binding indefinite pronouns to objects (or classes of objects) that fit.

Whether Watson’s “understanding” of natural language is analogous to that of humans doesn’t figure into the discussion here. Nobody is saying that Watson actually has a conscious mind; AI researchers don’t think on those airy-fairy ontological terms when they are designing systems for specific tasks. They participate in the debates over the philosophy of artificial minds, yes, and they’re usually on the optimistic side, but everyone is aware of the separation between that conversation and the immediate challenge of defeating humans on a robust, open-domain answer-questioning game show.

We are not even remotely in Dreyfus territory. Still, I can understand why layperson readers might think we are when they read the story in The Globe and Mail and come across a juicy quotation like this:

“We can use computers to find documents with keywords, but the computers don’t know what those documents say,” Dr. Ferrucci says. “What if they did?”

People whose notions about AI come entirely from Battlestar Galactica could easily misread Ferrucci’s statement as referring to sentience or consciousness. But anybody who knows a thing or two about AI can read this and correctly interpret it to refer to semantic-level knowledge representation—concepts on a larger scale than string matching or keyword search. It’s entirely agnostic on the problem of whether artificial minds can exist. I’m not deliberately reading this as a modest apologist: this is actually what Ferrucci is obviously saying.

If you get all your science from Hollywood and you think cloning has to do with developed bodies and selves rather than the raw data in your genes, it’s not the responsibility of geneticists to clarify their work for you every time they speak. Similarly, you can’t expect scientists and engineers in AI to explicitly backpedal from the philosophical question of conscious machines every time they talk about their work.

Or can you? What we desperately need is a greater public understanding of what scientists do, and what they mean when they use everyday words to talk about their fields. Readers dive into news stories about science with popular preconceptions that are often wrong, but nobody takes up the responsibility of correcting them until the discourse goes seriously awry. We’ve seen this before with how the hysteria over genetically modified foods or embryonic stem cell research obfuscated the real issues deserving of policy attention. There are even some dark corners of the world where creationists are wreaking havoc on schools because they still think evolution by natural selection is some kind of affront to their god.

Sooner or later, this will happen with AI: we’ll explore the possibility of delegating something big and very public to an autonomous system, and legitimate policy concerns will drown in a sea of hysteria about machines taking over the world. If scientifically knowledgable people do not shoulder the burden of sober clarification, that role will become occupied by contrarian journalists who don’t really know what they’re talking about, but still take pleasure in posturing as the voice of reason in the room.

If you are going to take the position of someone who sees through the publicity and understands the underlying science, you have to understand the underlying science. No matter how bombastic IBM’s promotional claims are, or how submissively the media repeats the press releases with a dash of unchecked sensationalism on top, Watson is more than a “gimmick” if it’s computationally interesting—and by any informed and reasonable standard, it is. Watson is a nontrivial system, and Jeopardy! is a nontrivial pursuit.

Previous:
Next:

submit to reddit

8 rejoinders to “IBM’s double jeopardy”

  1. Stan

    Thanks for this thoughtful rebuttal to Cosh’s uninformed blog. There was so much wrong with his thinking I couldn’t figure out where to start, but you’ve done a nice job.

    I love the video’s IBM has posted on YouTube about Watson. And in the interest of fairness they probably show Watson getting as many wrong as it gets right. (Search IBM Jeopardy on YouTube.)

    Tuesday, 8 February 2011 at 4:36am

  2. Note that Dreyfus has restricted his argument to GOFAI-style efforts and concedes that connectionist-style programs may produce actual AI.

    I’ll have more comments when it isn’t 3 AM.

    Tuesday, 8 February 2011 at 8:03am

  3. Okay, I’ve read your whole post carefully (though not what it links to), and I have a few comments:

    manned spaceflight is merely an expensive publicity stunt that will never be scientifically interesting until we’ve colonized the Moon

    That’s actually a fairly defensible position. Manned flights taught us a lot, but it’s not clear that we wouldn’t have learned about as much from unmanned sample-and-return missions. And I’d settle for colonizing Earth orbit.

    This isn’t something tractable via faster processors or larger memory banks: there are theoretical lower bounds on the efficiency of searching and sorting algorithms in proportion to the dataset’s size.

    It is, however, embarrassingly parallel. A DNA-based system suitably primed really might solve these problems lightning-fast because of its extreme hardware parallelism.

    mere symbol manipulation

    You and I know what that means, but “symbol” is a very equivocal term ranging from mere strings to enormous meme-complexes, and so dangerous in public conversations like this one.

    the Chinese Room

    Without saying more I’ll just register that while the Systems Reply is unquestionably correct, the Robot Reply may well have merit as well. It may just not be possible to pass the Turing test if you aren’t sensorily grounded in the world of medium-scale, medium-complexity objects as humans are.

    to think like humans

    Alas, we know bugger-all about how humans think, in the sense of what Freud called secondary-process thinking, ordinary conscious cognition. Bateson said that while we know quite a bit about primary-process thinking, we adopt the convention that we know all about secondary process and so don’t explore it scientifically. What is really the case is that we know a lot about how we think we think.

    expanded to other problem domains

    I think it’s a serious mistake to talk of “a Turing test for the vision domain”. Vision and conversation just aren’t equivalent, for this reason: there is a large subset of humans that inherently will do worse on even the most mundane visual-object recognition test than even the crudest AI program, namely the blind. But nobody says that blind people aren’t intelligent. On the other hand, when we are confronted with an adult who cannot converse sensibly, in detail, and for any reasonable amount of time about any culturally relevant domain in such a way as to convince a human being that they have a point of view, we do indeed say that they are defective in intelligence. There are obvious exceptions like stuttering, Broca’s aphasia, and lock-in syndrome, but I think the definition can be cleaned up to avoid them.

    Watson performs well in Jeopardy because it understands natural language

    Well, no, it doesn’t. The core activity for which natural language is employed is the kind of conversation I just mentioned. Everything else counts as one sort of language game or another, which you may or may not be able to perform well, or at all (some anglophones can’t speak Pig Latin, for example). But if you can’t converse, you don’t speak or understand a natural language. (Speaking includes signing, naturally.)

    Wednesday, 9 February 2011 at 2:29am

  4. Thanks for the extensive comments. I often find it very easy to trip up when writing on subjects like this; after some familiarity with the field, or any field, a whole bushel of assumptions about how to refer to certain things become second-nature to the point of impeding communication with others.

    I don’t think the effects of spaceflight on human physiology are something that unmanned missions would satisfyingly replicate. Broadly speaking, the preparations that we sometimes represent as ancillary steps towards an ultimate objective like colonization (self-sustaining ecologies, materials and propulsion engineering, psychological studies of humans in long-term isolation, etc.) are interesting in their own right, not just incidental byproducts.

    Regarding computer vision, what I was getting at is that thinking about AI in the context of the Turing Test is a severely limited notion of AI that cuts out many of the interesting problems. We wouldn’t say that the blind lack intelligence because they lack visual perception, but I would still classify the cognitive chunking of perceived sense data as an intelligent activity. (Sufficient but not necessary, so to speak; and I think much the same applies to speech and aphasic disorders.)

    It strikes me that the unquestioned precondition of the verbal Turing Test is that it posits, as the human subject, a normative human being. The specified language-game in play is (as you note) a particular constraint on what is being tested, just like the decision on whether audiovisual I/O figures into the problem domain. What do we do with detecting sarcasm, for instance, or body-language clues? We get into all sorts of trouble if we require those as necessary conditions of intelligence as soon as we consider the autism spectrum.

    I agree that perception (as an internal process) is a task that maps very poorly to comparisons of external output, so maybe it wasn’t the best example in the context of what I was saying. Ultimately, though, if ‘the Turing Test’ has developed a sense of generality over the years, it’s been with respect to what’s being tested, while fundamentally remaining a comparison of actions rather than processes.

    We absolutely know bugger-all about Freud’s secondary process among humans, and until that changes substantially, I don’t expect the debate over machine consciousness to get anywhere. While my knowledge of this area is fairly cursory, my reading of the “systems that think like humans” approach—modelling brain activity in neural nets and so forth—is that it isn’t too dissimilar from approaching natural language bottom-up from Chomskyan formal grammars: we can get to a good understanding of the structures that we’ve isolated and abstracted away as the relevant ones, but behaviour on the level of natural language (or natural cognition) is a domain so far up the hierarchy that it will only yield to a different, non-reductionistic methodology.

    I hadn’t considered DNA-based systems from the perspective of massive parallelism. That’s a fascinating way to look at it. What would be a good example of highly efficient human problem-solving that isn’t as parallel as search?

    Wednesday, 9 February 2011 at 3:30am

  5. effects of spaceflight on human physiology

    Quite so.

    severely limited notion of AI that cuts out many of the interesting problems

    I agree that it’s a severe limitation, but the natural reply is that non-Turing AI is something other than actual artificial intelligence. It’s some part of the study of artificial humanity, as is, say, making robots that walk like humans, just not intelligence.

    Wednesday, 9 February 2011 at 6:10am

  6. Ralph Dratman

    – – – – – – –
    “Whether an artificial system has a human-like mind of its own […] is not empirically testable. But neither is the problem of whether other humans have minds.”
    – – – – – – –

    I’d like to add that the question of whether even my own human self “has a mind” does not yield any obvious answer.

    In writing the previous sentence, I am not stating anything arcane, abstract or erudite. It’s just that, having made a slow, intense, line-by-line study of Wittgenstein’s Philosophical Investigations at the arguably too-tender age of 17, I have ever since found it impossible to take seriously statements such as “I know I have a mind, but I can’t be sure you do.”

    My difficulty with such utterances, post Wittgenstein, is that I can never figure out what such a sentence is trying to assert. The claim that some entity “has a mind” is not a falsifiable proposition — not even when reference is made to the close, personal voice in my head which is envisioned as being attached to a sort of undetectable entity called “my mind.”

    Poppycock. Intrinsically undetectable things are not things. The term “mind” in this context is nothing but a sort of linguistic placeholder, something like the word “it” in the sentence “it is raining.” The only significant difference between the two examples is that no one ever seems to argue over the existence of an “it” that is out there doing the raining.

    Finally, about the content-free tale of the “Chinese Room,” consider this: even the human brain doesn’t “understand” what it is doing, any more than my arm “understands” what it is lifting or my lips “understand” what I am saying. Only with respect to a functioning, interacting being — not an anatomical part of one — can we sensibly use the term, “understand.”

    Monday, 21 February 2011 at 8:42pm

  7. Hector

    What you say toward the end about making sure you know your science, if you’re writing about scientific questions, goes also for philosophical topics. And there’s a whole lot more to the breadth of philosophical critique of AI than Dreyfus and Searle. Very few in philosophy or in AI research are naïve enough to think that brains literally are computers in the sense that the digital machines common to life in the early 21st century are computers; brains are not electrical circuitry which emulates formal systems. Some of the tasks carried out by human neurocircuitry do resemble certain programming architectures, usually in isolated perceptual tasks, e.g., David Marr’s work on visual computation. Modular theories of mind and biological teleosemantics suggest that “the mind” understood in terms of the brain will break down into particular capacities with specific etiologies. The hardware arose by whichever course of development selection saw fit to lay down, and it is highly unlikely that this course in any way resembles the design of computers as we know them, even if the principles of computability are useful in explaining certain aspects of neural function.

    Much more than that, what troubles me is the prevailing focus on artificial intelligence rather than on artificial creativity. Intelligence is not the main thing standing in need of explanation because it basically comes down to problem-solving. There exist computers and programs that are very capable with problems for which they already store ready-made solutions or strategies. But humans survived because we are so adept at devising novel strategies to handle new kinds of problems and situations. And solving new problems requires creativity. The best work on “AC” that I know of is the work on analogical processing architectures like COPYCAT and METACAT, which is a bit old by now but still instructive. I really like the way Hofstadter and his FARG approach the question of creativity.

    Beyond all this though, beyond the reach of psychology and biology, there are far more substantial philosophical objections to be considered. Wittgenstein has already been mentioned, and his insights on clarity and thinking indirectly inspire some of the work on analogy I just cited. If AI researchers are chasing after “the mind” or “consciousness” then they’re after something that doesn’t exist. I agree with the author here that most actual researchers don’t set such lofty goals for themselves, and surely he’s correct that they shouldn’t. It’s much more practical, especially in the short term, to focus on tangible milestones. But to really get a grip on the meaty philosophical objections, researchers will eventually have to engage with the intersection of ontology, philosophy of computing, and epistemic phenomenology. The whole thrust of applied computability theory tends to take psychological representationalism for granted. That is, the target phenomenon (“thinking”) to be explicated computationally is conceived almost entirely in terms of how information gets represented. Digital devices do this so readily, and they are just arrangements of electrical activity standing in for mathematical steps. The activity encodes the calculation, and thus represents it. But there are whole traditions of philosophical discussion which reject this notion of representation as characteristic of lived being. Representation is a small subset of the texture of our experience; at best it’s a cognitive skill rather than a perceptual or phenomenological one. So to tie the entire effort to create artificial thought into modelling this one small aspect is bound to fail, no matter how “high-level” you take representation to be.

    Now there are voices of dissent and voices of unity, and I prefer the latter almost every time. Some philosophers propose naturalizing phenomenology, à la Francisco Varela’s work. Some propose constructing a modular theory of thought out of algorithmically-described perceptual input, like Tyler Burge. There will be a few very broad lessons from 20th century philosophy to keep in mind:

    (1) Our practical engagement with the environment. This governs everything from externalism and distributed cognition to Heidegger’s extremely useful distinction between Vorhandensein and Zuhandensein. Any characterization of our lived embodiment must find a way to refer to the shared interplay between the body and its broader world, in which our praxis is rooted. Carnap was one of the clearest writers on the way theoretical commitments, which we take to describe the laws which govern our experience, get their force from our practical commitments, which determine for us which theoretical descriptions can work. Practice is so much more basic than theory, and we simply have to be able to incorporate this insight into the practice of theory.

    (2) Related to the prior point, the world in which we live contributes to (and perhaps determines) the shape of the skills we need to live there, and in that sense the way humans have come to think is a product of the kinds of situations we’ve faced as a species. The structure of the environment (partly) shapes how we think. But a subset of that environment is perhaps equally important in shaping how we think, and that would be the social environment. The normativity of conceptual content, i.e. the fact that words and thoughts have shared public meanings, depends less on how the world looks than on how social creatures interact verbally. So finding a way to get machines to become genuinely social will be key.

    (3) Physical theories of information. Perhaps no concept stands in greater need of clarification in our time than the concept “information”. Physicists now use it to describe — and sometimes to explain — everything from entropy to entanglement. What ultimate role does it play in quantum theory? We already know that selection makes use of quantum effects in some animals (the visual systems of certain birds, if I recall). How does that play into what the brain does, if at all? If information is the actual “stuff” of reality, is it extended? If so, in what sense? If not, does this invert the original Cartesian criterion for a material thing? What might that imply for our concept of thinking things? If we’re going to put information on such central footing, we need to think about what kinds of explanations it can support.

    Sunday, 27 February 2011 at 6:24am