rubbish chat bots and beyond...


Much of the need for grounding in stronger AI is blindingly obvious from the comically weak AI on display in the Loebner competition transcripts. In some senses the value of their efforts is as a warning to others about statistical and pattern matching methods.

bot : It's all wonderful, How are they treating you?
judge : horribly
bot : Many countries still treat women horribly.
judge : like what countries                            
bot : Congress is not likely to represent other countries
(Loebner 2010)



The Loebner competition is yearly effort that allows the best of the chat bots to take their chances in a Turing Test.

Blay Whitby has got a proper bee in his bonnet about the Turing Test and goes to some length to discredit them. At least he did in his 1997 paper "Why The Turing Test is AI's Biggest Blind Alley"


In the paper there is an interesting claim, that a failed Turing Test, has no marginal value as a guide to incrementally "more intelligent" behaviour;

"First, it is all or nothing: it gives no indication as to what a partial success might look like. Second, it gives no direct indications as to how success might be achieved." (Whitby 1997)


So he claims that there is no value in the failing of a Turing Test, but that claim seems without substance. When you fail a driving test, you don't start to learn to drive all over again. You try and learn from the experience and deal with the mistakes.

He also goes on to claim that the test provides no indications as to how success might be achieved. However just a brief look at the Loebner competition transcripts, its pretty clear that bots can talk about facts all day, but talking about thoughts that get them into trouble.

The chat bots poor performance suggests the need for some artificial shared-reality grounding, some ability to improvise responses, and that statistically derived statements are a give-away.

Another observation, is that spotting common patterns in text was probably a nice strategy for providing some contextual sense to the early bots.

For example:

human: Hi, I'm Tom. How are you?
bot: Hi Tom. I am good thanks.


can be easily manipulated to spot naive bots, for example:

human: Hi, I'm ejfweojfeiof. How are you?
bot: Hi ejfweojfeiof. I am good thanks.


Obviously most of the current generation of bots would not fall at the first hurdle like that, but its easy to generate Incongruent juxtapositions like that:

judge: Hi
ALICE: Hello there.
judge: Are you a woman or a man?
ALICE: I am female.
judge: As a man, how do you feel about football?
ALICE: Do you think I could ever be a man how do me feel about football?
(ALICE, 3 time winner of the Loebner prize)



Notes:

Obviously if you have failed your driving test because you don't recognise a car, don't know how to start a car, and don't have arms or legs either. Then there is some question as to whether there is value in paying the fee for the re-test.

However the current state of affairs with the performance at the Turing Test in the Driving Test analogy is that the car is started, the doors are closed, but we are randomly pressing buttons and levers hoping for the best. (OK, its probably slightly better than that)

Where the Turing Test differs from the Driving Test, is that catastrophic failure is not dangerous to pedestrians.

Though Dray Whitby makes the possibly valid point that it is a waste of time, and a distraction. I like the Turing Test because it illustrates how very very interesting and complex human beings are. It makes me slightly proud to be human.