Hello, Google Duplex? No Artificially Intelligent Calls, Please
By Geoffrey K. PullumMay 13, 2018
Google’s AI lab has just announced an automated conversation system that appears to take us a giant leap toward the performance of HAL, the computer that so disastrously set its own policy on a voyage to Jupiter in the famously
We’re sorry, something went wrong.
We are unable to fully display the content of this page.
This is most likely due to a content blocker on your computer or network.
Please allow access to our site and then refresh this page.
You may then be asked to log in, create an account (if you don't already have one),
or subscribe.
If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com.
Google’s AI lab has just announced an automated conversation system that appears to take us a giant leap toward the performance of HAL, the computer that so disastrously set its own policy on a voyage to Jupiter in the famously chilling scene in 2001: A Space Odyssey. Forgive me for being doubtful about both the system’s capabilities and its promise.
You can read a nontechnical account of what Google claims to have done in a post on the Google AI Blog, complete with audio snippets of the system in operation. The claim is that the system, called Google Duplex, is capable of making unsupervised calls to real businesses staffed by human beings, and can successfully make various kinds of appointments that largely depend on customizable scripts — booking a haircut or reserving a restaurant table, for example. The post even includes a photo of the two chief engineers on the project enjoying a meal at a table that the robot phone call booked for them. (Duplex is so Californian that it actually says “awesome” when the booking is confirmed.)
The synthesized voices do sound uncannily natural, despite being generated with glued-together speech snippets, like all contemporary speech synthesis. One notable innovation is that Duplex, trained on real phone conversations, deliberately inserts pause-fillers like hmm and uh at appropriate places. It is an interesting fact about natural human speech that it essentially always includes pause-fillers, hesitations, repetitions, and repairs. Speech with no disfluencies (like HAL’s) sounds eerie, unlike an ordinary person. We subconsciously expect hitches and flubs and fillers like umm now and then. (Nick Enfield’s interesting recent book, How We Talk: The Inner Workings of Conversation is largely concerned with this side of conversational speech.)
One danger that I see with Duplex’s reproduction of naturalness is that people may think there is intelligence back there, behind the natural-sounding synthesized voice. There isn’t. “At the core of Duplex,” the blog post tells us, “is a recurrent neural network (RNN).” The Wikipedia article on RNNs discusses dozens of types of the things, but will be mostly unintelligible to anyone outside computer science and artificial intelligence. I once asked Mark Liberman to provide an elementary introduction to RNNs for the general public, and the piece he wrote in response, on Language Log, might be the best popular introduction.
ADVERTISEMENT
Suffice it to say that RNNs can do amazingly sophisticated pattern reproduction, and play a major role in the development of the guessing game that underlies contemporary speech recognition. But anyone expecting actual knowledge, common sense, understanding, syntax, meaning, or coherence from RNNs is in for a crashing disappointment. Their elaborate statistical attempts to reproduce patterns found in training material are based purely on computation and recomputation of probabilities of sequences, but without the faintest understanding of what those sequences represent. An RNN can be trained to produce text reminiscent of Shakespeare, but what you get is stuff like this:
Second Lord: They would be ruled after this chamber, and my fair nues begun out of the fact, to be conveyed, Whose noble souls I’ll have the heart of the wars.
Clown: Come, sir, I will make did behold your worship.
VIOLA: I’ll drink it.
Sorta Shakespeare-ish, in a hard-to-define way; closer than monkeys banging on a typewriter. But monkeys are far more aware of what’s going on.
When RNNs go rogue, they go fantastically rogue: Read some of the astonishing results reported on Language Log under the (RNN-generated) category heading Elephant Semifics.
I’m profoundly skeptical about whether Google Duplex phone calls have more of a future than, say, disco or cold fusion. And I note a sensible comment by Lauren Weinstein addressing an additional worry concerning the possible dishonest use of Google’s new technology:
[T]he use of embedded “uh"s and other artifacts to try fool the listener into believing that they are speaking to a human may well engender blowback as these systems are deployed. My sense is that humans in general don’t mind talking to machines so long as they know that they’re doing so. I anticipate significant negative reactions by many persons who ultimately discover that they’ve been essentially conned into thinking they’re talking to a human, when they actually were not. It’s basic human nature — an area where Google seems to have a continuing blind spot. Another problem of course is whether this technology will ultimately be leveraged by robocallers (criminal or not) to make all of our lives even more miserable …
Indeed. Last Saturday, I had my landline phone disconnected forever, because it receives nothing but cold calls from telemarketers and scammers, which I hate. I’d hate them even more if, instead of providing employment for workers in New Delhi, they were made by an app based on Google Duplex.