Monday, March 23, 2015

The Rise of the Robot Writers


I'm fascinated by a recent piece in the NY Times Sunday Review on a subject that I’ll call Robot Writers. These are software programs that are designed to produce text that is indistinguishable from that produced by human writers. And they’re good at it.

According to the author, Shelley Podolny, “a shocking amount of what we’re reading is created not by humans but by computer algorithms.” And this material is not just on the drawing boards or inside some R&D lab. The material is in print and being published every day. It’s a certainty that you’ve already read plenty of it and never sensed a difference.

There are several commercially available programs – with names like Quill, Wordsmith and Quakebot – that are being used in industries that must produce a lot of copy very quickly, usually based on predictably structured data. Think journalism, but also a lot of factual business and technical writing.

Consider this passage: “Tuesday was a great day for W. Roberts, as the junior pitcher threw a perfect game to carry Virginia to a 2-0 victory over George Washington at Davenport Field.” If you read this in a newspaper, I’ll bet you would not suddenly sense that it was written by an algorithm, which it was. A sports story like this is easy for software to emulate, since it takes data (the score for example) and wraps it in predictable sports language (“carry to victory” etc.).

Weather reports are another area that lends itself to this, as in “Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country.

Wikipedia, from which that last sample was taken, calls this subject Natural Language Generation, and it has a history that goes back more than a decade. What’s new recently is that the programs have become commonplace all around us – at newspapers, magazines and publishing houses that work with structured information like annual corporate reports.

But the Robot Writers, predictably, are also becoming more capable and powerful. As they evolve, they’ll be used for more and more information that we consume daily, not just sports summaries and the weather. They’ve already produced novels and poetry. One thing I find mildly disturbing, is that for all their use in journalism right now, no publisher that I know of is putting any kind of identifying label or disclaimer on this content. The assumption appears to have been made that content is content, no matter who (or what) produced it.

This trend presents several ethical issues and dilemmas. First, hard-working and talented reporters and writers are threatened by this, since clearly the economics will favor software over human beings. Of course, we’ve seen this before, as I don’t need to tell you. But the growing encroachment of technology into the creative arts is something we should think twice about.

Secondly, is it right to replace the empathy and insights of a human interpreter of the news or a story, with the mindless logic of a computer? I know that sounds like a conversation from Star Trek, but the fact is that we get our world view from so many sources that are based on human communication and I for one believe there is a loss if that interpretation is removed. The empathy of a reporter telling the story means something that is missed when it’s removed. Of course, you may say that the removal of potential human prejudice and hidden agendas may be a good thing and that view has some merit. But agendas and prejudices can also be unconsciously (or consciously) programmed into the software by the companies that produce them. And I will still believe that human-human communication is always going to be better than any other kind.

We’re already talking to our machines today, with Siri and other robot helpers; they are becoming more numerous, not less, and as we become more comfortable with them, we’ll think less and less about what’s missing. I foresee a day soon when we turn on the evening news and the broadcaster is not a human but a computer generated representation of one. Perhaps we can each select the avatar of our choice. Will we have lost something every time that one side of a formerly human encounter is replaced with a computer? I think so.

As Podolny said in the Times, “The next evolutionary step always seems logical… We rarely step back to reflect on whether, ultimately, we’re giving up more than we’re getting.”

And by the way, in case you're wondering, this blog is still being produced by a human. If that changes, I'll let you know.

----------------------------------------------------------------------------------------------
The Times also produced an online quiz, asking you to distinguish human from computer text. Try it and see how you do.


2 comments:

David D said...

Formulaic writing has been part of our culture for years. Jackie Collins and the Hardy Boys and Zane Gray and any pulp or romance fiction you can think of was mostly a smart programmer's construction and recycling of descriptions and plots. Cheap porn is particularly notorious for just providing pre-programmed plot material in between the pre-programmed hot stuff. Anyone who knows baseball or grew up with TV weathermen can look at a box score or a map with an occluded front and read what's there. Why not let computers spit out the daily, workmanlike doses of informative prose to tell the simple tales, and leave the important work to the artists. Anyone who wants to read the real deal will go to stories such as John Updike's "Hub Fans Bid Kid Adieu." No computer wrote that.

Nick P. said...

Your premise is that human to human communication is always better than machine to human communication. In certain areas this is true but I don't believe it's always true. As a daily reader of a printed newspaper (which I realize gives my age away making me from roughly the Paleozoic era), I have read many articles that are very poorly written. Not only to they sometimes report data incorrectly, but often offer no insight beyond the surface information. I don't know if machines can do that yet (I suspect they can), but give it some time and I'm pretty certain they will. Ultimately, you have to make a judgment on which way is best but it's probably a little too soon to do that yet.