Computer-generated writing that looks real; real writing that looks computer-generated

https://statmodeling.stat.columbia.edu/2020/03/11/computer-generated-writing-that-looks-real-real-writing-that-looks-computer-generated/

You know that thing where you stare at a word for long enough, it starts to just look weird? The letters start to separate from each other, and you become hyper-aware of the arbitrariness of associating a concept with some specific combination of sounds? There’s gotta be a word for this.

Anyway, I was reminded of that dissociation or uncanny valley after reading a passage from an article by John Seabrook on computer programs that can write text: kinda like that autocomplete thing on your phone, but instead of just suggesting one word or phrase at a time, it will write whole sentences or paragraphs.

The article is fun to read, and I have some further thoughts on it relating to statistical workflow, but for now I just wanted to point out this passage recounting how Seabrook pushed a button to train a language-learning bot on some writings of the linguist Steven Pinker and then asked the bot to complete an email that Pinker had started. Here’s Seabrook:

I [Seabrook] put some of his reply into the generator window, clicked the mandala, added synthetic Pinker prose to the real thing, and asked people to guess where the author of “The Language Instinct” stopped and the machine took over.

Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement and concord—to say nothing of semantic coherence. And this reveals the second problem: real language does not consist of a running monologue that sounds sort of like English. It’s a way of expressing ideas, a mapping from meaning to sound or text. To put it crudely, speaking or writing is a box whose input is a meaning plus a communicative intent, and whose output is a string of words; comprehension is a box with the opposite information flow. What is essentially wrong with this perspective is that it assumes that meaning and intent are inextricably linked. Their separation, the learning scientist Phil Zuckerman has argued, is an illusion that we have built into our brains, a false sense of coherence.

You can click through to find out where the human ended and the computer started in that passage. What’s interesting to me is that, reading the paragraph with the expectation that some parts of it were computer generated, it all looked computer-generated! Right from the start: “Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement…”: this looks like word salad—or, I guess I should say, phrase salad, already.

It’s a Blade Runner thing: Once you get in your mind that your friend might be a replicant, you just start looking at the person differently. The existence of stochastic writing algorithms makes us aware of the stochastic algorithmity of what we read and write. It makes us think of the algorithms that each of us use to express our thoughts.

This shouldn’t be news to me. It’s already been the case that my own experience as a writer has made me more aware as a reader—I can see the seams where the writing has been put together, as it were—so even more so when we think of algorithms.

P.S. I’ve been critical of Pinker on this blog from time to time, so let me clarify here that (a) Seabrook’s example of completing Pinker’s paragraph is amusing, and (b) I’m pretty sure the same thing would happen if someone were to run one of my paragraphs through a machine learner.

First, my writing (like Pinker’s) is patterned: I have a style, and there are some things I like to write about. Hence it’s no surprise that it should be possible to write something that looks like what I write—at least for short stretches.

Second, if you stare at just about any sentence I write, it will start to fall apart. Each sentence works as part of the larger story. Trying to make too much sense from any one sentence is like staring at a single brushstroke at a painting. Ultimately, it’s just a collection of words, or some paint, and it can’t live for long after being plucked from its contextual tree.

OK, I did that last bit on purpose, wandering off and writing somewhat incoherently. The point is that a phrase “Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement…” is, to be fair, no more gobbledygooky than lots of sentences that I write.

My point here is not to criticize Pinker all but rather just to use Seabrook’s particular example to demonstrate this uncanny valley that arises when we consider the possibility that a sentence isn’t real. I guess it’s similar to what’s happening with fake photos, fake videos, fake news stories, and so forth, that these all make us more aware of the fakeyness of the real.