Vicky Pollard and the NIH Slogan

I love this. An article in Great Britain, or as Our Leader would say, "a Great British article," recently asserted:
Britain's teenagers risk becoming a nation of "Vicky Pollards" held back by poor verbal skills, research suggests.

And like the Little Britain character the top 20 words used, including yeah, no, but and like, account for around a third of all words, the study says. UK's Vicky Pollards 'left behind'

Vicky Pollard, it turns out, is a character on a British TV show, an obnoxious glue-sniffing juvenile delinquent. Her stammering seems to be the British equivalent of a Valley girl, maybe worse.

The BBC story had a few other nuggets.
Employers often complained that new employees were unable to answer the telephone in the formal way required of them for work and that they were also intimidated by speaking formally in meetings, the professor added.

He put this down to a lack of training and the overuse of technologies such as computer games and MP3 players.

"This trend, known as technology isolation syndrome, could lead to problems in the classroom and then later in life.

"Employers are already complaining that first jobbers are lacking basic verbal communication and it seems things could be set to get worse.

"Kids need to get talking and develop their vocabulary."

Technology isolation syndrome. Good one, that. Watch for that to catch on over here soon.

OK, so this is one of those things that sounds so plausible, you figure it's just obviously true.

But this morning I was reading Language Log, which is one of my favorite websites, and I came across a kind of replication of the analysis, with a twist. This time the subject of the analysis was not teenagers, but the BBC article that reported the story in the first place.
I took the entire text of the actual BBC article reporting this news of verbal poverty, ... computed the top 20 most frequent words in it, and worked out what percentage of the total it was. The answer is between 36 and 40 percent. (The difference depends on how much you collapse different word forms together into lexemes. Collapsing genitives and plurals with non-genitive singulars makes hardly any difference to the results, but treating is, are, was, and were as different words rather than as representatives of the verb be lowers the figure slightly. If you do the collapsing, the top 20 words make up over 39.5% of the text. If you don't, the top 20 account for just over 36%.)

So this is the situation. This staggeringly stupid news report states that Britain's teenagers are "held back by poor verbal skills" because the evidence shows that the top 20 words in their speech account for 33% of all the words they use — the implication being that they aren't using enough words, they're just repeating a few words like "yeah" and "no" and "but" and "like". But in the staggeringly stupid article itself, the top 20 words account for substantially more than that. So Britain's science writers (at least at the BBC) are even more verbally retarded.

There are two lessons to be learned here. The first has to do with the dependence of the Zipf distribution on sample size. But I don't think we'll get very far going on about that.

The real lesson has to do with the willingness of people to believe anything that confirms their expectations. By invoking the stereotype of "Vicky Pollard" (and you must realize, all of the UK watches the same -- very few -- television channels, they all know who she is), the author made the conclusion that British teens are basically a bunch of blabbering fools seem very reasonable.

In our controversy in Montgomery County, we have seen a kind of artificial amplification of this effect. The analysis just reported allowed a perfect sampling of all the words in the article. But in our discussions, let's say we want to confirm or deny some facts from the peer-reviewed scientific research. We could do a meta-analysis of aggregated reports to find out, say, if anal sex is indeed riskier than vaginal sex, or to find out if learning about homosexuality makes one more likely to become gay. In that case, we would take all the published papers on the topic, aggregate their results, including the p-values obtained, and using some well-established methods we could come up with an estimate of confidence for each hypothesis.

But we can't do that because, basically, there's no data on those questions. The CRC says anal sex is incredibly risky, but out of the other side of their mouth they warn that we just don't know if anal sex is safe. The fact is, nobody would permit research that randomly assigned subjects to the "anal" and "vaginal" groups, and told them to do nothing but that for, say, a year, and then measured which group had more STDs. There is no experimental research, only some inferential chains of reasoning that might lead one to conclude that anal intercourse is riskier.

And of course there's no evidence in the world that teaching students about sexual orientation will turn them gay.

So what happens is that we find people taking a sentence here or there and repeating it over and over again, because it seems plausible in the context of their expectations. An example is the CRC's emphasis on the sentence from an NIH report that "The highest rate of [HIV] transmission is through anal exposure." They want to use this one sentence (and another one that a Surgeon General possibly uttered in the 1980s) to begin teaching students about anal sex and how horribly dangerous it is. The NIH report on condom effectiveness specifically says: The workshop addressed condom effectiveness in preventing infections transmitted via penile-vaginal intercourse. It wasn't about anal transmission at all, didn't consider it, examined no data on the subject, but the CRC takes this one sentence out of context and throws it around as if it were a major finding of the workshop.

This is just the "Vicky Pollard" effect; the BBC's report seemed plausible, and so millions of readers went away taking it as fact that young people can barely speak the language, that schools are underemphasizing verbal skills. It seemed plausible because the TV show had people primed already to believe it. But the relationship between frequency and rank is especially well-studied in language, and this same result would pop out in almost any corpus of text. You would find this same thing, to some degree, in Moby Dick or the Bible.

In the same way, the NIH sentence takes on a vibrant life of its own among people who easily associate anal sex with homosexuality; the fact that they make that association tells you that the negative stereotype is primed and accessible. To those people, this sentence seems deeply important and true.

We don't need slogans in all of this, we need knowledge. Our students should be presented with known facts, not dark warnings without substance.


