Fort McHenry II: How to read a Poll. (No, seriously, I got help from a Math Prof. and everything!)

Thursday, August 23, 2012

How to read a Poll. (No, seriously, I got help from a Math Prof. and everything!)

It’s a truth of American Political Life that a lot of Americans just don’t trust polls, or polling for that matter.

Usually, it has to do with feelings. There’s a gut level feeling a lot of Americans have, and when a poll doesn’t match their gut they tend to discount not just the poll, but polling in general.

It’s not really fair if you think about it. Polling, as well as Statistics itself, are actual provable science. Polling does work…when it’s allowed to work.

Let’s take an example from a poll that was released on August 21st, 2012 from the firm of Foster McCollum White Baydoun, which, according to Nate Silver of the New York Times’ respected FiveThirtyEight Blog, conducts polls for Democratic candidates as well as independently.

They have Mitt Romney with a lead in Florida.

Okay. Not too weird. Mitt and the President have been flipping Florida back and forth in recent polls. Most of the last polls give Obama the lead.

Foster, McCollum, White and Baydoun gave Mitt Romney…a 14.6 point lead.

Huh?

Excuse me??

If I have one tremendous advantage in life, it's that if I have any kind of mathematical question, I can pick up the phone and call an actual Professor of Mathematics and get my questions answered.

Of course, the fact that this Math Professor is also my Father, kinda guarantees he'll take my calls, at least every once in a while.

By way of background, Dad is a former Professor of Mathematics at the University of Maryland-College Park, former Chair of the Math Department at the University of Maryland-College Park, and currently a Visiting Professor of Mathematics at his Alma Mater, Rice University in Houston, Texas.

(Side note: While Dad has a fairly impressive Mathematical resume, but he’d want me to emphasize that he’s not a Statistician).

“Dad,” I asked. “What the @#$’s up with this poll??”

“Read me the internals,” he replied.

Now, what are the internals? Internals are shorthand term of for all these concepts we’re going to be talking about: Sample size, margin of error, etc.

I cracked open the document from Foster, McCollum, White and Baydoun, and read him what I could find…

…and then, he started laughing. Out loud. On the phone.

Unfortunately, Dad tends to speak in precise Mathematical terminology, only some of which I can understand without the aid of a dictionary. Fortunately, peering further down into Nate’s column, I got a good easy-to-understand reason why Dad was laughing so hard:

Once in a great while, a poll comes along with methodology that is so implausible that it deserves some further comment. The Foster McCollum White Baydoun poll of Florida is one such survey.

The poll was weighted to a demographic estimate that predicts that just 2 percent of Florida voters will be 30 or younger. It’s a decent bet that turnout will be down some among younger voters this year, but that isn’t a realistic estimate. In 2008, according to exit polls, 15 percent of voters in Florida were between 18 and 30.

The poll also assumed that 10 percent of voters will be between the ages of 31 and 50. In 2008, the actual percentage was 36 percent, according to the exit survey.

The poll projected Latinos to be 7 percent of the turnout in Florida, against 14 percent in 2008. And it has African-American turnout at 10 percent, down from 13 percent.

If the turnout numbers look something like that in November, then Mr. Obama will lose Florida badly. He’ll also lose almost every other state; his electoral map might look a lot like Walter Mondale’s.

But the share of voters 50 and younger in Florida is not going to drop all the way from half the electorate to roughly one-tenth of it, as the poll assumed. That is far beyond the range you can get from reasonable disagreement about methods, or from sampling error. It looks like the result from a badly-designed statistical model that never got a sanity check.

See? Right at the end there, he said it plain as day. It was a badly designed poll from the outset, all but guaranteeing a result no one should believe.

But what’s worse? Nate Silver still uses this data in his Polling Aggregator Site.

Everyone is.

All of the sudden, we’re looking at a closer election from the Polling because of what one Poll did to Florida, even though the results are more than suspect.

Unlike me, you may not have a PhD in Mathematics on speed dial. You just read the results of these polls, and react to the consequences.

But what if you could look at a polls so-called “Internals”, and decide for yourself out if they’re worth anything?

What if you could read a poll, and not have to depend on anyone on CNN or MSNBC to do it for you?

Here are a few helpful hints:

First, you need to understand how statistics work. And I’m bringing this up because this is the image I’ll go back to again and again.

I want you to imagine a bowl of Vegetable Soup, filled with tomatoes, beans, onions, what have you. It’s a big bowl, and you want to get good taste of what it’s like. So you dip your spoon in.

Well, that spoonful of soup? That’s pretty much what a poll is. When you went in, you hopefully got a representative sampling of tomatoes, beans, onions, and what have you. Odds are, though…you didn’t. It’s a tiny spoon, and it’s a big bowl. Maybe you get more broth than beans. Maybe you get more onions than tomatoes. Mathematically this is the concept behind margin of error, another thing you see referenced in polls all the time.

Of course, if you used a bigger spoon, you’d get a better sampling of what’s in the soup, and thus a lower margin of error.

Which brings us to our next concept…

The first thing a Pollster does before taking a poll is create a model.

Model, you hear that all the time. What does it mean? In our Vegetable Soup scenario, it means the Pollster is going to guess in advance how many beans are in your bowl, how many tomatoes, onions, etc.

In Politics, the Pollster is going to guess in advance who’s going to show up at the polls: how many old people, how many young, how many blacks, whites, Latinos, etc. Sometimes, they use data based on previous elections. Sometimes, they’ll stretch outward and try to guess the future.

But the key word is…guess. It may be a guess backed by a lot of empirical data, in the end, it’s still a guess.

Now, I know what you’re thinking. If the Pollster is guessing what’s in the bowl before he’s even taken a bit, what’s the point of the poll?

And thus, you discover one of the dangers of watching a lot of polls. You are dependent on the models they use, which, nine times out of ten, you’re not going to see. (And that tenth time, you may not understand without a PhD in Mathematics on speed dial).

You saw this a lot in the 2008 Election. It was a wave election, a change election, one that President Obama won, but one in which he had been trailing in a lot of early polling, especially during the primaries. Why? Because a lot of the early modeling was based on a false assumption. It was based on who showed up in 2004, not who was going to show up in 2008.

For the purposes of this election, as in the example above, if the Pollster decides a lot more old folks than young are going to show up at the polls in Florida, then guess what…President Obama is going to get swamped. Of course, guess what happens in the reverse is true?

So long story short, before you go panicking about any one poll, knowing a little about the Pollster’s model matters. If you don’t trust the model, you can’t trust the poll.

Next, you have to look at the sample size.

A couple of weeks ago, there was a Poll that showed Mitt Romney with a healthy lead in a National poll. The lead made no kind of sense to me at the time (or ever). I mentioned it to my Dad, and as always, asked me to crack open those internals.

I told my Father that this poll was conducted with a sample size of a thousand respondents, with a margin of error of plus or minus four points.

A half hour later, when Dad stopped laughing out loud, he was able to tell me that the poll was a joke.

Sample Size is just how many people answered all the pollster’s questions. The Pollster will start out calling 6,000-7,000 people. Most of them won’t be home. You call 6,000-7,000 people to get 1,000-1,500 respondents.

The poll Dad and I were laughing about had a sample of 1,000 people...nationally. It was a decent size for a State poll (the Foster, McCollum, White and Baydoun survey, for example had 1,503 respondents, but they were only polling Florida). But this poll wasn't in one state, but all 50. Hence Dad’s laughter. It was too frighteningly small of a sample to give you an impression of anything. It’d be like, instead of using a spoon to sample the soup, you used the end of a toothpick instead.

For something like a State Poll, a thousand respondents is a good sized sample. For a National one, you want something a bit larger. Problem is, most Pollsters work for news organizations, and they have things called deadlines. You can probably put a poll into the field, calling 6,000 people and getting a 1,000 respondents, and you could probably do it in a overnight (at most a two or three days). But of course, the quicker the turnaround, the worse a poll is going to be, the lower the sample, and the higher the margin of error.

So remember, your average Newspaper Poll isn’t interested in the quality of the poll, they’re interested in beating the other guy with the results. Speed is all that matters, even if it ruins the poll.

How the Poll is gathered matters.

This is another area that allows Pollsters to play games with the results. Calling people on the Telephone and asking them stuff is the traditional way of putting a poll in the field. That’s all well, and good, but the problem is what kind of phones people use has changed over the last 5 years or so. Young people are more likely to be on cell phones instead of land lines. Older folks (not all, but a lot) tend to be on land lines, not cell phones.

So what do you think happens if a Pollster calls nothing but landlines?

Yeah, your sample will skew older, and affect the result.

The same thing happens with Registered Voters versus Likely Voters. One group will give you a Poll answer one way, another the other. Every one of these choices will affect the outcome of the poll, and these decisions are made in advance of the poll being taken.

State Polls are more important to you than National Polls.

This is something Nate Silver harps on over and over and over again…and he’s right. Why? It’s simple 9th Grade Civics.

Despite everything you may have heard, we the people do not decide the results of Presidential elections. We only indirectly decide them.

The winner will not be determined by the number of votes cast. They are decided by Electors, as in Electoral College. Electors are chosen by the winners of individual states on November 6th. It is mathematically possible for a Presidential candidate to lose the Popular Vote, and still win. It has happened three times in our history, the last being a particularly nasty event in December of 2000.

In the end, the State Polls matter more because it is by winning our individual States that we choose the Electors. National Polls are interesting in a general “taking your temperature” kind of a way. But what really matters is the States.

How the questions are asked matters.

This is trickier to explain, because it’s hard to quantify. Simply put, how you phrase the questions, and in exchange, explain the answers matters. The simplest turn of phrase in a question can affect how the respondent answers. The most extreme example of this is so-called “Push Polling” where the question is phrased in such a way as to guarantee an answer for example.

“Did you know Candidate X favors the kicking of puppies?”

Who’s NOT going to react negatively to that?

Questions can be asked in far more subtle ways that can pull answers one way or the other. Pollsters know this. They’ve been doing this for years, and the hucksters of the world (Rasmussen, anyone?) can potentially skew things anyway they want.

The individual ups and downs of poll don’t matter, all that matters is the trendline.

You’ve seen it yourself. If you look at a polling graph it looks as though its made of broken glass with hundreds of little jags and shards.

Polls blip up and polls blip down, and much hay (i.e., panic) is made over those individual blips.

My father calls those blips “squiggles.”

He has one bit of Political advice that doesn’t vary from year to year:

Stop looking at the squiggles.

Don’t tell me who went up a point and who went down a point day to day, that’s useless. Yet, Media outlets do it all the time. A poll comes out Monday, showing Obama up by 5. Pundit strum and drang begins. What is Mitt Romney doing wrong, blah-blah-blah. 24 hours later, another poll comes out (from a different polling company with a different sample, questionnaire and margin of error, mind you) showing Obama only up by 4. All of the sudden, it’s where did that point go? Why did Obama lose a point? What has he done wrong?

Are you kidding me?

This is the functional equivalent of weighing yourself every day, and freaking over gaining 3/10ths of a pound.

Worse, it’s the equivalent of weighing yourself everyday on your scale at home, then weighing yourself on some other scale, and freaking over the difference. Please, don’t do it.

Finally, remember the main thing, only one poll matters, the one on November 6th, 2012.

We’ve showed them once in 2008, and we’ll show ‘em again in 2012. If our people show up at the polls, if we rush the barricades, if we vote in overwhelming numbers, we can’t lose. Period.

So, that's it. This article won’t make you an expert on polling, but it should give you a clearer understanding of how it works.

If you want to look a the sites my Dad really likes as far as the Math goes (and God help you if you do), he loves the Princeton Election Consortium and Dr. Sam Wang. Dad loves what he’s doing with Neuro-science and mathematics. Right now, Dr. Wang is predicting a 89% chance of an Obama win. Dr. Wang has refined his formula, and he’s been right for the last twenty years.

Dad also likes Nate Silver and Fivethirtyeight.com at the New York Times. The only reservation he and I both have is that Nate has decided to use the State of the Economy in his data, and we’re both not sure what Econ data he’s using, and how much of it, and what quality. Basically, we thought what he did in 2008 wasn’t broke, so why fix it?

Thursday, August 23, 2012

How to read a Poll. (No, seriously, I got help from a Math Prof. and everything!)

Where I Go For News

Blog Archive